pime: A package for discovery of novel differences among microbial communities.

Luiz Fernando Würdig Roesch Priscila Thiago Dobbler Victor Satler PylroBryan KolaczkowskiJennifer C DrewEric W Triplett

Published in: Molecular ecology resources (2019)

The data used for profiling microbial communities is usually sparse with some microbes having high abundance in a few samples and being nearly absent in others. However, current bioinformatics tools able to deal with this sparsity are lacking. pime (Prevalence Interval for Microbiome Evaluation) was designed to remove those taxa that may be high in relative abundance in just a few samples but have a low prevalence overall. The reliability and robustness of pime were compared against existing methods and tested using 16S rRNA independent data sets. pime filters microbial taxa not shared in a per treatment prevalence interval started at 5% prevalence with increasing increments of 5% at each filtering step. For each prevalence interval, hundreds of decision trees were calculated to predict the likelihood of detecting differences in treatments. The best prevalence-filtered data set was user-selected by choosing the prevalence interval that kept a large portion of the 16S rRNA sequences in the data set while also showing the lowest error rate. To obtain the likelihood of introducing type I error while building prevalence-filtered data sets, an error detection step based was also included. A pime reanalysis of published data sets uncovered other expected microbial associations than previously reported, which may be masked when only relative abundance was considered.

Keyphrases