Contamination source modeling with SCRuB improves cancer phenotype prediction from microbiome data.
George I AustinHeekuk ParkYoli MeydanDwayne SeeramTanya SezinYue Clare LouBrian A FirekMichael J MorowitzJillian F BanfieldAngela M ChristianoItsik Pe'erAnne-Catrin UhlemannLiat ShenhavTal KoremPublished in: Nature biotechnology (2023)
Sequencing-based approaches for the analysis of microbial communities are susceptible to contamination, which could mask biological signals or generate artifactual ones. Methods for in silico decontamination using controls are routinely used, but do not make optimal use of information shared across samples and cannot handle taxa that only partially originate in contamination or leakage of biological material into controls. Here we present Source tracking for Contamination Removal in microBiomes (SCRuB), a probabilistic in silico decontamination method that incorporates shared information across multiple samples and controls to precisely identify and remove contamination. We validate the accuracy of SCRuB in multiple data-driven simulations and experiments, including induced contamination, and demonstrate that it outperforms state-of-the-art methods by an average of 15-20 times. We showcase the robustness of SCRuB across multiple ecosystems, data types and sequencing depths. Demonstrating its applicability to microbiome research, SCRuB facilitates improved predictions of host phenotypes, most notably the prediction of treatment response in melanoma patients using decontaminated tumor microbiome data.
Keyphrases
- risk assessment
- drinking water
- health risk
- human health
- electronic health record
- end stage renal disease
- big data
- newly diagnosed
- single cell
- ejection fraction
- chronic kidney disease
- heavy metals
- healthcare
- climate change
- machine learning
- peritoneal dialysis
- molecular dynamics
- data analysis
- oxidative stress
- diabetic rats
- deep learning
- patient reported
- drug induced
- childhood cancer
- skin cancer