To assemble or not to assemble: metagenomic profiling of microbially mediated biogeochemical pathways in complex communities.
Jiayin ZhouWen SongQichao TuPublished in: Briefings in bioinformatics (2023)
High-throughput profiling of microbial functional traits involved in various biogeochemical cycling pathways using shotgun metagenomic sequencing has been routinely applied in microbial ecology and environmental science. Multiple bioinformatics data processing approaches are available, including assembly-based (single-sample assembly and multi-sample assembly) and read-based (merged reads and raw data). However, it remains not clear how these different approaches may differ in data analyses and affect result interpretation. In this study, using two typical shotgun metagenome datasets recovered from geographically distant coastal sediments, the performance of different data processing approaches was comparatively investigated from both technical and biological/ecological perspectives. Microbially mediated biogeochemical cycling pathways, including nitrogen cycling, sulfur cycling and B12 biosynthesis, were analyzed. As a result, multi-sample assembly provided the most amount of usable information for targeted functional traits, at a high cost of computational resources and running time. Single-sample assembly and read-based analysis were comparable in obtaining usable information, but the former was much more time- and resource-consuming. Critically, different approaches introduced much stronger variations in microbial profiles than biological differences. However, community-level differences between the two sampling sites could be consistently observed despite the approaches being used. In choosing an appropriate approach, researchers shall balance the trade-offs between multiple factors, including the scientific question, the amount of usable information, computational resources and time cost. This study is expected to provide valuable technical insights and guidelines for the various approaches used for metagenomic data analysis.
Keyphrases
- data analysis
- high intensity
- electronic health record
- high throughput
- big data
- single cell
- microbial community
- antibiotic resistance genes
- heavy metals
- healthcare
- climate change
- genome wide
- health information
- risk assessment
- single molecule
- lymph node
- machine learning
- dna methylation
- cancer therapy
- drug delivery
- wastewater treatment
- polycyclic aromatic hydrocarbons
- water quality