Critical Assessment of Short-Read Assemblers for the Metagenomic Identification of Foodborne and Waterborne Pathogens Using Simulated Bacterial Communities.
Zhao ChenJianghong MengPublished in: Microorganisms (2022)
Metagenomics offers the highest level of strain discrimination of bacterial pathogens from complex food and water microbiota. With the rapid evolvement of assembly algorithms, defining an optimal assembler based on the performance in the metagenomic identification of foodborne and waterborne pathogens is warranted. We aimed to benchmark short-read assemblers for the metagenomic identification of foodborne and waterborne pathogens using simulated bacterial communities. Bacterial communities on fresh spinach and in surface water were simulated by generating paired-end short reads of Illumina HiSeq, MiSeq, and NovaSeq at different sequencing depths. Multidrug-resistant Salmonella Indiana SI43 and Pseudomonas aeruginosa PAO1 were included in the simulated communities on fresh spinach and in surface water, respectively. ABySS, IDBA-UD, MaSuRCA, MEGAHIT, metaSPAdes, and Ray Meta were benchmarked in terms of assembly quality, identifications of plasmids, virulence genes, Salmonella pathogenicity island, antimicrobial resistance genes, chromosomal point mutations, serotyping, multilocus sequence typing, and whole-genome phylogeny. Overall, MEGHIT, metaSPAdes, and Ray Meta were more effective for metagenomic identification. We did not obtain an optimal assembler when using the extracted reads classified as Salmonella or P. aeruginosa for downstream genomic analyses, but the extracted reads showed consistent phylogenetic topology with the reference genome when they were aligned with Salmonella or P. aeruginosa strains. In most cases, HiSeq, MiSeq, and NovaSeq were comparable at the same sequencing depth, while higher sequencing depths generally led to more accurate results. As assembly algorithms advance and mature, the evaluation of assemblers should be a continuous process.
Keyphrases
- antimicrobial resistance
- escherichia coli
- bioinformatics analysis
- gram negative
- multidrug resistant
- pseudomonas aeruginosa
- listeria monocytogenes
- machine learning
- biofilm formation
- single cell
- antibiotic resistance genes
- klebsiella pneumoniae
- acinetobacter baumannii
- deep learning
- single molecule
- copy number
- staphylococcus aureus
- dna methylation
- transcription factor
- high resolution
- quality improvement