Login / Signup

Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus.

Amy H FitzpatrickAgnieszka RupnikHelen O'SheaFiona CrispieSinéad KeaveneyPaul D Cotter
Published in: Applied and environmental microbiology (2022)
In order to survey noroviruses in our environment, it is essential that both wet-lab and computational methods are fit for purpose. Using a simulated sequencing data set, denoising-based (DADA2, Deblur and USEARCH-UNOISE3) and clustering-based pipelines (VSEARCH and FROGS) were compared with respect to their ability to represent composition and sequence information. Open source classifiers (Ribosomal Database Project [RDP], BLASTn, IDTAXA, QIIME2 naive Bayes, and SINTAX) were trained using three different databases: a custom database, the NoroNet database, and the Human calicivirus database. Each classifier and database combination was compared from the perspective of their classification accuracy. VSEARCH provides a robust option for analyzing viral amplicons based on composition analysis; however, all pipelines could return OTUs with high similarity to the expected sequences. Importantly, pipeline choice could lead to more false positives (DADA2) or underclassification (FROGS), a key aspect when considering pipeline application for source attribution. Classification was more strongly impacted by the classifier than the database, although disagreement increased with norovirus GII.4 capsid variant designation. We recommend the use of the RDP classifier in conjunction with VSEARCH; however, maintenance of the underlying database is essential for optimal use. IMPORTANCE In benchmarking bioinformatic pipelines for analyzing high-throughput sequencing (HTS) data sets, we provide method standardization for bioinformatics broadly and specifically for norovirus in situations for which no officially endorsed methods exist at present. This study provides recommendations for the appropriate analysis and classification of norovirus amplicon HTS data and will be widely applicable during outbreak investigations.
Keyphrases
  • adverse drug
  • machine learning
  • electronic health record
  • deep learning
  • big data
  • single cell
  • endothelial cells
  • sars cov
  • high throughput sequencing
  • decision making
  • induced pluripotent stem cells
  • amino acid