Login / Signup

Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software.

Alexander SczyrbaPeter HofmannPeter BelmannDavid KoslickiStefan JanssenJohannes DrögeIvan GregorStephan MajdaJessika FiedlerEik DahmsAndreas BremgesAdrian FritzRuben Garrido-OterTue Sparholt JørgensenNicole ShapiroPhilip D BloodAlexey GurevichYang BaiDmitrij TuraevMatthew Z DeMaereRayan ChikhiNiranjan NagarajanChristopher QuinceFernando MeyerMonika BalvočiūtėLars Hestbjerg HansenSøren J SørensenBurton K H ChiaBertrand DenisJeff L FroulaZhong WangRobert EganDongwan Don KangJeffrey J CookCharles DeltelMichael BeckstetteClaire LemaitrePierre PeterlongoGuillaume RizkDominique LavenierYu-Wei WuSteven W SingerChirag JainMarc StrousHeiner KlingenbergPeter MeinickeMichael D BartonThomas LingnerHsin-Hung LinYu-Chieh LiaoGenivaldo Gueiros Z SilvaDaniel A CuevasRobert A EdwardsSurya SahaVitor C PiroBernhard Y RenardMihai PopHans-Peter KlenkMarkus GökerNikos C KyrpidesTanja WoykeJulia A VorholtPaul Schulze-LefertEdward M RubinAaron E DarlingThomas RatteiAlice Carolyn McHardy
Published in: Nature methods (2017)
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Keyphrases
  • public health
  • escherichia coli
  • electronic health record
  • data analysis
  • healthcare
  • single cell
  • big data
  • mental health
  • dna methylation
  • genome wide