Login / Signup

Performance evaluation of six popular short-read simulators.

Mark MilhavenSusanne P Pfeifer
Published in: Heredity (2022)
High-throughput sequencing data enables the comprehensive study of genomes and the variation therein. Essential for the interpretation of this genomic data is a thorough understanding of the computational methods used for processing and analysis. Whereas "gold-standard" empirical datasets exist for this purpose in humans, synthetic (i.e., simulated) sequencing data can offer important insights into the capabilities and limitations of computational pipelines for any arbitrary species and/or study design-yet, the ability of read simulator software to emulate genomic characteristics of empirical datasets remains poorly understood. We here compare the performance of six popular short-read simulators-ART, DWGSIM, InSilicoSeq, Mason, NEAT, and wgsim-and discuss important considerations for selecting suitable models for benchmarking.
Keyphrases
  • electronic health record
  • single molecule
  • big data
  • high throughput sequencing
  • data analysis
  • copy number
  • single cell
  • dna methylation
  • silver nanoparticles
  • virtual reality