Login / Signup

Assembly of long, error-prone reads using repeat graphs.

Mikhail KolmogorovJeffrey YuanYu LinPavel A Pevzner
Published in: Nature biotechnology (2019)
Accurate genome assembly is hampered by repetitive regions. Although long single molecule sequencing reads are better able to resolve genomic repeats than short-read data, most long-read assembly algorithms do not provide the repeat characterization necessary for producing optimal assemblies. Here, we present Flye, a long-read assembly algorithm that generates arbitrary paths in an unknown repeat graph, called disjointigs, and constructs an accurate repeat graph from these error-riddled disjointigs. We benchmark Flye against five state-of-the-art assemblers and show that it generates better or comparable assemblies, while being an order of magnitude faster. Flye nearly doubled the contiguity of the human genome assembly (as measured by the NGA50 assembly quality metric) compared with existing assemblers.
Keyphrases
  • single molecule
  • machine learning
  • deep learning
  • living cells
  • atomic force microscopy
  • genome wide
  • high frequency
  • gene expression
  • electronic health record
  • dna methylation
  • single cell
  • copy number
  • high speed