Login / Signup

Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps.

Caroline BelserBenjamin IstaceErwan DenisMarion DubarryFranc-Christophe BaurensCyril FalentinMathieu GeneteWahiba BerrabahAnne-Marie ChèvreRégine DelourmeGwenaëlle DeniotFrance DenoeudPhilippe DufféStefan EngelenArnaud LemainqueMaria Manzanares-DauleuxGuillaume MartinJérôme MoriceBenjamin NoelXavier VekemansAngélique D'HontMathieu Rousseau-GueutinValérie BarbeCorinne CruaudPatrick WinckerJean-Marc Aury
Published in: Nature plants (2018)
Plant genomes are often characterized by a high level of repetitiveness and polyploid nature. Consequently, creating genome assemblies for plant genomes is challenging. The introduction of short-read technologies 10 years ago substantially increased the number of available plant genomes. Generally, these assemblies are incomplete and fragmented, and only a few are at the chromosome scale. Recently, Pacific Biosciences and Oxford Nanopore sequencing technologies were commercialized that can sequence long DNA fragments (kilobases to megabase) and, using efficient algorithms, provide high-quality assemblies in terms of contiguity and completeness of repetitive regions1-4. However, even though genome assemblies based on long reads exhibit high contig N50s (>1 Mb), these methods are still insufficient to decipher genome organization at the chromosome level. Here, we describe a strategy based on long reads (MinION or PromethION sequencers) and optical maps (Saphyr system) that can produce chromosome-level assemblies and demonstrate applicability by generating high-quality genome sequences for two new dicotyledon morphotypes, Brassica rapa Z1 (yellow sarson) and Brassica oleracea HDEM (broccoli), and one new monocotyledon, Musa schizocarpa (banana). All three assemblies show contig N50s of >5 Mb and contain scaffolds that represent entire chromosomes or chromosome arms.
Keyphrases
  • single molecule
  • copy number
  • machine learning
  • high resolution
  • high speed
  • circulating tumor cells
  • arabidopsis thaliana
  • genome wide identification