Login / Signup

HapSolo: an optimization approach for removing secondary haplotigs during diploid genome assembly and scaffolding.

Edwin A SolaresYuan TaoAnthony D LongBrandon S Gaut
Published in: BMC bioinformatics (2021)
HapSolo rapidly identified candidate assemblies that yield improvements in assembly metrics, including decreased genome size and improved N50 scores. Contig N50 scores improved by 35%, 9% and 9% for Chardonnay, mosquito and the thorny skate, respectively, relative to unreduced primary assemblies. The benefits of HapSolo were amplified by down-stream analyses, which we illustrated by scaffolding with Hi-C data. We found, for example, that prior to the application of HapSolo, only 52% of the Chardonnay genome was captured in the largest 19 scaffolds, corresponding to the number of chromosomes. After the application of HapSolo, this value increased to ~ 84%. The improvements for the mosquito's largest three scaffolds, representing the number of chromosomes, were from 61 to 86%, and the improvement was even more pronounced for thorny skate. We compared the scaffolding results to assemblies that were based on PurgeDups for identifying secondary contigs, with generally superior results for HapSolo.
Keyphrases
  • genome wide
  • aedes aegypti
  • dengue virus
  • tissue engineering
  • electronic health record
  • dna methylation
  • zika virus
  • big data