A second unveiling: Haplotig masking of the eastern oyster genome improves population-level inference.
Jonathan B PuritzXiming GuoMatthew P HareYan HeLaDeana W HillierShubo JinMing LiuKathleen E LotterhosPat MinxTejashree ModakDina ProestouEdward S RiceChad TomlinsonWesley C WarrenErin WitkopHonggang ZhaoMarta Gomez-ChiarriPublished in: Molecular ecology resources (2023)
Genome assembly can be challenging for species that are characterized by high amounts of polymorphism, heterozygosity, and large effective population sizes. High levels of heterozygosity can result in genome mis-assemblies and a larger than expected genome size due to the haplotig versions of a single locus being assembled as separate loci. Here, we describe the first chromosome-level genome for the eastern oyster, Crassostrea virginica. Publicly released and annotated in 2017, the assembly has a scaffold N50 of 54 mb and is over 97.3% complete based on BUSCO analysis. The genome assembly for the eastern oyster is a critical resource for foundational research into molluscan adaptation to a changing environment and for selective breeding for the aquaculture industry. Subsequent resequencing data suggested the presence of haplotigs in the original assembly, and we developed a post hoc method to break up chimeric contigs and mask haplotigs in published heterozygous genomes and evaluated improvements to the accuracy of downstream analysis. Masking haplotigs had a large impact on SNP discovery and estimates of nucleotide diversity and had more subtle and nuanced effects on estimates of heterozygosity, population structure analysis, and outlier detection. We show that haplotig masking can be a powerful tool for improving genomic inference, and we present an open, reproducible resource for the masking of haplotigs in any published genome.