SpecHap: a diploid phasing algorithm based on spectral graph theory.
Yonghan YuLingxi ChenXinyao MiaoShuai Cheng LiPublished in: Nucleic acids research (2021)
Haplotype phasing plays an important role in understanding the genetic data of diploid eukaryotic organisms. Different sequencing technologies (such as next-generation sequencing or third-generation sequencing) produce various genetic data that require haplotype assembly. Although multiple diploid haplotype phasing algorithms exist, only a few will work equally well across all sequencing technologies. In this work, we propose SpecHap, a novel haplotype assembly tool that leverages spectral graph theory. On both in silico and whole-genome sequencing datasets, SpecHap consumed less memory and required less CPU time, yet achieved comparable accuracy with state-of-art methods across all the test instances, which comprises sequencing data from next-generation sequencing, linked-reads, high-throughput chromosome conformation capture, PacBio single-molecule real-time, and Oxford Nanopore long-reads. Furthermore, SpecHap successfully phased an individual Ambystoma mexicanum, a species with gigantic diploid genomes, within 6 CPU hours and 945MB peak memory usage, while other tools failed to yield results either due to memory overflow (40GB) or time limit exceeded (5 days). Our results demonstrated that SpecHap is scalable, efficient, and accurate for diploid phasing across many sequencing platforms.
Keyphrases
- single cell
- single molecule
- copy number
- high throughput
- rna seq
- electronic health record
- machine learning
- working memory
- big data
- optical coherence tomography
- genome wide
- magnetic resonance imaging
- atomic force microscopy
- neural network
- convolutional neural network
- magnetic resonance
- high resolution
- hiv infected
- molecular dynamics simulations
- antiretroviral therapy
- dna methylation
- molecular docking
- computed tomography
- cell free