Login / Signup

Performance difference of graph-based and alignment-based hybrid error correction methods for error-prone long reads.

Anqi WangKin Fai Au
Published in: Genome biology (2020)
The error-prone third-generation sequencing (TGS) long reads can be corrected by the high-quality second-generation sequencing (SGS) short reads, which is referred to as hybrid error correction. We here investigate the influences of the principal algorithmic factors of two major types of hybrid error correction methods by mathematical modeling and analysis on both simulated and real data. Our study reveals the distribution of accuracy gain with respect to the original long read error rate. We also demonstrate that the original error rate of 19% is the limit for perfect correction, beyond which long reads are too error-prone to be corrected by these methods.
Keyphrases
  • single cell
  • machine learning
  • big data