Login / Signup

CoLoRd: compressing long reads.

Marek KokotAdam GudyśHeng LiSebastian Deorowicz
Published in: Nature methods (2022)
The cost of maintaining exabytes of data produced by sequencing experiments every year has become a major issue in today's genomic research. In spite of the increasing popularity of third-generation sequencing, the existing algorithms for compressing long reads exhibit a minor advantage over the general-purpose gzip. We present CoLoRd, an algorithm able to reduce the size of third-generation sequencing data by an order of magnitude without affecting the accuracy of downstream analyses.
Keyphrases
  • single cell
  • machine learning
  • electronic health record
  • deep learning
  • big data
  • copy number
  • artificial intelligence
  • data analysis