CoLoRd: compressing long reads.

Marek KokotAdam GudyśHeng Li Sebastian Deorowicz

Published in: Nature methods (2022)

The cost of maintaining exabytes of data produced by sequencing experiments every year has become a major issue in today's genomic research. In spite of the increasing popularity of third-generation sequencing, the existing algorithms for compressing long reads exhibit a minor advantage over the general-purpose gzip. We present CoLoRd, an algorithm able to reduce the size of third-generation sequencing data by an order of magnitude without affecting the accuracy of downstream analyses.

Keyphrases

single cell
machine learning
electronic health record
deep learning
big data
copy number
artificial intelligence
data analysis