Login / Signup

Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2.

Jamshed KhanMarek KokotSebastian DeorowiczRob Patro
Published in: Genome biology (2022)
The de Bruijn graph is a key data structure in modern computational genomics, and construction of its compacted variant resides upstream of many genomic analyses. As the quantity of genomic data grows rapidly, this often forms a computational bottleneck. We present Cuttlefish 2, significantly advancing the state-of-the-art for this problem. On a commodity server, it reduces the graph construction time for 661K bacterial genomes, of size 2.58Tbp, from 4.5 days to 17-23 h; and it constructs the graph for 1.52Tbp white spruce reads in approximately 10 h, while the closest competitor requires 54-58 h, using considerably more memory.
Keyphrases
  • convolutional neural network
  • electronic health record
  • working memory
  • copy number
  • neural network
  • big data
  • high resolution
  • single cell
  • mass spectrometry
  • machine learning
  • protein protein
  • high density