Login / Signup

Analysis-ready VCF at Biobank scale using Zarr.

Eric A CzechTimothy R MillarTom WhiteBen JefferyAlistair MilesSam TallmanRafal WojdylaShadi ZabadJeff HammerbacherJerome Kelleher
Published in: bioRxiv : the preprint server for biology (2024)
Large row-encoded VCF files are a major bottleneck for current research, and storing and processing these files incurs a substantial cost. The VCF Zarr specification, building on widely-used, open-source technologies has the potential to greatly reduce these costs, and may enable a diverse ecosystem of next-generation tools for analysing genetic variation data directly from cloud-based object stores.
Keyphrases
  • human health
  • climate change
  • working memory
  • big data
  • risk assessment
  • data analysis
  • deep learning