Analysis-ready VCF at Biobank scale using Zarr.
Eric A CzechTimothy R MillarTom WhiteBen JefferyAlistair MilesSam TallmanRafal WojdylaShadi ZabadJeff HammerbacherJerome KelleherPublished in: bioRxiv : the preprint server for biology (2024)
Large row-encoded VCF files are a major bottleneck for current research, and storing and processing these files incurs a substantial cost. The VCF Zarr specification, building on widely-used, open-source technologies has the potential to greatly reduce these costs, and may enable a diverse ecosystem of next-generation tools for analysing genetic variation data directly from cloud-based object stores.