Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes.
Simone RubinacciRobin J HofmeisterBárbara Sousa da MotaOlivier DelaneauPublished in: Nature genetics (2023)
The release of 150,119 UK Biobank sequences represents an unprecedented opportunity as a reference panel to impute low-coverage whole-genome sequencing data with high accuracy but current methods cannot cope with the size of the data. Here we introduce GLIMPSE2, a low-coverage whole-genome sequencing imputation method that scales sublinearly in both the number of samples and markers, achieving efficient whole-genome imputation from the UK Biobank reference panel while retaining high accuracy for ancient and modern genomes, particularly at rare variants and for very low-coverage samples.