Minimizing Reference Bias with an Impute-First Approach.

Naga Sai Kavya Vaddadi Taher Mun Ben Langmead

Published in: bioRxiv : the preprint server for biology (2023)

Pangenome indexes reduce reference bias in sequencing data analysis. However, a greater reduction in bias can be achieved using a personalized reference, e.g. a diploid human reference constructed to match a donor individual's alleles. We present a novel impute-first alignment framework that combines elements of genotype imputation and pangenome alignment. It begins by genotyping the individual from a sub-sample of the input reads. It next uses a reference panel and efficient imputation algorithm to impute a personalized diploid reference. Finally, it indexes the personalized reference and applies a read aligner, which could be a linear or graph aligner, to align the full read set to the personalized reference. This frame-work has higher variant-calling recall (99.54% vs. 99.37%), precision (99.36% vs. 99.18%), and F1 (99.45% vs. 99.28%) compared to a graph-based pangenome. The personalized reference is also smaller and faster to query compared to a pangenome index, making it an overall advantageous choice for whole-genome DNA sequencing experiments.

Keyphrases