Login / Signup

SNP-Slice: A Bayesian nonparametric framework to resolve SNP haplotypes in mixed infections.

Nianqiao P JuJiawei LiuQixin He
Published in: bioRxiv : the preprint server for biology (2023)
Multi-strain infection is a common yet under-investigated phenomenon of many pathogens. Currently, biologists analyzing SNP information have to discard mixed infection samples, because existing downstream analyses require monogenomic infection inputs. Such a protocol impedes our understanding of the real genetic diversity, co-infection patterns, and genomic relatedness of pathogens. A reliable tool to learn and resolve the SNP haplotypes from polygenomic data is an urgent need in molecular epidemiology. In this work, we develop a slice sampling Markov Chain Monte Carlo algorithm, named SNP-Slice, to learn not only the SNP hap-lotypes of all strains in the populations but also which strains infect each host. Our method reconstructs SNP haplotypes and allele frequencies accurately with-out reference panels and outperforms the state of art methods at estimating the multiplicity of infections, allele frequencies, and heterozygosity. We illustrate the performance of SNP-Slice on empirical malaria and HIV datasets.
Keyphrases