SNP-Slice: A Bayesian nonparametric framework to resolve SNP haplotypes in mixed infections.
Nianqiao P JuJiawei LiuQixin HePublished in: bioRxiv : the preprint server for biology (2023)
Multi-strain infection is a common yet under-investigated phenomenon of many pathogens. Currently, biologists analyzing SNP information have to discard mixed infection samples, because existing downstream analyses require monogenomic infection inputs. Such a protocol impedes our understanding of the real genetic diversity, co-infection patterns, and genomic relatedness of pathogens. A reliable tool to learn and resolve the SNP haplotypes from polygenomic data is an urgent need in molecular epidemiology. In this work, we develop a slice sampling Markov Chain Monte Carlo algorithm, named SNP-Slice, to learn not only the SNP hap-lotypes of all strains in the populations but also which strains infect each host. Our method reconstructs SNP haplotypes and allele frequencies accurately with-out reference panels and outperforms the state of art methods at estimating the multiplicity of infections, allele frequencies, and heterozygosity. We illustrate the performance of SNP-Slice on empirical malaria and HIV datasets.
Keyphrases
- genetic diversity
- genome wide
- high density
- dna methylation
- escherichia coli
- copy number
- hiv infected
- antiretroviral therapy
- monte carlo
- machine learning
- human immunodeficiency virus
- gene expression
- magnetic resonance imaging
- hiv positive
- gram negative
- magnetic resonance
- mass spectrometry
- high resolution
- south africa
- multidrug resistant
- hiv testing
- atomic force microscopy
- high speed
- men who have sex with men