Lineage-informative microhaplotypes for spatio-temporal surveillance of Plasmodium vivax malaria parasites.
Sasha V SiegelRoberto AmatoHidayat TrimarsantoEdwin SutantoMariana KleineckeKathryn MurieGeorgia WhittonAimee R TaylorJames A WatsonMallika ImwongAshenafi AssefaAwab Ghulam RahimNguyen Hoang ChauTran Tinh HienJustin A GreenGavin KohNicholas J WhiteNicholas DayDominic P KwiatkowskiJulian C RaynerRichard N PriceSarah AuburnPublished in: medRxiv : the preprint server for health sciences (2023)
Challenges in understanding the origin of recurrent Plasmodium vivax infections constrains the surveillance of antimalarial efficacy and transmission of this neglected parasite. Recurrent infections within an individual may arise from activation of dormant liver stages (relapse), blood-stage treatment failure (recrudescence) or new inoculations (reinfection). Molecular inference of familial relatedness (identity-by-descent or IBD) based on whole genome sequence data, together with analysis of the intervals between parasitaemic episodes ("time-to-event" analysis), can help resolve the probable origin of recurrences. Whole genome sequencing of predominantly low-density P. vivax infections is challenging, so an accurate and scalable genotyping method to determine the origins of recurrent parasitaemia would be of significant benefit. We have developed a P. vivax genome-wide informatics pipeline to select specific microhaplotype panels that can capture IBD within small, amplifiable segments of the genome. Using a global set of 615 P. vivax genomes, we derived a panel of 100 microhaplotypes, each comprising 3-10 high frequency SNPs within <200 bp sequence windows. This panel exhibits high diversity in regions of the Asia-Pacific, Latin America and the horn of Africa (median H E = 0.70-0.81) and it captured 89% (273/307) of the polyclonal infections detected with genome-wide datasets. Using data simulations, we demonstrate lower error in estimating pairwise IBD using microhaplotypes, relative to traditional biallelic SNP barcodes. Our panel exhibited high accuracy in predicting the country of origin (median Matthew's correlation coefficient >0.9 in 90% countries tested) and it also captured local infection outbreak and bottlenecking events. The informatics pipeline is available open-source and yields microhaplotypes that can be readily transferred to high-throughput amplicon sequencing assays for surveillance in malaria-endemic regions.
Keyphrases
- plasmodium falciparum
- genome wide
- high throughput
- high frequency
- dna methylation
- single cell
- public health
- electronic health record
- big data
- copy number
- transcranial magnetic stimulation
- gene expression
- ulcerative colitis
- mass spectrometry
- spinal cord
- machine learning
- early onset
- molecular dynamics
- replacement therapy
- spinal cord injury
- magnetic resonance imaging
- combination therapy