Lineage-informative microhaplotypes for recurrence classification and spatio-temporal surveillance of Plasmodium vivax malaria parasites.
Sasha V SiegelHidayat TrimarsantoRoberto AmatoKathryn MurieAimee R TaylorEdwin SutantoMariana KleineckeGeorgia WhittonJames A WatsonMallika ImwongAshenafi AssefaAwab Ghulam RahimHoang Chau NguyenTinh Hien TranJustin A GreenGavin C K W KohNicholas J WhiteNicholas DayDominic P KwiatkowskiJulian C RaynerRichard N PriceSarah AuburnPublished in: Nature communications (2024)
Challenges in classifying recurrent Plasmodium vivax infections constrain surveillance of antimalarial efficacy and transmission. Recurrent infections may arise from activation of dormant liver stages (relapse), blood-stage treatment failure (recrudescence) or reinfection. Molecular inference of familial relatedness (identity-by-descent or IBD) can help resolve the probable origin of recurrences. As whole genome sequencing of P. vivax remains challenging, targeted genotyping methods are needed for scalability. We describe a P. vivax marker discovery framework to identify and select panels of microhaplotypes (multi-allelic markers within small, amplifiable segments of the genome) that can accurately capture IBD. We evaluate panels of 50-250 microhaplotypes discovered in a global set of 615 P. vivax genomes. A candidate global 100-microhaplotype panel exhibits high marker diversity in the Asia-Pacific, Latin America and horn of Africa (median H E = 0.70-0.81) and identifies 89% of the polyclonal infections detected with genome-wide datasets. Data simulations reveal lower error in estimating pairwise IBD using microhaplotypes relative to traditional biallelic SNP barcodes. The candidate global panel also exhibits high accuracy in predicting geographic origin and captures local infection outbreak and bottlenecking events. Our framework is open-source enabling customised microhaplotype discovery and selection, with potential for porting to other species or data resources.
Keyphrases
- genome wide
- plasmodium falciparum
- dna methylation
- copy number
- public health
- small molecule
- electronic health record
- single cell
- high throughput
- machine learning
- deep learning
- free survival
- big data
- ulcerative colitis
- neuropathic pain
- data analysis
- cancer therapy
- autism spectrum disorder
- spinal cord injury
- single molecule
- human health