VAMPr: VAriant Mapping and Prediction of antibiotic resistance via explainable features and machine learning.
Jiwoong KimDavid E GreenbergReed PiferShuang JiangGuanghua XiaoSamuel A ShelburneAndrew Y KohYang XieXiaowei ZhanPublished in: PLoS computational biology (2020)
Antimicrobial resistance (AMR) is an increasing threat to public health. Current methods of determining AMR rely on inefficient phenotypic approaches, and there remains incomplete understanding of AMR mechanisms for many pathogen-antimicrobial combinations. Given the rapid, ongoing increase in availability of high-density genomic data for a diverse array of bacteria, development of algorithms that could utilize genomic information to predict phenotype could both be useful clinically and assist with discovery of heretofore unrecognized AMR pathways. To facilitate understanding of the connections between DNA variation and phenotypic AMR, we developed a new bioinformatics tool, variant mapping and prediction of antibiotic resistance (VAMPr), to (1) derive gene ortholog-based sequence features for protein variants; (2) interrogate these explainable gene-level variants for their known or novel associations with AMR; and (3) build accurate models to predict AMR based on whole genome sequencing data. We curated the publicly available sequencing data for 3,393 bacterial isolates from 9 species that contained AMR phenotypes for 29 antibiotics. We detected 14,615 variant genotypes and built 93 association and prediction models. The association models confirmed known genetic antibiotic resistance mechanisms, such as blaKPC and carbapenem resistance consistent with the accurate nature of our approach. The prediction models achieved high accuracies (mean accuracy of 91.1% for all antibiotic-pathogen combinations) internally through nested cross validation and were also validated using external clinical datasets. The VAMPr variant detection method, association and prediction models will be valuable tools for AMR research for basic scientists with potential for clinical applicability.
Keyphrases
- antimicrobial resistance
- copy number
- high density
- machine learning
- high resolution
- public health
- big data
- electronic health record
- genome wide
- healthcare
- small molecule
- risk assessment
- high throughput
- dna methylation
- mass spectrometry
- single cell
- cystic fibrosis
- artificial intelligence
- escherichia coli
- health information
- transcription factor
- cell free
- drug resistant
- binding protein
- genome wide identification
- circulating tumor cells
- global health