Using random forest to predict antimicrobial minimum inhibitory concentrations of nontyphoidal Salmonella in Taiwan.
Chia-Chi WangYu-Ting HungChe-Yu ChouShih-Ling HsuanZeng-Weng ChenPei-Yu ChangTong-Rong JanChun-Wei TungPublished in: Veterinary research (2023)
Antimicrobial resistance (AMR) is a global health issue and surveillance of AMR can be useful for understanding AMR trends and planning intervention strategies. Salmonella, widely distributed in food-producing animals, has been considered the first priority for inclusion in the AMR surveillance program by the World Health Organization (WHO). Recent advances in rapid and affordable whole-genome sequencing (WGS) techniques lead to the emergence of WGS as a one-stop test to predict the antimicrobial susceptibility. Since the variation of sequencing and minimum inhibitory concentration (MIC) measurement methods could result in different results, this study aimed to develop WGS-based random forest models for predicting MIC values of 24 drugs using data generated from the same laboratories in Taiwan. The WGS data have been transformed as a feature vector of 10-mers for machine learning. Based on rigorous validation and independent tests, a good performance was obtained with an average mean absolute error (MAE) less than 1 for both validation and independent test. Feature selection was then applied to identify top-ranked 10-mers that can further improve the prediction performance. For surveillance purposes, the genome sequence-based machine learning methods could be utilized to monitor the difference between predicted and experimental MIC, where a large difference might be worthy of investigation on the emerging genomic determinants.
Keyphrases
- machine learning
- antimicrobial resistance
- public health
- global health
- big data
- artificial intelligence
- climate change
- sars cov
- escherichia coli
- electronic health record
- deep learning
- respiratory syndrome coronavirus
- neural network
- randomized controlled trial
- listeria monocytogenes
- quality improvement
- single cell
- gene expression
- copy number
- data analysis
- risk assessment
- drug induced
- sensitive detection
- loop mediated isothermal amplification