Machine learning has the potential to be a powerful tool in the fight against antimicrobial resistance (AMR), a critical global health issue. Machine learning can identify resistance mechanisms from DNA sequence data without prior knowledge. The first step in building a machine learning model is a feature extraction from sequencing data. Traditional methods like single nucleotide polymorphism (SNP) calling and k-mer counting yield numerous, often redundant features, complicating prediction and analysis. In this paper, we propose PanKA, a method using the pangenome to extract a concise set of relevant features for predicting AMR. PanKA not only enables fast model training and prediction but also improves accuracy. Applied to the Escherichia coli and Klebsiella pneumoniae bacterial species, our model is more accurate than conventional and state-of-the-art methods in predicting AMR.
Keyphrases
- machine learning
- antimicrobial resistance
- escherichia coli
- klebsiella pneumoniae
- big data
- global health
- artificial intelligence
- deep learning
- healthcare
- multidrug resistant
- public health
- electronic health record
- risk assessment
- dna methylation
- single molecule
- data analysis
- circulating tumor
- climate change
- human health
- virtual reality
- genetic diversity