Machine learning has the potential to be a powerful tool in the fight against antimicrobial resistance (AMR), a critical global health issue. Machine learning can identify resistance mechanisms from DNA sequence data without prior knowledge. The first step in building a machine learning model is a feature extraction from sequencing data. Traditional methods like single nucleotide polymorphism (SNP) calling and k-mer counting yield numerous, often redundant features, complicating prediction and analysis. In this paper, we propose PanKA, a method using the pangenome to extract a concise set of relevant features for predicting AMR. PanKA not only enables fast model training and prediction but also improves accuracy. Applied to the Escherichia coli and Klebsiella pneumoniae bacterial species, our model is more accurate than conventional and state-of-the-art methods in predicting AMR.
Keyphrases
- machine learning
- antimicrobial resistance
- escherichia coli
- klebsiella pneumoniae
- big data
- global health
- artificial intelligence
- multidrug resistant
- deep learning
- healthcare
- public health
- oxidative stress
- genome wide
- staphylococcus aureus
- circulating tumor
- gene expression
- risk assessment
- cell free
- climate change
- single molecule
- pseudomonas aeruginosa
- data analysis
- genetic diversity
- human health
- high density
- anti inflammatory
- dna methylation