Protein Language Models and Machine Learning Facilitate the Identification of Antimicrobial Peptides.
David Medina-OrtizSebastian ContrerasDiego FernándezNicole Soto-GarcíaIván MoyaGabriel Cabas-MoraAlvaro Olivera-NappaPublished in: International journal of molecular sciences (2024)
Peptides are bioactive molecules whose functional versatility in living organisms has led to successful applications in diverse fields. In recent years, the amount of data describing peptide sequences and function collected in open repositories has substantially increased, allowing the application of more complex computational models to study the relations between the peptide composition and function. This work introduces AMP-Detector, a sequence-based classification model for the detection of peptides' functional biological activity, focusing on accelerating the discovery and de novo design of potential antimicrobial peptides (AMPs). AMP-Detector introduces a novel sequence-based pipeline to train binary classification models, integrating protein language models and machine learning algorithms. This pipeline produced 21 models targeting antimicrobial, antiviral, and antibacterial activity, achieving average precision exceeding 83%. Benchmark analyses revealed that our models outperformed existing methods for AMPs and delivered comparable results for other biological activity types. Utilizing the Peptide Atlas, we applied AMP-Detector to discover over 190,000 potential AMPs and demonstrated that it is an integrative approach with generative learning to aid in de novo design, resulting in over 500 novel AMPs. The combination of our methodology, robust models, and a generative design strategy offers a significant advancement in peptide-based drug discovery and represents a pivotal tool for therapeutic applications.
Keyphrases
- machine learning
- deep learning
- artificial intelligence
- protein kinase
- amino acid
- drug discovery
- autism spectrum disorder
- big data
- minimally invasive
- single cell
- magnetic resonance imaging
- high throughput
- drug delivery
- magnetic resonance
- multidrug resistant
- binding protein
- high resolution
- ionic liquid
- cancer therapy
- protein protein
- image quality
- real time pcr