pLM4Alg: Protein Language Model-Based Predictors for Allergenic Proteins and Peptides.
Zhenjiao DuYixiang XuChangqi LiuYonghui LiPublished in: Journal of agricultural and food chemistry (2023)
The rising prevalence of allergy demands efficient and accurate bioinformatic tools to expedite allergen identification and risk assessment while also reducing wet experiment expenses and time. Recently, pretrained protein language models (pLMs) have successfully predicted protein structure and function. However, to our best knowledge, they have not been used for predicting allergenic proteins/peptides. Therefore, this study aims to develop robust models for allergenic protein/peptide prediction using five pLMs of varying sizes and systematically assess their performance through fine-tuning with a convolutional neural network. The developed pLM4Alg models have achieved state-of-the-art performance with accuracy, Matthews correlation coefficient, and area under the curve scoring 93.4-95.1%, 0.869-0.902, and 0.981-0.990, respectively. Moreover, pLM4Alg is the first model capable of handling prediction tasks involving residue-missed sequences and sequences containing nonstandard amino acid residues. To facilitate easy access, a user-friendly web server (https://f6wxpfd3sh.us-east-1.awsapprunner.com) has been established. pLM4Alg is expected to become the leading machine learning-based prediction model for allergenic peptides and proteins. Its collaboration with other predictors holds great promise for accelerating allergy research.
Keyphrases
- amino acid
- protein protein
- machine learning
- convolutional neural network
- risk assessment
- healthcare
- autism spectrum disorder
- binding protein
- deep learning
- risk factors
- computed tomography
- heavy metals
- mass spectrometry
- magnetic resonance imaging
- magnetic resonance
- high resolution
- atopic dermatitis
- infectious diseases
- contrast enhanced