Prediction of peptide hormones using an ensemble of machine learning and similarity-based methods.
Dashleen KaurAkanksha AroraPalani VigneshwarGajendra Pal Singh RaghavaPublished in: Proteomics (2024)
Peptide hormones serve as genome-encoded signal transduction molecules that play essential roles in multicellular organisms, and their dysregulation can lead to various health problems. In this study, we propose a method for predicting hormonal peptides with high accuracy. The dataset used for training, testing, and evaluating our models consisted of 1174 hormonal and 1174 non-hormonal peptide sequences. Initially, we developed similarity-based methods utilizing BLAST and MERCI software. Although these similarity-based methods provided a high probability of correct prediction, they had limitations, such as no hits or prediction of limited sequences. To overcome these limitations, we further developed machine and deep learning-based models. Our logistic regression-based model achieved a maximum AUROC of 0.93 with an accuracy of 86% on an independent/validation dataset. To harness the power of similarity-based and machine learning-based models, we developed an ensemble method that achieved an AUROC of 0.96 with an accuracy of 89.79% and a Matthews correlation coefficient (MCC) of 0.8 on the validation set. To facilitate researchers in predicting and designing hormone peptides, we developed a web-based server called HOPPred. This server offers a unique feature that allows the identification of hormone-associated motifs within hormone peptides. The server can be accessed at: https://webs.iiitd.edu.in/raghava/hoppred/.
Keyphrases
- machine learning
- deep learning
- convolutional neural network
- artificial intelligence
- mental health
- polycystic ovary syndrome
- big data
- healthcare
- protein protein
- public health
- amino acid
- neural network
- type diabetes
- small molecule
- gene expression
- computed tomography
- insulin resistance
- metabolic syndrome
- human health
- diffusion weighted imaging
- adipose tissue
- multidrug resistant
- genetic diversity