EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction.

Xiaoyang HouYu WangDongbo BuYaojun WangChuncui Huang

Published in: Bioinformatics (Oxford, England) (2023)

In this context, a new approach called EMNGly has been proposed. The EMNGly approach utilizes pretrained protein language model (Evolutionary Scale Modeling) and pretrained protein structure model (Inverse Folding Model) for features extraction and support vector machine for classification. Ten-fold cross-validation and independent tests show that this approach has outperformed existing techniques. And it achieves Matthews Correlation Coefficient, sensitivity, specificity, and accuracy of 0.8282, 0.9343, 0.8934, and 0.9143, respectively on a benchmark independent test set.

Keyphrases

deep learning
machine learning
autism spectrum disorder
gene expression
dna methylation
computed tomography
magnetic resonance
genome wide
molecular dynamics simulations
small molecule
single molecule