Viral Immunogenicity Prediction by Machine Learning Methods.
Nikolet DonevaIvan DimitrovPublished in: International journal of molecular sciences (2024)
Since viruses are one of the main causes of infectious illnesses, prophylaxis is essential for efficient disease control. Vaccines play a pivotal role in mitigating the transmission of various viral infections and fortifying our defenses against them. The initial step in modern vaccine design and development involves the identification of potential vaccine targets through computational techniques. Here, using datasets of 1588 known viral immunogens and 468 viral non-immunogens, we apply machine learning algorithms to develop models for the prediction of protective immunogens of viral origin. The datasets are split into training and test sets in a 4:1 ratio. The protein structures are encoded by E-descriptors and transformed into uniform vectors by the auto- and cross-covariance methods. The most relevant descriptors are selected by the gain/ratio technique. The models generated by Random Forest, Multilayer Perceptron, and XGBoost algorithms demonstrate superior predictive performance on the test sets, surpassing predictions made by VaxiJen 2.0-an established gold standard in viral immunogenicity prediction. The key attributes determining immunogenicity in viral proteins are specific fingerprints in hydrophobicity and steric properties.