MHC2AffyPred: A machine-learning approach to estimate affinity of MHC class II peptides based on structural interaction fingerprints.
Siddhi P JaniSivakumar Prasanth KumarNaman MangukiaSaumya K PatelHimanshu A PandyaRakesh M RawalPublished in: Proteins (2022)
Understanding how MHC class II (MHC-II) binding peptides with differing lengths exhibit specific interaction at the core and extended sites within the large MHC-II pocket is a very important aspect of immunological research for designing peptides. Certain efforts were made to generate peptide conformations amenable for MHC-II binding and calculate the binding energy of such complex formation but not directed toward developing a relationship between the peptide conformation in MHC-II structures and the binding affinity (BA) (IC 50 ). We present here a machine-learning approach to calculate the BA of the peptides within the MHC-II pocket for HLA-DRA1, HLA-DRB1, HLA-DP, and HLA-DQ allotypes. Instead of generating ensembles of peptide conformations conventionally, the biased mode of conformations was created by considering the peptides in the crystal structures of pMHC-II complexes as the templates, followed by site-directed peptide docking. The structural interaction fingerprints generated from such docked pMHC-II structures along with the Moran autocorrelation descriptors were trained using a random forest regressor specific to each MHC-II peptide lengths (9-19). The entire workflow is automated using Linux shell and Perl scripts to promote the utilization of MHC2AffyPred program to any characterized MHC-II allotypes and is made for free access at https://github.com/SiddhiJani/MHC2AffyPred. The MHC2AffyPred attained better performance (correlation coefficient [CC] of .612-.898) than MHCII3D (.03-.594) and NetMHCIIpan-3.2 (.289-.692) programs in the HLA-DRA1, HLA-DRB1 types. Similarly, the MHC2AffyPred program achieved CC between .91 and .98 for HLA-DP and HLA-DQ peptides (13-mer to 17-mer). Further, a case study on MHC-II binding 15-mer peptides of severe acute respiratory syndrome coronavirus-2 showed very close competency in computing the IC 50 values compared to the sequence-based NetMHCIIpan v3.2 and v4.0 programs with a correlation of .998 and .570, respectively.