Identification of Protein-Ligand Binding Sites by Sequence Information and Ensemble Classifier.
Yijie DingJijun TangYijie DingPublished in: Journal of chemical information and modeling (2017)
Identifying protein-ligand binding sites is an important process in drug discovery and structure-based drug design. Detecting protein-ligand binding sites is expensive and time-consuming by traditional experimental methods. Hence, computational approaches provide many effective strategies to deal with this issue. Recently, lots of computational methods are based on structure information on proteins. However, these methods are limited in the common scenario, where both the sequence of protein target is known and sufficient 3D structure information is available. Studies indicate that sequence-based computational approaches for predicting protein-ligand binding sites are more practical. In this paper, we employ a novel computational model of protein-ligand binding sites prediction, using protein sequence. We apply the Discrete Cosine Transform (DCT) to extract feature from Position-Specific Score Matrix (PSSM). In order to improve the accuracy, Predicted Relative Solvent Accessibility (PRSA) information is also utilized. The predictor of protein-ligand binding sites is built by employing the ensemble weighted sparse representation model with random under-sampling. To evaluate our method, we conduct several comprehensive tests (12 types of ligands testing sets) for predicting protein-ligand binding sites. Results show that our method achieves better Matthew's correlation coefficient (MCC) than other outstanding methods on independent test sets of ATP (0.506), ADP (0.511), AMP (0.393), GDP (0.579), GTP (0.641), Mg2+ (0.317), Fe3+ (0.490) and HEME (0.640). Our proposed method outperforms earlier predictors (the performance of MCC) in 8 of the 12 ligands types.