Prediction of CYP450 Enzyme-Substrate Selectivity Based on the Network-Based Label Space Division Method.

Xiaoqi ShanXiangeng WangCheng-Dong LiYanyi ChuYufang ZhangYi Xiong Dong-Qing Wei

Published in: Journal of chemical information and modeling (2019)

A drug may be metabolized by multiple cytochrome P450 (CYP450) isoforms. Predicting the metabolic fate of drugs is very important to prevent drug-drug interactions in the development of novel pharmaceuticals. Prediction of CYP450 enzyme-substrate selectivity is formulized as a multilabel learning task in this study. First, we compared the performance of feature combinations based on four different categories of features, which are physiochemical property descriptors, mol2vec descriptors, extended connectivity fingerprints, and molecular access system key fingerprints on modeling. After identifying the best combination of features, we applied seven different multilabel models, which are multilabel k-nearest neighbor (ML-kNN), multilabel twin support vector machine, and five network-based label space division (NLSD)-based methods (NLSD-MLP, NLSD-XGB, NLSD-EXT, NLSD-RF, and NLSD-SVM). All of the six models (ML-kNN, NLSD-MLP, NLSD-XGB, NLSD-EXT, NLSD-RF, and NLSD-SVM) in this paper exhibit better performances than the previous work. Besides, NLSD-XGB achieves the best performance with the average top-1 prediction success of 91.1%, the average top-2 prediction success of 96.2%, and the average top-3 prediction success of 98.2%. When compared with the previous work, NLSD-XGB shows a significant improvement over 11% on top-1 in the 10 times repeated 5-fold cross-validation test and over 14% on top-1 in the 10 times repeated hold-out method. To the best of our knowledge, the network-based label space division model is first introduced in drug metabolism and performs well in this task.

Keyphrases