Developing QSAR Models with Defined Applicability Domains on PPARγ Binding Affinity Using Large Data Sets and Machine Learning Algorithms.
Zhongyu WangJingwen ChenHuixiao HongPublished in: Environmental science & technology (2021)
Chemicals may cause adverse effects on human health through binding to peroxisome proliferator-activated receptor γ (PPARγ). Hence, binding affinity is useful for evaluating chemicals with potential endocrine-disrupting effects. Quantitative structure-activity relationship (QSAR) regression models with defined applicability domains (ADs) are important to enable efficient screening of chemicals with PPARγ binding activity. However, lack of large data sets hindered the development of QSAR models. In this study, based on PPARγ binding affinity data sets curated from various sources, 30 QSAR models were developed using molecular fingerprints, two-dimensional descriptors, and five machine learning algorithms. Structure-activity landscapes (SALs) of the training compounds were described by network-like similarity graphs (NSGs). Based on the NSGs, local discontinuity scores were calculated and found to be positively correlated with the cross-validation absolute prediction errors of the models using the different training sets, descriptors, and algorithms. Moreover, innovative ADs were defined based on pairwise similarities between compounds and were found to outperform some conventional ADs. The curated data sets and developed regression models could be useful for evaluating PPARγ-involved adverse effects of chemicals. The SAL analysis and the innovative ADs could facilitate understanding of prediction results from QSAR models.