Applicability Domains Based on Molecular Graph Contrastive Learning Enable Graph Attention Network Models to Accurately Predict 15 Environmental End Points.
Haobo WangWenjia LiuJingwen ChenZhongyu WangPublished in: Environmental science & technology (2023)
In silico models for predicting physicochemical properties and environmental fate parameters are necessary for the sound management of chemicals. This study employed graph attention network (GAT) algorithms to construct such models on 15 end points. The results showed that the GAT models outperformed the previous state-of-the-art models, and their performance was not influenced by the presence or absence of compounds with certain structures. Molecular similarity density (ρ s ) was found to be a key metrics characterizing data set modelability, in addition to the proportion of compounds at activity cliffs. By introducing molecular graph (MG) contrastive learning, MG-based ρ s and molecular inconsistency in activities ( I A ) were calculated and employed for characterizing the structure-activity landscape (SAL)-based applicability domain AD SAL {ρ s , I A }. The GAT models coupled with AD SAL {ρ s , I A } significantly improved the prediction coefficient of determination ( R 2 ) on all the end points by an average of 14.4% and enabled all the end points to have R 2 > 0.9, which could hardly be achieved previously. The models were employed to screen persistent, mobile, and/or bioaccumulative chemicals from inventories consisting of about 10 6 chemicals. Given the current state-of-the-art model performance and coverage of the various environmental end points, the constructed models with AD SAL {ρ s , I A } may serve as benchmarks for future efforts to improve modeling efficacy.