Login / Signup

General Graph Neural Network-Based Model To Accurately Predict Cocrystal Density and Insight from Data Quality and Feature Representation.

Jiali GuoMing SunXueyan ZhaoChaojie ShiHaoming SuYanzhi GuoXuemei Pu
Published in: Journal of chemical information and modeling (2023)
Cocrystal engineering as an effective way to modify solid-state properties has inspired great interest from diverse material fields while cocrystal density is an important property closely correlated with the material function. In order to accurately predict the cocrystal density, we develop a graph neural network (GNN)-based deep learning framework by considering three key factors of machine learning (data quality, feature presentation, and model architecture). The result shows that different stoichiometric ratios of molecules in cocrystals can significantly influence the prediction performances, highlighting the importance of data quality. In addition, the feature complementary is not suitable for augmenting the molecular graph representation in the cocrystal density prediction, suggesting that the complementary strategy needs to consider whether extra features can sufficiently supplement the lacked information in the original representation. Based on these results, 4144 cocrystals with 1:1 stoichiometry ratio are selected as the dataset, supplemented by the data augmentation of exchanging a pair of coformers. The molecular graph is determined to learn feature representation to train the GNN-based model. Global attention is introduced to further optimize the feature space and identify important atoms to realize the interpretability of the model. Benefited from the advantages, our model significantly outperforms three competitive models and exhibits high prediction accuracy for unseen cocrystals, showcasing its robustness and generality. Overall, our work not only provides a general cocrystal density prediction tool for experimental investigations but also provides useful guidelines for the machine learning application. All source codes are freely available at https://github.com/Xiao-Gua00/CCPGraph.
Keyphrases