Predict Ionization Energy of Molecules Using Conventional and Graph-Based Machine Learning Models.
Yufeng LiuZhenyu LiPublished in: Journal of chemical information and modeling (2023)
Ionization energy (IE) is an important property of molecules. It is highly desirable to predict IE efficiently based on, for example, machine learning (ML)-powered quantitative structure-property relationships (QSPR). In this study, we systematically compare the performance of different machine learning models in predicting the IE of molecules with distinct functional groups obtained from the NIST webbook. Mordred and PaDEL are used to generate informative and computationally inexpensive descriptors for conventional ML models. Using a descriptor to indicate if the molecule is a radical can significantly improve the performance of these ML models. Support vector regression (SVR) is the best conventional ML model for IE prediction. In graph-based models, the AttentiveFP gives an even better performance compared to SVR. The difference between these two types of models mainly comes from their predictions for radical molecules, where the local environment around an unpaired electron is better described by graph-based models. These results provide not only high-performance models for IE prediction but also useful information in choosing models to obtain reliable QSPR.