Interpreting chemisorption strength with AutoML-based feature deletion experiments.
Zhuo LiChangquan ZhaoHaikun WangYanqing DingYechao ChenPhilippe SchwallerKe YangCheng HuaYulian HePublished in: Proceedings of the National Academy of Sciences of the United States of America (2024)
The chemisorption energy of reactants on a catalyst surface, [Formula: see text], is among the most informative characteristics of understanding and pinpointing the optimal catalyst. The intrinsic complexity of catalyst surfaces and chemisorption reactions presents significant difficulties in identifying the pivotal physical quantities determining [Formula: see text]. In response to this, the study proposes a methodology, the feature deletion experiment, based on Automatic Machine Learning (AutoML) for knowledge extraction from a high-throughput density functional theory (DFT) database. The study reveals that, for binary alloy surfaces, the local adsorption site geometric information is the primary physical quantity determining [Formula: see text], compared to the electronic and physiochemical properties of the catalyst alloys. By integrating the feature deletion experiment with instance-wise variable selection (INVASE), a neural network-based explainable AI (XAI) tool, we established the best-performing feature set containing 21 intrinsic, non-DFT computed properties, achieving an MAE of 0.23 eV across a periodic table-wide chemical space involving more than 1,600 types of alloys surfaces and 8,400 chemisorption reactions. This study demonstrates the stability, consistency, and potential of AutoML-based feature deletion experiment in developing concise, predictive, and theoretically meaningful models for complex chemical problems with minimal human intervention.
Keyphrases
- machine learning
- density functional theory
- neural network
- deep learning
- ionic liquid
- high throughput
- mental health
- room temperature
- artificial intelligence
- physical activity
- highly efficient
- emergency department
- healthcare
- reduced graphene oxide
- pseudomonas aeruginosa
- carbon dioxide
- metal organic framework
- big data
- smoking cessation
- molecular docking
- gold nanoparticles
- human milk
- staphylococcus aureus
- human health
- cystic fibrosis
- induced pluripotent stem cells
- aqueous solution