Deep exploration of random forest model boosts the interpretability of machine learning studies of complicated immune responses and lung burden of nanoparticles.

Fubo Yu Changhong Wei Peng Deng Ting Peng Xiangang Hu

Published in: Science advances (2021)

The development of machine learning provides solutions for predicting the complicated immune responses and pharmacokinetics of nanoparticles (NPs) in vivo. However, highly heterogeneous data in NP studies remain challenging because of the low interpretability of machine learning. Here, we propose a tree-based random forest feature importance and feature interaction network analysis framework (TBRFA) and accurately predict the pulmonary immune responses and lung burden of NPs, with the correlation coefficient of all training sets >0.9 and half of the test sets >0.75. This framework overcomes the feature importance bias brought by small datasets through a multiway importance analysis. TBRFA also builds feature interaction networks, boosts model interpretability, and reveals hidden interactional factors (e.g., various NP properties and exposure conditions). TBRFA provides guidance for the design and application of ideal NPs and discovers the feature interaction networks that contribute to complex systems with small-size data in various fields.

Keyphrases