Deep exploration of random forest model boosts the interpretability of machine learning studies of complicated immune responses and lung burden of nanoparticles.
Fubo YuChanghong WeiPeng DengTing PengXiangang HuPublished in: Science advances (2021)
The development of machine learning provides solutions for predicting the complicated immune responses and pharmacokinetics of nanoparticles (NPs) in vivo. However, highly heterogeneous data in NP studies remain challenging because of the low interpretability of machine learning. Here, we propose a tree-based random forest feature importance and feature interaction network analysis framework (TBRFA) and accurately predict the pulmonary immune responses and lung burden of NPs, with the correlation coefficient of all training sets >0.9 and half of the test sets >0.75. This framework overcomes the feature importance bias brought by small datasets through a multiway importance analysis. TBRFA also builds feature interaction networks, boosts model interpretability, and reveals hidden interactional factors (e.g., various NP properties and exposure conditions). TBRFA provides guidance for the design and application of ideal NPs and discovers the feature interaction networks that contribute to complex systems with small-size data in various fields.
Keyphrases
- machine learning
- immune response
- big data
- artificial intelligence
- deep learning
- network analysis
- climate change
- toll like receptor
- electronic health record
- dendritic cells
- pulmonary hypertension
- magnetic resonance imaging
- risk factors
- case control
- rna seq
- data analysis
- inflammatory response
- single cell
- diffusion weighted imaging
- computed tomography
- walled carbon nanotubes