DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery.
Wei LanHaibo LiaoQingfeng ChenLingzhi ZhuYi PanYi-Ping Phoebe ChenPublished in: Briefings in bioinformatics (2024)
Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.
Keyphrases
- deep learning
- papillary thyroid
- squamous cell
- single cell
- machine learning
- electronic health record
- small molecule
- big data
- high throughput
- squamous cell carcinoma
- convolutional neural network
- childhood cancer
- gene expression
- free survival
- genome wide
- working memory
- wastewater treatment
- lymph node
- young adults
- dna methylation
- human health
- climate change
- sentinel lymph node