An integrated network representation of multiple cancer-specific data for graph-based machine learning.
Limeng PuManali SinghaHsiao-Chun WuCostas BuschJ RamanujamMichal BrylinskiPublished in: NPJ systems biology and applications (2022)
Genomic profiles of cancer cells provide valuable information on genetic alterations in cancer. Several recent studies employed these data to predict the response of cancer cell lines to drug treatment. Nonetheless, due to the multifactorial phenotypes and intricate mechanisms of cancer, the accurate prediction of the effect of pharmacotherapy on a specific cell line based on the genetic information alone is problematic. Emphasizing on the system-level complexity of cancer, we devised a procedure to integrate multiple heterogeneous data, including biological networks, genomics, inhibitor profiling, and gene-disease associations, into a unified graph structure. In order to construct compact, yet information-rich cancer-specific networks, we developed a novel graph reduction algorithm. Driven by not only the topological information, but also the biological knowledge, the graph reduction increases the feature-only entropy while preserving the valuable graph-feature information. Subsequent comparative benchmarking simulations employing a tissue level cross-validation protocol demonstrate that the accuracy of a graph-based predictor of the drug efficacy is 0.68, which is notably higher than those measured for more traditional, matrix-based techniques on the same data. Overall, the non-Euclidean representation of the cancer-specific data improves the performance of machine learning to predict the response of cancer to pharmacotherapy. The generated data are freely available to the academic community at https://osf.io/dzx7b/ .
Keyphrases
- papillary thyroid
- machine learning
- squamous cell
- big data
- healthcare
- electronic health record
- emergency department
- artificial intelligence
- randomized controlled trial
- lymph node metastasis
- mental health
- squamous cell carcinoma
- neural network
- health information
- young adults
- transcription factor
- mass spectrometry
- dna methylation
- convolutional neural network
- data analysis
- low cost