Systems Biology and Machine Learning Identify Genetic Overlaps Between Lung Cancer and Gastroesophageal Reflux Disease.
Sanjukta DasguptaPublished in: Omics : a journal of integrative biology (2024)
One Health and planetary health place emphasis on the common molecular mechanisms that connect several complex human diseases as well as human and planetary ecosystem health. For example, not only lung cancer (LC) and gastroesophageal reflux disease (GERD) pose a significant burden on planetary health, but also the coexistence of GERD in patients with LC is often associated with a poor prognosis. This study reports on the genetic overlaps between these two conditions using systems biology-driven bioinformatics and machine learning-based algorithms. A total of nine hub genes including IGHV1-3, COL3A1, ITGA11, COL1A1, MS4A1, SPP1, MMP9, MMP7, and LOC102723407 were found to be significantly altered in both LC and GERD as compared with controls and with pathway analyses suggesting a significant association with the matrix remodeling pathway. The expression of these genes was validated in two additional datasets. Random forest and K-nearest neighbor, two machine learning-based algorithms, achieved accuracies of 89% and 85% for distinguishing LC and GERD, respectively, from controls using these hub genes. Additionally, potential drug targets were identified, with molecular docking confirming the binding affinity of doxycycline to matrix metalloproteinase 7 (binding affinity: -6.8 kcal/mol). The present study is the first of its kind that combines in silico and machine learning algorithms to identify the gene signatures that relate to both LC and GERD and promising drug candidates that warrant further research in relation to therapeutic innovation in LC and GERD. Finally, this study also suggests upstream regulators, including microRNAs and transcription factors, that can inform future mechanistic research on LC and GERD.
Keyphrases
- gastroesophageal reflux disease
- machine learning
- poor prognosis
- genome wide
- molecular docking
- healthcare
- public health
- simultaneous determination
- mass spectrometry
- artificial intelligence
- mental health
- deep learning
- endothelial cells
- big data
- transcription factor
- genome wide identification
- health information
- multiple sclerosis
- liquid chromatography
- tandem mass spectrometry
- human health
- molecular dynamics simulations
- copy number
- risk assessment
- gene expression
- ms ms
- health promotion
- risk factors
- current status
- binding protein
- high resolution
- pluripotent stem cells