Linking gene expression to clinical outcomes in pediatric Crohn's disease using machine learning.
Kevin A ChenNina C NishiyamaMeaghan M Kennedy NgAlexandria ShumwayChinmaya U JoisaMatthew R SchanerGrace LianCaroline BeasleyLee-Ching ZhuSurekha BantumilliMuneera R KapadiaShawn M GomezTerrence S FureyShehzad Z SheikhPublished in: Scientific reports (2024)
Pediatric Crohn's disease (CD) is characterized by a severe disease course with frequent complications. We sought to apply machine learning-based models to predict risk of developing future complications in pediatric CD using ileal and colonic gene expression. Gene expression data was generated from 101 formalin-fixed, paraffin-embedded (FFPE) ileal and colonic biopsies obtained from treatment-naïve CD patients and controls. Clinical outcomes including development of strictures or fistulas and progression to surgery were analyzed using differential expression and modeled using machine learning. Differential expression analysis revealed downregulation of pathways related to inflammation and extra-cellular matrix production in patients with strictures. Machine learning-based models were able to incorporate colonic gene expression and clinical characteristics to predict outcomes with high accuracy. Models showed an area under the receiver operating characteristic curve (AUROC) of 0.84 for strictures, 0.83 for remission, and 0.75 for surgery. Genes with potential prognostic importance for strictures (REG1A, MMP3, and DUOX2) were not identified in single gene differential analysis but were found to have strong contributions to predictive models. Our findings in FFPE tissue support the importance of colonic gene expression and the potential for machine learning-based models in predicting outcomes for pediatric CD.
Keyphrases
- gene expression
- machine learning
- dna methylation
- ulcerative colitis
- minimally invasive
- big data
- genome wide
- artificial intelligence
- end stage renal disease
- oxidative stress
- newly diagnosed
- risk factors
- adipose tissue
- prognostic factors
- metabolic syndrome
- chronic kidney disease
- early onset
- single cell
- deep learning
- signaling pathway
- risk assessment
- transcription factor
- surgical site infection
- climate change
- bioinformatics analysis
- insulin resistance