Gene-SGAN: discovering disease subtypes with imaging and genetic signatures via multi-view weakly-supervised deep clustering.
Zhijian YangJunhao WenAhmed AbdulkadirYuhan CuiGuray ErusElizabeth MamourianRanda MelhemDhivya SrinivasanSindhuja T GovindarajanJiong ChenMohamad HabesColin L MastersPaul MaruffJurgen FrippLuigi FerruciMarilyn S AlbertSterling C JohnsonJohn C MorrisPamela LaMontagneDaniel S MarcusTammie L S BenzingerDavid A WolkLi ShenJingxuan BaoSusan M ResnickHaochang ShouIlya M NasrallahChristos DavatzikosPublished in: Nature communications (2024)
Disease heterogeneity has been a critical challenge for precision diagnosis and treatment, especially in neurologic and neuropsychiatric diseases. Many diseases can display multiple distinct brain phenotypes across individuals, potentially reflecting disease subtypes that can be captured using MRI and machine learning methods. However, biological interpretability and treatment relevance are limited if the derived subtypes are not associated with genetic drivers or susceptibility factors. Herein, we describe Gene-SGAN - a multi-view, weakly-supervised deep clustering method - which dissects disease heterogeneity by jointly considering phenotypic and genetic data, thereby conferring genetic correlations to the disease subtypes and associated endophenotypic signatures. We first validate the generalizability, interpretability, and robustness of Gene-SGAN in semi-synthetic experiments. We then demonstrate its application to real multi-site datasets from 28,858 individuals, deriving subtypes of Alzheimer's disease and brain endophenotypes associated with hypertension, from MRI and single nucleotide polymorphism data. Derived brain phenotypes displayed significant differences in neuroanatomical patterns, genetic determinants, biological and clinical biomarkers, indicating potentially distinct underlying neuropathologic processes, genetic drivers, and susceptibility factors. Overall, Gene-SGAN is broadly applicable to disease subtyping and endophenotype discovery, and is herein tested on disease-related, genetically-associated neuroimaging phenotypes.
Keyphrases
- genome wide
- machine learning
- copy number
- magnetic resonance imaging
- blood pressure
- electronic health record
- computed tomography
- magnetic resonance
- multiple sclerosis
- high resolution
- small molecule
- transcription factor
- blood brain barrier
- deep learning
- smoking cessation
- subarachnoid hemorrhage
- functional connectivity
- combination therapy