A predictive ensemble classifier for the gene expression diagnosis of ASD at ages 1 to 4 years.
Bokan BaoJavad ZahiriVahid H GazestaniLinda LopezYaqiong XiaoRaphael KimTeresa H WenAustin W T ChiangSrinivasa NalaboluKaren PierceKimberly RobaskyTianyun WangKendra HoekzemaEvan E EichlerNathan E LewisEric CourchesnePublished in: Molecular psychiatry (2022)
Autism Spectrum Disorder (ASD) diagnosis remains behavior-based and the median age of diagnosis is ~52 months, nearly 5 years after its first-trimester origin. Accurate and clinically-translatable early-age diagnostics do not exist due to ASD genetic and clinical heterogeneity. Here we collected clinical, diagnostic, and leukocyte RNA data from 240 ASD and typically developing (TD) toddlers (175 toddlers for training and 65 for test). To identify gene expression ASD diagnostic classifiers, we developed 42,840 models composed of 3570 gene expression feature selection sets and 12 classification methods. We found that 742 models had AUC-ROC ≥ 0.8 on both Training and Test sets. Weighted Bayesian model averaging of these 742 models yielded an ensemble classifier model with accurate performance in Training and Test gene expression datasets with ASD diagnostic classification AUC-ROC scores of 85-89% and AUC-PR scores of 84-92%. ASD toddlers with ensemble scores above and below the overall ASD ensemble mean of 0.723 (on a scale of 0 to 1) had similar diagnostic and psychometric scores, but those below this ASD ensemble mean had more prenatal risk events than TD toddlers. Ensemble model feature genes were involved in cell cycle, inflammation/immune response, transcriptional gene regulation, cytokine response, and PI3K-AKT, RAS and Wnt signaling pathways. We additionally collected targeted DNA sequencing smMIPs data on a subset of ASD risk genes from 217 of the 240 ASD and TD toddlers. This DNA sequencing found about the same percentage of SFARI Level 1 and 2 ASD risk gene mutations in TD (12 of 105) as in ASD (13 of 112) toddlers, and classification based only on the presence of mutation in these risk genes performed at a chance level of 49%. By contrast, the leukocyte ensemble gene expression classifier correctly diagnostically classified 88% of TD and ASD toddlers with ASD risk gene mutations. Our ensemble ASD gene expression classifier is diagnostically predictive and replicable across different toddler ages, races, and ethnicities; out-performs a risk gene mutation classifier; and has potential for clinical translation.
Keyphrases
- autism spectrum disorder
- gene expression
- attention deficit hyperactivity disorder
- intellectual disability
- dna methylation
- machine learning
- pi k akt
- cell cycle
- cell proliferation
- immune response
- deep learning
- genome wide
- signaling pathway
- oxidative stress
- magnetic resonance
- high resolution
- single cell
- single molecule
- epithelial mesenchymal transition
- magnetic resonance imaging
- rna seq
- endoplasmic reticulum stress