Multi-omics cannot replace sample size in genome-wide association studies.
David A A BarangerAlexander S HatoumRenato PolimantiJoshua C GrayHoward J EdenbergRyan BogdanArpana AgrawalPublished in: Genes, brain, and behavior (2023)
The integration of multi-omics information (e.g., epigenetics and transcriptomics) can be useful for interpreting findings from genome-wide association studies (GWAS). It has been suggested that multi-omics could circumvent or greatly reduce the need to increase GWAS sample sizes for novel variant discovery. We tested whether incorporating multi-omics information in earlier and smaller-sized GWAS boosts true-positive discovery of genes that were later revealed by larger GWAS of the same/similar traits. We applied 10 different analytic approaches to integrating multi-omics data from 12 sources (e.g., Genotype-Tissue Expression project) to test whether earlier and smaller GWAS of 4 brain-related traits (alcohol use disorder/problematic alcohol use, major depression/depression, schizophrenia, and intracranial volume/brain volume) could detect genes that were revealed by a later and larger GWAS. Multi-omics data did not reliably identify novel genes in earlier less-powered GWAS (PPV <0.2; 80% false-positive associations). Machine learning predictions marginally increased the number of identified novel genes, correctly identifying 1-8 additional genes, but only for well-powered early GWAS of highly heritable traits (i.e., intracranial volume and schizophrenia). Although multi-omics, particularly positional mapping (i.e., fastBAT, MAGMA, and H-MAGMA), can help to prioritize genes within genome-wide significant loci (PPVs = 0.5-1.0) and translate them into information about disease biology, it does not reliably increase novel gene discovery in brain-related GWAS. To increase power for discovery of novel genes and loci, increasing sample size is required.
Keyphrases
- genome wide
- dna methylation
- single cell
- genome wide association
- copy number
- genome wide identification
- small molecule
- genome wide association study
- machine learning
- high throughput
- bioinformatics analysis
- white matter
- resting state
- quality improvement
- genome wide analysis
- transcription factor
- big data
- multiple sclerosis
- electronic health record
- brain injury
- high resolution
- poor prognosis
- blood brain barrier
- data analysis
- drinking water
- mass spectrometry