A general statistic to test an optimally weighted combination of common and/or rare variants.
Jianjun ZhangBaolin WuQiuying ShaShuanglin ZhangXuexia WangPublished in: Genetic epidemiology (2019)
Both genome-wide association study and next-generation sequencing data analyses are widely employed to identify disease susceptible common and/or rare genetic variants. Rare variants generally have large effects though they are hard to detect due to their low frequencies. Currently, many existing statistical methods for rare variants association studies employ a weighted combination scheme, which usually puts subjective weights or suboptimal weights based on some adhoc assumptions (e.g., ignoring dependence between rare variants). In this study, we analytically derived optimal weights for both common and rare variants and proposed a general and novel approach to test association between an optimally weighted combination of variants (G-TOW) in a gene or pathway for a continuous or dichotomous trait while easily adjusting for covariates. Results of the simulation studies show that G-TOW has properly controlled type I error rates and it is the most powerful test among the methods we compared when testing effects of either both rare and common variants or rare variants only. We also illustrate the effectiveness of G-TOW using the Genetic Analysis Workshop 17 (GAW17) data. Additionally, we applied G-TOW and other competitive methods to test disease-associated genes in real data of schizophrenia. The G-TOW has successfully verified genes FYN and VPS39 which are associated with schizophrenia reported in existing publications. Both of these genes are missed by the weighted sum statistic and the sequence kernel association test. Simulation study and real data analysis indicate that G-TOW is a powerful test.
Keyphrases
- copy number
- genome wide
- data analysis
- magnetic resonance
- randomized controlled trial
- dna methylation
- bipolar disorder
- systematic review
- machine learning
- contrast enhanced
- gene expression
- network analysis
- genome wide identification
- big data
- genome wide association study
- bioinformatics analysis
- physical activity
- sleep quality
- deep learning
- cell free