Login / Signup

A high-dimensional omnibus test for set-based association analysis.

Haitao YangXin WangZechen ZhangFuzhao ChenHongyan CaoLina YanXia GaoHui DongYuehua Cui
Published in: Briefings in bioinformatics (2024)
Set-based association analysis is a valuable tool in studying the etiology of complex diseases in genome-wide association studies, as it allows for the joint testing of variants in a region or group. Two common types of single nucleotide polymorphism (SNP)-disease functional models are recognized when evaluating the joint function of a set of SNP: the cumulative weak signal model, in which multiple functional variants with small effects contribute to disease risk, and the dominating strong signal model, in which a few functional variants with large effects contribute to disease risk. However, existing methods have two main limitations that reduce their power. Firstly, they typically only consider one disease-SNP association model, which can result in significant power loss if the model is misspecified. Secondly, they do not account for the high-dimensional nature of SNPs, leading to low power or high false positives. In this study, we propose a solution to these challenges by using a high-dimensional inference procedure that involves simultaneously fitting many SNPs in a regression model. We also propose an omnibus testing procedure that employs a robust and powerful P-value combination method to enhance the power of SNP-set association. Our results from extensive simulation studies and a real data analysis demonstrate that our set-based high-dimensional inference strategy is both flexible and computationally efficient and can substantially improve the power of SNP-set association analysis. Application to a real dataset further demonstrates the utility of the testing strategy.
Keyphrases
  • genome wide
  • data analysis
  • copy number
  • dna methylation
  • high density
  • gene expression
  • minimally invasive
  • single cell
  • virtual reality