A rare variant analysis framework using public genotype summary counts to prioritize disease-predisposition genes.
Wenan ChenShuoguo WangSaima Sultana TithiDavid W EllisonDaniel J SchaidGang WuPublished in: Nature communications (2022)
Sequencing cases without matched healthy controls hinders prioritization of germline disease-predisposition genes. To circumvent this problem, genotype summary counts from public data sets can serve as controls. However, systematic inflation and false positives can arise if confounding factors are not controlled. We propose a framework, consistent summary counts based rare variant burden test (CoCoRV), to address these challenges. CoCoRV implements consistent variant quality control and filtering, ethnicity-stratified rare variant association test, accurate estimation of inflation factors, powerful FDR control, and detection of rare variant pairs in high linkage disequilibrium. When we applied CoCoRV to pediatric cancer cohorts, the top genes identified were cancer-predisposition genes. We also applied CoCoRV to identify disease-predisposition genes in adult brain tumors and amyotrophic lateral sclerosis. Given that potential confounding factors were well controlled after applying the framework, CoCoRV provides a cost-effective solution to prioritizing disease-risk genes enriched with rare pathogenic variants.
Keyphrases
- genome wide
- genome wide identification
- healthcare
- amyotrophic lateral sclerosis
- quality control
- papillary thyroid
- mental health
- genome wide analysis
- squamous cell carcinoma
- copy number
- mass spectrometry
- dna damage
- big data
- oxidative stress
- dna repair
- young adults
- lymph node metastasis
- artificial intelligence
- human immunodeficiency virus