Estimating clinical risk in gene regions from population sequencing cohort data.
James D FifeChristopher A CassaPublished in: medRxiv : the preprint server for health sciences (2023)
While pathogenic variants significantly increase disease risk in many genes, it is still challenging to estimate the clinical impact of rare missense variants more generally. Even in genes such as BRCA2 or PALB2 , large cohort studies find no significant association between breast cancer and rare germline missense variants collectively. Here we introduce REGatta, a method to improve the estimation of clinical risk in gene segments. We define gene regions using the density of pathogenic diagnostic reports, and then calculate the relative risk in each of these regions using 109,581 exome sequences from women in the UK Biobank. We apply this method in seven established breast cancer genes, and identify regions in each gene with statistically significant differences in breast cancer incidence for rare missense carriers. Even in genes with no significant difference at the gene level, this approach significantly separates rare missense variant carriers at higher or lower risk ( BRCA2 regional model OR=1.46 [1.12, 1.79], p=0.0036 vs. BRCA2 gene model OR=0.96 [0.85,1.07] p=0.4171). We find high concordance between these regional risk estimates and high-throughput functional assays of variant impact. We compare with existing methods and the use of protein domains (Pfam) as regions, and find REGatta better identifies individuals at elevated or reduced risk. These regions provide useful priors which can potentially be used to improve risk assessment and clinical management.
Keyphrases
- genome wide
- copy number
- genome wide identification
- risk assessment
- high throughput
- dna methylation
- intellectual disability
- type diabetes
- pregnant women
- emergency department
- transcription factor
- machine learning
- cross sectional
- oxidative stress
- young adults
- autism spectrum disorder
- big data
- dna damage
- electronic health record
- bioinformatics analysis