Login / Signup

Genome-wide Analysis of Rare Haplotypes Associated with Breast Cancer Risk.

Fan WangWon Jong MoonWilliam LetsouYadav SapkotaZhaoming WangCindy ImJessica L BaedkeLeslie L RobisonYutaka Yasui
Published in: Cancer research (2022)
Numerous common genetic variants have been linked to breast cancer (BCa) risk, but they only partially explain the total BCa heritability. Inference from Nordic population-based twin data indicates rare high-risk loci as the chief determinant of BCa risk. Here, we use haplotypes, rather than single variants, to identify rare high-risk loci for BCa. With computationally-phased genotypes from 181,034 white British women in the UK Biobank, a genome-wide haplotype-BCa association analysis was conducted using sliding windows of 5-500 consecutive array-genotyped variants. In the discovery stage, haplotype-BCa associations were evaluated retrospectively in the pre-study-enrollment data including 5,487 BCa cases. BCa hazard ratios (HRs) for additive haplotypic effects were estimated using Cox regression. The replication analysis included a prospective cohort of women free of BCa at enrollment, of whom 3,524 later developed BCa. This two-stage analysis detected 13 rare loci (frequency <1%), each associated with an appreciable BCa-risk increase (discovery: HRs=2.84-6.10, P-value<5x10-8; replication: HRs=2.08-5.61, P-value<0.01). In contrast, the variants that formed these rare haplotypes individually exhibited much smaller effects. Functional annotation revealed extensive cis-regulatory DNA elements in BCa-related cells underlying the replicated rare haplotypes. Using phased, imputed genotypes from 30,064 cases and 25,282 controls in the DRIVE OncoArray case-control study, six of the 13 rare-loci associations were found generalizable (odds ratio estimates: 1.48-7.67, P-value<0.05). This study demonstrates the complementary advantage of utilizing rare haplotypes to capture novel risk loci and suggests the potential for the discovery of more genetic elements contributing to cancer heritability as large datasets of germline whole-genome sequencing become available.
Keyphrases