The frequency of pathogenic variation in the All of Us cohort reveals ancestry-driven disparities.
Eric VennerKarynne E PattersonDivya KalraMarsha M WheelerYi-Ju ChenSara E KallaBo YuanJason H KarnesKimberly WalkerJoshua D SmithSean McGeeAparna RadhakrishnanAndrew HaddadPhilip E EmpeyQiaoyan WangLee LichtensteinDiana ToledoGail JarvikAnjene M AddingtonRichard A Gibbsnull nullPublished in: Communications biology (2024)
Disparities in data underlying clinical genomic interpretation is an acknowledged problem, but there is a paucity of data demonstrating it. The All of Us Research Program is collecting data including whole-genome sequences, health records, and surveys for at least a million participants with diverse ancestry and access to healthcare, representing one of the largest biomedical research repositories of its kind. Here, we examine pathogenic and likely pathogenic variants that were identified in the All of Us cohort. The European ancestry subgroup showed the highest overall rate of pathogenic variation, with 2.26% of participants having a pathogenic variant. Other ancestry groups had lower rates of pathogenic variation, including 1.62% for the African ancestry group and 1.32% in the Latino/Admixed American ancestry group. Pathogenic variants were most frequently observed in genes related to Breast/Ovarian Cancer or Hypercholesterolemia. Variant frequencies in many genes were consistent with the data from the public gnomAD database, with some notable exceptions resolved using gnomAD subsets. Differences in pathogenic variant frequency observed between ancestral groups generally indicate biases of ascertainment of knowledge about those variants, but some deviations may be indicative of differences in disease prevalence. This work will allow targeted precision medicine efforts at revealed disparities.
Keyphrases
- healthcare
- electronic health record
- big data
- copy number
- clinical trial
- public health
- genome wide association study
- cardiovascular disease
- gene expression
- type diabetes
- quality improvement
- risk factors
- dna methylation
- climate change
- social media
- artificial intelligence
- affordable care act
- deep learning
- study protocol
- bioinformatics analysis
- genome wide analysis
- drug induced