Genetic association analysis of 77,539 genomes reveals rare disease etiologies.
Daniel Greenenull nullDaniela PirriKaren FruddEge SackeyMohammed Al-OwainArnaud P J GieseKhushnooda RamzanSehar RiazItaru YamanakaNele BoeckxChantal ThysBruce D GelbPaul BrennanVerity HartillJulie HarvengtTomoki KoshoSahar MansourMitsuo MasunoTakako OhataHelen StewartKhalid TaibahClaire L S TurnerFaiqa ImtiazSaima RiazuddinTakayuki MorisakiPia OstergaardBart L LoeysHiroko MorisakiZubair M AhmedGraeme M BirdseyKathleen FresonAndrew MumfordErnest TurroPublished in: Nature medicine (2023)
The genetic etiologies of more than half of rare diseases remain unknown. Standardized genome sequencing and phenotyping of large patient cohorts provide an opportunity for discovering the unknown etiologies, but this depends on efficient and powerful analytical methods. We built a compact database, the 'Rareservoir', containing the rare variant genotypes and phenotypes of 77,539 participants sequenced by the 100,000 Genomes Project. We then used the Bayesian genetic association method BeviMed to infer associations between genes and each of 269 rare disease classes assigned by clinicians to the participants. We identified 241 known and 19 previously unidentified associations. We validated associations with ERG, PMEPA1 and GPR156 by searching for pedigrees in other cohorts and using bioinformatic and experimental approaches. We provide evidence that (1) loss-of-function variants in the Erythroblast Transformation Specific (ETS)-family transcription factor encoding gene ERG lead to primary lymphoedema, (2) truncating variants in the last exon of transforming growth factor-β regulator PMEPA1 result in Loeys-Dietz syndrome and (3) loss-of-function variants in GPR156 give rise to recessive congenital hearing impairment. The Rareservoir provides a lightweight, flexible and portable system for synthesizing the genetic and phenotypic data required to study rare disease cohorts with tens of thousands of participants.
Keyphrases
- copy number
- genome wide
- transcription factor
- transforming growth factor
- dna methylation
- epithelial mesenchymal transition
- genome wide identification
- emergency department
- high throughput
- dna binding
- machine learning
- big data
- signaling pathway
- intellectual disability
- autism spectrum disorder
- electronic health record
- deep learning
- adverse drug