Inferring compound heterozygosity from large-scale exome sequencing data.
Michael H GuoLaurent C FrancioliSarah L StentonJulia K GoodrichNicholas A WattsMoriel H Singer-BerkEmily GroopmanPhilip W DarnowskyMatthew SolomonsonSamantha Baxternull nullGrace TiaoBenjamin M NealeJoel N HirschhornMichael J BamshadMark J DalyAnne H O'Donnell-LuriaKonrad J KarczewskiDaniel G MacArthurKaitlin E SamochaPublished in: bioRxiv : the preprint server for biology (2023)
Severe recessive diseases arise when both the maternal and the paternal copies of a gene carry, or are impacted by, a damaging genetic variant in the affected individual. When a patient carries two different potentially causal variants, accurate diagnosis requires determining that these two variants occur on different copies of the chromosome (i.e., are in trans ) rather than on the same copy (i.e., in cis ). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. We developed a strategy for inferring phase for rare variant pairs within genes, leveraging haplotype patterns observed in exome sequencing data from the Genome Aggregation Database (gnomAD v2, n=125,748). When applied to trio data where phase is known, our approach estimates phase with high accuracy, even for very rare variants (frequency <1x10 - 4 ), and also correctly phases 95.2% of variant pairs in a set of 293 patients carrying presumed causal compound heterozygous variants. We provide a public resource of phasing estimates from gnomAD, including phasing estimates for coding variants across the genome and counts per gene of rare variants in trans , that can aid interpretation of rare co-occurring variants in the context of recessive disease.