Hypothesis-free phenotype prediction within a genetics-first framework.

Chang LuJan ZauchaRihab GamHai Fangnull Ben SmithersMatt E OatesMiguel Bernabe-RubioJames Williams Natalie Thurlby Arun Prasad PanduranganHimani TandonHashem ShihabRaju KalaivaniMinkyung SungAdam J Sardar Bastian Greshake Tzovoras Davide Danovi Julian Gough

Published in: Nature communications (2023)

Cohort-wide sequencing studies have revealed that the largest category of variants is those deemed 'rare', even for the subset located in coding regions (99% of known coding variants are seen in less than 1% of the population. Associative methods give some understanding how rare genetic variants influence disease and organism-level phenotypes. But here we show that additional discoveries can be made through a knowledge-based approach using protein domains and ontologies (function and phenotype) that considers all coding variants regardless of allele frequency. We describe an ab initio, genetics-first method making molecular knowledge-based interpretations for exome-wide non-synonymous variants for phenotypes at the organism and cellular level. By using this reverse approach, we identify plausible genetic causes for developmental disorders that have eluded other established methods and present molecular hypotheses for the causal genetics of 40 phenotypes generated from a direct-to-consumer genotype cohort. This system offers a chance to extract further discovery from genetic data after standard tools have been applied.

Keyphrases