Login / Signup

Predicting the direction of phenotypic difference.

David GokhmanKeith D HarrisShai CarmiGili Greenbaum
Published in: bioRxiv : the preprint server for biology (2024)
Predicting phenotypes from genomic data is a key goal in genetics, but for most complex phenotypes, predictions are hampered by incomplete genotype-to-phenotype mapping. Here, we describe a more attainable approach than quantitative predictions, which is aimed at qualitatively predicting phenotypic differences. Despite incomplete genotype-to-phenotype mapping, we show that it is relatively easy to determine which of two individuals has a greater phenotypic value. This question is central in many scenarios, e.g., comparing disease risk between individuals, the yield of crop strains, or the anatomy of extinct vs extant species. To evaluate prediction accuracy, i.e., the probability that the individual with the greater predicted phenotype indeed has a greater phenotypic value, we developed an estimator of the ratio between known and unknown effects on the phenotype. We evaluated prediction accuracy using human data from tens of thousands of individuals from either the same family or the same population, as well as data from different species. We found that, in many cases, even when only a small fraction of the loci affecting a phenotype is known, the individual with the greater phenotypic value can be identified with over 90% accuracy. Our approach also circumvents some of the limitations in transferring genetic association results across populations. Overall, we introduce an approach that enables accurate predictions of key information on phenotypes - the direction of phenotypic difference - and suggest that more phenotypic information can be extracted from genomic data than previously appreciated.
Keyphrases
  • electronic health record
  • high resolution
  • big data
  • copy number
  • escherichia coli
  • endothelial cells
  • healthcare
  • genome wide
  • machine learning
  • high density
  • data analysis
  • deep learning