Login / Signup

Incremental data integration for tracking genotype-disease associations.

Tomasz KonopkaDamian Smedley
Published in: PLoS computational biology (2020)
Functional annotation of genes remains a challenge in fundamental biology and is a limiting factor for translational medicine. Computational approaches have been developed to process heterogeneous data into meaningful metrics, but often do not address how findings might be updated when new evidence comes to light. To address this challenge, we describe requirements for a framework for incremental data integration and propose an implementation based on phenotype ontologies and Bayesian probability updates. We apply the framework to quantify similarities between gene annotations and disease profiles. Within this scope, we categorize human diseases according to how well they can be recapitulated by animal models and quantify similarities between human diseases and mouse models produced by the International Mouse Phenotyping Consortium. The flexibility of the approach allows us to incorporate negative phenotypic data to better prioritize candidate genes, and to stratify disease mapping using sex-dependent phenotypes. All our association scores can be updated and we exploit this feature to showcase integration with curated annotations from high-precision assays. Incremental integration is thus a suitable framework for tracking functional annotations and linking to complex human pathology.
Keyphrases
  • endothelial cells
  • electronic health record
  • big data
  • induced pluripotent stem cells
  • machine learning
  • healthcare
  • primary care
  • genome wide
  • gene expression
  • deep learning
  • copy number