Login / Signup

Disease category-specific annotation of variants using an ensemble learning framework.

Zhen CaoYanting HuangRan DuanPeng JinZhaohui S QinTinghu Zhang
Published in: Briefings in bioinformatics (2021)
Understanding the impact of non-coding sequence variants on complex diseases is an essential problem. We present a novel ensemble learning framework-CASAVA, to predict genomic loci in terms of disease category-specific risk. Using disease-associated variants identified by GWAS as training data, and diverse sequencing-based genomics and epigenomics profiles as features, CASAVA provides risk prediction of 24 major categories of diseases throughout the human genome. Our studies showed that CASAVA scores at a genomic locus provide a reasonable prediction of the disease-specific and disease category-specific risk prediction for non-coding variants located within the locus. Taking MHC2TA and immune system diseases as an example, we demonstrate the potential of CASAVA in revealing variant-disease associations. A website (http://zhanglabtools.org/CASAVA) has been built to facilitate easily access to CASAVA scores.
Keyphrases
  • copy number
  • endothelial cells
  • gene expression
  • single cell
  • dna methylation
  • risk assessment
  • deep learning
  • big data
  • electronic health record
  • climate change
  • artificial intelligence