Login / Signup

Inbred Strain Variant Database (ISVdb): A Repository for Probabilistically Informed Sequence Differences Among the Collaborative Cross Strains and Their Founders.

Daniel OreperYanwei CaiLisa M TarantinoFernando Pardo-Manuel de VillenaWilliam Valdar
Published in: G3 (Bethesda, Md.) (2017)
The Collaborative Cross (CC) is a panel of recently established multiparental recombinant inbred mouse strains. For the CC, as for any multiparental population (MPP), effective experimental design and analysis benefit from detailed knowledge of the genetic differences between strains. Such differences can be directly determined by sequencing, but until now whole-genome sequencing was not publicly available for individual CC strains. An alternative and complementary approach is to infer genetic differences by combining two pieces of information: probabilistic estimates of the CC haplotype mosaic from a custom genotyping array, and probabilistic variant calls from sequencing of the CC founders. The computation for this inference, especially when performed genome-wide, can be intricate and time-consuming, requiring the researcher to generate nontrivial and potentially error-prone scripts. To provide standardized, easy-to-access CC sequence information, we have developed the Inbred Strain Variant Database (ISVdb). The ISVdb provides, for all the exonic variants from the Sanger Institute mouse sequencing dataset, direct sequence information for CC founders and, critically, the imputed sequence information for CC strains. Notably, the ISVdb also: (1) provides predicted variant consequence metadata; (2) allows rapid simulation of F1 populations; and (3) preserves imputation uncertainty, which will allow imputed data to be refined in the future as additional sequencing and genotyping data are collected. The ISVdb information is housed in an SQL database and is easily accessible through a custom online interface (http://isvdb.unc.edu), reducing the analytic burden on any researcher using the CC.
Keyphrases
  • genome wide
  • escherichia coli
  • health information
  • single cell
  • copy number
  • gene expression
  • big data
  • amino acid
  • high resolution
  • machine learning
  • deep learning
  • data analysis
  • quantum dots