ExAgBov: A public database of annotated variations from hundreds of bovine whole-exome sequencing samples.
Rotem RazZvi RothMoran GershoniPublished in: Scientific data (2022)
Large reference datasets of annotated genetic variations from genome-scale sequencing are essential for interpreting identified variants, their functional impact, and their possible contribution to diseases and traits. However, to date, no such database of annotated variation from broad cattle populations is publicly available. To overcome this gap and advance bovine NGS-driven variant discovery and interpretation, we obtained and analyzed raw data deposited in the SRA public repository. Short reads from 262 whole-exome sequencing samples of Bos Taurus were mapped to the Bos Taurus ARS-UCD1.2 reference genome. The GATK best practice workflow was applied for variant calling. Comprehensive annotation of all recorded variants was done using the Ensembl Variant Effect Predictor (VEP). An in-depth analysis of the population structure revealed the breeds comprising the database. The Exomes Aggregate of Bovine- ExAgBov is a comprehensively annotated dataset of more than 20 million short variants, of which ~2% are located within open reading frames, splice regions, and UTRs, and more than 60,000 variants are predicted to be deleterious.