Login / Signup

ACDC, a global database of amphibian cytochrome-b sequences using reproducible curation for GenBank records.

Matthijs P van den BurgSalvador Herrando-PérezDavid R Vieites
Published in: Scientific data (2020)
Genetic data are a crucial and exponentially growing resource across all biological sciences, yet curated databases are scarce. The widespread occurrence of sequence and (meta)data errors in public repositories calls for comprehensive improvements of curation protocols leading to robust research and downstream analyses. We collated and curated all available GenBank cytochrome-b sequences for amphibians, a benchmark marker in this globally declining vertebrate clade. The Amphibia's Curated Database of Cytochrome-b (ACDC) consists of 36,514 sequences representing 2,309 species from 398 genera (median = 2 with 50% interquartile ranges of 1-7 species/genus). We updated the taxonomic identity of >4,800 sequences (ca. 13%) and found 2,359 (6%) conflicting sequences with 84% of the errors originating from taxonomic misidentifications. The database (accessible at https://doi.org/10.6084/m9.figshare.9944759 ) also includes an R script to replicate our study for other loci and taxonomic groups. We provide recommendations to improve genetic-data quality in public repositories and flag species for which there is a need for taxonomic refinement in the face of increased rate of amphibian extinctions in the Anthropocene.
Keyphrases