Login / Signup

The importance of timely metadata curation to the global surveillance of genetic diversity.

Eric D CrandallRachel H ToczydlowskiLibby LigginsAnn E HolmesMaryam GhoojaeiMichelle R GaitherBriana E WhamAndrea L PrittCory NobleTanner J AndersonRandi L BartonJustin T BergSofia G BeskidAlonso DelgadoEmily FarrellNan HimmelsbachSamantha R QueenoThienthanh TrinhCourtney WeyandAndrew BentleyJohn DeckCynthia RiginosGideon S BradburdRobert J Toonen
Published in: Conservation biology : the journal of the Society for Conservation Biology (2023)
Genetic diversity within species represents a fundamental yet underappreciated level of biodiversity. Because genetic diversity can indicate species resilience to changing climate, its measurement is relevant to many national and global conservation policy targets. Many studies produce large amounts of genome-scale genetic diversity data for wild populations, but most (87%) do not include the associated spatial and temporal metadata necessary for them to be reused in monitoring programs or for acknowledging the sovereignty of nations or Indigenous Peoples. We undertook a "distributed datathon" to quantify the availability of these missing metadata and to test the hypothesis that their availability decays with time. We also worked to remediate missing metadata by extracting them from associated published papers, online repositories, and from direct communication with authors. Starting with 848 candidate genomic datasets (reduced representation and whole genome) from the International Nucleotide Sequence Database Collaboration, we determined that 561 contained mostly samples from wild populations. We successfully restored spatiotemporal metadata for 78% of these 561 datasets (N = 440 datasets comprising 45,105 individuals from 762 species in 17 phyla). Looking at papers and online repositories was much more fruitful than contacting authors, who only replied to our email requests 45% of the time. Overall, 23% of our email queries to authors unearthed useful metadata. Importantly, we found that the probability of retrieving spatiotemporal metadata declined significantly with the age of the dataset, with a 13.5% yearly decrease for metadata located in published papers or online repositories and up to a 22% yearly decrease for metadata that were only available from authors. This rapid decay in metadata availability, mirrored in studies of other types of biological data, should motivate swift updates to data sharing policies and researcher practices to ensure that the valuable context provided by metadata is not lost to conservation science forever. This article is protected by copyright. All rights reserved.
Keyphrases