Login / Signup

Poor data stewardship will hinder global genetic diversity surveillance.

Rachel H ToczydlowskiLibby LigginsMichelle R GaitherTanner J AndersonRandi L BartonJustin T BergSofia G BeskidBeth DavisAlonso DelgadoEmily FarrellMaryam GhoojaeiNan HimmelsbachAnn E HolmesSamantha R QueenoThienthanh TrinhCourtney A WeyandGideon S BradburdCynthia RiginosRobert J ToonenEric D Crandall
Published in: Proceedings of the National Academy of Sciences of the United States of America (2021)
Genomic data are being produced and archived at a prodigious rate, and current studies could become historical baselines for future global genetic diversity analyses and monitoring programs. However, when we evaluated the potential utility of genomic data from wild and domesticated eukaryote species in the world's largest genomic data repository, we found that most archived genomic datasets (86%) lacked the spatiotemporal metadata necessary for genetic biodiversity surveillance. Labor-intensive scouring of a subset of published papers yielded geospatial coordinates and collection years for only 33% (39% if place names were considered) of these genomic datasets. Streamlined data input processes, updated metadata deposition policies, and enhanced scientific community awareness are urgently needed to preserve these irreplaceable records of today's genetic biodiversity and to plug the growing metadata gap.
Keyphrases
  • genetic diversity
  • copy number
  • electronic health record
  • public health
  • big data
  • genome wide
  • mental health
  • randomized controlled trial
  • systematic review
  • gene expression
  • risk assessment
  • rna seq