Current stewardship practices in invasion biology limit the value and secondary use of genomic data.
Amy L VaughanElahe ParviziPaige MathesonAngela McGaughranManpreet K DhamiPublished in: Molecular ecology resources (2023)
Invasive species threaten native biota, putting fragile ecosystems at risk and having a large-scale impact on primary industries. Growing trade networks and the popularity of personal travel make incursions a more frequent risk, one only compounded by global climate change. With increasing publication of whole-genome sequences lies an opportunity for cross-species assessment of invasive potential. However, the degree to which published sequences are accompanied by satisfactory spatiotemporal data is unclear. We assessed the metadata associated with 199 whole-genome assemblies of 89 invasive terrestrial invertebrate species and found that only 38% of these were derived from field-collected samples. Seventy-six assemblies (38%) reported an 'undescribed' sample origin and, while further examination of associated literature closed this gap to 23.6%, an absence of spatial data remained for 47 of the total assemblies. Of the 76 assemblies that were ultimately determined to be field-collected, associated metadata relevant for invasion studies was predominantly lacking: only 35% (27 assemblies) provided granular location data, and 33% (n = 25) lacked sufficient collection date information. Our results support recent calls for standardized metadata in genome sequencing data submissions, highlighting the impact of missing metadata on current research in invasion biology (and likely other fields). Notably, large-scale consortia tended to provide the most complete metadata submissions in our analysis-such cross-institutional collaborations can foster a culture of increased adherence to improved metadata submission standards and a standard of metadata stewardship that enables reuse of genomes in invasion science.