Login / Signup

Who are you? A framework to identify and report genetic sample mix-ups.

Laura DuntschPatricia BrekkeJohn G EwenAnna W Santure
Published in: Molecular ecology resources (2022)
Sample mix-ups occur when samples have accidentally been duplicated, mislabelled or swapped. When samples are subsequently genotyped or sequenced, this can lead to individual IDs being incorrectly linked to genetic data, resulting in incorrect or biased research results, or reduced power to detect true biological patterns. We surveyed the community and found that almost 80% of responding researchers have encountered sample mix-ups. However, many recent studies in the field of molecular ecology do not appear to systematically report individual assignment checks as part of their publications. Although checks may be done, lack of consistent reporting means that it is difficult to assess whether sample mix-ups have occurred or been detected. Here, we present an easy-to-follow sample verification framework that can utilise existing metadata, including species, population structure, sex and pedigree information. We demonstrate its application to a data set representing individuals of a threatened Aotearoa New Zealand bird species, the hihi, genotyped on a 50K SNP array. We detected numerous incorrect genotype-ID associations when comparing observed and genetic sex or comparing to relationships in a verified microsatellite pedigree. The framework proposed here helped to confirm 488 individuals (39%), correct another 20 bird-genotype links, and detect hundreds of incorrect sample IDs, emphasizing the value of routinely checking genetic and genomic data sets for their accuracy. We therefore promote the implementation and reporting of this simple yet effective sample verification framework as a standardized quality control step for studies in the field of molecular ecology.
Keyphrases
  • genome wide
  • copy number
  • healthcare
  • quality control
  • electronic health record
  • primary care
  • mental health
  • emergency department
  • adverse drug
  • genetic diversity
  • machine learning
  • artificial intelligence
  • single cell