Login / Signup

Genetic sex assignment in wild populations using genotyping-by-sequencing data: A statistical threshold approach.

William R StovallHelen R TaylorMichael BlackStefanie GrosserKim RutherfordNeil J Gemmell
Published in: Molecular ecology resources (2018)
Establishing the sex of individuals in wild systems can be challenging and often requires genetic testing. Genotyping-by-sequencing (GBS) and other reduced-representation DNA sequencing (RRS) protocols (e.g., RADseq, ddRAD) have enabled the analysis of genetic data on an unprecedented scale. Here, we present a novel approach for the discovery and statistical validation of sex-specific loci in GBS data sets. We used GBS to genotype 166 New Zealand fur seals (NZFS, Arctocephalus forsteri) of known sex. We retained monomorphic loci as potential sex-specific markers in the locus discovery phase. We then used (i) a sex-specific locus threshold (SSLT) to identify significantly male-specific loci within our data set; and (ii) a significant sex-assignment threshold (SSAT) to confidently assign sex in silico the presence or absence of significantly male-specific loci to individuals in our data set treated as unknowns (98.9% accuracy for females; 95.8% for males, estimated via cross-validation). Furthermore, we assigned sex to 86 individuals of true unknown sex using our SSAT and assessed the effect of SSLT adjustments on these assignments. From 90 verified sex-specific loci, we developed a panel of three sex-specific PCR primers that we used to ascertain sex independently of our GBS data, which we show amplify reliably in at least two other pinniped species. Using monomorphic loci normally discarded from large SNP data sets is an effective way to identify robust sex-linked markers for nonmodel species. Our novel pipeline can be used to identify and statistically validate monomorphic and polymorphic sex-specific markers across a range of species and RRS data sets.
Keyphrases
  • genome wide
  • electronic health record
  • big data
  • genome wide association study
  • genetic diversity
  • high throughput
  • machine learning
  • gene expression
  • genome wide association
  • data analysis
  • single molecule