Login / Signup

A novel workflow to improve genotyping of multigene families in wildlife species: An experimental set-up with a known model system.

Mark A F GillinghamB Karina MonteroKerstin WihelmKara GrudzusSimone SommerPablo S C Santos
Published in: Molecular ecology resources (2020)
Genotyping complex multigene families in novel systems is particularly challenging. Target primers frequently amplify simultaneously multiple loci leading to high PCR and sequencing artefacts such as chimeras and allele amplification bias. Most genotyping pipelines have been validated in nonmodel systems whereby the real genotype is unknown and the generation of artefacts may be highly repeatable. Further hindering accurate genotyping, the relationship between artefacts and genotype complexity (i.e. number of alleles per genotype) within a PCR remains poorly described. Here, we investigated the latter by experimentally combining multiple known major histocompatibility complex (MHC) haplotypes of a model organism (chicken, Gallus gallus, 43 artificial genotypes with 2-13 alleles per amplicon). In addition to well-defined 'optimal' primers, we simulated a nonmodel species situation by designing 'cross-species' primers based on sequence data from closely related Galliform species. We applied a novel open-source genotyping pipeline (ACACIA; https://gitlab.com/psc_santos/ACACIA), and compared its performance with another, previously published pipeline (AmpliSAS). Allele calling accuracy was higher when using ACACIA (98.5% versus 97% and 77.8% versus 75% for the 'optimal' and 'cross-species' data sets, respectively). Systematic allele dropout of three alleles owing to primer mismatch in the 'cross-species' data set explained high allele calling repeatability (100% when using ACACIA) despite low accuracy, demonstrating that repeatability can be misleading when evaluating genotyping workflows. Genotype complexity was positively associated with nonchimeric artefacts, chimeric artefacts (nonlinearly by levelling when amplifying more than 4-6 alleles) and allele amplification bias. Our study exemplifies and demonstrates pitfalls researchers should avoid to reliably genotype complex multigene families.
Keyphrases
  • genetic diversity
  • genome wide
  • high throughput
  • electronic health record
  • dna methylation
  • single cell
  • machine learning
  • cell therapy
  • drug induced