Login / Signup

High-throughput sequencing-based microsatellite genotyping for polyploids to resolve allele dosage uncertainty and improve analyses of genetic diversity, structure and differentiation: A case study of the hexaploid Camellia oleifera.

Xiangyan CuiCaihua LiShengyuan QinZebin HuangBin GanZhengwen JiangXiaomao HuangXiaoqiang YangQin LiXiaoguo XiangJiakuan ChenYao ZhaoJun Rong
Published in: Molecular ecology resources (2021)
Conventional microsatellite (simple sequence repeat, SSR) genotyping methods cannot accurately identify polyploid genotypes leading to allele dosage uncertainty, introducing biases in population genetic analysis. Here, a new SSR genotyping method was developed to directly infer accurate polyploid genotypes. The frequency distribution of SSR sequences was obtained based on deep-coverage high-throughput sequencing data. Corrections were performed accounting for the "stutter peak" and amplification efficiency of SSR sequences. Perl scripts and an online SSR genotyping tool "SSRSeq" were provided to process the sequencing data and output genotypes with corrected allele dosages. Hexaploid Camellia oleifera is the dominant woody oilseed crop in China. Understanding the geographical pattern of genetic variation in wild C. oleifera is essential for the conservation and utilization of genetic resources. Six wild C. oleifera populations were sampled across geographical ranges in subtropical evergreen broadleaf forests of China. Using 35 SSR markers, the high-throughput sequencing-based SSRSeq method was applied to obtain accurate hexaploid genotypes of wild C. oleifera. The results demonstrated that the new method could resolve allele dosage uncertainty and considerably improve genetic diversity, structure and differentiation analyses for polyploids. The genetic variation patterns of wild C. oleifera across geographical ranges agree with the "central-marginal hypothesis", stating that genetic diversity is high in the central population and declines from the central to the peripheral populations, and genetic differentiation increases from the centre to the periphery. This method and findings can facilitate the utilization of wild C. oleifera genetic resources for the breeding of cultivated C. oleifera.
Keyphrases
  • genetic diversity
  • high throughput sequencing
  • genome wide
  • climate change
  • electronic health record
  • high resolution
  • big data
  • dna methylation
  • high throughput
  • deep learning
  • data analysis