De novo identification and targeted sequencing of SSRs efficiently fingerprints Sorghum bicolor sub-population identity.
John P BaggettRichard L TillettElizabeth A CooperMelinda K YerkaPublished in: PloS one (2021)
Recent plant breeding studies of several species have demonstrated the utility of combining molecular assessments of genetic distance into trait-linked SNP genotyping during the development of parent lines to maximize yield gains due to heterosis. SSRs (Short Sequence Repeats) are the molecular marker of choice to determine genetic diversity, but the methods historically used to sequence them have been burdensome. The ability to analyze SSRs in a higher-throughput manner independent of laboratory conditions would increase their utility in molecular ecology, germplasm curation, and plant breeding programs worldwide. This project reports simple bioinformatics methods that can be used to generate genome-wide de novo SSRs in silico followed by targeted Next Generation Sequencing (NGS) validation of those that provide the most information about sub-population identity of a breeding line, which influences heterotic group selection. While these methods were optimized in sorghum [Sorghum bicolor (L.) Moench], they were developed to be applied to any species with a reference genome and high-coverage whole-genome sequencing data of individuals from the sub-populations to be characterized. An analysis of published sorghum genomes selected to represent its five main races (bicolor, caudatum, durra, kafir, and guinea; 75 accessions total) identified 130,120 SSR motifs. Average lengths were 23.8 bp and 95% were between 10 and 92 bp, making them suitable for NGS. Validation through targeted sequencing amplified 188 of 192 assayed SSR loci. Results highlighted the distinctness of accessions from the guinea sub-group margaritiferum from all other sorghum accessions, consistent with previous studies of nuclear and mitochondrial DNA. SSRs that efficiently fingerprinted margaritiferum individuals (Xgma1 -Xgma6) are presented. Developing similar fingerprints of other sub-populations (Xunr1 -Xunr182) was not possible due to the extensive admixture between them in the data set analyzed. In summary, these methods were able to fingerprint specific sub-populations when rates of admixture between them are low.