SoySNP618K array: A high-resolution single nucleotide polymorphism platform as a valuable genomic resource for soybean genetics and breeding.
Yan-Fei LiYing-Hui LiShan-Shan SuJochen Christoph ReifZhao-Ming QiXiao-Bo WangXing WangYu TianDe-Lin LiLi-Juan QiuZhang-Xiong LiuZe-Jun XuGuang-Hui FuYa-Liang JiQing-Shan ChenJi-Qiang LiuLi-Juan QiuPublished in: Journal of integrative plant biology (2022)
Innovations in genomics have enabled the development of low-cost, high-resolution, single nucleotide polymorphism (SNP) genotyping arrays that accelerate breeding progress and support basic research in crop science. Here, we developed and validated the SoySNP618K array (618,888 SNPs) for the important crop soybean. The SNPs were selected from whole-genome resequencing data containing 2,214 diverse soybean accessions; 29.34% of the SNPs mapped to genic regions representing 86.85% of the 56,044 annotated high-confidence genes. Identity-by-state analyses of 318 soybeans revealed 17 redundant accessions, highlighting the potential of the SoySNP618K array in supporting gene bank management. The patterns of population stratification and genomic regions enriched through domestication were highly consistent with previous findings based on resequencing data, suggesting that the ascertainment bias in the SoySNP618K array was largely compensated for. Genome-wide association mapping in combination with reported quantitative trait loci enabled fine-mapping of genes known to influence flowering time, E2 and GmPRR3b, and of a new candidate gene, GmVIP5. Moreover, genomic prediction of flowering and maturity time in 502 recombinant inbred lines was highly accurate (>0.65). Thus, the SoySNP618K array is a valuable genomic tool that can be used to address many questions in applied breeding, germplasm management, and basic crop research.
Keyphrases
- high resolution
- genome wide
- copy number
- dna methylation
- genome wide association
- low cost
- climate change
- mass spectrometry
- high throughput
- high density
- electronic health record
- single cell
- public health
- big data
- risk assessment
- human health
- liquid chromatography
- genome wide identification
- deep learning
- machine learning
- cell free