Login / Signup

Genomic prediction based on preselected single-nucleotide polymorphisms from genome-wide association study and imputed whole-genome sequence data annotation for growth traits in Duroc pigs.

Yuling ZhangZhanwei ZhuangYiyi LiuJinyan HuangMenghao LuanXiang ZhaoLinsong DongJian YeMing YangEnqin ZhengGengyuan CaiZhenfang WuJie Yang
Published in: Evolutionary applications (2024)
The use of whole-genome sequence (WGS) data is expected to improve genomic prediction (GP) power of complex traits because it may contain mutations that in strong linkage disequilibrium pattern with causal mutations. However, a few previous studies have shown no or small improvement in prediction accuracy using WGS data. Incorporating prior biological information into GP seems to be an attractive strategy that might improve prediction accuracy. In this study, a total of 6334 pigs were genotyped using 50K chips and subsequently imputed to the WGS level. This cohort includes two prior discovery populations that comprise 294 Landrace pigs and 186 Duroc pigs, as well as two validation populations that consist of 3770 American Duroc pigs and 2084 Canadian Duroc pigs. Then we used annotation information and genome-wide association study (GWAS) from the WGS data to make GP for six growth traits in two Duroc pig populations. Based on variant annotation, we partitioned different genomic classes, such as intron, intergenic, and untranslated regions, for imputed WGS data. Based on GWAS results of WGS data, we obtained trait-associated single-nucleotide polymorphisms (SNPs). We then applied the genomic feature best linear unbiased prediction (GFBLUP) and genomic best linear unbiased prediction (GBLUP) models to estimate the genomic estimated breeding values for growth traits with these different variant panels, including six genomic classes and trait-associated SNPs. Compared with 50K chip data, GBLUP with imputed WGS data had no increase in prediction accuracy. Using only annotations resulted in no increase in prediction accuracy compared to GBLUP with 50K, but adding annotation information into the GFBLUP model with imputed WGS data could improve the prediction accuracy with increases of 0.00%-2.82%. In conclusion, a GFBLUP model that incorporated prior biological information might increase the advantage of using imputed WGS data for GP.
Keyphrases
  • electronic health record
  • big data
  • genome wide
  • genome wide association study
  • machine learning
  • data analysis
  • gene expression
  • high throughput
  • deep learning
  • hepatitis c virus
  • hiv infected
  • genetic diversity
  • amino acid