Effects of reference population size and structure on genomic prediction of maternal traits in two pig lines using whole-genome sequence-, high-density- and combined annotation-dependent depletion genotypes.
Maria V KjetsåArne B GjuvslandEli GrindflekTheo MeuwissenPublished in: Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie (2024)
The aim of this study was to investigate the reference population size required to obtain substantial prediction accuracy within- and across-lines and the effect of using a multi-line reference population for genomic predictions of maternal traits in pigs. The data consisted of two nucleus pig populations, one pure-bred Landrace (L) and one Synthetic (S) Yorkshire/Large White line. All animals were genotyped with up to 30 K animals in each line, and all had records on maternal traits. Prediction accuracy was tested with three different marker data sets: High-density SNP (HD), whole genome sequence (WGS), and markers derived from WGS based on pig combined annotation dependent depletion-score (pCADD). Also, two different genomic prediction methods (GBLUP and Bayes GC) were compared for four maternal traits; total number piglets born (TNB), total number of stillborn piglets (STB), Shoulder Lesion Score and Body Condition Score. The main results from this study showed that a reference population of 3 K-6 K animals for within-line prediction generally was sufficient to achieve high prediction accuracy. However, when the number of animals in the reference population was increased to 30 K, the prediction accuracy significantly increased for the traits TNB and STB. For multi-line prediction accuracy, the accuracy was most dependent on the number of within-line animals in the reference data. The S-line provided a generally higher prediction accuracy compared to the L-line. Using pCADD scores to reduce the number of markers from WGS data in combination with the GBLUP method generally reduced prediction accuracies relative to GBLUP using HD genotypes. The BayesGC method benefited from a large reference population and was less dependent on the different genotype marker datasets to achieve a high prediction accuracy.