Abstract
Selective genotyping of crossbred (CB) animals to include in traditionally purebred (PB) dominated genetic evaluations has been shown to provide an increase in the response to selection for CB performance. However, the inclusion of phenotypes from selectively genotyped CB animals, without the phenotypes of their non-genotyped cohorts, could cause bias in estimated variance components (VC) and subsequent estimated breeding values (EBV). The objective of the study was to determine the impact of selective CB genotyping on VC estimates and subsequent bias in EBV when non-genotyped CB animals are not included in genetic evaluations. A swine crossbreeding scheme producing 3-way CB animals was simulated to create selectively genotyped datasets. The breeding scheme consisted of three PB breeds each with 25 males and 450 females, F1 crosses with 1200 females and 12,000 CB progeny. Eighteen chromosomes each with 100 QTL and 4k SNP markers were simulated. Both PB and CB performances were considered to be moderately heritable (h2 = 0.4). Factors evaluated were as follows: 1) CB phenotype and genotype inclusion of 15% (n = 1800) or 35% (n = 4200), 2) genetic correlation between PB and CB performance (rpc = 0.1, 0.5, or 0.7), and 3) selective genotyping strategy. Genotyping strategies included the following: 1) Random: random CB selection, 2) Top: highest CB phenotype, and 3) Extreme: half highest and half lowest CB phenotypes. Top and Extreme selective genotyping strategies were considered by selecting animals in full-sib (FS) families or among the CB population (T). In each generation, 4320 PB selection candidates contributed phenotypic and genotypic records. Each scenario was replicated 15 times. VC were estimated for PB and CB performance utilizing bivariate models using pedigree relationships with dams of CB animals considered to be unknown. Estimated values of VC for PB performance were not statistically different from true values. Top selective genotyping strategies produced deflated estimates of phenotypic VC for CB performance compared to true values. When using estimated VC, Top_T and Extreme_T produced the most biased EBV, yet EBV of PB selection candidates for CB performance were most accurate when using Extreme_T. Results suggest that randomly selecting CB animals to genotype or selectively genotyping Top or Extreme CB animals within full-sib families can lead to accurate estimates of additive genetic VC for CB performance and unbiased EBV.
Keywords: commercial data, genetic parameters, swine
Introduction
Advances in the rate of genetic gain of crossbred (CB) animals can be achieved in swine crossbreeding schemes by implementing combined CB and purebred (PB) selection (CCPS) when the breeding goal is CB performance (Wei and van der Werf, 1994; Dekkers, 2007, Esfandyari et al., 2015; van Grevenhof and van der Werf, 2015). Substantial gains in CB performance can be achieved using CCPS compared to selection for only PB performance when the genetic correlation between PB and CB performance (rpc) is below 0.7 (Dekkers, 2007). Through a multi-breed reference population containing both PB and CB animals, CCPS produces estimated breeding values (EBV) for CB performance for PB selection candidates. With the availability of affordable genotyping, inclusion of CB phenotypes into PB genetic evaluations could be achieved through genotyping CB animals which does not require tracking CB pedigrees. However, it is currently unpractical to the genotype and phenotype of all CB animals to be included in genetic evaluations; therefore, a subset of the CB population must be chosen. It has been shown that an increase in CB performance can be achieved when using a reference population that is comprised of a combination of PB information and a random subset of the CB population, compared to a purely PB reference population (See et al., 2020). Recent studies have suggested that selective genotyping can provide an increased response to selection compared to the random selection of CB records (Howard et al., 2018; Gowane et al., 2019; Chu et al., 2020; See et al., 2021), whereas previous studies suggest that two-tailed selective genotyping strategies outperform random or single-tailed strategies in the response to selection (Gowane et al., 2019; Chu et al., 2020; See et al., 2021). Chu et al. (2020) evaluated selective genotyping across two environments and did not include phenotypes of unselected animals in genetic evaluations. The authors suggested that selective genotyping of top performing individuals in the breeding environment, and top and bottom performing individuals from the commercial environment would increase genetic gains, compared to choosing top animals in the breeding environment and random animals in the commercial environment. Similarly, See et al. (2021) reported an increased rate of genetic gain in CB animals when incorporating a proportion of CB data into genetic evaluations using selective genotyping, while not including the phenotypes of unselected CB individuals. According to See et al. (2021), two-tailed selective genotyping provided an advantage in EBV accuracy, EBV bias, and genetic gain in a CCPS swine breeding scheme compared to random and single-tailed selection strategies when true variance components (VC) were used. If phenotypic information from unselected animals is unavailable to estimate VC, then VC and subsequent EBV could be biased.
Best linear unbiased prediction (BLUP) can provide unbiased breeding values of a population under selection if the selection criteria are included (Henderson, 1975). However, by selectively genotyping CB animals, and thereby selectively including phenotypes of some animals but not others, this requirement could be violated. Furthermore, if only selectively genotyped CB animals contribute phenotypes to genetic evaluations, estimates of variance could be inflated or deflated depending upon the selection criteria (Cesarani et al., 2019; Chu et al., 2020). In swine breeding, it is unpractical to track pedigrees of CB animals for numerous reasons including the use of pooled semen, management practices relating to semen delivery and administration, along with the separation of commercial relatives of the nucleus population(s) by both location and ownership. Therefore, the inclusion of CB phenotypes into genetic evaluations could be aided when the CB phenotype is accompanied by genotype information. Gowane et al. (2019) showed that single-tail selective genotyping strategies such as selecting the top performing animals can result in biased EBV and variance estimates when using genomic BLUP. When using a combined relationship matrix (Aguilar et al., 2010) of pedigree and genomic information (ssGBLUP), breeding values and VC can be estimated free of bias due to the addition of phenotypes from animals which were not selectively genotyped (Howard et al., 2018; Gowane et al., 2019). The study carried out by Chu et al. (2020) is the only one reporting the bias in VC estimation when pedigree information was missing and unselected animals were not phenotyped. Over-estimation of VC was reported when selectively genotyping the top and bottom performing individuals. Additionally, VC were under-estimated when using single-tailed selection strategies. Such deviations from true VC could have an impact on the prediction of EBV in CCPS breeding scheme when unselected CB animals do not contribute phenotypic information.
The objective of the current study was to investigate the impact from CB selective genotyping schemes for various levels of rpc and proportion of CB data inclusion on 1) VC estimates of PB and CB performance and 2) subsequent bias in estimated breeding values using estimated VC.
Methods
All data used in this study were simulated, and thus animal care and use approval was not required. Simulation of genotypes, phenotypes, and pedigrees was performed using AlphaSimR (Gaynor et al., 2019). A three-way terminal swine crossbreeding scheme was simulated to produce data sets via CB selective genotyping used for VC estimation. Description of the simulation structure and gene flow of the crossbreeding scheme was previously discussed by See et al. (2020). To reduce computational requirements of the simulation, each PB and CB dam produced one litter per generation with a litter size of 4 and 10, respectively.
Population and genotype simulation
Development of phenotypes, genotypes, and founder animals has been previously discussed by See et al. (2020). Briefly, historical haplotypes were simulated using the Markovian Coalescence Simulator (Chen et al., 2009). Simulation of historical haplotypes began with an effective population size (Ne) of 100,000 which was gradually reduced to 100. Historical haplotypes were then sampled at random with replacement to create 5,400 founder animals. A genome was simulated to mimic the Sus Scrofa genome with a SNP density of 72K which included 18 chromosomes each with 100 non-overlapping QTL. The SNP chip was simulated by randomly selecting 4K SNP markers per chromosome. Mutation and recombination rates were 2.5 × 10−8 and 1 × 10−8, respectively. Further, CB and PB performance were simulated free of dominance and epistatic variance with additive and phenotypic variance of 4 and 10, respectively (h2 = 0.40). The genetic correlation between PB breeds was considered to be 1 and the genetic correlations between PB breeds and CB animals were constant across breeds to reduce the number of variables being compared between selective genotyping strategies.
The general flow of the crossbreeding scheme has previously been discussed by See et al. (2020). Briefly, founder animals were split into three PB breeds (breeds A, B, and C) with equal numbers of males and females. To develop breed differences, within-breed random mating occurred for 65 generations. In the first 15 generations of random mating, the population size was gradually reduced to 475 consisting of 25 males and 450 females. At the end of random mating, selective mating occurred for one generation within breed. A three-tier swine breeding program was then simulated using the previously developed PB lines. Crossbreeding began in the second generation by simulating an F1 cross (AB) between males of breed A (n = 25) and females of breed B (n = 750). In the third and final generation, females from AB (n = 1,200) were crossed with sires from C (n = 25) to produce terminal CB progeny (C(AB)). The crossbreeding scheme resulted in a total of 12,000 CB progeny to be considered for selective genotyping. From generation 0 until 2, the selection criteria for PB selection candidates were EBV for PB performance. With the addition of CB performance in genetic evaluations in generation 3, the selection criteria for PB selection candidates were EBV for CB performance. Sire replacement rates for breeds A, B, and C were 0.75, 0.75, and 0.7, respectively. Dam replacement rates for breeds A, B, and C were 0.6, 0.6, and 0.5, respectively. The replacement rate of AB females was 0.65.
Genotyping strategies
Variance components and associated bias were calculated from three genotyping strategies: 1) Random: random CB animals were selected, 2) Top: CB animals with the highest phenotypes, and 3) Extreme: half CB animals with the highest and half with the lowest phenotypes. Strategies 2 and 3 were conducted by comparing CB genotype candidates to either their full-sib (FS) or all their contemporaries (all animals within the same generation; T), resulting in five unique selective genotyping strategies. This resulted in selecting Top or Extreme CB animals within full-sib families or within all CB animals of a generation. Genotype candidates were selected from each full-sib family when possible. For example, when performing Extreme selection among full-sibs choosing 1,800 genotype candidates, 900 full-sib families were randomly chosen to select one family member with the highest and lowest phenotype. The average number of sires, dams, and selected CB animals from each selective genotyping strategy is presented in Table 1. In each selective genotyping strategy, a proportion of CB animals were chosen to be genotyped and phenotyped. Proportions (number) of CB data inclusion investigated were 15% (1,800) or 35% (4,200). For each of the genotyping strategies, it was assumed that 80% (4,320) of PB selection candidates contributed phenotypes and genotypes to genetic evaluations each generation. PB selection candidates which contributed phenotypic and genotypic information to genetic evaluations were selected at random. Proportions of CB and PB with phenotypes and genotypes included were chosen to develop reference populations which were comprised of 10% and 20% of CB data as suggested by See et al. (2020). Selective genotyping strategies were evaluated when rpc was 0.1, 0.5, or 0.7. Only genotyped CB animals contributed phenotypes in genetic evaluations or were used to estimate VC. Each unique rpc and selective genotyping scenario was replicated 15 times.
Table 1.
Summary statistics of selected CB used in genetic evaluations including mean (SD) number of related sires and dams and CB phenotype, averaged over 15 replicates
| CB % | Strategy | Number of Dams | Number of Sires | Phenotype |
|---|---|---|---|---|
| 15 | Random | 1437 (16.06) | 74 (0.77) | 1.41 (0.61) |
| Top_T | 1390 (15.16) | 74 (0.77) | 6.07 (0.61) | |
| Top_FS | 1479 (17.15) | 74 (0.77) | 4.94 (0.61) | |
| Extreme_T | 1432 (14.41) | 74 (0.77) | 1.41 (0.61) | |
| Extreme_FS | 1409 (15.71) | 74 (0.77) | 1.41 (0.61) | |
| 35 | Random | 1489 (15.42) | 74 (0.77) | 1.41 (0.62) |
| Top_T | 1483 (16.72) | 74 (0.77) | 4.58 (0.61) | |
| Top_FS | 1491 (15.74) | 74 (0.77) | 3.90 (0.61) | |
| Extreme_T | 1489 (15.48) | 74 (0.77) | 1.41 (0.61) | |
| Extreme_FS | 1491 (15.74) | 74 (0.77) | 1.34 (0.61) |
Data
Data sets used for evaluation were simulated for each rpc scenario and selective genotyping strategy. Pedigrees used a maximum of 25,240 individuals, beginning with founder animals produced two generations prior to the initiation of selection. Complete pedigree relationships from breeds A, B, and C along with genomic inferred sire relationships of CB animals were included in the pedigree. Pedigrees, phenotypes, and genotypes of AB females were considered missing and were not used in genetic evaluations. Genotypes included were those from selectively genotyped CB animals and 80% of breeds A, B, and C selection candidates which were randomly genotyped (n = 12,960). Phenotypes included in genetic evaluations included those from selectively genotyped CB animals and all selection candidates from breeds A, B, and C (n = 16,200). Across PB and CB, the phenotyping and genotyping scheme resulted in a maximum of 18,585 genotypic and 21,825 phenotypic records in an evaluation.
Analysis and estimation of variance components
Genetic evaluations used to make selection decisions were conducted using one- and two-trait ssGBLUP models using the BLUPf90 suite of programs (Misztal et al., 2002). In genetic evaluations which were performed after generation 0, all PB selection candidates were evaluated jointly. Traits across PB breeds were considered to be the same trait and were evaluated using univariate models. Univariate models were used when evaluating PB performance and included fixed effects of the overall mean and generation of birth. Bivariate models were used when evaluating CB and PB performance and included fixed effects of the overall mean, generation of birth, and breed fraction covariates. Both univariate and bivariate models included random animal effects which were assumed to be and respectively, where is the additive genetic variance associated with PB performance, is the Kronecker product, and H is the blended relationship matrix including pedigree and genomic relationships calculated according to Aguilar et al. (2010). The is the additive genetic (co)variance matrix of PB and CB performance, where is the additive genetic variance of CB performance and is the additive genetic covariance between PB and CB performance. The residuals from univariate and bivariate models were assumed to have homogeneous variance. The (co)variance values used in all ssGBLUP evaluations were the true values dependent upon the rpc scenario.
At the end of the simulation, the resulting data sets produced from selective genotyping of CB animals were used to estimate VC for each genotyping strategy. Variance components were estimated using AIREMLf90 (Misztal et al., 2002) using the following bivariate model:
where the subscripts denote model terms relating to PB or CB performance, y is a vector of phenotypes, b is a vector of the fixed effects including the overall mean, generation of birth, and breed fraction covariates, u is a vector of breeding values assumed to be and e is a vector of random residuals assumed to be
where was the residual variance of PB performance, was the residual variance of CB performance, A was the numerator relationship matrix, and I was an identity matrix. The incidence matrices X and Z related fixed and random effects with phenotypic records. Starting values of (co)variance used to estimate VC were those initially simulated. The numerator relationship matrix was used to estimate VC as suggested by Aldridge et al. (2020) to reduce the computational time. Estimated VC were compared to true VC calculated at the end of the simulation. True additive genetic variance was calculated as the variance of residuals produced from a linear model for TBV regressed on generation of birth to remove the effect of genetic trend. Similarly, the true phenotypic variance was calculated as the variance of residuals produced from a linear model for phenotype regressed on generation of birth. Linear models used to calculate true VC for PB performance were generation of birth, breed, replicate, and the interaction between breed and replicate. Linear models used to calculate true VC for CB performance were generation of birth and replicate.
Bias of and accuracy of PB EBV for crossbred performance was calculated using EBV produced from ssGBLUP models using estimated VC. Bias and accuracy of EBV were calculated at the conclusion of the simulation for each breed and unique genetic correlation scenario and selective genotyping strategy combination. Bias was defined as the regression coefficient (β) produced from the regression of TBV on EBV, where E[β] = 1. The accuracy of PB EBV for CB performance was calculated as the Pearson correlation between TBV and EBV.
Results
Estimates and true values of additive and phenotypic VC were different from the values initially simulated. As the rpc increased, the difference between simulated with true and estimated VC increased. True values of additive and phenotypic variance for PB performance when the rpc was 0.1, 0.5, and 0.7 were 3.0, 2.8, and 2.9 and 8.9, 8.8, and 8.9, respectively. True values of additive and phenotypic variance for CB performance when the rpc was 0.1, 0.5, or 0.7 were 3.1, 3.1, and 2.8 and 9.0, 9.1, and 8.8, respectively. True values for the genetic correlation between PB and CB performance when the simulated rpc was 0.1, 0.5, or 0.7 were 0.21, 0.46, and 0.66, respectively. As expected, estimated VC for PB performance did not differ from true values. Estimates of additive genetic and phenotypic variance for PB performance were consistent across selective genotyping strategies and were not impacted by CB inclusion rate, rpc or family sampling method (results not shown).
Outlined in Figure 1 are the VC estimates for CB performance estimated using data sets produced from each selective genotyping strategy. Estimated additive variance for CB performance varied greatly across selective genotyping schemes and CB data inclusion levels. The true value of additive variance for CB performance, averaged across CB data inclusion levels, was 3.0, while the estimated values ranged from 0.1 to 30. Regardless of the proportion of CB data included, estimates of additive genetic variance did not differ from true values when using Random, Top_FS, or Extreme_FS selective genotyping (Figure 1A). Additionally, non-random selective genotyping strategies resulted in inflated or deflated estimates of phenotypic variance for CB performance (Figure 1B). Estimates of phenotypic variance ranged from 1.7 to 35, while the true value averaged over CB inclusion levels and rpc scenarios was 9.0. Using Extreme_T selective genotyping greatly over-estimated the additive and phenotypic estimates of VC. Alternatively, using Top_T selective genotyping greatly under estimated the additive and phenotypic estimates of VC. Non-random selective genotyping strategies were impacted by the proportion of CB data included. When the proportion of CB data was high, non-random strategies tended to produce estimated VC which were closer to true values compared to when the proportion of CB data was low. Additive genetic and phenotypic estimates of variance were impacted by full-sib family sampling within selective genotyping strategies. When CB animals were selectively genotyped using either Top or Extreme selective genotyping within FS families, the estimated additive genetic variance did not differ from true values. Conversely, using either Top or Extreme selective genotyping within FS families resulted in a decrease and increase in the phenotypic variance, respectively. Estimated VC for CB performance were used to calculate heritability estimates (Figure 1C). Heritability estimates were not impacted by the simulated rpc. The true heritability for CB performance averaged across CB inclusion rates and rpc scenarios was 0.34. Random selective genotyping produced heritability estimates which were not different from true values (h2 = 0.34). Top_FS and Extreme_T selective genotyping strategies produced heritability estimates for CB performance which were greatly over estimated (0.71 and 0.78 averaged over CB inclusion rate and rpc scenario, respectively). Increases in heritability estimates using Top_FS selective genotyping were driven by an under estimation of residual variance (Supplementary Figure S1). However, increases in heritability estimates using Extreme_T selective genotyping were driven by an over estimation of additive genetic variance. Conversely, Top_T and Extreme_FS produced heritability estimates for CB performance which were under estimated (0.09 and 0.11 averaged over CB inclusion, respectively). The decrease in heritability estimates using Top_T was driven by an under estimation of additive genetic variance (Figure 1A). The decrease in heritability estimates using Extreme_FS selective genotyping was driven by an over-estimation of the residual variance.
Figure 1.
Estimated variance components for CB performance when selectively genotyping a percentage of crossbred animals to be included in bivariate models under various levels of rpc compared to the true VC averaged across replicates and scenarios. (A) Estimated additive genetic variance for CB performance. (B) Estimated phenotypic variance for CB performance. (C) Estimates of heritability for CB performance.
Estimated values for the genetic correlation between PB and CB performance for each selective genotyping strategy and CB inclusion rate are presented in Figure 2. In general, neither selective genotyping strategy nor CB inclusion rate greatly impacted the ability to estimate when the simulated rpc was 0.5 or 0.7. As the CB inclusion rate increased, the difference between and the true rpc tended to decrease. Extreme and Top selective genotyping inflated to the greatest degree when 15% of CB were included regardless of the simulated rpc.
Figure 2.
Estimated rpc when selectively genotyping a percentage of crossbred animals to be included in bivariate models under various levels of simulated rpc compared to the true rpc which was calculated as the correlation between true breeding values for PB and CB performance which was averaged across replicates and scenarios.
Estimates of bias in EBV of PB selection candidates from breed C are presented in Figure 3. Bias was defined as the regression coefficient produced from regressing TBV on EBV. In general, trends in bias from breed C selection candidates were not different compared to those from breeds A and B (Supplementary Table S1). Bias of PB EBV for CB performance were not impacted by the simulated rpc. Conversely, EBV bias was inversely impacted by the CB inclusion rate, the increase in CB data resulted in a decrease in EBV bias. Top_T selective genotyping produced the most under dispersed EBV while Extreme_T produced the most over dispersed EBV. In general, EBV produced using Random, Top_FS and Extreme_FS selective genotyping were unbiased. Bias of EBV was greater using Extreme_FS when the rate of CB inclusion was 15% compared to when it was 35%. Similar trends in the accuracy of EBV for PB selection candidates from breed C were observed due to selective genotyping strategy (Figure 4). Again, trends in EBV accuracy from breed C were not different compared to those from breeds A and B (Supplementary Table S1). As expected, EBV accuracy increased as the simulated rpc and CB inclusion rate increased. Across rpc scenarios and CB inclusion rate, EBV produced from Random and Extreme_T selective genotyping strategies were the most accurate (0.57 to 0.74). Conversely, the least accurate EBV were produced by Top_T (0.32 to 0.61).
Figure 3.
Estimated breeding value bias for crossbred CB performance in PB selection candidates from breed C. Bias was calculated as the regression coefficient produced from the regression of true breeding value on EBV produced from BLUP models using estimated variance components.
Figure 4.
Estimated breeding value accuracy for crossbred CB performance in PB selection candidates from breed C. Accuracy was calculated as the Spearman correlation between true breeding value and EBV produced from BLUP models using estimated variance components.
Discussion
Using simulation, this study provided evidence of inaccurate estimates of VC when performing selective genotyping when non-genotyped animals were not included in genetic evaluations, similar to other studies (Cesarani et al., 2019; Chu et al., 2020; Wang et al., 2020). Further, this study has quantified the bias introduced in EBV estimates through the use of incorrect VC. In the current study, true and estimated VC were reduced compared to those which were initially simulated. Such reductions in genetic parameters estimates are to be expected in a population under directional selective pressure (Falconer and Mackay, 1996).
Random genotyping resulted in the most accurate estimates of VC in agreement with previous studies (Cesarani et al., 2019; Chu et al., 2020; Wang et al., 2020). The efficiency of Random selective genotyping in estimating VC is to be expected, given the sample of CB data using Random is the most likely to be representative of the population. Wang et al. (2020) evaluated the bias in VC estimates using both pedigree-based relationship matrices and single-step relationship matrices in broilers. When selectively genotyping the top 20% of broilers, models using single-step relationship matrices produced VC which were highly over-estimated, in contrast with the current study. Chu et al. (2020) estimated VC using Extreme selective genotyping via a GBLUP model. The authors reported that Extreme selective genotyping resulted in an over-estimation of the phenotypic and additive genetic VC compared to simulated values, in agreement with the current study. Additionally, Chu et al. (2020) reported Top selective genotyping greatly reduced the additive genetic and phenotypic VC estimates compared to those which were simulated.
In the current study, estimates of additive genetic VC were closer to true values when selectively genotyping CB animals using Top or Extreme sampling within FS families. When using pedigree relationships to estimate VC, additive genetic variance is best estimated by leveraging the variance between families (Hill, 2013). Making comparisons within FS families provided a better representation of the true between family variance, yet, the within family variance was still skewed due to the selective genotyping strategy. This inaccurate representation of the within family variance due to selective genotyping resulted in over and under estimation of residual VC when using Extreme_FS and Top_FS selective genotyping. When selective genotyping occurred among the entire CB population, the between family variance was inflated while the within family variance was deflated. This artificial change in family variance due to sampling method led to biased estimates of VC. In addition to artificial changes in family variance, estimates of VC in the current study could be inflated/deflated compared to previous reports due to the number of selective generations simulated. In the current study, only one generation of CB animals was produced where selective genotyping took place, whereas others have recommended using multiple generations of data to accurately estimate VC for a population under selection (Cesarani et al., 2019). In practice, multiple generations might not always be available when phenotypic data collection for new traits or strategic data collection schemes are initiated. In the current study, a single generation was used to lessen computational demands. Further departures from true and estimated VC using Extreme and Top selective genotyping strategies could be attributed to unknown dam relationships for CB animals and/or the absence of genomic data in VC estimation. In swine breeding, the absence of CB dam relationships is to be expected, as current practice does not include routine genotyping and pedigree tracking of CB dams. Admittedly, VC could be estimated using genomic data in addition to pedigree relationships which would reduce the dependency on accurate representation of between family variance (Hill, 2013) and could increase the accuracy of VC estimation. The absence of both genomic data and CB dam pedigree relationships along with using only one generation of CB data could have hindered the ability to accurately estimate additive genetic VC when sampling among the entire CB population.
Changes in bias across selective genotyping strategies were driven by use of inaccurate estimates of additive genetic VC. In general, using estimates of additive genetic VC, which were incorrect, produced EBV which were biased. Selectively genotyping using Top_T or Extreme_T animals produced the greatest amount of EBV bias compared to Random in agreement with previous studies (Chu et al., 2020; Wang et al., 2020). Top_T and Extreme_T produced estimates of bias which were far greater and lower than the expected value, respectively. Such drastic differences are due to the use of incorrect additive genetic VC which were utilized to predict EBV. Estimated additive genetic VC using Top_T and Extreme_T were under and overestimated compared to true values. Trends of over and under dispersion of EBV were less pronounced with Extreme and Top selective genotyping when utilizing FS family sampling. This is due to the use of additive genetic VC which were near true values. Selective genotyping using Extreme_FS produced slightly under dispersed EBV when the proportion of CB data was 15%. This is likely due to the over-estimation of the genetic correlation between PB and CB performance under the same scenario. Despite Extreme_T producing over dispersed EBV from inaccurate VC, EBV of PB selection candidates using Extreme_T were the most accurate. The accuracy of Extreme_T selective genotyping could be equated to case–control studies often used in QTL detection (Van Gestel et al., 2000). In such studies, the use of phenotypes which differ greatly often increases the probability of detecting causal SNP variants. In this case, SNP markers which are in the tightest linkage with QTL are likely predicted with more accurate effects resulting in an increase in the accuracy of EBV.
Previous studies found that the prediction of EBV and estimation of VC could be unbiased by utilizing ssGBLUP while selective genotyping (Cesarani et al., 2019; Howard et al., 2018; Gowane et al., 2019). With ssGBLUP, pedigree and genomic relationships are combined into a single relationship matrix which allows for the inclusion of phenotypes from non-genotyped animals. The inclusion of non-genotyped data to genetic evaluations allows for the model to account for the phenotypic pre-selection of genotyped animals (Vitezica et al., 2011; Veerkamp et al., 2011; Aasmundstad et al., 2015). In the current study, phenotypes of non-genotyped CB animals were not used to estimate VC or included in the genetic evaluations. In practice, most swine breeding programs have limited or unknown pedigree relationships to infer kinship between CB and PB populations. In some cases, specialized testing herds have been developed to measure CB phenotypes for the purposes of genetic evaluation, however, this is only practiced in large scale swine genetic suppliers. Outside of dedicated testing herds, it is not feasible to track CB pedigree relationships due to the use of pooled semen comprised of multiple sires and dam relationships which are commonly unknown. Therefore, CB animals which contribute phenotypic information to genetic evaluations must be related to PB selection candidates through genomic relationships. The alternative to this scenario would be to utilize single sire semen in a more controlled testing herd environment. However, there are numerous factors contributing to the economic feasibility of using pooled semen combined with CB genotyping compared to single sire semen and pedigree tracking of CB animals which is likely herd and system dependent.
This study compared the ability to estimate VC and EBV when selectively genotyping CB animals in a swine CCPS breeding scheme. Selectively genotyping a proportion of CB animals in a swine CCPS breeding scheme, where the breeding goal is CB performance, can increase the accuracy of PB EBV for CB performance. However, non-random selective genotyping strategies can lead to biased estimates of VC and subsequent EBV if estimated VC are used in the genetic evaluation. When the rpc is 0.1, 0.5, or 0.7, simultaneously selecting top and bottom performing CB animals resulted in over-dispersed EBV when using estimated VC and resulted in the greatest over estimation of phenotypic and additive genetic VC. Further, there are practical advantages to random genotype selection compared to simultaneously selecting the top and bottom performing CB animals. Using selective genotyping implies that all CB animals are phenotyped which is unlikely for phenotypes outside of routine carcass traits. Therefore, it is recommended that selective genotyping be used to collect data from CB animals used to inform selection decisions and use either random selection or phenotypic-based selection within full-sib families for the purpose of estimating VC.
Supplementary Material
Acknowledgments
We thank the University of Nebraska-Lincoln Holland Computing Center (HCC) for use of their computational resources.
Glossary
Abbreviations
- BLUP
best linear unbiased prediction
- CB
crossbred
- CCPS
combined crossbred and purebred selection
- EBV
estimated breeding values
- FS
full-sib family selection
- HS
half-sib family selection
- N e
effective population size
- PB
purebred
- r pc
genetic correlation between PB and CB performance
estimated genetic correlation between PB and CB performance
- ssGBLUP
single-step genomic best linear unbiased prediction
- VC
variance components
Conflict of interest statement
The authors declare no real or perceived conflicts of interest.
Literature Cited
- Aasmundstad, T., Andersen-Ranberg I., Nordbø Ø., Meuwissen T., Vangen O., and Grindflek E.. . 2015. The effect of including genomic relationships in the estimation of genetic parameters of functional traits in pigs. J. Anim. Breed. Genet. 132:386–391. doi: 10.1111/jbg.12156. [DOI] [PubMed] [Google Scholar]
- Aguilar, I., Misztal I., Johnson D. L., Legarra A., Tsuruta S., and Lawlor T. J.. . 2010. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J. Dairy Sci. 93: 743–752. doi: 10.3168/jds.2009-2730. [DOI] [PubMed] [Google Scholar]
- Aldridge, M. N., Vandenplas J., Bergsma R., and Calus M. P.. . 2020. Variance estimates are similar using pedigree or genomic relationships with or without the use of metafounders or the algorithm for proven and young animals. J. Anim. Sci. 98:1–9. doi: 10.1093/jas/skaa019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cesarani, A., Pocrnic I., Macciotta N. P. P., Fragomeni B. O., Misztal I., and Lourenco D. A. L.. . 2019. Bias in heritability estimates from genomic restricted maximum likelihood methods under different genotyping strategies. J. Anim. Breed. Genet. 136:40–50. doi: 10.1111/jbg.12367. [DOI] [PubMed] [Google Scholar]
- Chu, T. T., Sørensen A. C., Lund M. S., Meier K., Nielsen T., and Su G.. . 2020. Phenotypically selective genotyping realizes more genetic gains in a rainbow trout breeding program in the presence of genotype-by-environment interactions. Front. Genet. 11:866. doi: 10.3389/fgene.2020.00866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dekkers, J. C. 2007. Marker-assisted selection for commercial crossbred performance. J. Anim. Sci. 85:2104–2114. doi: 10.2527/jas.2006-683. [DOI] [PubMed] [Google Scholar]
- Esfandyari, H., Sørensen A. C., and Bijma P.. . 2015. A crossbred reference population can improve the response to genomic selection for crossbred performance. Genet. Sel. Evol. 47:76. doi: 10.1186/s12711-015-0155-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Falconer, D. S., and Mackay T. F. C.. . 1996. Introduction to quantitative genetics. Longman Group, Essex, UK. [Google Scholar]
- Gaynor, R. C. G., Gorjanc G., Wilson E., Money D., and Hickey J. M.. . 2019. AlphaSimR: breeding program simulations. Available from https://CRAN.R-project.org/package=AlphaSimR, R package version 0.9.0.
- Gowane, G. R., Lee S. H., Clark S., Moghaddar N., Al-Mamun H. A., and van der Werf J. H. J.. . 2019. Effect of selection and selective genotyping for creation of reference on bias and accuracy of genomic prediction. J. Anim. Breed. Genet. 136:390–407. doi: 10.1111/jbg.12420. [DOI] [PubMed] [Google Scholar]
- Henderson, C. R. 1975. Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423–447. doi:10.1017/s135772980005715 [PubMed] [Google Scholar]
- Hill, W. G. 2013. On estimation of genetic variance within families using genome-wide identity-by-descent sharing. Genet. Sel. Evol. 45:32. doi: 10.1186/1297-9686-45-32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Howard, J. T., Rathje T. A., Bruns C. E., Wilson-Wells D. F., Kachman S. D., and Spangler M. L.. . 2018. The impact of selective genotyping on the response to selection using single-step genomic best linear unbiased prediction. J. Anim. Sci. 96:4532–4542. doi: 10.1093/jas/sky330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Misztal, I., Tsuruta S., Strabel T., Auvray B., Druet T., and Lee D. H.. 2002. BLUPF90 and related programs (GF90). In: Proceedings of the 7th world congress on genetics applied to livestock production; Montpellier, France; p. 743–744. [Google Scholar]
- See, G. M., Mote B. E., and Spangler M. L.. . 2020. Impact of inclusion rates of crossbred phenotypes and genotypes in nucleus selection programs. J. Anim. Sci. 98:12. doi: 10.1093/jas/skaa360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- See, G. M., Mote B. E., and Spangler M. L.. . 2021. Selective genotyping strategies of crossbred progeny for combined crossbred and purebred selection schemes in swine breeding programs. J. Anim. Sci. doi: 10.1093/jas/skab041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Gestel, S., Houwing-Duistermaat J. J., Adolfsson R., van Duijn C. M., and Van Broeckhoven C.. . 2000. Power of selective genotyping in genetic association analyses of quantitative traits. Behav. Genet. 30:141–146. doi: 10.1023/a:1001907321955. [DOI] [PubMed] [Google Scholar]
- van Grevenhof, I. E., and van der Werf J. H.. . 2015. Design of reference populations for genomic selection in crossbreeding programs. Genet. Sel. Evol. 47:14. doi: 10.1186/s12711-015-0104-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Veerkamp, R. F., Mulder H. A., Thompson R., and Calus M. P.. . 2011. Genomic and pedigree-based genetic parameters for scarcely recorded traits when some animals are genotyped. J. Dairy Sci. 94:4189–4197. doi: 10.3168/jds.2011-4223. [DOI] [PubMed] [Google Scholar]
- Vitezica, Z. G., Aguilar I., Misztal I., and Legarra A.. . 2011. Bias in genomic predictions for populations under selection. Genet. Res. (Camb). 93:357–366. doi: 10.1017/S001667231100022X. [DOI] [PubMed] [Google Scholar]
- Wei, M., and van der Werf J. H.. . 1994. Maximizing genetic response in crossbreds using both purebred and crossbred information. Anim. Sci. 59:401–413. doi: 10.1017/S0003356100007923. [DOI] [Google Scholar]
- Wang, L., Janss L. L., Madsen P., Henshall J., Huang C. H., Marois D., Alemu S., Sørensen A. C., and Jensen J.. . 2020. Effect of genomic selection and genotyping strategy on estimation of variance components in animal models using different relationship matrices. Genet. Sel. Evol. 52:3–14. doi: 10.1186/s12711-020-00550-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




