Abstract
Simple Summary
Compared to BLUP, in single-step genomic BLUP, is added to the inverse of the pedigree relationship matrix (), forming , where G is the genomic relationship matrix, and is the block of A for genotyped animals. Incompatibility between G and A may cause inflated genetic variance. Blending and tuning G with partially solves the problem. However, conditioning might still be needed, which is usually performed via . This may violate the properties upon which H is built. Alternative ways of weighting the components are presented to prevent/minimise violations of the properties of H.
Abstract
The single-step genomic BLUP (ssGBLUP) is used worldwide for the simultaneous genetic evaluation of genotyped and non-genotyped animals. It is easily extendible to all BLUP models by replacing the pedigree-based additive genetic relationship matrix (A) with an augmented pedigree–genomic relationship matrix (H). Theoretically, H does not introduce any artificially inflated variance. However, inflated genetic variances have been observed due to the incomparability between the genomic relationship matrix (G) and A used in H. Usually, G is blended and tuned with (the block of A for genotyped animals) to improve its numerical condition and compatibility. If deflation/inflation is still needed, a common approach is weighting in the form of , added to to form . In some situations, this can violate the conditional properties upon which H is built. Different ways of weighting the components (, , , and itself) were studied to avoid/minimise the violations of the conditional properties of H. Data were simulated on ten populations and twenty generations. Responses to weighting different components of were measured in terms of the regression of phenotypes on the estimated breeding values (the lower the slope, the higher the inflation) and the correlation between phenotypes and the estimated breeding values (predictive ability). Increasing the weight on increased the inflation. The responses to weighting were similar to those for . Increasing the weight on (together with ) was not influential and slightly increased the inflation. Predictive ability is a direct function of the slope of the regression line and followed similar trends. Responses to weighting depend on the inflation/deflation of evaluations from to and the compatibility of the two matrices with the heritability used in the model. One possibility is a combination of weighting and weighting . Given recent advances in ssGBLUP, conditioning might become an interim solution from the past and then not be needed in the future.
Keywords: conditional property, inflated, relationship matrix, single-step GBLUP, weighting
1. Introduction
The unified genetic evaluation of genotyped and non-genotyped animals has been of great interest. In an initial attempt, Misztal et al. [1] suggested a unified pedigree (A) and genomic (G) relationship matrix (), in which genomic relationships between genotyped animals replaced their pedigree relationship coefficients in A. Denoting non-genotyped and genotyped animals with 1 and 2:
| (1) |
This relationship matrix did not condition the distributions of breeding values for genotyped and non-genotyped animals on each other, leading to incoherencies in the joint distribution of genetic values for genotyped and non-genotyped animals. Legarra et al. [2] presented an augmented (A and G) relationship matrix in which the genetic values of non-genotyped animals were conditioned to the genetic values of genotyped animals. The resulting matrix was:
| (2) |
which can be simplified to any of the following:
| (3) |
| (4) |
| (5) |
In matrix H, the genomic information in G influences the relationships between non-genotyped and genotyped animals and among non-genotyped animals. Later, it was discovered that can be indirectly obtained without forming and inverting H [3,4].
| (6) |
Note that:
Matrix G is not always full-rank (e.g., when the number of genotyped animals is greater than the number of loci or when there are duplicated genotypes, such as for identical twins). To force G to be positive-definite and avoid large diagonal values of due to the bad numerical condition of G, the first step of conditioning G often involves blending it with , which is always positive-definite (except in the existence of identical twins or clones [5]) and of good numerical conditions (i.e., , 0 < k < 1). Blending introduces residual polygenic effects (genetic effects not captured by genetic markers) to the evaluation model without explicitly modelling it, where the scalar k is the ratio of the polygenic to the total additive genetic variance [6].
It is theoretically true that no artificially inflated variance is introduced via the H matrix [2]. However, inflated genetic variances have been observed due to incompatibilities between G and [6,7,8,9]. Incompatible G and lead to incorrectly weighted pedigree and genomic information [7,8]. Besides different distributions of G and elements, incomplete and incorrect pedigree information, and genotyping and imputation errors, incompatibilities between G and can be due to the non-random selection of genotyped animals [10], and the different bases and scales of the two matrices [7]. Matrices and G regress data to different means. Matrix regresses solutions towards pedigree founders, animals in the pedigree with unknown parents or genetic groups if considered in the pedigree. On the other hand, G regresses solutions toward a founder population comprising genotyped animals [5,10] since the real allele frequencies in the founder population are unknown. The average genetic merit of genotyped animals can be different from founders, especially in the presence of selection. Different approaches (referred to as tuning) have been used for correcting the base difference between G and [7,11] and rebasing and scaling G to improve its consistency with [10]. Those approaches were tested by Nilforooshan [9] on New Zealand Romney sheep. Christensen [8] and Gao et al. [6] tuned G by regressing its averages to the averages of (Equations (7) and (8), respectively).
| (7) |
| (8) |
The and scalars obtained by solving either of the equations above are used for transforming G into . Another solution proposed to tackle the problem of inflated genomic evaluations (i.e., an increased variance of genomic predictions) as a result of incorrectly scaled genomic and pedigree information was scaling in the form of [3,12,13]. Applying is equivalent to transforming G into [3,9], which equals . It is also equivalent to replacing with in Equation (2) [12].
Reducing and values toward 0 brings G closer to by bringing closer to . However, it is not easily quantifiable how G and are proportionally combined. With and deviating from each other and 1, there is a risk of distorting the conditional properties of H, because the changes made in are not reflected in other blocks of . Whereas 1 – k and k are the commonly used blending coefficients of G and , and are the commonly used blending coefficients of and . i.e.,
| (9) |
Considering the above equation, there is no legitimate reason for being out of the boundary of 0 and 1, and being out of the boundary of –1 and 1. Martini et al. [12] studied ranging from 0.1 to 2, and ranging from –1 to 1 by steps of 0.1, leading to 420 analyses. Dealing with two parameters increases the number of analyses and validation tests in a two-dimensional space. It is assuming that the k coefficient has already been chosen and does not need to be validated. The most coherent approach for finding k is by restricted maximum likelihood (REML), as proposed by Christensen and Lund [4], rather than using empirical values by screening and validation.
Weighting and as has been used until recently [12,13,14,15,16,17]. Several improvements have been made to ssGBLUP [18] and the use of is declining. For example, one of the factors leading to the need for an considerably less than 1 was that inbreeding coefficients were considered in but not in [19]. The aim of this study was to communicate the problems that might occur using , and investigate the possible solutions for weighting the components if the modifications in G are not satisfactory and the weighting of the components is still needed for the deflation/inflation of genomic breeding values.
2. Methods
2.1. Possible Problems with
The matrix in Equation (9) is unconditional and not reflected in the other blocks of . As such, some combinations of potentially distort the conditional properties of H. However, any ranging from 0 to 1 is legitimate and can be considered as a blending of and . While it might make sense to weight and to bring them closer to each other and make them more compatible, weighting causes incompatibility between and . Matrix can also be written as:
| (10) |
| (11) |
Weighting the components of in Equation (10), the aim is to preserve the existing quadratic form. This study aimed to introduce weighting on the components that are unlikely to introduce distortions to the conditional properties of H. Weighting can be performed on any of the following components:
-
1.
itself
-
2.
-
3.
-
4.
-
5.
-
6.
-
7.
2.2. Weighting
This scenario is helpful when the heritability estimate (h2) does not match the data or . Heritability may change over time and as a result of selection. An outdated h2 may differ from the current h2 of the trait in the population. Estimating variance components is a computationally expensive process. The h2 estimate might have been from a population subset or via a matrix other than ( or ). Different relationship matrices contain different information and may result in different genetic variances and h2 estimates [20].
2.3. Weighting
Aguilar et al. [3] suggested using equal and . Weighting by is equivalent to .
2.4. Weighting
This scenario can be understood as scaling the h2 corresponding to to the h2 corresponding to . No violation is made to the conditional properties of , and weighting by is equivalent to using in H. Therefore, instead of G, is propagated through the blocks of H. A more compatible with would bring G closer to and more compatible with A.
2.5. Weighting
This scenario can be understood as scaling the h2 corresponding to to the h2 corresponding to . In response to weighted by , in Equation (6) should be changed to , which is equivalent to multiplying in Equation (11) by . With an h2 estimate based on pedigree information, weighting is preferred over weighting .
2.6. Weighting
Considering Equation (10), weighting is equivalent to weighting all the components of , except , similar to that of the weighting scenario.
2.7. Weighting
Considering Equation (11), weighting should coincide with weighting the other blocks of to preserve its conditional properties, as well as weighting , similar to that of the weighting scenario.
2.8. Weighting
Considering Equation (10), weighting is equivalent to:
| (12) |
However, this is not recommended as it imposes a different pedigree-based h2 on the genotyped and non-genotyped animals in . Furthermore, as becomes smaller, the relationships between genotyped and non-genotyped animals are weakened.
2.9. The Experiments
Since the scenarios of weighting and are equivalent to weighting , and weighting is not recommended, the four scenarios of weighting , , , and were tested. These scenarios were tested with ranging from 0.8 to 1.2 to know the responses of each conversion to the deviation of from 1. Because weighting requires to be between 0 and 1, it was studied with ranging from 0.8 to 1. Predictive ability was calculated as Pearson’s correlation between the phenotypes and the estimated breeding values. Phenotypes were regressed on the estimated breeding values, where a lower slope means inflation and a higher slope means deflation.
3. Materials
Data were simulated for a species in a 1:1 sex ratio, litter size of 2, and generation overlap of 1. The pedigree, phenotypes, and genotypes were simulated using the R package pedSimulate [21]. Initially, ten generations were simulated, starting with a base generation (F0) of 100 animals (50 of each sex). No non-random pre-mating mortality or selection was applied to F0. Genotypes were simulated on 5000 markers, and allele frequencies were sampled from a uniform distribution ranging from 0.1 to 0.9. Marker (allele substitution) effects were simulated from a gamma distribution with shape and rate parameters equal to 2. The distribution was rebased to have a mean of 0 and scaled to create a variance of (true) marker breeding values in F0, = 9. Residual polygenic and environment (residual) effects were simulated from normal distributions with variances = 1 and = 30, respectively.
Following F0, half of the males were mated to half of the females, which were all randomly selected and mated. Where the numbers of mating animals per sex were not equal, the sex with the higher number of animals underwent random selection to match the number of animals of the opposite sex. These ten generations were followed by ten more generations, in which 50% of male candidates (to become sires of the next generation) were selected for their marker breeding value and mated to the same number of randomly selected females. Genotypes in each subsequent generation were obtained by combining sampled gametes from the parents’ genotypes.
Phenotypes were calculated as , where is the population mean, and g, a, and e are the vectors of effects corresponding to , , and . Genotypes before F8 and phenotypes for the last generation (F19) and before F7 were set to missing. Randomly, 5% of the known dams and 5% of the known sires (after F0) were set to missing. As such, missing pedigree and phenotype information, genomic pre-selection, and base and scale deviations between A and G were accommodated in the simulation. Data simulation was repeated ten times to reduce the possibility of observing the results specific to a dataset.
No fixed effect was simulated, and the data were analysed using the following mixed model equations:
| (13) |
where Z is the matrix relating phenotypes to animals, 1 and are the vectors of ones and predicted breeding values, and is the mean estimate. Matrix G was used in and built according to method 1 of VanRaden [5], where , W is the centred and scaled genotype matrix, and p is the marker allele frequency. Markers with minor allele frequency below 0.02 were discarded before calculating G. Then, G was blended as .
4. Results
The simulated pedigrees had a population size of 2162.8 ± 358.3 ( sd), 1326.4 ± 298.2 genotypes, 1324.6 ± 277.2 phenotypes, 1074.7 ± 156.8 males, and 1088.1 ± 202.9 females. Inflation and predictive ability estimates over the ten simulated pedigrees were averaged and presented (Figure 1 and Figure 2).
Figure 1.
Regression coefficients of the phenotypes on genomic breeding values for different components of weighted by . Each data point is an average of ten observations for the simulated populations.
Figure 2.
Correlation coefficients between phenotypes and genomic breeding values for different components of weighted by . Each data point is an average of ten observations for the simulated populations.
Different components were weighted by ranging from 0.8 to 1.2, except for , where ranged from 0.8 to 1. Weighting and showed similar trends for inflation (Figure 1) and predictive ability (Figure 2), with the slope of the trends being slightly less for compared to . Weighting (accompanied by weighting ) showed slightly decreasing trends, with the regression slope decreasing by 0.01 (i.e., inflation increasing by 0.01) and the predictive ability decreasing by 4.4 over the range of . The inflation and prediction ability increased by weighting with decreasing from 1 to 0.8.
5. Discussion
Matrices G and indicate different means and variances for genotyped animals. This can cause differently scaled genomic and pedigree information in [3]. Usually, G is blended and tuned (rebased and scaled) with . If genomic breeding values are still inflated, a complementary weighting of might be needed. A common practice is to weight using . It was shown that some combinations are likely to distort the properties of H that provide conditionality between the breeding values of genotyped and non-genotyped animals. Other ways of weighting the components of were presented that are unlikely to distort the conditional properties of H.
Weighting with > 1 is equivalent to reducing h2 and increasing inflation due to increased dispersion. It is equivalent to adding to 1/h2 or weighting the genetic variance by 1/. Due to selection, h2 can be lower than expected. The h2 reduction is expected to be greater due to genomic selection. Change of genetic variance by genomic selection is propagated from G throughout H. The predictive ability declined with increasing (Figure 2), which might be concerning. However, predictive ability is a direct function of the slope of the regression line (Figure 1). Therefore, the slope of the regression line (inflation) should be the main concern.
Weighting (accompanied by weighting ) did not influence inflation and predictive ability. Predictive ability and the slope of the regression line decreased slightly (inflation increased slightly) over the increase in . The reason for this is likely that H is a genomic relationship matrix extended from G for genotyped animals to non-genotyped animals via the coefficients (Equations (2)–(5)). As such, G is more influential in defining the variances in H than A. This was confirmed by similar trends for weighting and (Figure 1 and Figure 2). The slopes of the regression line (inflation) and predictive ability were slightly steeper for than for , and that was a result of the combined weighting of , and . Weighting by < 1 increased the inflation but at a lower rate than weighting or with > 1.
The inflation results are expected to be valid for other data as weighting or its components is equivalent to inversely weighting the genetic variance, regardless of the data. The exception is weighting . Whether weighting with a larger results in inflation or deflation depends on whether using instead of results in inflation or deflation. If using results in inflation, then weighting with a larger (more emphasis on than ) results in greater inflation. The predictive ability improved by weighting with decreasing from 1 to 0.8. Generally, predictive ability increases by the increase in the slope of the regression line. Notice that the predictive ability ignoring inflation can be misleading. Since the trends for prediction ability and the slope of the regression line were in opposite directions for weighting , it shows that the predictive ability benefited from blending and , mainly because the h2 was more compatible with a blended and than with .
This study does not completely rule out using . However, weighting components should meet specific conditions to avoid/minimise violating the conditional properties of H. As such,
and are better alternatives to . By definition, none of these four options are better than the others. However, achieving good compatibility between the resulting and h2 without blending and at a high rate (low emphasis on genomic information) is important.
Concerning pedigree and genomic errors, regardless of the emphasis given to pedigree and genomic information, genotype errors propagate through non-genotyped animals, and pedigree errors incorrectly and insufficiently propagate genotype information through non-genotyped animals. Therefore, the correctness and the completeness of pedigree and genomic information are vital for accurate and unbiased ssGBLUP evaluations.
Future research may focus on changing genetic parameters over time or across populations in genomic predictions. It is possible to reduce inflation in genomic predictions for young animals by using smaller additive genetic variances. This can be done by replacing with . Considering no overall weight on : . Matrix D is a diagonal matrix of positive values descending in function of the animal’s age. The researcher would need to decide the range, where d = diag(D). With recent advances in ssGBLUP (mentioned by Misztal et al. [18]), which improve the compatibility between A and G, conditioning might become an interim solution from the past or be reduced to only weighting .
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data, code, and the results supporting the findings of this study are openly available in Mendeley Data at doi:10.17632/cn9jzpj7fg.1 [22].
Conflicts of Interest
M.A.N. is employed at Livestock Improvement Corporation, Hamilton, New Zealand. He declares that the research was conducted in the absence of any commercial or financial interest.
Funding Statement
This work was supported by the NZ Ministry for Primary Industries, SFF Futures Programme: Resilient Dairy-Innovative breeding for a sustainable dairy future (grant number PGP06-17006).
Footnotes
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Misztal I., Legarra A., Aguilar I. Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information. J. Dairy Sci. 2009;92:4648–4655. doi: 10.3168/jds.2009-2064. [DOI] [PubMed] [Google Scholar]
- 2.Legarra A., Aguilar I., Misztal I. A relationship matrix including full pedigree and genomic information. J. Dairy Sci. 2009;92:4656–4663. doi: 10.3168/jds.2009-2061. [DOI] [PubMed] [Google Scholar]
- 3.Aguilar I., Misztal I., Johnson D.L., Legarra A., Tsuruta S., Lawlor T.J. Hot topic: A unified approach to utilise phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J. Dairy Sci. 2010;93:743–752. doi: 10.3168/jds.2009-2730. [DOI] [PubMed] [Google Scholar]
- 4.Christensen O.F., Lund M.S. Genomic prediction when some animals are not genotyped. Genet. Sel. Evol. 2010;42:2. doi: 10.1186/1297-9686-42-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.VanRaden P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008;91:4414–4423. doi: 10.3168/jds.2007-0980. [DOI] [PubMed] [Google Scholar]
- 6.Gao H., Christensen O.F., Madsen P., Nielsen U.S., Zhang Y., Lund M.S., Su G. Comparison on genomic predictions using three GBLUP methods and two single-step blending methods in the Nordic Holstein population. Genet. Sel. Evol. 2012;44:8. doi: 10.1186/1297-9686-44-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Forni S., Aguilar I., Misztal I. Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information. Genet. Sel. Evol. 2011;43:1. doi: 10.1186/1297-9686-43-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Christensen O.F. Compatibility of pedigree-based and marker-based relationship matrices for single-step genetic evaluation. Genet. Sel. Evol. 2012;44:37. doi: 10.1186/1297-9686-44-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Nilforooshan M.A. Application of single-step GBLUP in New Zealand Romney sheep. Anim. Prod. Sci. 2020;60:1139–1144. doi: 10.1071/AN19315. [DOI] [Google Scholar]
- 10.Vitezica Z.G., Aguilar I., Misztal I., Legarra A. Bias in genomic predictions for populations under selection. Genet. Res. 2011;93:357–366. doi: 10.1017/S001667231100022X. [DOI] [PubMed] [Google Scholar]
- 11.Chen C.Y., Misztal I., Aguilar I., Legarra A., Muir W.M. Effect of different genomic relationship matrices on accuracy and scale. J. Anim. Sci. 2011;89:2673–2679. doi: 10.2527/jas.2010-3555. [DOI] [PubMed] [Google Scholar]
- 12.Martini J.W.R., Schrauf M.F., Garcia-Baccino C.A., Pimentel E.C.G., Munilla S., Rogberg-Muñoz A., Cantet R.J.C., Reimer C., Gao N., Wimmer V., et al. The effect of the H−1 scaling factors τ and ω on the structure of H in the single-step procedure. Genet. Sel. Evol. 2018;50:16. doi: 10.1186/s12711-018-0386-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Misztal I., Aguilar I., Legarra A., Lawlor T.J. Choice of parameters for single-step genomic evaluation for type; Proceedings of the 61st Annual EAAP Meeting; Heraklion, Greece. 23–27 August 2010; p. 357. [Google Scholar]
- 14.Kang H., Ning C., Zhou L., Zhang S., Yan Q., Liu J.-F. Short communication: Single-step genomic evaluation of milk production traits using multiple-trait random regression model in Chinese Holsteins. J. Dairy Sci. 2018;101:11143–11149. doi: 10.3168/jds.2018-15090. [DOI] [PubMed] [Google Scholar]
- 15.Imai A., Kuniga T., Yoshioka T., Nonaka K., Mitani N., Fukamachi H., Hiehata N., Yamamoto M., Hayashiet T. Single-step genomic prediction of fruit-quality traits using phenotypic records of non-genotyped relatives in citrus. PLoS ONE. 2019;14:e0221880. doi: 10.1371/journal.pone.0221880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Alvarenga A.B., Veroneze R., Oliveira H.R., Marques D.B.D., Lopes P.S., Silva F.F., Brito L.F. Comparing alternative single-step GBLUP approaches and training population designs for genomic evaluation of crossbred animals. Front. Genet. 2020;11:263. doi: 10.3389/fgene.2020.00263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Fu C., Ostersen T., Christensen O.F., Xiang T. Single-step genomic evaluation with metafounders for feed conversion ratio and average daily gain in Danish Landrace and Yorkshire pigs. Genet. Sel. Evol. 2021;53:79. doi: 10.1186/s12711-021-00670-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Misztal I., Lourenco D., Tsuruta S., Aguilar I., Masuda Y., Bermann M., Cesarani A., Legarra A. How ssGBLUP became suitable for national dairy cattle evaluations; Proceedings of the 12th World Congress on Genetics Applied to Livestock Production; Rotterdam, The Netherlands. 3–8 July 2022; [(accessed on 5 October 2022)]. p. 357. Available online: https://www.wageningenacademic.com/pb-assets/wagen/WCGALP2022/52_009.pdf. [Google Scholar]
- 19.Lourenco D.A.L., Legarra A., Tsuruta S., Masuda Y., Aguilar I., Misztal I. Single-step genomic evaluations from theory to practice: Using SNP chips and sequence data in BLUPF90. Genes. 2020;11:790. doi: 10.3390/genes11070790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Legarra A. Comparing estimates of genetic variance across different relationship models. Theor. Pop. Biol. 2016;107:26–60. doi: 10.1016/j.tpb.2015.08.005. [DOI] [PubMed] [Google Scholar]
- 21.Nilforooshan M.A. pedSimulate—An R package for simulating pedigree, genetic merit, phenotype, and genotype data. R. Bras. Zootec. 2022;51:e20210131. doi: 10.37496/rbz5120210131. [DOI] [Google Scholar]
- 22.Nilforooshan M.A. Code & Data—A Note on the Conditioning of the H-1 Matrix Used in Single-Step GBLUP. Mendeley Data V1. 2022. [(accessed on 15 November 2022)]. Available online: [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data, code, and the results supporting the findings of this study are openly available in Mendeley Data at doi:10.17632/cn9jzpj7fg.1 [22].


