Abstract
For a few decades, animal models (AMs) in the form of best linear unbiased prediction (BLUP) have been used for the genetic evaluation of animals. An equation system is set in the order of all the effects in the model, including all the animals in the pedigree. Solving these large equation systems has been a challenge. Reduced AM (RAM) was introduced in 1980, which allowed setting up equations for parents instead of all animals. That greatly reduced the number of equations to be solved. The RAM is followed by a back-solving step, in which progenies’ breeding values are obtained conditional on parental breeding values. Initially, pedigree information was utilized to model genetic relationships between animals. With the availability of genomic information, genomic BLUP (GBLUP), single-step GBLUP (ssGBLUP), and single-step marker models were developed. Single-step methods utilize pedigree and genomic information for simultaneous genetic evaluation of genotyped and nongenotyped animals. There has been a shortage of studies on RAM development for genetic evaluation models utilizing genomic information. This study extended the concept of RAM from BLUP to the single-step methods. Using example data, three RAMs were described for ssGBLUP. The order of animal equations was reduced from the total number of animals to (1) genotyped animals and nongenotyped parents, (2) genotyped animals and nongenotyped phenotyped animals, and (3) genotyped animals and nongenotyped parents of phenotyped nongenotyped nonparents. Solutions for the remaining animals are obtained following a back-solving step. All the RAMs produced identical results to the full ssGBLUP. Advances in computational hardware have alleviated many computational limitations, but, on the other hand, the size of data is growing rapidly by the number of animals, traits, phenotypes, genotypes, and genotype density. There is an opportunity for a RAM comeback for the single-step methods to reduce the computational demands by reducing the number of equations.
Keywords: animal model, genotyped, parents, phenotyped, reduced model, single-step
Reduced animal models used to reduce computational demands for best linear unbiased prediction (BLUP) are now applicable to single-step genomic BLUP.
Introduction
Since a few decades ago, animal genetic evaluations have been performed using “animal model” (AM) in the form of best linear unbiased prediction (BLUP; Henderson, 1975), taking into account genetic relationships between animals via pedigree information. With the availability of commercial genotyping platforms for livestock species, genomic BLUP (GBLUP; VanRaden, 2008) was developed, which was limited to information on genotyped animals. Followed by GBLUP, single-step methods were developed to incorporate pedigree and genomic information for simultaneous genetic evaluation of genotyped and nongenotyped animals. Single-step comes in the two forms of AM, called single-step GBLUP [ssGBLUP (Aguilar et al., 2010, Christensen and Lund, 2010)] and marker model (Fernando et al., 2014, 2016).
Worldwide, many genetic evaluation centers perform BLUP and single-step evaluations [ssGBLUP or single-step marker model (ssMM)] alongside each other, mainly for validation purposes and for qualifying their genomic data. Theoretically, genetic evaluation of animals unrelated to genotyped animals and their relatives remains intact in ssGBLUP compared with BLUP. However, minor differences are expected due to possibly different convergence behavior of BLUP and ssGBLUP equations (Vandenplas et al., 2018).
The concept of a reduced model (an equivalent to the full model) was first introduced to animal breeding by Henderson (1974), who absorbed equations for fixed effects into equations for sire effects. Quaas and Pollak (1980) took the reduced model to the next level by absorbing equations for random effects [reduced animal model (RAM)]. RAM allowed equations to be set up for parents in the mixed-model equations (MME). Since the number of parents is less than the number of progeny in most livestock populations, the order of equations to be formed and solved was greatly reduced (Mrode, 2005). This came with the huge benefit of reducing computational demands, given the computational limitations at the time. After solving RAM MME for breeding values of parents, breeding values of nonparents are obtained via back-solving predicted parental breeding values. The back-solving procedure is simpler and computationally less demanding than obtaining breeding values of nonparents via solving the MME for the full AM.
There have been tremendous advances in computational hardware, but on the other hand, data sizes used in genetic evaluations have increased considerably, mainly by larger pedigrees, more traits and phenotypes, and the incorporation of genomic data in genetic evaluations (VanRaden, 2008, Aguilar et al., 2010, Christensen and Lund, 2010, Fernando et al., 2014, 2016). There is an opportunity for RAM to reduce computational demands for modern (single-step) genetic evaluations.
Nilforooshan and Garrick (2021) introduced another RAM, which reduces the number of equations to phenotyped animals for BLUP, genotyped phenotyped animals for GBLUP, and genotyped animals and nongenotyped phenotyped animals for ssGBLUP. A back-solving step follows the RAM to calculate breeding values for animals not included in the MME. The aim of this technical note is to demonstrate the development of RAM for single-step methods, with numerical examples for ssGBLUP.
Materials
The example data, code (written in the R programming language), and results of this study are publicly available (Nilforooshan, 2022).
Ethical statement
No Animal Care and Use Committee approvals were required for this study because example data were used.
Data
The sample pedigree is illustrated in Figure 1, in which animals with odd numbers are females and animals with even numbers are males. Animals 5 and 10 are genotyped. Let us consider the genomic relationship matrix for these animals as
Figure 1.
Pedigree structure. Genotyped animals are in dashed circles, and phenotyped animals are in a gray background.
The phenotypes were 5.64, 4.30, 4.32, 5.39, 7.72, and 4.36 for animals 4, 5, 9, 10, 11, and 12, respectively. Genetic and residual variances were considered to be 1 and 2, respectively.
Methods
Pedigree pruning
Animals with no genetic information flow with genotyped animals might be discarded from the analysis. Breeding values of those animals are independent of genomic information and can be obtained via BLUP. Animal 3 in the example pedigree (Figure 1) is a removal candidate. Expressing the inverse of the pedigree-based additive genetic relationship matrix (A−1) as
| (1) |
where genotyped animals are denoted as 2, rows and columns corresponding to A00 can be discarded. A33 contains nongenotyped animals that are parent, or progeny, or mate to a genotyped animal, and A44 contains nongenotyped animals that are parent, or progeny, or mate to an animal in A33. Grouping
into A11, H−1 (the inverse of the augmented pedigree- and genomic-based additive genetic relationship matrix) is reduced from
where is reduced to is reduced to ,
and
It is assuming no or negligible changes in the solution of fixed effects available in X1 and X2.
The ssGBLUP MME is written as
| (2) |
where y is the vector of phenotypes, X is the incidence matrix relating y to fixed effects, Z is the incidence matrix relating y to animals, R is the diagonal matrix of residual variance(s) corresponding to y, is the additive genetic variance, is the vector of fixed effect solutions, and is the vector of animals’ additive genetic merit solutions.
Method 1
The RAM method of Quaas and Pollak (1980) for BLUP involves applying for parents (p) and for nonparents (n), where is an incidence matrix of zeros and halves identifying the parents of animals (Mrode, 2005). This matrix has rows corresponding to phenotyped n, and columns corresponding to p. Considering
and being a diagonal block of the diagonal matrix D, where , the RAM is written as
| (3) |
Compared with the full AM, W, , and replace
respectively. The difference between and is the absorption of the Mendelian Sampling term into the residual term (i.e., ) for (Mrode, 2005). Whereas, all animals are directly evaluated by AM (i.e., ), parents are directly evaluated by RAM (i.e., ), and solutions for nonparents () are obtained indirectly via back-solving:
| (4) |
where s and d are sire and dam of n. Matrix B can be simplified to To extend RAM to ssGBLUP, let us split animals into the three groups of genotyped animals (2), nongenotyped parents (m), and others (n). Combining m and 2 in p, Equation (2) can be written as
| (5) |
where
Following Quaas and Pollak (1980), Equation (5) is reduced to
| (6) |
where
and
Both and are derived using the fast algorithm of Faux and Gengler (2013), which works based on pedigree searching for dependencies between selected animals or the method of Colleau (2002). Back-solving is done using Equation (4). The reason for keeping nonparent genotyped animals in RAM is to keep the subsequent back-solving free from the blocks of . Otherwise, RAM would be computationally unjustifiable.
Method 2
This method (Nilforooshan and Garrick, 2021) reduces AM for ssGBLUP to genotyped animals and nongenotyped phenotyped animals. Splitting animals to genotyped, nongenotyped phenotyped (m), and nongenotyped nonphenotyped (n), and combining m and 2 in p, Equation (2) can be written as
| (7) |
where and matrix 0 corresponds to the columns of Z for animals n. The above equation can be reduced to
| (8) |
where
Back-solving for solutions is done by solving
| (9) |
Method 3
This method further reduces the RAM created by Method 1. Consider the RAM presented in Equation (6). Dividing p (genotyped animals and nongenotyped parents) into genotyped animals and nongenotyped parents of phenotyped nongenotyped nonparents (q), and nongenotyped parents not parent to any phenotyped nongenotyped nonparent animal (r), Equation (6) can be written as
| (10) |
where contains columns of W corresponding to q, and Next, the above equation system is reduced to
| (11) |
where As such, the two-step reduction can be made at once with a one-step reduction to q. Back-solving (solutions for nongenotyped nonparents) is done according to Equation (4), and back-solving for solutions is done similar to Equation (9), by solving
Extension to the single-step marker model
Fernando et al. (2014) developed ssMM, an equivalent model to ssGBLUP, which uses imputed marker covariates for nongenotyped animals and a residual genetic effect accommodating deviations between imputed and true genotypes. Including an additive polygenic effect not captured by the markers (residual polygenic effect) into the marker model presented in Equation (3) of Fernando et al. (2016):
| (12) |
where
M 2 is the observed and centered genotype matrix for genotyped animals, M1 is the imputed genotype matrix for nongenotyped animals derived by solving A11M1 = −A12M2, is the additive genetic variance of marker effects, σδ2 is the residual polygenic variance, is the vector of marker effect solutions, is the vector of predictions for the deviations between the true and estimated (imputed) marker breeding values for nongenotyped animals, and is the vector of animal’s residual polygenic effect solutions.
Method 1
Applying Method 1 to Equation (12), the ssMM is reduced to genotyped animals and nongenotyped parents (p). Thus, A−1 is replaced with A11 with A12 with with , and with . Consequently, matrices made up of these matrices will change as well.
Back-solving for nongenotyped nonparents (n) is done by extending Equation (4) to
| (13) |
If is not phenotyped, is replaced with If the sire of is genotyped,
Method 2
Applying Method 2 to Equation (12), the ssMM is reduced to genotyped animals and nongenotyped phenotyped animals (p). Thus, A−1 is replaced with A11 with A12 with and Z1 with . Consequently, matrices made up of these matrices will change too.
Back-solving for nongenotyped nonphenotyped animals (n) is performed by solving
| (14) |
Method 3
Applying Method 3 to Equation (12), the ssMM is reduced to genotyped animals and nongenotyped parents of phenotyped nongenotyped nonparents (q). Thus, A−1 is replaced with A11 with A12 with with and with Consequently, matrices made up of these matrices will change too.
Back-solving for nongenotyped nonparents (n) is done according to Equation (13), and back-solving for nongenotyped parents, not parent to any phenotyped nongenotyped nonparent animal (r) is done by solving
Results
Full animal model
The ssGBLUP MME for the full AM was formed and solved to check the correctness of the RAM solutions. Where applicable, columns/rows of matrices/vectors corresponding to animals are presented with animal identification indices. Where column and row indices are the same, only column indices are presented. The required matrices to form the MME were
where is the residual variance. Solving Equation (2) returns
RAM: Method 1
Reducing AM to genotyped animals and nongenotyped parents nongenotyped nonparents (n = {8, 9, 11, 12}) are excluded from the model. Changing from AM to RAM, replaces W replaces Z, and replaces H−1, where
Note that even though because it has no phenotype, it has no designated rows and columns in Dnn, , and rows in W. Solving Equation (6) returns
Back-solving involves Equation (4), where
and returns
Notice that for nonphenotyped animal is assigned to
RAM: Method 2
Reducing AM to genotyped animals and nongenotyped phenotyped animals nongenotyped nonphenotyped animals (n = {1, 2, 3, 6, 7, 8}) are excluded from the model. Changing from AM to RAM, Zp replaces Z, and replaces H−1, where
Solving Equation (8) returns
Back-solving involves Equation (9), and returns
RAM: Method 3
Reducing AM to genotyped animals and nongenotyped parents of phenotyped nongenotyped nonparents , nongenotyped nonparents and nongenotyped parents not parent to any phenotyped nongenotyped nonparent animal (r = {1, 2, 3}) are excluded from the model. Changing from AM to RAM, replaces , replaces Z, and replaces where
Solving Equation (11) returns
Back-solving for solutions involves Equation (4), and returns
Back-solving for solutions involves solving , and returns
Factors affecting the computational efficiency of RAM: A real data example
The computational efficiency of RAM depends on the size and the computational cost of deriving vs. , and the sparsity of the rows of corresponding to nongenotyped animals (the block of corresponding to genotyped animals sums with which is dense). With optimized computing, the computational costs of forming and are marginally greater than those for Z and for the full AM. This is considering that the back-solving procedure should remain as simple and efficient as possible. The most important factors influencing (some with overlapping influence) the computational efficiency of the ssGBLUP RAM are (1) number of genotyped vs. nongenotyped animals, (2) number of nongenotyped parents vs. nongenotyped nonparents, (3) number of nongenotyped phenotyped vs. nongenotyped nonphenotyped animals, (4) pedigree missing rate, and (5) sparsity of the rows of corresponding to nongenotyped animals (influenced by the sparsity of ).
A real data (New Zealand’s dairy cattle pedigree, May 2022) example was used to study the dimension reduction by RAM. There were 32,679,354 animals in the pedigree. Considering Equation (1), there were 243,034, 405,911, 25,759,086, and 6,335,004 animals in A22, A33, A44, and A00, respectively. Pruning the pedigree free from animals in A00, 26,344,350 animals remained in the pedigree. There were 13,980,955 nongenotyped parents in the remaining pedigree, of which 9,041,130 were parents of nongenotyped nonparents. The 243,034 genotyped animals and the 9,041,130 nongenotyped parents of nongenotyped nonparents remained for the single-step RAM. Without pedigree pruning, 243,034 genotyped animals and 10,362,206 nongenotyped parents of nongenotyped nonparents remained in the RAM. Since the aim is to keep the back-solving procedure free from genomic information, only the equations corresponding to nongenotyped animals undergo reduction. It was possible to further reduce nongenotyped parents of nongenotyped nonparents to nongenotyped parents of phenotyped nongenotyped nonparents. However, this further reduction was omitted, so that a single is created and used in all the analyses for different traits. Similarly, the ssMM equations (Equation (12)) corresponding to nongenotyped animals are reduced to nongenotyped parents of nongenotyped nonparents.
Discussion
Breeding values of some nongenotyped animals are not influenced and do not influence the breeding values of genotyped animals. If BLUP is performed in conjunction with single-step evaluations, those animals can be evaluated via BLUP only.
This study showed that RAM used to reduce the dimension of equations for BLUP can be easily extended to single-step genetic evaluations. Numerical results were shown for ssGBLUP, and the three RAMs produced identical solutions to the full ssGBLUP. No results were presented for the ssMM. However, as the ssMM is an equivalent form of ssGBLUP (Fernando et al., 2014, Bermann et al., 2022), indirectly evaluating animals via directly evaluating marker effects, the same principles applied to RAM for ssGBLUP are valid for the reduced ssMM. Considering Equation (12) and Method 3, the number of equations corresponding to reduces from the number of nongenotyped animals to nongenotyped parents of phenotyped nongenotyped nonparents, and the number of equations corresponding to reduces from the total number of animals to genotyped animals and nongenotyped parents of phenotyped nongenotyped nonparents.
Compared with RAM for BLUP, genotyped animals are present in the reduced single-step model. An efficient reduced model should be followed by an easy and fast back-solving, introducing no new information other than the conditionality of the back-solving solutions to the RAM solutions. That conditionality should be defined via sparse matrices, such as the diagonal matrix B in Methods 1 and Method 3, or blocks of A−1 in Methods 2 and 3, instead of the dense blocks of H−1.
The choice among the three RAMs depends on the number of nongenotyped animals remaining together with genotyped animals in RAM. It seems that Method 2 compromises the sparseness of used in and that back-solving is easier in Methods 1 and 3. Therefore, unless the number of nongenotyped phenotyped animals is considerably less than the number of nongenotyped parents or nongenotyped parents of phenotyped nongenotyped nonparents, Methods 1 and 3 are preferred over Method 2.
Acknowledgments
This work received financial support from the NZ Ministry for Primary Industries, SFF Futures Programme: Resilient Dairy–Innovative breeding for a sustainable dairy future (grant number: PGP06-17006).
Glossary
Abbreviations
- AM
animal model
- BLUP
best linear unbiased prediction
- GBLUP
genomic BLUP
- MME
mixed-model equations
- RAM
reduced AM
- ssGBLUP
single-step GBLUP
- ssMM
single-step marker model
Conflict of Interest Statement
MAN is employed at Livestock Improvement Corporation, Hamilton, New Zealand. He declares that the research was conducted in the absence of any commercial or financial interests.
References
- Aguilar, I., Misztal I., Johnson D. L., Legarra A., Tsuruta S., and Lawlor T. J.. . 2010. Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J. Dairy Sci. 93:743–752. doi: 10.3168/jds.2009-2730 [DOI] [PubMed] [Google Scholar]
- Bermann, M., D. Lourenco, N. S. Forneris, A. Legarra, and Misztal I.. 2022. On the equivalence between marker effect models and breeding value models and direct genomic values with the Algorithm for Proven and Young. Genet. Sel. Evol. 54:52. doi: 10.1186/s12711-022-00741-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christensen, O. F., and Lund M. S.. . 2010. Genomic prediction when some animals are not genotyped. Genet. Sel. Evol. 42:2. doi: 10.1186/1297-9686-42-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Colleau, J. J. 2002. An indirect approach to the extensive calculation of relationship coefficients. Genet. Sel. Evol. 34:409–421. doi: 10.1186/1297-9686-34-4-409 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faux, P., and Gengler N.. . 2013. Inversion of a part of the numerator relationship matrix using pedigree information. Genet. Sel. Evol. 45:45. doi: 10.1186/1297-9686-45-45 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernando, R. L., Dekkers J. C., and Garrick D. J.. . 2014. A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses. Genet. Sel. Evol. 46:50. doi: 10.1186/1297-9686-46-50 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernando, R. L., Cheng H., Golden B. L., and Garrick D. J.. . 2016. Computational strategies for alternative single-step Bayesian regression models with large numbers of genotyped and non-genotyped animals. Genet. Sel. Evol. 48:96. doi: 10.1186/s12711-016-0273-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henderson, C. R. 1975. Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423–447. doi: 10.2307/2529430 [DOI] [PubMed] [Google Scholar]
- Henderson, C. R. 1974. General flexibility of linear model techniques for sire evaluation. J. Dairy Sci. 57:963–972. doi: 10.3168/jds.S0022-0302(74)84993-3 [DOI] [Google Scholar]
- Mrode, R. A. 2005. Univariate models with one random effect. In: Linear models for the prediction of animal breeding values. 2nd ed. CABI Publishing, Oxfordshire, UK. p. 39–64. doi: 10.1079/9780851990002.0000 [DOI] [Google Scholar]
- Nilforooshan, M. A. 2022. Reduced animal model for ssGBLUP. figshare (May 28, 2022). doi: 10.6084/m9.figshare.16455681.v2. [DOI]
- Nilforooshan, M. A., and Garrick D.. . 2021. Reduced animal models fitting only equations for phenotyped animals. Front. Genet. 12:637626. doi: 10.3389/fgene.2021.637626 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quaas, R. L., and Pollak E. J.. . 1980. Mixed model methodology for farm and ranch beef cattle testing programs. J. Anim. Sci. 51:1277–1287. doi: 10.2527/jas1981.5161277x [DOI] [Google Scholar]
- Vandenplas, J., Eding H., Calus M. P. L., and Vuik C.. . 2018. Deflated preconditioned conjugate gradient method for solving single-step BLUP models efficiently. Genet. Sel. Evol. 50:51. doi: 10.1186/s12711-018-0429-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- VanRaden, P. M. 2008. Efficient methods to compute genomic predictions. J. Dairy Sci. 91:4414–4423. doi: 10.3168/jds.2007-0980 [DOI] [PubMed] [Google Scholar]

