Optimum and Decorrelated Constrained Multistage Linear Phenotypic Selection Indices Theory

J Jesus Cerón-Rojas; Fernando H Toledo; Jose Crossa

doi:10.2135/cropsci2019.04.0241

letter

. 2019 Oct 31;59:2585–2600. doi: 10.2135/cropsci2019.04.0241

Optimum and Decorrelated Constrained Multistage Linear Phenotypic Selection Indices Theory

J Jesus Cerón-Rojas ¹, Fernando H Toledo ¹, Jose Crossa ^1,^*

PMCID: PMC7680945 PMID: 33343016

Abstract

Some authors have evaluated the unconstrained optimum and decorrelated multistage linear phenotypic selection indices (OMLPSI and DMLPSI, respectively) theory. We extended this index theory to the constrained multistage linear phenotypic selection index context, where we denoted OMLPSI and DMLPSI as OCMLPSI and DCMLPSI, respectively. The OCMLPSI (DCMLPSI) is the most general multistage index and includes the OMLPSI (DMLPSI) as a particular case. The OCMLPSI (DCMLPSI) predicts the individual net genetic merit at different individual ages and allows imposing constraints on the genetic gains to make some traits change their mean values based on a predetermined level, while the rest of them remain without restrictions. The OCMLPSI takes into consideration the index correlation values among stages, whereas the DCMLPSI imposes the restriction that the index correlation values among stages be null. The criteria to evaluate OCMLPSI efficiency vs. DCMLPSI efficiency were that the total response of each index must be lower than or equal to the single-stage constrained linear phenotypic selection index response and that the expected genetic gain per trait values should be similar to the constraints imposed by the breeder. We used one real and one simulated dataset to validate the efficiency of the indices. The results indicated that OCMLPSI accuracy when predicting the selection response and expected genetic gain per trait was higher than DCMLPSI accuracy when predicting them. Thus, breeders should use the OCMLPSI when making a phenotypic selection.

In a two-stage context, Ceron-Rojas et al. (2019) described and evaluated the unconstrained optimum and decorrelated multistage linear phenotypic selection indices (OMLPSI and DMLPSI, respectively) theory and concluded that OMLPSI efficiency when predicting the net genetic merit was higher than the DMLPSI efficiency and that breeders should use the OMLPSI when making phenotypic selection. The main difference between the two indices is that although the OMLPSI takes into consideration the correlation values among stages when predicting the net genetic me it, the DMLPSI imposes the restriction that the correlation values among stages be null when it makes the prediction. The main characteristic of the OMLPSI (DMLPSI) in a two-stage context is that at Stage 1, OMLPSI (DMLPSI) is a partial index, but at Stage 2, it is a complete index. This selection procedure is called the part and whole index selection method (Young, 1964; Saxton, 1983) and is valid for any number of stages. The OMLPSI (DMLPSI) is more efficient than the independent culling method because it uses all available information at each stage and incorporates the genetic correlations between traits in the prediction.

The OMLPSI (DMLPSI) combines the single-stage linear phenotypic selection index (LPSI) theory (Smith, 1936; Hazel, 1943) with the independent culling selection method (Cochran, 1951; Young, 1964; Cunningham, 1975; Xu and Muir, 1992) and is useful for selecting more than one trait in the multistage selection context. Breeders apply the OMLPSI (DMLPSI) mainly in animal and tree breeding where, due to early culling, OMLPSI (DMLPSI) is a cost-saving strategy for improving several traits because they do not need to measure all traits at each stage. The OMLPSI (DMLPSI) increases selection intensity on traits measured at an earlier age, and, with fixed facilities, this index selects a greater number of individuals at an earlier age (Xu and Muir, 1991, 1992).

The OMLPSI values may have a non-normal distribution after the first selection stage, and to derive selection intensities for more than two stages, this index requires numeric multiple integration techniques. To solve this problem, the DMLPSI minimizes the mean squared difference between the index and the net genetic merit at each stage under the restriction that the covariance between the DMLPSI values at different stages be zero, thus preventing the correlation between DMLPSI values at different stages. Under this restriction, truncation points and selection intensities can be determined for a fixed total proportion before the breeder carries out selection, and the selected individual index values after the first selection stage may be normally distributed (Xu and Muir, 1992). Nevertheless, due to the indicated restriction, the DMLPSI selection response and accuracy after the first stage could be lower than the OMLPSI selection response.

One additional problem with the OMLPSI (DMLPSI) expected genetic gain per trait (or multitrait selection response) is that its values can increase or decrease in a positive or negative direction without control. In the single-stage context, Kempthorne and Nordskog (1959) developed the restricted LPSI that allows imposing restrictions equal to zero on the expected genetic gain of some traits. Other authors (Mallard, 1972; Harville, 1975; Tallis, 1985) extended the Kempthorne and Nordskog (1959) approach and developed a single-stage constrained LPSI (SCLPSI) that attempts to make some traits change their expected genetic gain values based on a predetermined level while the rest of the traits remain without restrictions. Itoh and Yamada (1987) showed that in reality there is only one optimum SCLPSI; that is, the Mallard (1972), Harville (1975), and Tallis (1985) indices are the same. Xie and Xu (1997) and Ceron-Rojas and Crossa (2018, Chapter 9) extended the DMLPSI and OMLPSI to the constrained context, respectively. The Xie and Xu (1997) index, however, is not an optimum constrained multistage index because their approach is based on the single-stage Tallis (1962) constrained index theory, which is not an optimum index (see Ceron-Rojas and Crossa, 2018, Chapter 3, for details).

Based on the Mallard (1972) constrained phenotypic single-stage index theory, which is an optimum singlestage constrained index (see Ceron-Rojas and Crossa, 2018, Chapter 3, for details), in this work, we extend the OMLPSI and DMLPSI to the constrained multistage selection context. We will denote the OMLPSI and DMLPSI as OCMLPSI (optimum constrained multistage LPSI) and DCMLPSI (decorrelated constrained multistage LPSI), respectively. The main difference between the OCMLPSI and the DCMLPSI is that the OCMLPSI imposes only one restriction when solving the OMLPSI equations to obtain its vector of coefficients, whereas the DCMLPSI imposes two restrictions. The OCMLPSI solves the OMLPSI equations subject to the restriction that the covariance between the OCMLPSI and some linear combinations of the genotypes involved be equal to a vector of predetermined proportional gains (or constraints) imposed by the breeder, whereas the DCMLPSI imposes the additional restriction that the covariance between DCMLPSI values at different stages be zero. This additional restriction negatively affects the DCMLPSI selection response and expected genetic gain values per trait after the first stage.

One of the purposes of conducting a multistage selection is to reduce the cost and still obtain a reasonable gain. This means that the OCMLPSI and DCMLPSI could also be optimized with respect to aggregated economic gain and cost associated with obtain measures on each trait, but in this work, that problem was not considered. In a two-stage context, Namkoong (1970) has detailed how this last problem could be solved for the OMLPSI, whereas Xu and Muir (1992) have described that problem in the DMLPSI context.

We compared the relative efficiency of OCMLPSI and DCMLPSI under the assumption that the net genetic merit and the OCMLPSI and DCMLPSI values have joint multivariate normal distribution. We corroborated the normality assumption at Stage 2 using graphical methods and normality tests (Shapiro and Wilk, 1965; Mardia, 1980). Under this assumption, the regression of the net genetic merit on any linear function of the phenotypic values is linear (Kempthorne and Nordskog, 1959) and the selection response and expected genetic gain per trait results for two or more stages can be summarized arithmetically (Cochran, 1951; Young, 1964). We used two criteria to compare the efficiency of both indices. The first criterion was that the total selection response of each index must be lower than or equal to the SCLPSI selection response (Young, 1964; Saxton, 1983; Ceron-Rojas et al., 2019). The second criterion was that the expected genetic gain per trait values should be similar to the predetermined gains or constraints imposed by the breeder. We used one real and one simulated dataset, each with four traits, to validate OCMLPSI efficiency vs. DCMLPSI efficiency. The results of both datasets indicated that the OCMLPSI is the most efficient index for predicting the net genetic merit, and its accuracy when predicting the selection response and estimating the expected genetic gain per trait was higher than the DCMLPSI accuracy when predicting the selection response and estimating the expected genetic gain per trait. Thus, breeders should use OCMLPSI when making a constrained phenotypic selection.

Results of this study are the first ones comparing (with real and simulated data) the relative efficiencies of the OCMLPSI vs. DCMLPSI using the total selection response and expected genetic gain pert trait as the main criteria to compare the efficiency of both indices.

MATERIALS AND METHODS

Methods Objectives of the Constrained Multistage Linear Phenotypic Selection Indices

Let m_j be the population mean of the jth trait before selection. One of the main OCMLPSI (DCMLPSI) objectives is to change m_j to m_j + d_j, where d_j is the jth (j = 1, 2, …, r; r = the number of constrained traits) constrained trait or the jth predetermined proportional gain imposed by the breeder on the OCMLPSI (DCMLPSI) expected genetic gain per trait (Mallard, 1972; Cerón-Rojas and Crossa, 2018, Chapter 3). Additional OCMLPSI (DCMLPSI) objectives are (i) to maximize the selection response; (ii) to predict the net genetic merit (H = w′g, where w^′ = [w₁ w ₂ … w_n] and g^′ = [g₁ g₂ … g_nare 1 × n vectors of economic weights and true unobservable breeding values, respectively); and (iii) to select individuals with the highest H values as parents of the next generation.

The Part and Whole Phenotypic Index Selection Method

Let y′ = [γ₁ γ₂ … γ_n] be a 1 × n vector of scores for n traits and assume that we can select only n_i of them at Stage i (i = 1, 2, …, N; N = number of stages) such that after N stages, n = n₁ + n₂ + … + n_N, where n_i. < N < n. We can partition y into N subvectors as y′ = [x₁ x₂ … x_N], where x′_i. = [γ₁ γ₂… γ_ni] is the subvector of y at Stage i (i = 1, 2, …, N). This means that at this stage, the ith index is I_i = β_i1γi1 + β_i2γi2… β_{in_iγin_i} = β_i′x_i where β′_i = [β_i1 β_i2 … β_{in_i}] is the index vector of coefficients, whereas x_i was defined earlier. Let

B_{0}^{'} = [\begin{matrix} β_{1}^{'} & 0 & \dots & 0 \\ β_{1}^{'} & β_{2}^{'} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ β_{1}^{'} & β_{2}^{'} & \dots & β_{N}^{'} \end{matrix}]

be a transforming matrix; then, for each stage, we can construct an index as

[\begin{matrix} I_{1} \\ I_{2} \\ ⋮ \\ I_{N} \end{matrix}] = [\begin{matrix} β_{1}^{'} & 0 & \dots & 0 \\ β_{1}^{'} & β_{2}^{'} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ β_{1}^{'} & β_{2}^{'} & \dots & β_{N}^{'} \end{matrix}] [\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ x_{N} \end{matrix}]

(Xu and Muir, 1992; Cerón-Rojas et al., 2019). This last result indicates that until Stage N - 1, each index is partial, but at Stage N, I_N = β′₁x₁ + β′₂x₂ + … +β′_Nx_N is a whole index.

Young (1964) called the foregoing procedure the part and whole index selection method. Xu and Muir (1992) called that selection procedure selection index updating because as traits become available, each subsequent index contains all traits available up to that stage. This method is more efficient than the independent culling selection method because it uses the genetic correlation among traits and all available information at each stage to predict the net genetic merit (Saxton, 1983). In addition, the independent culling selection method cannot impose constraints on the expected genetic gain of each trait, as the constrained index does.

Genotypic and Phenotypic Covariance Matrices

Let g′ = [g₁ g₂ … g_n], x′ = [𝓛₁ 𝓛₂ … 𝓛_{n_i}], and y′ = [γ₁ γ₂ … γ_n] be vectors, as defined in the above subsections. Thus, the genotypic covariance matrix of vectors x_i and g for N stages is

[\begin{matrix} Cov (x_{1}, g) \\ Cov (x_{2}, g) \\ ⋮ \\ Cov (x_{N}, g) \end{matrix}] = [\begin{matrix} G_{1} \\ G_{2} \\ ⋮ \\ G_{N} \end{matrix}] = G

whereas the phenotypic covariance matrix of vector y is Var(y) = {P_ij} = P, where Cov(x_i, g) = G_i is the ith submatrix of G, and P_ij = Cov(xi, x_j) is the ijth (i,j = 1, 2,…, N) submatrix of P.

To obtain the OCMLPSI (DCMLPSI) parameters, we need the following matrices:

Q_{i i} = [\begin{matrix} P_{11} & P_{12} & \dots & P_{1 i} \\ P_{21} & P_{22} & \dots & P_{2 i} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ P_{i 1} & P_{i 2} & \dots & P_{i i} \end{matrix}], A_{i} = [\begin{matrix} G_{1} \\ G_{2} \\ ⋮ \\ G_{i} \end{matrix}]

[1a]

which are submatrices of P and G, respectively. In Appendix A (Eq. [A1] to [A3]), we describe a method to estimate P and G.

Now suppose that the number of traits selected up to Stage i - 1 is n_i-1 and that at Stage i we select n_i traits, such that n_i ≤ n_i-1 (or n_i-1 < n_i). Then, according to the part and whole index selection method, at Stage i, we shall have n-1. + n_i traits. This means that the phenotypic covariance matrix [Q_(i-1)i] obtained with the n_i-1 traits selected at Stage i - 1 and the total n_i-1 traits will be of size n_i-1(n_i-1 + n_i) and can be written as

Q_{(i - 1) i} = \{s_{j c}\}

[1b]

where s_jc is the jcth phenotypic covariance value for j = 1, 2, …, n_i-1 and c = 1, 2, … (n_i-1 + n_i). In addition, n_i-1 and (n_i-1 + n_i) are the numbers of rows and columns of matrix Q_(i-1)i’ respectively. Equation [1b] indicates that Q_(i-1)i is a nonsquare and nonsymmetric matrix. Matrix Q_(i-1)i is useful for imposing the restrictions that make the DCMLPSI values independent among stages (see Cerón-Rojas et al., 2019, for details).

Selection Response at Stage i

At Stage i, the selection response (R_i) is the ith net genetic merit (H = w′g) mean of the selected population and can be written as

R_{i} = k_{i} σ_{H} ρ_{H I_{i}}

[2]

where k_i is the selection intensity (Xu and Muir, 1992; Cerón Rojas et al., 2019), $σ_{H} = \sqrt{w' Cw}$ is the standard deviation of H = w′g, Var(g) = C is the covariance matrix of g, and ρ_{HI_i} is the correlation between H = w′g and the index at Stage i (I_i = β′_i x_i). For N stages, the total selection response is R_t = R₁ + R₂ + … + R_N (Cochran, 1951; Young, 1964). Equation [2] indicates that the genetic gain that can be achieved in R_i by selecting for several traits simultaneously within a population of animals or plants is the product of k_i, <_H, and ρ_HI (Kempthorne and Nordskog, 1959). Selection intensity is limited by the rate of reproduction of each species, whereas <_H is beyond human control; hence, the greatest opportunity for increasing selection progress is by ensuring that ρ_{HI_i} is as large as possible (Hazel 1943). Equation [2] is a useful criterion for comparing the efficiency of different types of indices to predict the net genetic merit (H = w′g; e.g., OCMLPSI efficiency vs. DCMLPSI efficiency). We would expect that the greater Eq. [2] is, the more effective OCMLPSI (DCMLPSI) is at predicting H = w ‘g. In the multistage selection index context, however, one main restriction is that the whole OCMLPSI (DCMLPSI) selection response be lower than or equal to the SCLPSI response (Saxton, 1983; Cerón-Rojas et al., 2019).

Expected Genetic Gain per Trait at Stage i

The expected genetic gain per trait at Stage i (E_i, or multitrait selection response) is the covariance between the true breeding value vector (g) and the I_i = β′_ix_i value weighted by its standard deviation $(σ_{1_{i} =} \sqrt{{β'}_{i} Q_{ii} β_{i}})$ and multiplied by the selection intensity (k_i), so that

E_{i} = k_{i} \frac{A_{i}^{'} β_{i}}{σ_{I_{i}}}

[3]

We defined all the parameters of Eq. [3] previously. In the univariate and single-stage breeding scheme, Eq. [3] is the same as the selection response. For N stages, the total expected genetic gain per trait is E_t = E₁ + E₂ + … + E_n (Cochran, 1951; Young, 1964).

In the OCMLPSI context, we will minimize the mean squared difference between the net genetic merit H = w ‘g and the index I_i = β′_ix_i {i.e., E(H-I_i)²]} with respect to the vector of coefficients β_i(i = 1,2, …, N) under the assumption that Eq. [3] values are equal to the d_j (j = 1, 2, …, r; r = number of constraints) values imposed by the breeder. The resulting vector of coefficients (β_i) should maximize Eq. [2] and make the Eq. [3] values be near the d_j value. In the DCMLPSI context, however, it is necessary to impose the additional restriction that the DCMLPSI values among stages are independent, as we shall see in the next two subsections.

The OCMLPSI Vector of Coefficients at Stage i

Let d’ = [d₁ d₂ … d_r] be a vector 1 X r of constraints or predetermined proportional gains per trait imposed by the breeder, and

D^{'} = [\begin{matrix} d_{r} & 0 & \dots & 0 & - d_{1} \\ 0 & d_{r} & \dots & 0 & - d_{2} \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & \dots & d_{r} & - d_{r - 1} \end{matrix}]

be a Mallard (1972) matrix of size (r — 1)r, where d_j (j = 1, 2, …, r) is the jth element of vector d’. In addition, let U’ be a Kempthorne and Nordskog (1959) matrix (n — r)n (n = number of traits and r = number of constraints) of 1s and 0s, where 1 indicates that the trait is constrained and 0 indicates that the trait has no constraints (see Cerón-Rojas and Crossa, 2018, Chapter 3, for details). According to the single-stage Mallard (1972) constrained index theory, to obtain the OCMLPSI vector of coefficients at Stage i, we need to minimize the mean squared difference between the net genetic merit H = w′g and the index I_i= β′_ix_i {i.e., i.e., E(H - Iⁱ)²]} under the restrictions M′ β_I = 0, where M’_i = D ‘U’A’_i. and A’_i was defined in Eq. [1a].

Suppose that matrices Q_ii’ U, and A’_i’ and vectors d and w are known at Stage i; then, it is necessary to minimize the function

f_{○} (β_{i}, u) = β_{i}^{'} Q_{i i} β_{i} - 2 {wA}_{i}^{'} β_{i} + 2 u^{'} M_{i}^{'} β_{i}

[4]

with respect to the vector of coefficients β_i and the vector of Lagrange multipliers u’ = [u₁ u₂ … u_r-1] The OCMLPSI vector of coefficients at Stage i is

β_{i} = K_{○ i} δ_{i}

[5]

where δ_i = Q_ii^-1 is the inverse of matrix Q_ii’ and A_i and w were defined earlier. In addition, K_Oi = [I_i - F_Oi = Q_ii^-1M)^-1M^′_i, and I_i. is an identity matrix of the same size as Q_ii. When D = U, the vector of coefficients of Eq. [5] imposes null restrictions, and when D = U and U is a null matrix, Eq. [5] is equal to δ_i = Q_ii_-1A_iw, the vector of coefficients of the OMLPSI (Cerón-Rojas et al., 2019). Thus, the OCMLPSI is more general and includes the multistage null phenotypic restricted index (Kempthorne and Nordskog, 1959; Xie and Xu, 1997) and the OMLPSI as particular cases (Cerón- Rojas and Crossa, 2018, Chapter 9).

The DCMLPSI Vector of Coefficients at Stage i

Let I_Di-1 = b^′_i-1x_i-1 and I_Di = b^’_ix_i be the DCMLPSIs at Stages i - 1 and i, respectively. We shall obtain the DCMLPSI vector of coefficients at Stage i with the additional restriction that the covariance between the DCMLPSI values until Stage i _ 1 with the I_Di = b_i^’x_i values be null. Let J^′_{i - 1} =[I^D1 I_D2 … I_D(i-1)] be a vector of DCMLPSIs values until Stage i — 1 such that the covariance between I_Di and J_{i i - 1} will be null. Xu and Muir (1992) and Xie and Xu (1997) showed that the covariance between I_Di and J_{i - 1} is null when B′_D(i-1) Q_(i-1)ib_i = 0, where

B_{D (i - 1)}^{'} = [\begin{matrix} b_{1}^{'} & 0 & \dots & 0 \\ b_{1}^{'} & b_{2}^{'} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ b_{1}^{'} & b_{2}^{'} & \dots & b_{i - 1}^{'} \end{matrix}]'

b^′_{i - 1} = [b_{(i -1)1} b_{(i - 1)2} … b_{(i-1)n_(i-1)}] is the DCMLPSI vector of coefficients at Stage i - 1, Q(_{i - 1)i} was defined in Eq. [1b] and b_i is the DCMLPSI vector of coefficients at Stage i. Thus, to obtain b_i, we need to minimize the mean squared difference between H = w^′g I_Di = b^′_ix_i {i.e., E[(H - I_Di)²]}, under the joint restrictions M^′_ib_i = 0 Cov(I_Di,J_{i - 1})=B′_{D(i - 1) = Q}_{D(i - 1)} b_i = 0.

Let S_{i(i - 1)} = Q_i(i-1) B_D(i-1) be the transpose of matrix S_i(i-1) and assume that matrices Q_ii, Q_i(i-1), U, and A_i and Ai. and vectors d and w are known. To minimize E[(H - I_Di)²] under the restrictions M′_ib_i = 0 and S_{(i - 1)i} = B^′_D(i-1)Q_(i-1)ib_i = 0, it is necessary to minimize the function

f_{D} (b_{i}, u, v) = b_{i}^{'} Q_{i i} b_{i} - 2 w A_{i}^{'} b_{i} + 2 u^{'} M_{i}^{'} b_{i} + 2 v^{'} S_{(i - 1) i} b_{i}

[6]

with respect to the vector of coefficients b_i and the vector of Lagrange multipliers u^′ = [u₁ u₂ u_{r - 1}] and v^′ = [v₁ v₂ … v_{i - 1}]. The only difference between Eq. [4] and Eq. [6] is the term 2v^′S_{(i - 1)}ibi. The DCMLPSI vector of coefficients at Stage i is

b_{i} = K_{D i} δ_{i}

[7]

where δ_i = Q_ii^-1 A_i w, Q_ii^-1 A_i and w were defined in Eq. [5]. In addition, K_Di and [I_i - F_Di], F_Di = Q_ii^-1V_i(V′_iQ₁₁^-1V_i)^-1, V_i = [M_i S_{i(i - 1)}], V′_i = [M′_i S_{(i - 1)i}], and I_i is an identity matrix of the same size as Q_ii. Thus, the only difference between Eq. [5] and [7] is matrix S_{i(i - 1)}. When matrix S_{i(i - 1)} is null, Eq. [5] and [7] are the same, as we would expect. If D = U, Eq. [6] and [7] impose null restrictions; in such cases, we shall have a multistage index similar to the Kempthorne and Nordskog (1959) index.

According to Eq. [5] and [7] results, matrices K_oi and K_Di transform the OMLPSI vector of coefficients (δ_i = Q_ii^-1A_iw) into the OCMLPSI and the DCMLPSI vectors of coefficients, respectively.

Maximized Selection Response at Stage i

It can be shown (Cerón-Rojas and Crossa, 2018, Chapter 9) that when we use Eq. [5] and [7] in Eq. [2], we obtain

R_{O i} = k_{O i} \sqrt{β_{i}^{'} Q_{i i} β_{i}}

[8a]

and

R_{D i} = k_{D i} \sqrt{b_{i}^{'} Q_{i i} b_{i}}

[8b]

which are the maximized OCMLPSI and DCMLPSI selection responses at Stage i, respectively. Although in Eq. [2] the selection response can take any value, in Eq. [8a] and [8b], R_Oi and R_Di give the maximum value of Eq. [2] for the OCMLPSI and DCMLPSI, respectively. In addition, in practice k_Oi and k_Di are obtained with a different method; therefore, their values are generally different (i.e., k_Oi ≠ k_Di). In this work, we obtained the k_Oi value in a two-stage breeding scheme according to Eq. [A6] (Appendix B), whereas we obtained the k_Di values according to the Xu and Muir (1992) method. We described a method to estimated Eq. [8a] and [8b] in Appendix B (Eq. [A1] to [A4b]).

The maximized total OCMLPSI and DCMLPSI selection responses for N stages can be written as as R_{t_O} = R_O1 + R_O2 + … + R_ON and R_{t_D} = R_D1 + R_D2 + … + R_DN, respectively.

Maximized Expected Genetic Gain per Trait at Stage i

Using Eq. [5] and [7] in Eq. [3], the OCMLPSI and DMLPSI expected genetic gains per trait at Stage i can be written as

E_{O i} = k_{O i} \frac{A_{i}^{'} β_{i}}{\sqrt{β_{i}^{'} Q_{i i} β_{i}}}

[9a]

and

E_{D i} = k_{D i} \frac{A_{i}^{'} b_{i}}{\sqrt{b_{i}^{'} Q_{i i} b_{i}}}

[9b]

respectively. We defined all the parameters of Eq. [9a] and [9b] earlier. The maximized total OCMLPSI and DMLPSI expected genetic gains per trait for N stages can be written as E_t₀ = E_O1 + E_o2 + … + E_ON and E_tD = E_D1 + E_D2 + … + E_DN’ respectively. We described a method to estimated Eq. [9a] and [9b] in Appendix B (Eq. [A1] to [A3], and Eq. [A5a] and [A5b]).

Efficiency when Predicting the Net Genetic Merit

According to Lande and Thompson (1990) and Moreau et al. (1998), the efficiency of the indices when predicting the net genetic merit, in percentage terms, is

\emptyset = 100 (T - 1)

[10]

where T = R_Oi/R_Di, R_Oi denotes the OCMLPSI selection response and R_Di the DCMLPSI selection response. Therefore, when Ø is null, the efficiency of both indices is the same; when Ø> 0, the efficiency ofthe OCMLPSI is higher than that of the DCMLPSI, and when Ø < 0, DCMLPSI efficiency is higher than OCMLPSI efficiency for predicting the net genetic merit.

An additional criterion for comparing the indices’ efficiency is that the total selection response R_t = R₁ + R₂ of each index should be lower than or equal to the single-stage constrained index selection response (R = kσ_I), i.e., R_t ≤ R (see Cerón-Rojas et al., 2019, for details).

Adjusting the OCMLPSI Covariance Matrices at Stage 2

At Stage 2, the phenotypic covariance matrix is

P = [\begin{matrix} P_{1} & P_{12} \\ P_{21} & P_{2} \end{matrix}] = Q_{22}

whereas the genotypic covariance matrix is

G = [\begin{matrix} G_{1} \\ G_{2} \end{matrix}] = A_{2}

(Eq. [1a])

These matrices are affected by prior selection on I₁ = β′x₁. It is thus necessary to adjust them to take into consideration the I₁ = β’₁x₁ effects on them. According to Cochran (1951) and Cunningham (1975), both matrices can be adjusted as follows:

P^{*} = P - a \frac{[\begin{matrix} P_{1} \\ P_{21} \end{matrix}] β_{1} β_{1}^{'} [\begin{matrix} P_{1} & P_{21} \end{matrix}]}{β_{1}^{'} P_{1} β_{1}}

[11a]

and

G^{*} = G - a \frac{G_{1}^{'} β_{1} β_{1}^{'} G_{1}}{β_{1}^{'} P_{1} β_{1}}

[11b]

where P* and G* are the adjusted matrix, a = k_O1(k_O1 - u), k_o1 is the selection intensity at Stage 1, u is the truncation point when I₁ = β’₁x₁ is applied, P₁ = Var(x₁) and G₁ = Cov(x₁’ g). Thus, the maximized OCMLPSI selection response (Eq. [8a]) and expected genetic gains (Eq. [9a]) at Stage 2 can be written as $R_{O 2} = κ_{O 2} \sqrt{β'_{2} P * β_{2}} and E_{O 2} = κ_{O 2} (G * β_{2} / \sqrt{β'_{2} P * β_{2}})$ , respectively.

Test of the OCMLPSI (DCMLPSI) Normality Assumption

Several authors (Shapiro and Wilk 1965; Mardia 1980; Mohd- Razali and Bee-Wah 2011; Rani Das and Rahmatullah Imon, 2016) have given details of how to perform a normality test procedure on a dataset and many statistical packages provide graphs and normality tests.

We corroborated the OCMLPSI (DCMLPSI) normality assumption at Stage 2 with a simulated dataset using a graphical method (histograms) and analytical test procedures (the Shapiro—Wilk and Kolmogorov—Smirnov normality test). The corroboration procedure was as follows. In a two-stage context, let p = q₁q₂ be the fixed total proportion retained, where q₁ and q₂ denote the proportion selected at Stage 1 and 2, respectively, and let n be the size of the simulated dataset at Stage 1; then, nq₁ will be the size of the selected individuals at Stage 1. We used the information of nq₁ individuals at Stage to construct graphs and statistical tests to corroborate the OCMLPSI (DCMLPSI) normality assumption.

Materials

Real Dataset

The number of genotypes in this real data set was 3330 and the vector of economic weights (w) was w′ = [19.54 _3.56 17.01 _2.51]. This dataset comes from a commercial egg poultry line (Akbar et al., 1984) and we used it to illustrate the indices’ theoretical results obtained in this work. The estimated phenotypic (P) and genotypic (Ĉ) covariance matrices among the rate of lay (RL, number of eggs), age at sexual maturity (SM, d), egg weight (EW, kg), and body weight (BW, kg) were

\hat{P} = [\begin{matrix} 240.57 & - 95.62 & 2.06 & 54.40 \\ - 95.62 & 167.20 & 4.58 & 15.36 \\ 2.06 & 4.58 & 22.80 & 37.20 \\ 54.40 & 15.36 & 37.20 & 516.11 \end{matrix}]

and

\hat{C} = [\begin{matrix} 29.86 & - 17.90 & - 4.13 & - 1.75 \\ - 17.90 & 18.56 & 1.49 & - 4.88 \\ - 4.13 & 1.49 & 9.24 & 16.66 \\ - 1.75 & - 4.88 & 16.66 & 179.73 \end{matrix}]

The total proportions (p) of retained values for this dataset were p = 0.05, 0.10, 0.20, and 0.30 for both indices.

For illustration purposes only, at Stage 1, we selected RL, SM, and EW, where RL and SM were constrained by the vector of predetermined restrictions d’ = [3 -1] and matrices

U^{'} = [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{matrix}]

and D’ = [-1 -3] for both stages, whereas at Stage 2, we selected trait BW only. The vectors of records at Stages 1 and 2 were x’₁ = [RL SM EW] and y’ = [x’₁ x’₂], respectively, where x₂ = BW. At Stage 1, the estimated phenotypic (Ô₁₁) and genotypic (Â₁) covariance matrices were

{\hat{Q}}_{11} = [\begin{matrix} 240.57 & - 95.62 & 2.06 \\ - 95.62 & 167.20 & 4.58 \\ 2.06 & 4.58 & 22.80 \end{matrix}]

and

{\hat{A}}_{1} = [\begin{matrix} 29.86 & - 17.90 & - 4.13 & - 1.75 \\ - 17.90 & 18.56 & 1.49 & - 4.88 \\ - 4.13 & 1.49 & 9.24 & 16.66 \end{matrix}]

respectively, whereas the estimated covariance matrix of the traits at Stage 1 with traits at Stage 2 (Ô₁₂) was

{\hat{Q}}_{12} = [\begin{matrix} 240.57 & - 95.62 & 2.06 \\ - 95.62 & 167.20 & 4.58 \\ 2.06 & 4.58 & 22.80 \end{matrix} \begin{matrix} 54.40 \\ 15.36 \\ 37.20 \end{matrix}]

Simulated Datasets

These datasets are available in the Application of a Genomics Selection Index to Real and Simulated Data repository, at http://hdl.handle.net/11529/10199. They were simulated by Ceron-Rojas et al. (2015) with QU-GENE software (Podlich and Cooper, 1998) using 2500 molecular markers and 315 quantitative trait loci (QTLs) for eight phenotypic selection cycles (C0—C7), each with four traits (T₁’ T₂’ T₃’ and T₄)’ 500 genotypes and four replicates for each genotype. The authors distributed the markers uniformly across 10 chromosomes and the QTLs randomly across the 10 chromosomes to simulate maize (Zea mays L.) populations. A different number of QTLs affected each of the four traits: 300, 100, 60, and 40, respectively. The common QTLs affecting the traits generated genotypic correlations of -0.5, 0.4, 0.3, -0.3, —0.2, and 0.1 between T₁ and T₂’ T₁’ and T₃’ T₁ and T₄’ T₂’ and T₃’ T₂ and T₄’ T₃ and T₄’ respectively. The economic weights for T₁’ T₂’ T₃’ and T₄ were 1, —1, 1, and 1, respectively.

We used four phenotypic selection cycles (C1—C4) with p = 0.01, 0.10, and 0.30 in each cycle. At Stage 1 we selected T₁’ T₂’ and T₃’ where T₁ and T₂ were constrained with vector d’ = [5 —2] and matrices

U^{'} = [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{matrix}]

and D’ = [—2 —5] for both stages. At Stage 2 we selected only trait T₄; thus, the vector of observations at Stage 1 was x’ ₁ = [T₁ T₂ T₃] and at Stage 2,y ’ = [x’₁ x‘₂]’ where x₂ = T₄.

RESULTS Real Data

Truncation Points, Proportion Retained, and Selection Intensities for Two Stages

Figure 1 shows the relationship among the truncation points (u₁ and u₂), the total proportion retained (p = q₁q₂) and the heights of the ordinate of the normal curve: z(u₁) = e^-0.5u₁² and z(u₂) We found the OCMLPSI selection intensity for Stages 1 [k₁ = z(u₁)/q₂] according to Eq. [A6] (Appendix B) as follows. For a fixed value of p = q₁q₂ (e.g., p = 0.05), we used an iterative process with an R code. By successively changing the possible values of q₁ (q₂ = p/ q₁), u₁, and u₂, we found the maximum value of the estimated total OCMLPSI (DCMLPSI) selection response,R^_t = R^₁ + R^₂ (Fig. 2). For example, for the real dataset and p = 0.05, the estimated total OCMLPSI selection response was R^_t = R^₁ + R^₂ = 25.592k₁ + 26.580k₂ = 69.75, where ₁ =k₁ô_I_i = 33.265 and R^₂ = k₂ = 36.481 were the estimated selection responses at each stage (Table 1), whereas ô₁₂ = 26.579 Î₁ and Î₂ were the estimated standard deviations of the variance of Î₁ and Î₂ for Stages 1 and 2, respectively. Thus, for this dataset, the values of the truncation points (u₁ = 0.710 and u₂ = 0.81), proportions retained (q₁ = 0.24 and q₂ = 0.21) and selection intensity (k₁ = 1.30 and k₂ = 1.37), at both stages, were those associated with the maximum estimated total OCMLPSI selection response Ȓ_t= 69.75 value. Table 1, presents additional truncation points, proportions retained, selection intensities for p = q₁q₂ = 0.10, 0.20, and 0.30, associated to the OCMLPSI¹ a²nd DCMLPSI selection responses.

Fig. 1 — Theoretical relationship between the truncation points (u), the proportion retained (p), and the density values [z(u)] of the truncation points (after Ceron-Rojas et al., 2019).

Fig. 2 — Distribution of the total estimated optimum and decorrelated constrained multistage linear phenotypic selection index (OCMLPSI and DCMLPSI, respectively) selection response values for a real dataset, and the fixed total proportion retained (p) = 0.05 and 0.10.

Table 1.

Real data for total proportion (p) retained, estimated optimum and decorrelated constrained multistage linear phenotypic selection index (OCMLPSI and DCMLPSI, respectively) truncationpoints (u₁ and u₂), proportions retained (q₁ selection intensities (k₁ and k₂), and selection response (Ȓ₁, Ȓ₂, and Ȓ_t =Ȓ₁ + Ȓ₂) for Stages 1 and 2. Values of >R >correspond to the one-stage estimated constrained linear phenotypic selection index selection response.

Index	P	u_i	U₂	q₁	q₂	k_i	k₂	Ȓ₁	Ȓ₂	Ȓ_t	Ȓ
OCMLPSI	0.05	0.71	0.81	0.24	0.21	1.30	1.37	33.26	36.48	69.75	71.66
	0.10	0.41	0.55	0.34	0.29	1.07	1.18	27.45	31.95	59.40	60.96
	0.20	0.03	0.23	0.49	0.41	0.81	0.95	20.85	26.65	47.50	48.63
	0.30	−0.25	0.00	0.60	0.50	0.64	0.80	16.48	22.96	39.44	40.26
DCMLPSI	0.05	0.86	0.65	0.19	0.26	1.42	1.25	36.29	29.40	65.69	71.66
	0.10	0.58	0.37	0.28	0.35	1.20	1.05	30.65	24.66	55.30	60.96
	0.20	0.23	0.03	0.41	0.49	0.95	0.82	24.23	19.24	43.47	48.63
	0.30	−0.03	−0.22	0.51	0.59	0.78	0.67	19.91	15.64	35.55	40.26

Open in a new tab

Estimated OCMLPSI Selection Response for Two Stages

In the one-stage case, the selection intensity for p = 0.05 was k = 2.063, and the SCLPSI selection response was Ȓ = 71.66 (see Cerón-Rojas and Crossa, 2018, Chapter 3, for details). According to the results detailed in the paragraph above and to Young (11964) and Saxton (1983), the maximum estimated total OCMLPSI selection response (Ȓ_t= 69.75) value should be lower than or equal to the estimated SCLPSI response (Ȓ = 71.66). For this dataset, Ȓ_t= 69.75 explained 97.33% of the Ȓ = 71.66 value. That is, Ȓ_t = 69.75 and R = 71.66 were very similar.

In Table 1, we present additional maximum estimated OCMLPSI selection response values for p = q₁q₂ = 0.10, 0.20, and 0.30. For each of the latter three p values, the maximum estimated total OCMLPSI selection responses explained 97.44, 97.68, and 97.96%, respectively, of the SCLPSI selection response values. Thus, for this real dataset, the estimated total OCMLPSI selection response and the estimated SCLPSI selection response were very similar, as we would expect.

Estimated DCMLPSI Selection Response for Two Stages

For both stages, the estimated DCMLPSI vectors of coefficients (Eq. [7]) were b’₁ = [1.809 1.132 1.065] and b’₂ = [0.349 —0.262 1.743 —1.076]. Because at Stage 1 the restriction matrix S(_i-1)i was null [S_(i-1)i=0], the estimated DCMLPSI vector of coefficients was the same as the OCMLPSI vector of coefficients (i.e.,b^₁ = β^′₁; Eq. [5] and [7]); however, at Stage 2, S(i - 1)i ≠ and β^′₂.

For p = 0.05, the selection intensities obtained with the Xu and Muir (1992, Eq. [19]) method were k₁ = 1.42 and k₂ = 1.25 at Stages 1 and 2, respectively, from where the estimated maximized selection responses for both stages were Ȓ₁ = 39.29 and Ȓ₂ = 29.40 (Appendix A, Eq. [A4b]), whereas R^_t = R^₁R^₂ = 65.69 was the total estimated selection response. This means that Ȓ_t = 65.69 explained 91.67% of the estimated SCLPSI selection response (Ȓ = 71.66) value.

Table 1 presents additional maximum estimated DCMLPSI selection response values when p = q₁q₂ = 0.10, 0.20, and 0.30. For 0.10, 0.20 and 0.30, the estimated total selection response explained 90.72, 89.40, and 88.30%, respectively, of the estimated SCLPSI selection response.

The results of the last two subsections indicate that the average of the estimated total DCMLPSI and OCMLPSI selection responses explained 90 and 97.60%, respectively, of the average estimated SCLPSI selection response for all p values. This means that the average of the estimated total OCMLPSI selection response was 7.60% closer to the estimated SCLPSI selection response than the average of the estimated total DCMLPSI selection response. We explain the loss of DCMLPSI efficiency by noting that when DCMLPSI obtained its vector of coefficients, it incorporated an additional restriction, which made the DCMLPSI values independent at different stages. Xu and Muir (1992) and Xie and Xu (1997) indicated that the loss of efficiency is justified because their method for obtaining the selection intensities and total responses gives the breeder the opportunity to implement an unlimited number of selection stages, which otherwise would be very difficult or impossible to do.

Estimated OCMLPSI Expected Genetic Gain per Trait for Two Stages

Let p = 0.05 (k₁ = 1.30 and k₂ = 1.37); then, the estimated OCMLPSI expected genetic gains per trait (Appendix A, Eq. [A5a]) for both stages were E^^′₁ = [1.49 -0.50 0.21 0.46] and E^^′₂ = [1.92 -0.64 0.01 -8.02], while E^^′ = E^′₁ + E^′₂ = [3.41 .1.14 0.21 .7.56] [3.41 -1.14 0.21 -7.56] was the total estimated expected genetic gain per trait. Each E^′_t value was associated with the mean of traits rate of lay (RL, number of eggs), age at sexual maturity (SM, days), egg weight (EW, kg) and body weight (BW, kg). We constrained traits RL and SM by vector d′ =[3 -1] values. This means that the E^′_t values associated with traits RL and SM (3.41 and -1.14, respectively) overestimated the d’ = [3 —1] values by 10.33 and 14%, respectively.

Table 2 presents additional estimated expected genetic gains per traits RL, SM, EW, and BW for both stages and p = 0.10, 0.20, and 0.30. For these last three p values, the estimated total OCMLPSI expected genetic gains per traits RL and SM explained 95, 74, and 59% of each d’ = [3 -1] value, respectively. That is, the accuracy of the estimated OCMLPSI expected genetic gains per trait decreased when the p values increased from 0.10 to 0.30. For this real dataset, the optimum expected genetic gain per trait efficiency occurred when p = 0.10.

Table 2.

Total proportion (p) retained; estimated optimum and decorrelated constrained multistage linear phenotypic selection index (OCMLPSI and DCMLPSI, respectively) expected genetic gains per trait for four real traits: rate of lay (RL), age at sexual maturity (SM), egg weight (EW), and body weight (BW). Traits RL and SM were constrained with vector d’ (a vector of constraints or predetermined proportional gains per trait imposed by the breeder) = [3 -1] values.

Index	Stage 1						Stage 2		Total
Index	p	RL	SM	EW	BW	RL	SM	EW	BW	RL	SM	EW	BW
no. eggs	d		kg		no. eggs	d		kg		no. eggs	d		kg
OCMLPSI	0.05	1.49	−0.50	0.21	0.46	1.92	−0.64	0.01	−8.02	3.41	−1.14	0.21	−7.56
0.10	1.23	−0.41	0.17	0.38	1.62	−0.54	0.01	−6.72	2.85	−0.95	0.18	−6.34
0.20	0.93	−0.31	0.13	0.29	1.27	−0.42	0.01	−5.26	2.21	−0.74	0.14	−4.97
0.30	0.74	−0.25	0.10	0.23	1.04	−0.35	0.01	−4.30	1.78	−0.59	0.11	−4.07
DCMLPSI	0.05	1.63	−0.54	0.22	0.50	0.52	−0.17	0.19	−8.72	2.15	−0.72	0.03	−8.22
	0.10	1.37	−0.46	0.19	0.42	0.44	−0.15	0.16	−7.32	1.81	−0.60	0.03	−6.89
	0.20	1.09	−0.36	0.15	0.34	0.34	−0.11	0.13	−5.71	1.43	−0.48	0.02	−5.37
0.30	0.89	−0.30	0.12	0.28	0.28	−0.09 -	−0.10	−4.64	1.17	−0.39	0.02	−4.36

Open in a new tab

Estimated DCMLPSI Expected Genetic Gain per Trait for Two Stages

For p = 0.05 (k₁ = 1.42 and k₂ = 1.25), the estimated DCMLPSI expected genetic gains per trait for both stages were E^′₁ = [1.63 _0.54 0.22 0.50] and E^′₂ = [0.52 _0.17 -0.19 -8.72], whereas E^′_t =E^′₁ + E^′₂ = [2.15 -0.72 0.03 -8.22] was the total estimated expected genetic gain per trait. Each E^′_t value is associated with traits RL (rate of lay, number of eggs), SM (age at sexual maturity, days), EW (egg weight, kg), and BW (body weight, kg), and traits RL and SM were constrained by vector d’ = [3 -1] values. The E^′_t values associated with RL and SM (2.15 and -0.72, respectively) explained only 71.67 and 72% of each d’ = [3 -1] value, respectively.

Table 2 presents additional estimated expected genetic gains per traits RL, SM, EW, and BW for both stages and p = 0.10, 0.20, and 0.30. For 0.10, 0.20, and 0.30, the estimated total expected genetic gains per traits RL and SM explained 60.33, 47.67, and 39% of each d′ = [3 -1] value, respectively. Thus, for this dataset, the estimated expected genetic gains per trait underestimated the d′ = [3 -1] values. We explained the loss of DCMLPSI accuracy, noting that when the DCMLPSI obtained its vector of coefficients, it incorporated an additional restriction to make the DCMLPSI values independent among stages. The average estimated DCMLPSI expected genetic gain per trait efficiency was 54.67%, whereas the average estimated OCMLPSI expected genetic gain per trait efficiency was 85% for all p values. Thus, the average of the estimated OCMLPSI accuracy associated with d′ = [3 -1] values was 35% higher than the average of the estimated DCMLPSI accuracy associated with d′ = [3-1] for the real dataset.

The results of the above four subsections indicate that the accuracy of both indices was higher when they predicted the selection response than when they estimated the expected genetic gain per trait. However, for the real data, the efficiency of the OCMLPSI when predicting the selection response and estimating the expected genetic gain per trait was higher than the DMLPSI efficiency when predicting the selection response and estimating the expected genetic gain per trait.

OCMLPSI Efficiency vs. DCMLPSI Efficiency to Predict the Net Genetic Merit

Equation [10] is a tool for determining OCMLPSI efficiency vs. DCMLPSI efficiency when predicting the net genetic merit in percentage terms. The estimated average OCMLPSI efficiency to predict the net genetic merit in percentage terms is 100(97.604/90.019 - 1) = 8.426%, where 97.604 and 90.019 are the average of the estimated total OCMLPSI and DCMLPSI selection responses (Table 1) for all p values, respectively, and 8.426% is OCMLPSI efficiency with respect to DCMLPSI efficiency, in percentage terms, to predict the net genetic merit. Thus, for the Akbar et al. (1984) real dataset, the estimated average OCMLPSI efficiency was 8.426% higher than the estimated average DCMLPSI efficiency for predicting the net genetic merit.

Simulated Data Estimated OCMLPSI and DCMLPSI Selection Response

For p = q₁q₂ = 0.01, 0.10, and 0.30, Table 3 presents the estimated OCMLPSI and DCMLPSI responses R^₁ , R^₂ , R^′_t = R^′₁ + R^′₂ and SCLPSI responses R^′_0.01 , R^′_0.10 , R^′_0.30 ) for four simulated selection cycles in a two-stage breeding selection scheme. For p = 0.01, the average ofthe estimated total OCMLPSI selection responses (22.79) explained 99.80% of the average of the SCLPSI selection responses (22.84), whereas for p = 0.10 and 0.30, the average of the estimated total OCMLPSI selection responses (15.13 and 10.11, respectively) explained 100.60 and 101.81% of the average of the SCMLPSI selection response (15.04 and 9.93, respectively). Thus, for this dataset, the OCMLPSI and SCLPSI results were equivalent for all p values.

For p = 0.01, the average of the estimated total DCMLPSI selection responses (21.84) explained 95.62% of the average of the SCMLPSI selection responses (22.84), whereas for p = 0.10 and 0.30, the average of the estimated total DCMLPSI selection responses (14.43 and 9.49, respectively) explained 95.94 and 95.57% of the average of the SCMLPSI selection responses (15.04 and 9.93, respectively).

Table 3.

Simulated data for total proportion retained (p)= q₁q₂ = 0.01, 0.10, and 0.30, and estimated optimum and decorrelated constrained multistage linear phenotypic selection indices (OCMLPSI and DCMLPSI, respectively) responses (Ȓ₁, Ȓ₂, and Ȓ_t = Ȓ +₂) and single-stage constrained linear phenotypic selection index (SCLPSI) responses (Ȓ_0.01, Ȓ_0.10, and Ȓ_0.30) for four simulated selection cycles in a two-stage breeding scheme.

Index	p = 0.01			OCMLPSI >p = 0.10			p = 0.30			SCLPSI>
Index	Cycle	Ȓ₁	Ȓ₂	Ȓ_t	Ȓ₁	Ȓ₂	Ȓ_t	Ȓ₁	Ȓ₂	Ȓ_t	Ȓ_0.01	Ȓ_0.10	Ȓ₀₃₀
OCMLPSI	1	21.59	3.16	24.75	13.87	2.57	16.44	8.75	2.24	10.99	24.77	16.31	10.77
	2	20.39	3.09	23.47	12.99	2.6	15.59	8.18	2.24	10.42	23.54	15.50	10.24
	3	17.75	3.48	21.23	11.28	2.81	14.09	7.09	2.32	9.41	21.39	14.08	9.30
4	18.95	2.73	21.68	12.21	2.19	14.4	7.70	1.93	9.63	21.68	14.28	9.43
	Avg.	19.67	3.12	22.79	12.59	2.54	15.13	7.93	2.18	10.11	22.84	15.04	9.93
DCMLPSI	1	21.59	2.08	23.68	15.25	0.47	15.72	10.15	0.20	10.35	24.77	16.31	10.77
	2	20.39	2.12	22.51	14.40	0.48	14.88	9.58	0.20	9.78	23.54	15.50	10.24
	3	18.17	2.27	20.44	12.57	0.80	13.36	8.43	0.33	8.76	21.39	14.08	9.30
	4	18.95	1.77	20.72	13.38	0.40	13.78	8.91	0.17	9.07	21.68	14.28	9.43
Avg.	19.78	2.06	21.84	13.90	0.54	14.43	9.27	0.22	9.49	22.84	15.04	9.93

Open in a new tab

The results in this section indicate that although the average of the total OCMLPSI selection response, for all p values, overestimated the average of the SCLPSI by 0.73%, the average of the total DCMLPSI selection response, for all p values, underestimated the average of the SCLPSI by 4.30%. Thus, for this simulated dataset, the OCMLPSI was the best predictor of the net genetic merit, and its accuracy when predicting the selection response was higher than the DMLPSI accuracy for predicting the selection response.

Estimated OMLPSI and DMLPSI Expected Genetic Gains per Trait

Table 4 presents the estimated OCMLPSI and DCMLPSI expected genetic gains per trait (E^′₁ , E^′₂ , and E^′_t = E^′₁ + E^′₂) for four simulated selection cycles and p = q₁q₂ = 0.30 in a two-stage context. Each E^′_t value was associated with the mean values of traits T₁, T₂, T₃, and T₄. In addition, in both indices, traits T₁ and T₂ were constrained by vector d′ = [5-2] values. The average of the estimated total OCMLPSIE^′_t values associated with traits T₁ and T₂ (5.76 and -2.30, respectively) overestimated the d′ = [5 -2] values by 15.20%. However, the average of the estimated total DCMLPSI E^′_t values associated with traits T₁ and T₂ (5.05 and -2.02, respectively) overestimated the d’ = [5 -2] values by only 1.0%. Nevertheless, note that at Stage 2, the averages of the estimated total DCMLPSI expected genetic gains per trait associated with traits T₁, T₂, and T₃ (0.02, -0.01, and 0.0, respectively) were practically null. This means that DCMLPSI efficiency occurred at Stage 1, when restriction matrix S′_(i was null [S′_{(i - 1)i} = 0] and b^′₁ = β^′₁. Thus, for this dataset, the average of the estimated total DCMLPSI expected genetic gains per trait was more efficient for predicting the d′ = [5 -2] values than the OCMLPSI, but the highest DCMLPSI efficiency occurred at Stage 1, when b^′₁ = β^′₁ and the estimated standard deviations of OCMLPSI and DCMLPSI values were the same.

Table 4.

Estimated optimum and decorrelated constrained multistage linear phenotypic selection indices (OCMLPSI and DCMLPSI, respectively) expected genetic gains per trait (E^) and (Ê′₁, Ê′₂, and Ê′_t = Ê′₁, Ê′₂) for Stages 1 and 2 in four simulated selection cycles with total proportion retained (p = q₁q₂ = 0.30 Traits T₁ and T₂ were constrained with vector d’ (a vector of constraints or predetermined proportional gains per trait imposed by the breeder) = [5 -2] values on the two indices.

Stage 1 Ê′₁					Stage 2 Ê′₂					Ê′_t = Ê′₁, Ê′₂
Index	Cycle	T¹	T²	T³	T⁴	T¹	T²	T³	T⁴	T¹	T²	T³	T⁴
OCMLPSI	1	4.55	−1.82	2.08	0.31	1.49	−0.59	0.66	0.54	6.03	−2.41	2.73	0.85
	2	4.30	−1.72	1.87	0.29	1.39	−0.56	0.60	0.59	5.69	−2.28	2.47	0.88
	3	3.87	−1.55	1.58	0.09	1.48	−0.59	0.57	0.64	5.36	−2.14	2.15	0.73
	4	4.50	−1.80	1.26	0.14	1.44	−0.58	0.40	0.43	5.94	−2.38	1.67	0.56
Avg.	4.31	−1.72	1.70	0.21	1.45	−0.58	0.56	0.55	5.76	−2.30	2.25	0.76
DCMLPSI	1	5.28	−2.11	2.41	0.35	0.01	−0.01	0.00	0.18	5.29	−2.12	2.41	0.53
	2	5.04	−2.02	2.19	0.34	0.01	0.00	0.00	0.18	5.05	−2.02	2.20	0.52
	3	4.60	−1.84	1.87	0.11	0.04	−0.01	0.00	0.28	4.64	−1.86	1.87	0.39
	4	5.21	−2.08	1.46	0.16	0.00	0.00	0.00	0.16	5.21	−2.08	1.46	0.32
	Avg.	5.03	−2.01	1.98	0.24	0.02	−0.01	0.00	0.20	5.05	−2.02	1.98	0.44

Open in a new tab

We also estimated the total expected genetic gains per trait of both indices for p = 0.01, 0.10, and 0.20 (data not shown); however, in all cases, those values were higher than the d′ = [5 -2] values. For example, for p = 0.10, the averages of the estimated total OCMLPSI and DCMLPSI expected genetic gains per trait associated with d’ = [5 -2] were 8.78 and -3.51, and 7.58 and -3.03, respectively.

The difference between the OCMLPSI and DCMLPSI expected genetic gains per trait is due to the different number of genotypes used to estimate the parameters. That is, in the real dataset, the number of genotypes was 3330, but in the simulated data, the number of genotypes was only 500, which represents only 15% of the size of the genotypes used in the real dataset to estimate the parameters of the indices. This means that the number of genotypes used to estimate the indices’ parameters was an important factor for both indices in the real and simulated data.

The results of the real and simulated datasets indicated that the OCMLPSI is the most efficient index for predicting the net genetic merit, and its accuracy when predicting the selection response and estimating the expected genetic gain per trait was higher than DCMLPSI accuracy when predicting the selection response and estimating the expected genetic gain per trait.

Normality Test for the Estimated OCMLPSI and DCMLPSI Values at Stage 2

We used the simulated dataset in Cycle 2 to test the normality assumption of the estimated OCMLPSI and DCMLPSI values at Sage 2. In Cycle 1, the number of genotypes was 500. For p = q₁q₂ = 0.05 and 0.30, the q₁ values for OCMLPSI were 0.22 and 0.55, whereas those values for DCMLPSI were 0.06 and 0.31, respectively. Then, at Stage 2, (0.2)(500) = 110 and (0.55)(500) = 270 were the number of genotypes for OCMLPSI, whereas for DCMLPSI, the number of genotypes were (0.06)(500) = 30 and (0.31)(500) = 155. We used these last numbers of genotypes to construct histograms (Fig. 3) of the estimated OCMLPSI and DCMLPSI values at Stage 2.

According to the histograms constructed for the estimated OCMLPSI values, when the number of genotypes changed from 110 (Fig. 3a) to 270 (Fig. 3b), the estimated OCMLPSI values were closer to the normal distribution. The same was true for the estimated DCMLPSI values (Fig. 3c and 3d).

We describe now the Shapiro-Wilk and Kolmogorov- Smirnov normality test results of the estimated OCMLPSI and DCMLPSI values at Stage 2 (Cycle 2) when the number of genotypes was 110 and 270 for OCMLPSI, and 30 and 155 for DCMLPSI. With the simulated dataset, we tested the null hypothesis that the estimated OCMLPSI and DCMLPSI values at Stage 2 have normal distribution.

The statistical value of the Shapiro-Wilk test should be close to 1.0 to accept the null hypothesis, whereas the statistic value of the Kolmogorov-Smirnov test should be close to 0.0 to accept the null hypothesis (Rani Das and Rahmatullah Imon, 2016). In the present case, for the values associated with OCMLPSI (110 and 270), the statistic values of the Shapiro-Wilk were 0.958 and 0.989, whereas the statistic values of the Kolmogorov-Smirnov were 0.080 and 0.044, respectively. Thus, we believe that for the estimated OCMLPSI values, the null hypothesis was true. In a similar manner, for the values associated with DCMLGSI (30 and 155), the statistic values of the Shapiro-Wilk were 0.967 and 0.991, whereas the statistic values of the Kolmogorov-Smirnov were 0.094 and 0.029, respectively. We again accept that the estimated DCMLPSI values approach the normal distribution.

DISCUSSION

Criteria Used to Evaluate the Relative Efficiency of the Indices

A criterion used to evaluate OCMLPSI efficiency vs. DCMLPSI efficiency when predicting the net genetic merit was that the estimated total OCMLPSI and DCMLPSI selection response must be lower than or equal to the single-stage estimated OCLPSI selection response. Additional criteria were the ratio of the OCMLPSI selection response over the DCMLPSI selection response and the estimated expected genetic gain per trait or multitrait selection response. The estimated total selection response of both indices predicted the mean value of the net genetic merit in the progeny population, whereas the estimated expected genetic gain values indicated how close the estimated mean values of the traits are to the predetermined proportional gains (or constraints) imposed by the breeder in each selection cycle. Both parameters are good criteria for comparing the efficiency of the indices, depending on the method used to estimate the vector of coefficients of each index.

Selection Intensities

The selection intensities (k₁ and k₂) of both indices had three main parts: the proportions retained (q₁ and q₂), the truncation points (u₁ and u₂) and the height of the ordinate of the normal curve $[z (u_{1}) = e^{- 0.5 u_{1}^{2}} / \sqrt{2 π} a n d z (u_{2}) = e^{- 0.5 u_{2}^{2}} / \sqrt{2 π}]$ (Fig. 1). We obtained the k₁ and k₂ values for OCMLPSI with Eq. [A6] (Appendix B) method and with the Xu and Muir (1992) method for DCMLPSI. Both approaches were associated with the maximum total selection response R^′_t = k₁Q^′_I1 + k₂Q^′_I2) value, and the values of k₁ and k₂ were affected by the method used to obtain their values at Stages 1 and 2. When the p values changed from 0.05 to 0.30, the u₁ and u₂ values decreased, the q₁ and q₂ values increased and the k₁ and k₂ values decreased in both indices, as we would expect.

Equation [A6] (Appendix B) to obtain the OCMLPSI selection intensities in a two-stage context was proposed by Cerón-Rojas et al. (2019). These authors compared their results with the results of Saxton (1983), who used a numerical integration method to obtain truncation points, proportion retained, and selection intensities in a two-stage context. Saxton (1983) applied a two-stage selection scheme in two ways: first, by selecting three traits and then two traits; and second, by first selecting the last two traits and later the first three traits. Under the first scheme, Saxton (1983) found that the estimated total selection response overestimated the single-stage LPSI response by 3.8%, but under the second, he found that the estimated total selection response overestimated the single-stage LPSI response by only 1.5%. These results were very similar to the results obtained by Cerón-Rojas et al. (2019) when they used real data. This mains that, at least in a two-stage context, Equation [A6] was a good method to obtain the truncation points, proportion retained, and selection intensities.

Number of Restrictions Imposed on the Indices

The OCMLPSI solved the OMLPSI equations subject to the restriction that the covariance between the OCMLPSI and some linear combinations of the genotypes involved be equal to a vector of predetermined proportional gains (or constraints) imposed by the breeder. However, in addition to the latter restriction, the DCMLPSI imposed the restriction that the covariance between DCMLPSI values at different stages be zero. The latter restriction decreased DCMLPSI efficiency after Stage 1, and as a result, its selection response and expected genetic gain were lower than the OCMLPSI selection response and expected genetic gain for the real and simulated datasets at Stage 2. Xu and Muir (1991, 1992) and Xie and Xu (1997) indicated that the loss of DCMLPSI efficiency after Stage 1 is justified because their method for obtaining the selection intensities and total responses gives the breeder the opportunity to implement an unlimited number of selection stages, which would otherwise be very difficult or impossible to do. At Stage 1, when the additional DCMLPSI restriction was null, the DCMLPSI and OCMLPSI vectors of coefficients were the same, as we would expect. Incidentally, this corroborated that both indices were applications of the SCLPSI to the multistage context.

According to Xu and Muir (1991,1992), the restriction that made the covariance between DCMLPSI values at different stages be zero is similar to the Kempthorne and Nordskog (1959) restriction imposed on the expected genetic gain per trait, which makes some traits not change their mean values while the rest of the trait means remain without restrictions. In effect, the DCMLPSI used a projector matrix (e.g., (K_Di) to project the OMLPSI vector of coefficients (δ_i) into a space smaller than the original space of δ_i, whereas Kempthorne and Nordskog (1959) used a projector matrix to project the single-stage LPSI vector of coefficients into a space smaller than the original space of the LPSI vector of coefficients. The reduction of the space into which the Kempthorne and Nordskog (1959) matrix projects the LPSI vector of coefficients is equal to the number of zeros that appears in the expected genetic gain per trait, and the selection response and accuracy decrease as the number of restrictions increases (Cerón-Rojas and Crossa, 2018, Chapter 3). Nevertheless, it is not clear whether under the Xu and Muir (1992) restrictions the expected genetic gain per trait, the selection response, and the accuracy decrease as the number of stages increases. If this is true, the Xu and Muir (1992) method could not give the breeder the opportunity to implement an unlimited number of stages, because the expected genetic gain per trait, the selection response, and the accuracy will decrease as the number of stages increases and soon would be null.

In the DMLPSI context, Xie et al. (1997) compared the estimated single-stage LPSI selection response with the estimated DMLPSI selection response for two and three stages and found that at Stages 2 and 3, the estimated total DMLPSI selection response explained only 92 and 87%, respectively, of the estimated LPSI selection response. That is, at Stage 3, the estimated total DMLPSI selection response was lower (5%) than at Stage 2.

Another Way of Writing the OCMLPSI and DCMLPSI Vectors of Coefficients

We wrote the OCMLPSI and DCMLPSI vectors of coefficients (β_i = k_Oi and b_i = k_Oiδ_i, respectively) as a projection of the OMLPSI vector of coefficients (δ_i = Q_ii^-1A_iw) into a space that is perpendicular to the space generated by the columns of matrix M_i(V_i) made by the projector matrices K_Oi² and K_Di which are idempotent (K_Oi = K_Oi² and K_Di = K_Di²). This is the simplest way of writing the OCMLP SI and DCMLPSI vectors of coefficients. However, there is another way of writing the OCMLPSI and DCMLPSI vectors of coefficients based on the Tallis (1985) approach.

The Tallis (1985) approach requires a proportionality constant which, according to Itoh and Yamada (1987), represents the regression coefficient of the net genetic merit (H = w′g)on Q_ii^-1V_i(V′_iP^-1V_i)^-1d₀, where d₀ is the DCMLPSI vector of predetermined restrictions. There are some problems associated with the proportionality constant. For example, if the proportionality constant is positive, it is appropriate for the DCMLPSI (OCMLPSI) vector of coefficients, and there is no problem; however, if the proportionality constant is negative, the indices will move the population means in the opposite direction to the predetermined desired direction.

Another Constrained Multistage Index

Xie and Xu (1997) developed a constrained multistage selection index as an extension of the DMLPSI developed by Xu and Muir (1992) based on the Tallis (1962) index. Using the Akbar et al. (1984) real data, we found that the Xie and Xu (1997) index was not optimum. The average of the estimated Xie and Xu (1997) selection response for p = 0.05, 0.10, 0.20, and 0.30 explained only 68.55% of the one-stage SCLPSI, whereas the average of the estimated total OCMLPSI and DCMLPSI index selection responses explained 97.60 and 90%, respectively, of the estimated SCLPSI. Similarly, the estimated total Xie and Xu (1997) expected genetic gain values per trait explained only 10.17% of the vector d’ = [3 -1] values for both stages. These results indicated that, in effect, the Xie and Xu (1997) index is not optimum and breeders should not use it.

Cerón-Rojas and Crossa (2018) applied the OCMLPSI to the Hicks et al. (1998) dataset, but they used the Young (1964) method to obtain the selection intensities for two stages; thus, their results were approximations because the Young (1964) method overestimated the selection intensities (see Ceron-Rojas et al., 2019, for details).

The Multivariate Normality Assumption of Both Indices

The multivariate normality assumption of the estimated OCMLPSI (DCMLPSI) values was the basis for developing the OCMLPSI (DCMLPSI) theory. Under this assumption, the total OCMLPSI selection response and expected genetic gain per trait for two or more stages, is the sum of each response and expected genetic gain per trait obtained at each stage. We corroborated the normality assumption at Stage 2 using histograms and normality tests. When at Stage 1 the number of genotypes was 500 and the total proportion retained was 5 or 30%, at Stage 2 the estimated OCMLPSI values approach the normal distribution in a similar manner as the estimated DCMLPSI values. These results indicate that the correlations between the estimated OCMLPSI values do not affect the normality distribution of the estimated OCMLPSI values, at least for the simulated dataset. This means that when the size of the population at Stage 1 is high (e.g., 500 or more), the correlations between the estimated OCMLPSI values cannot affect the normality distribution of the estimated OCMLPSI values in a two-stage context.

CONCLUSIONS

We described the OCMLPSI and DCMLPSI theory and evaluated it in a two-stage context. Based on the estimated total selection response and the total expected genetic gain per trait of each index, we determined their efficiency using a real and a simulated dataset. We found that the OCMLPSI is the most efficient index for predicting the net genetic merit, and its accuracy for predicting the selection response and estimating the expected genetic gain per trait was higher than DCMLPSI accuracy for predicting the selection response and estimating the expected genetic gain per trait. Thus, breeders should use the OCMLPSI when making a selection, not the DCMLPSI.

Acknowledgments

We are thankful for the financial support provided by Bill & Melinda Gates Foundation (for maize and wheat breeding programs), the CIMMYT CGIAR CRP (maize and wheat), as well the USAID projects. We acknowledge the financial support provided by the Foundation for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) in Norway through Research NFR Grant 267806.

Abbreviations

BW: body weight
DCMLPSI: decorrelated constrained multistage linear phenotypic selection index
DMLPSI: unconstrained decorrelated multistage linear phenotypic selection index
EW: egg weight
LPSI: single-stage linear phenotypic selection index
OCMLPSI: optimum constrained multistage linear phenotypic selection index
OMLPSI: unconstrained optimum multistage linear phenotypic selection index
NMV: multivariate normal distribution
QTL: quantitative trait locus
REML: restricted maximum likelihood
RL: rate of lay
SCLPSI: single-stage constrained linear phenotypic selection index
SM: age at sexual maturity.

APPENDIX A. The Phenotypic Model to Estimate the Variance Components

In this work, we estimated matrices P and G (Eq. [1a]) using restricted maximum likelihood (REML) because this estimation method does not require a specific design or balanced data and can be used to estimate genetic and residual variance and covariance in any arbitrary pedigree of individuals. In addition, the expectation and maximization algorithm allows computing the REML for the variance components (Lynch and Walsh, 1998, Chapter 27; Cerón-Rojas and Crossa, 2018, Chapter 2).

Let

y_{q} = 1 μ_{q} {Zg}_{q} + e_{q}

be the phenotypic model where y_q is a g x 1(g=the number of genotypes in the population) vector of phenotypic averages, which has multivariate normal distribution (NMV) with mean 1µ_q and covariance matrix V_q; 1 is a g X 1 vector of ones, µ_q is the mean of the qth trait, Z is an identity matrix g x g; g_q ~ NMV(0,Aσ²_e_q) is a vector of true breeding values, and ~ NMV(0,Iσ²_e_q) is a g X 1 vector of residuals. Matrix A denotes the numerical relationship matrix between individuals (Lynch and Walsh, 1998), and V_q = Aσ²_g_q + Iσ²_e_q. We estimated and σ²_g_q and σ²_e_q in the absence of dominance and epistatic effects.

Estimating Matrices G and P using the Expectation and Maximization Algorithm

The expectation and maximization algorithm allows computing the REML for the variance components σ²_g_qand σ²_e_q by iterating the following equations:

σ_{g_{q}}^{2 (n + 1)} = σ_{g_{q}}^{2 (n)} + \frac{{[σ_{g_{q}}^{2 (n)}]}^{2}}{g} \times \{y_{q}^{'} [T^{(n)} {AT}^{(n)}] y_{q} - tr [T^{(n)} A]\}

[A1]

and

σ_{e_{q}}^{2 (n + 1)} = σ_{e_{q}}^{2 (n)} + \frac{{[σ_{e_{q}}^{2 (n)}]}^{2}}{g} \times \{y_{q}^{'} [T^{(n)} T^{(n)}] y_{q} - tr [T^{(n)}]\}

[A2]

where, after n iterations, σ²_g²⁽ⁿ⁺¹ and σ²_e_q²⁽ⁿ⁺¹⁾ are the estimated variance components ofσ²_g_q and σ²_e_q respectively. In Eq. [A1] and [A2], tr(.) denotes the trace of the matrices within parentheses; T = V_q^-1 - V_q^-11(1′V_q^-11)1′V_q^-1 and V_q_-1 is the inverse of matrix V_q = Aσ²_g_q + Iσ²_e_q. In T⁽ⁿ⁾, V_q^-1(n), is the inverse of matrix σ²⁽ⁿ⁾_γq + Iσ²⁽ⁿ⁾_e_q.

The additive genetic and residual covariances between the observations of the qth and ith traits, y_q and y_i (σ_{g_q,i , and σ_e_q,iq, i = 1, 2, t), can be estimated with REML by adapting Eq. [A1] and [A2] as follows. The variance of the sum of y_q and y_i can be written as Var(y_i + y_q) = V_i + V_q + 2C_iq^′_iq where V_i = Aσgi² + Iσ_e_i² is the variance of y_q; and V_q = Aσ_gq² + Iσ²_e_q is the variance of y_q; in addition, 2C_iq = 2Aσ_{giq + 2Iσ}_eiq = 2Cov(y, y_q) is the covariance of y_q and yi} and σ_giq and σ_eiq are the additive and residual covariances, respectively, associated with the covariance of y_q and y_i. Thus, one way of estimating is using the following equation: σ_giq σ_eiq

0.5 Var (y_{i} + y_{q}) - 0.5 Var (y_{i}) - 0.5 Var (y_{q})

[A3]

for which Eq. [A1] and [A2] can be used.

Estimating the OCMLPSI and DCMLPSI Selection Response and Expected Genetic Gain per Trait at Stage i

By Eq. [A1] to [A3], the estimates of matrices P and G can be denoted as P^ and G^ , and the estimated OCMLPSI and DCMLPSI selection responses (Eq. [8a] and [8b]) at Stage i are

{\hat{R}}_{O i} = k_{O i} \sqrt{{\hat{β}}_{i}^{'} {\hat{Q}}_{i i} {\hat{β}}_{i}}

[A4a]

and

{\hat{R}}_{D i} = k_{D i} \sqrt{{\hat{b}}_{i}^{'} {\hat{Q}}_{i i} {\hat{b}}_{i}}

[A4b]

whereas the estimated OCMLPSI and DCMLPSI expected genetic gains per trait (Eq. [9a] and [9b]) at Stage i are

{\hat{E}}_{O i} = k_{O i} \frac{{\hat{A}}_{i}^{'} {\hat{β}}_{i}}{\sqrt{{\hat{β}}_{i}^{'} {\hat{Q}}_{i i} {\hat{β}}_{i}}}

[A5a]

and

{\hat{E}}_{D i} = k_{D i} \frac{{\hat{A}}_{i}^{'} {\hat{b}}_{i}}{\sqrt{{\hat{b}}_{i}^{'} {\hat{Q}}_{i i} {\hat{b}}_{i}}}

[A5b]

respectively.

APPENDIX B. The OCMLPSI Selection Intensity for Two Stages

We describe a method to obtain the OCMLPSI selection intensity for fixed total proportion p = q₁q₂, where q₁ and q₂ are the proportions of individuals selected at Stage 1 and 2, respectively. This is because Cerón-Rojas et al. (2019) found that the Cochran (1951) and Young (1964) method overestimates the selection intensity. We obtained the DCMLPSI selection intensity according to the Xu and Muir (1992) method.

Let I₁ = β^′₁x₁ and I₂ = β^′ ₂y be the OCMLPSI at Stages 1 and 2, respectively, and assume that the indices have bivariate normal distribution. Let I₁ and I₂ be transformed as u₁ = (I₁ - μ_I₁)/σ_I¹ and u₂ = (I₁ - μ_I₂)/σ_I² with mean zero and variance 1.0, where μ_I₁ and μ_I₂ are the means, whereas σ_I₁ and σ_I₂ are the standard deviations of I₁ and I₂, respectively. The selected population has bivariate left truncated normal distribution with probability density function h(u₁, _u₂) = f(u₁, u₂)/p, where p = q₁q₂,

f (u_{1} + u_{2}) = \frac{1}{2 π \sqrt{1 - ρ_{12}^{2}}} \exp [- \frac{1}{2 (1 - ρ_{12}^{2})} (u_{1}^{2} + u_{2}^{2} - 2 ρ_{12} u_{1} u_{1})]

and ρ₁₂ is the correlation between u₁ and u₂ (Young, 1964; Cerón-Rojas and Crossa 2018, Chapters 2 and 9).

Consider the transformations (Springer, 1979) $ν_{2} = (u_{2} - ρ_{12} u_{1}) / \sqrt{1 - ρ_{12}^{2}}$ , with Jacobian j, where

j^{- 1} = |\begin{matrix} \frac{\partial v_{1}}{\partial u_{1}} & \frac{\partial v_{1}}{\partial u_{2}} \\ \frac{\partial v_{2}}{\partial u_{1}} & \frac{\partial v_{2}}{\partial u_{2}} \end{matrix}| = |\begin{matrix} 1 & 0 \\ \frac{- ρ_{12}}{\sqrt{1 - ρ_{12}^{2}}} & \frac{1}{\sqrt{1 - ρ_{12}^{2}}} \end{matrix}| = \frac{1}{\sqrt{1 - ρ_{12}^{2}}}

where |o| denotes the determinant function and ∂ denotes the partial derivatives of v₁ and v₂ with respect to u₁ and u₂. Thus,

v_{1}^{2} + v_{2}^{2} = \frac{u_{1}^{2} + u_{2}^{2} - 2 ρ_{12} u_{1} u_{2}}{(1 - ρ_{12}^{2})}

and

g (v_{1}, v_{2}) = |j| f (u_{1}, u_{2}) = (\frac{1}{\sqrt{2 π}} e^{- 0.5 v_{1}^{2}}) (\frac{1}{\sqrt{2 π}} e^{- 0.5 v_{2}^{2}})

The transformations indicate that variables v₁ and v₂ are independent, each with a standard normal distribution.

Variables v₁ and v₂ are associated with the truncation points as $u_{2} = v_{2} \sqrt{1 - ρ_{12}^{2}} + ρ_{12} v_{1}$ . This means that u₁ and u₂ values should be obtained in two steps. First, we obtained the values of v₁ and v₂ from two independent standard normal distributions; then, we obtained the values of $u_{2} = v_{2} \sqrt{1 - ρ_{12}^{2}} + ρ_{12} v_{1,} q_{1}$ , and q₂ finally, we obtained the OCMLPSI selection intensity at Stages 1 and 2 as

k_{1} = \frac{z (u_{1})}{q_{1}} and k_{2} = \frac{z (u_{2})}{q_{2}}

[A6]

respectively, where $z (u_{1}) = e^{- 0.5 u_{1}^{2}} / \sqrt{2 π} a n d z (u_{2}) = e^{- 0.5 u_{2}^{2}} / \sqrt{2 π}$ were the height of the ordinate of the normal curve at the lowest values of u₁ and u₂ retained, whereas q₁ and q₂ are the proportions of the population of animals or plants selected at each stage (Fig. 2). Equation [A6] values should be associated with the maximum R_t = R₁ + R₂ value (Fig. 2), where R₁ = k₁σ_I₁ and R₂ = k₂σ_I₂ are the selection responses, whereas σI₁ and σI₂ are the standard deviations of the variance of I₁ and I₂ at Stages 1 and 2, respectively.

Conflict of Interest

The authors declare that there is no conflict of interest.

Author Contribution Statement

J.J. Cerón-Rojas developed the statistical methods and the theory of both indices and wrote the manuscript. F.H. Toledo developed an R Package to estimate all the parameters needed to perform the data analysis. J. Crossa reviewed and corrected the manuscript. All the authors read and approved the final manuscript.

References

Akbar M.K., Lin C.Y., Gyles N.R., Gavora J.S., and Brown C.J.. 1984. Some aspects of selection indices with constraints. Poult. Sci; 63:1899–1905. 10.3382/ps.0631899 [DOI] [Google Scholar]
Cerón-Rojas J.J., Crossa J., Arief V.N. Basford K., Rutkoski J., Jarquin D., et al. 2015. A genomic selection index applied to simulated and real data. G3: Genes, Genomes, Genet: 5:2155–2164. 10.1534/g3.115.019869 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cerón-Rojas J.J., and Crossa J.. 2018. Linear selection indices in modern plant breeding. Springer, Cham, Switzerland: 10.1007/978-3-319-91223-3 [DOI] [Google Scholar]
Cer1n-Rojas J.J., Toledo F.H., and Crossa J.. 2019. The relative efficiency of two multi-stage linear phenotypic selection indices to predict the net genetic merit. Crop Sci; 59:2–1051. 10.2135/cropsci2018.11.0678 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cochran W.G. 1951. Improvement by means of selection In: Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. Univ. California Press, Berkeley, CA: p. 449–470. https://projecteuclid.org/euclid.bsmsp/1200500247 (accessed 7 June 2019). [Google Scholar]
Cunningham E.P. 1975. Multi-stage index selection. Theor. Appl. Genet; 46:55–61. 10.1007/BF00264755 [DOI] [PubMed] [Google Scholar]
Harville D.A., 1975. Index selection with proportionality constraints. Biometrics; 31:223–225. 10.2307/2529722 [DOI] [PubMed] [Google Scholar]
Hazel L.N. 1943. The genetic basis for constructing selection indexes. Genetics; 8:476–490. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hicks C., Muir W.M., and Stick D.A.. 1998. Selection index updating for maximizing rate of annual genetic gain in laying hens. Poult: Sci; 77:1–7. 10.1093/ps/77.1.1 [DOI] [PubMed] [Google Scholar]
Itoh Y., and Yamada Y.. 1987. Comparisons of selection indices achieving predetermined proportional gains. Genet; Sel. Evol: 19:69–82. 10.1186/1297-9686-19-1-69 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kempthorne O., and Nordskog A.W.. 1959. Restricted selection indices. Biometrics; 15:10–19. 10.2307/2527598 [DOI] [Google Scholar]
Lande R., and Thompson R.. 1990. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics; 124:743–756. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lynch M., and Walsh B.. 1998. Genetics and analysis of quantitative traits. Sinauer Assoc, Sunderland, MA. [Google Scholar]
Mallard J. 1972. The theory and computation of selection indices with constraints: A critical synthesis. Biometrics; 28:713–735. doi: 10.2307/2528758 [DOI] [Google Scholar]
Mardia K.V., 1980. Tests of univariate and multivariate normality In: Krishnaiah P.R., editor, Handbook of Statistics 1: Analysis of variance. North-Holland Publ, Amsterdam: p. 279–320. [Google Scholar]
Mohd-Razali N., and Bee-Wah Y.. 2011. Power comparisons of Shapiro-Wilk, Kolmogorov_Smirnov, Lilliefors and Ander-son-Darling tests. J. Stat; Modeling Analytics: 2:21–33. [Google Scholar]
Moreau L., Charcosset A., Hospital F., and Gallais A.. 1998. Marker-assisted selection efficiency in populations of finite size. Genetics; 148:1353–1365. [DOI] [PMC free article] [PubMed] [Google Scholar]
Namkoong G. 1970. Optimum allocation of selection intensity in two stages of truncation selection. Biometrics; 26:465–476. 10.2307/2529102 [DOI] [Google Scholar]
Podlich D.W., and Cooper M.. 1998. QU-GENE: A simulation platform for quantitative analysis of genetic models. Bioinformatics; 14:632–653. 10.1093/bioinformatics/14.7.632 [DOI] [PubMed] [Google Scholar]
Rani Das K., and Rahmatullah Imon A.H.M.. 2016. A brief review of tests for normality. Am. J. Theor. Appl. Stat; 5:5–12. 10.11648/j.ajtas.20160501.12 [DOI] [Google Scholar]
Saxton A.M., 1983. A comparison of exact and sequential methods in multi-stage index selection. Theor. Appl. Genet; 66:23–28. 10.1007/BF00281843 [DOI] [PubMed] [Google Scholar]
Shapiro S.S., and Wilk M.B.. 1965. An analysis of variance test for normality (complete samples). Biometrika; 52:591–611. 10.2307/2333709 [DOI] [Google Scholar]
Smith F.H. 1936. A discriminant function for plant selection. Ann. Eugen; 7:240–250. 10.1111/j.1469-1809.1936.tb02143.x [DOI] [Google Scholar]
Springer M.D. 1979. The algebra of random variables. John Wiley & Sons, New York. [Google Scholar]
Tallis G.M. 1962. A selection index for optimum genotype. Biometrics; 18:120–122. 10.2307/2527716 [DOI] [Google Scholar]
Tallis G.M. 1985. Constrained selection. Jpn. J. Genet; 60:151–155. 10.1266/jjg.60.151 [DOI] [Google Scholar]
Xie C., and Xu S.. 1997. Restricted multistage selection indices. Genet. Sel. Evol; 29:193–203. 10.1186/1297-9686-29-2-193 [DOI] [Google Scholar]
Xie C., Xu S., and Mosjidis J.A.. 1997. Multistage selection indices for maximum genetic gain and economic efficiency in red clover. Euphytica; 98:75–82. 10.1023/A:1003074814916 [DOI] [Google Scholar]
Xu S., and Muir W.M.. 1991. Multistage selection for genetic gain by orthogonal transformation. Genetics; 129:963–974. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu S., and Muir W.M.. 1992. Selection index updating. Theor. Appl. Genet; 83:451–458. 10.1007/BF00226533 [DOI] [PubMed] [Google Scholar]
Young S.S.Y. 1964. Multi-stage selection for genetic gain. Heredity; 19:131–145. 10.1038/hdy.1964.11 [DOI] [Google Scholar]

[cit0001] Akbar M.K., Lin C.Y., Gyles N.R., Gavora J.S., and Brown C.J.. 1984. Some aspects of selection indices with constraints. Poult. Sci; 63:1899–1905. 10.3382/ps.0631899 [DOI] [Google Scholar]

[cit0002] Cerón-Rojas J.J., Crossa J., Arief V.N. Basford K., Rutkoski J., Jarquin D., et al. 2015. A genomic selection index applied to simulated and real data. G3: Genes, Genomes, Genet: 5:2155–2164. 10.1534/g3.115.019869 [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0003] Cerón-Rojas J.J., and Crossa J.. 2018. Linear selection indices in modern plant breeding. Springer, Cham, Switzerland: 10.1007/978-3-319-91223-3 [DOI] [Google Scholar]

[cit0004] Cer1n-Rojas J.J., Toledo F.H., and Crossa J.. 2019. The relative efficiency of two multi-stage linear phenotypic selection indices to predict the net genetic merit. Crop Sci; 59:2–1051. 10.2135/cropsci2018.11.0678 [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0005] Cochran W.G. 1951. Improvement by means of selection In: Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. Univ. California Press, Berkeley, CA: p. 449–470. https://projecteuclid.org/euclid.bsmsp/1200500247 (accessed 7 June 2019). [Google Scholar]

[cit0006] Cunningham E.P. 1975. Multi-stage index selection. Theor. Appl. Genet; 46:55–61. 10.1007/BF00264755 [DOI] [PubMed] [Google Scholar]

[cit0007] Harville D.A., 1975. Index selection with proportionality constraints. Biometrics; 31:223–225. 10.2307/2529722 [DOI] [PubMed] [Google Scholar]

[cit0008] Hazel L.N. 1943. The genetic basis for constructing selection indexes. Genetics; 8:476–490. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0009] Hicks C., Muir W.M., and Stick D.A.. 1998. Selection index updating for maximizing rate of annual genetic gain in laying hens. Poult: Sci; 77:1–7. 10.1093/ps/77.1.1 [DOI] [PubMed] [Google Scholar]

[cit0010] Itoh Y., and Yamada Y.. 1987. Comparisons of selection indices achieving predetermined proportional gains. Genet; Sel. Evol: 19:69–82. 10.1186/1297-9686-19-1-69 [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0011] Kempthorne O., and Nordskog A.W.. 1959. Restricted selection indices. Biometrics; 15:10–19. 10.2307/2527598 [DOI] [Google Scholar]

[cit0012] Lande R., and Thompson R.. 1990. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics; 124:743–756. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0013] Lynch M., and Walsh B.. 1998. Genetics and analysis of quantitative traits. Sinauer Assoc, Sunderland, MA. [Google Scholar]

[cit0014] Mallard J. 1972. The theory and computation of selection indices with constraints: A critical synthesis. Biometrics; 28:713–735. doi: 10.2307/2528758 [DOI] [Google Scholar]

[cit0015] Mardia K.V., 1980. Tests of univariate and multivariate normality In: Krishnaiah P.R., editor, Handbook of Statistics 1: Analysis of variance. North-Holland Publ, Amsterdam: p. 279–320. [Google Scholar]

[cit0016] Mohd-Razali N., and Bee-Wah Y.. 2011. Power comparisons of Shapiro-Wilk, Kolmogorov_Smirnov, Lilliefors and Ander-son-Darling tests. J. Stat; Modeling Analytics: 2:21–33. [Google Scholar]

[cit0017] Moreau L., Charcosset A., Hospital F., and Gallais A.. 1998. Marker-assisted selection efficiency in populations of finite size. Genetics; 148:1353–1365. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0018] Namkoong G. 1970. Optimum allocation of selection intensity in two stages of truncation selection. Biometrics; 26:465–476. 10.2307/2529102 [DOI] [Google Scholar]

[cit0019] Podlich D.W., and Cooper M.. 1998. QU-GENE: A simulation platform for quantitative analysis of genetic models. Bioinformatics; 14:632–653. 10.1093/bioinformatics/14.7.632 [DOI] [PubMed] [Google Scholar]

[cit0020] Rani Das K., and Rahmatullah Imon A.H.M.. 2016. A brief review of tests for normality. Am. J. Theor. Appl. Stat; 5:5–12. 10.11648/j.ajtas.20160501.12 [DOI] [Google Scholar]

[cit0021] Saxton A.M., 1983. A comparison of exact and sequential methods in multi-stage index selection. Theor. Appl. Genet; 66:23–28. 10.1007/BF00281843 [DOI] [PubMed] [Google Scholar]

[cit0022] Shapiro S.S., and Wilk M.B.. 1965. An analysis of variance test for normality (complete samples). Biometrika; 52:591–611. 10.2307/2333709 [DOI] [Google Scholar]

[cit0023] Smith F.H. 1936. A discriminant function for plant selection. Ann. Eugen; 7:240–250. 10.1111/j.1469-1809.1936.tb02143.x [DOI] [Google Scholar]

[cit0024] Springer M.D. 1979. The algebra of random variables. John Wiley & Sons, New York. [Google Scholar]

[cit0025] Tallis G.M. 1962. A selection index for optimum genotype. Biometrics; 18:120–122. 10.2307/2527716 [DOI] [Google Scholar]

[cit0026] Tallis G.M. 1985. Constrained selection. Jpn. J. Genet; 60:151–155. 10.1266/jjg.60.151 [DOI] [Google Scholar]

[cit0027] Xie C., and Xu S.. 1997. Restricted multistage selection indices. Genet. Sel. Evol; 29:193–203. 10.1186/1297-9686-29-2-193 [DOI] [Google Scholar]

[cit0028] Xie C., Xu S., and Mosjidis J.A.. 1997. Multistage selection indices for maximum genetic gain and economic efficiency in red clover. Euphytica; 98:75–82. 10.1023/A:1003074814916 [DOI] [Google Scholar]

[cit0029] Xu S., and Muir W.M.. 1991. Multistage selection for genetic gain by orthogonal transformation. Genetics; 129:963–974. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0030] Xu S., and Muir W.M.. 1992. Selection index updating. Theor. Appl. Genet; 83:451–458. 10.1007/BF00226533 [DOI] [PubMed] [Google Scholar]

[cit0031] Young S.S.Y. 1964. Multi-stage selection for genetic gain. Heredity; 19:131–145. 10.1038/hdy.1964.11 [DOI] [Google Scholar]

PERMALINK

Optimum and Decorrelated Constrained Multistage Linear Phenotypic Selection Indices Theory

J Jesus Cerón-Rojas

Fernando H Toledo

Jose Crossa

Abstract

MATERIALS AND METHODS

Methods Objectives of the Constrained Multistage Linear Phenotypic Selection Indices

The Part and Whole Phenotypic Index Selection Method

Genotypic and Phenotypic Covariance Matrices

Selection Response at Stage i

Expected Genetic Gain per Trait at Stage i

The OCMLPSI Vector of Coefficients at Stage i

The DCMLPSI Vector of Coefficients at Stage i

Maximized Selection Response at Stage i

Maximized Expected Genetic Gain per Trait at Stage i

Efficiency when Predicting the Net Genetic Merit

Adjusting the OCMLPSI Covariance Matrices at Stage 2

Test of the OCMLPSI (DCMLPSI) Normality Assumption

Materials

Real Dataset

Simulated Datasets

RESULTS Real Data

Truncation Points, Proportion Retained, and Selection Intensities for Two Stages

Fig. 1.

Fig. 2.

Table 1.

Estimated OCMLPSI Selection Response for Two Stages

Estimated DCMLPSI Selection Response for Two Stages

Estimated OCMLPSI Expected Genetic Gain per Trait for Two Stages

Table 2.

Estimated DCMLPSI Expected Genetic Gain per Trait for Two Stages

OCMLPSI Efficiency vs. DCMLPSI Efficiency to Predict the Net Genetic Merit

Simulated Data Estimated OCMLPSI and DCMLPSI Selection Response

Table 3.

Estimated OMLPSI and DMLPSI Expected Genetic Gains per Trait

Table 4.

Normality Test for the Estimated OCMLPSI and DCMLPSI Values at Stage 2

Fig. 3.

DISCUSSION

Criteria Used to Evaluate the Relative Efficiency of the Indices

Selection Intensities

Number of Restrictions Imposed on the Indices

Another Way of Writing the OCMLPSI and DCMLPSI Vectors of Coefficients

Another Constrained Multistage Index

The Multivariate Normality Assumption of Both Indices

CONCLUSIONS

Acknowledgments

Abbreviations

APPENDIX A. The Phenotypic Model to Estimate the Variance Components

Estimating Matrices G and P using the Expectation and Maximization Algorithm

Estimating the OCMLPSI and DCMLPSI Selection Response and Expected Genetic Gain per Trait at Stage i

APPENDIX B. The OCMLPSI Selection Intensity for Two Stages

Conflict of Interest

Author Contribution Statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases