Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2022 Feb 6;62(2):537–563. doi: 10.1002/csc2.20676

The statistical theory of linear selection indices from phenotypic to genomic selection

J Jesus Cerón‐Rojas 1,, Jose Crossa 1
PMCID: PMC9305178  PMID: 35911794

Abstract

A linear selection index (LSI) can be a linear combination of phenotypic values, marker scores, and genomic estimated breeding values (GEBVs); phenotypic values and marker scores; or phenotypic values and GEBVs jointly. The main objective of the LSI is to predict the net genetic merit (H), which is a linear combination of unobservable individual traits’ breeding values, weighted by the trait economic values; thus, the target of LSI is not a parameter but rather the unobserved random H values. The LSI can be single‐stage or multi‐stage, where the latter are methods for selecting one or more individual traits available at different times or stages of development in both plants and animals. Likewise, LSIs can be either constrained or unconstrained. A constrained LSI imposes predetermined genetic gain on expected genetic gain per trait and includes the unconstrained LSI as particular cases. The main LSI parameters are the selection response, the expected genetic gain per trait, and its correlation with H. When the population mean is zero, the selection response and expected genetic gain per trait are, respectively, the conditional mean of H and the genotypic values, given the LSI values. The application of LSI theory is rapidly diversifying; however, because LSIs are based on the best linear predictor and on the canonical correlation theory, the LSI theory can be explained in a simple form. We provided a review of the statistical theory of the LSI from phenotypic to genomic selection showing their relationships, advantages, and limitations, which should allow breeders to use the LSI theory confidently in breeding programs.

Core Ideas

  • Linear selection indices (LSIs) are useful in plant and animal breeding.

  • The main LSI objective is to predict the net genetic merit of individuals.

  • We did a complete review of the statistical theory of LSIs.

  • We described the bases of genomic LSIs.


Abbreviations

BLP

best linear predictor

BLUP

best linear unbiased predictor

BP

best predictor

CCA

canonical correlation analysis

CDMLPSI

constrained decorrelated multi‐stage linear phenotypic selection index

CLGSI

constrained linear genomic selection index

CLPSI

constrained linear phenotypic selection index

COMLPSI

constrained optimum multi‐stage linear phenotipic selection index

DMLPSI

decorrelated multi‐stage linear phenotypic selection index

ESIM

eigen selection index method

GEBV

genomic estimated breeding value

GESIM

genomic eigen selection index method

LGSI

linear genomic selection index

LMSI

linear molecular selection index

LPSI

linear phenotypic selection index

LSI

linear selection index

MESIM

molecular eigen selection index method

MSPE

minimum mean square prediction error

MVN

multivariate normal

OMLPSI

optimum multi‐stage linear phenotipic selection index

PPG‐ESIM

predetermined proportional gain eigen selection index method

QTL

quantitative trait loci

REML

restricted maximum likelihood

RESIM

restricted eigen selection index method

RL

rate of lay

RLPSI

restricted linear phenotypic selection index

SM

age at sexual maturity.

1. INTRODUCTION

Smith (1936) published a paper called “A discriminant function for plant selection,” in which he described a statistical method to select parents for the next selection cycle based on a linear combination of several joint quantitative traits. That method makes it possible to improve several traits that differ in additive variability, heritability, economic importance, and in the correlation among their phenotypes and genotypes in the plant and animal breeding context (Hazel et al., 1994). Hazel and Lush (1942) called that selection method “total score,” whereas Hazel (1943) called it “selection index,” and Cerón‐Rojas and Crossa (2018) called it “linear phenotypic selection index” (LPSI). Smith (1936) made a clear distinction between the LPSI (discriminant function) and the net genetic merit (H). The LPSI is a linear combination of observable phenotypic records, and H is an unobservable linear combination of trait breeding values (g) weighted by the trait economic values (W).

The main assumptions made by Smith (1936) to develop the LPSI theory are as follows: the vector of genotypic values g (the average of the phenotypic values across a [large] population of environments, or the genetic constitution of an organism or cell) is composed entirely of additive effects of genes and thus is the plant or animal breeding value, and H is the total individual genotypic value (Hazel & Lush, 1942; Kempthorne & Nordskog, 1959). Finally, the vectors of adjusted means of phenotypic values (y) and H have a joint multivariate normal distribution. By those assumptions, the joint distribution of H and LPSI is bivariate normal, whereas the conditional distribution of H given LPSI (and the conditional distribution of LPSI given H) is univariate normal. Under those assumptions, the regression of H on y is linear (Kempthorne & Nordskog, 1959), and it is possible to write the H and y values as a multiple linear regression model, where H and y are the dependent and independent variables, respectively, and both are random. In this model, the LPSI is the conditional expectation of H given y. Then, to predict H, one must find the values of the LPSI regression coefficients so that the LPSI values may best discriminate those individuals that have the highest H values (Smith, 1936).

Smith (1936) wrote his paper before the Eisenhart (1947) article, which highlighted that there are two fundamentally different explanatory variables, called “fixed” and “random” effects. One important result of this distintion is that whereas fixed effects influence only the mean of y, random effects influence only the variance of y and thus the LPSI and its parameters. Assuming that the population mean of H and y is zero, the selection response is the conditional expectation of H given the LPSI values, whereas the expected genetic gain per trait (or multi‐trait selection response) is the conditional expectation of g given the linear selection index (LSI) values (Cochran, 1951; Kempthorne & Nordskog, 1959; Smith, 1936). The estimates of these last parameters and the correlation between H and the LPSI are the main criteria to validate and compare the efficiency of any LSI.

Some authors (Finney, 1962; Hazel & Lush, 1942; Young, 1961) compared the LPSI efficiency with the independent culling method, where breeders sequentially select for a series of independent traits in the same population, and the tandem selection method, which involves sequential selection on a series of traits where repeated cycles of selection for a single trait are practiced until a desirable expression of the trait is reached. Those authors found that the LPSI selection response was higher than the selection response for the other two selection methods. Thus, under certain conditions (e.g., normality), the LPSI is the best option for selection.

Core Ideas

  • Linear selection indices (LSIs) are useful in plant and animal breeding.

  • The main LSI objective is to predict the net genetic merit of individuals.

  • We did a complete review of the statistical theory of the LSI.

  • We described the bases of genomic LSI.

The LPSI theory has been extended, and now a LSI can be a linear combination of marker scores (Lande & Thompson, 1990), genomic estimated breeding values (GEBVs) (Cerón‐Rojas & Crossa, 2019; Ceron‐Rojas et al., 2015), phenotypic values and marker scores (Lande & Thompson, 1990), and phenotypic values and GEBVs jointly (Dekkers, 2007). The last two indices are called “combined LSIs” in this review. An additional distinction is between single‐stage and multi‐stage LSIs (Cerón‐Rojas & Crossa, 2020a; Cerón‐Rojas et al., 2019a, 2019b; Cochran, 1951; Young, 1964; Xu & Muir, 1992). Multi‐stage LSIs are methods for selecting traits at different times or stages of the crop, such as the age at which breeders can measure and select the trait of economic interest and are used to save money and time because these indices do not need to have a large number of individuals in the selection process.

The main objectives of all LSIs are to predict H and select parents for the next generation; thus, the target of study of any LSI is not a parameter but rather the unobserved random H values. Additional objectives of LSIs are to maximize the selection response, the expected genetic gain per trait, and the correlation with H. By imposing restrictions equal to some predetermined values on the expected genetic gain per trait, several authors (Harville, 1975; Kempthorne & Nordskog, 1959; Mallard, 1972; Tallis, 1985) developed the constrained LPSI (CLPSI) theory. In addition, by applying the LPSI and CLPSI theories in the context of genomic selection, Ceron‐Rojas et al. (2015) and Cerón‐Rojas and Crossa (2019) developed the unconstrained and constrained linear genomic selection indices (LGSI and CLGSI, respectively), which are linear combinations of GEBVs. Cerón‐Rojas et al. (2016b) have shown that the CLPSI vector of coefficients is a linear combination of the LPSI vector of coefficients or a projection of the unconstrained LPSI vector of coefficients to a different space.

All the LSIs described above assume that the economic weights are known constants; however, for several reasons, this assumption is rarely fulfilled (Vandepitte & Hazel, 1977). Furthermore, several authors (Hazel, 1943; Lin, 1978; Yamada et al., 1975) have indicated that there is no one general method for assigning economic weights to the traits because those economic weights can change from time to time or vary from one location to another in breeding programs. For this reason, modified indices, such as the base index (Williams, 1962), modified base index (Smith, 1983), and non‐weighted multiplicative index (Elston, 1963), have been proposed in plant and animal breeding. However, those authors did not show that the latter LSIs maximize the correlation with the net genetic merit. In addition, it can be difficult to obtain the selection response and the expected genetic gain per trait for those indices.

The preceding problems led Cerón‐Rojas et al. (2008a, 2016a) to develop the eigen selection index method (ESIM), the restricted ESIM (RESIM), and the predetermined proportional gain ESIM (PPG‐ESIM) in the canonical correlation context (Hotelling, 1936). The ESIM is an unrestricted index, but the RESIM and PPG‐ESIM allow imposing null and predetermined restrictions, respectively, on the expected genetic gains of some traits, while the rest remain without restrictions. These indices do not require economic weights when predicting H and making selections because they use the first eigenvector of the matrix of multi‐trait heritabilities (in the ESIM case) or the first eigen vector of a linear transformation of the matrix of multi‐trait heritabilities (in the RESIM and PPG‐ESIM case) in the prediction. Cerón‐Rojas et al. (2008b) adapted the ESIM to the marker context and developed an index (molecular ESIM [MESIM]) similar to the linear molecular selection index (LMSI), whereas Cerón‐Rojas and Crossa (2018) adapted ESIM to the genomic selection context and developed a combined genomic eigen selection index method (genomic ESIM [GESIM]), similar to the Dekkers (2007) index.

All indices associated with the LPSI theory (unconstrained and constrained; phenotypic, genomic, multi‐stage, etc.) are particular cases of the best linear predictor (BLP) theory, which is based on the minimum mean square prediction error (MSPE), because this approach allows us to obtain the unique best MSPE predictor (Bickel & Doksum, 2015). The MSPE is the expected squared deviation between the true value and the estimate, and it provides an effective measure of proximity to the true parameter of interest (Carlin & Louis, 2000). On the other hand, all indices associated with ESIM are based on the canonical correlation analysis (CCA) theory (Hotelling, 1936).

Searle et al. (2006) have indicated that the best predictor (BP), BLP, and the best linear unbiased predictor (BLUP) are associated as follows. When all the parameters of the joint distribution of H and LSI are known, BP is available, whereas for BLP only the first and second moments are assumed to be known, and for BLUP the second but not first moments are assumed to be known. Under multivariate normality, BP and BLP are the same (Bickel & Doksum, 2015; Christensen, 2011; Harville, 2018); however, in practice, those approaches need estimates of their parameters. In this last case, the LSI could not be BLP (Searle et al., 2006). Nevertheless, with good estimates of the phenotypic and genotypic covariance matrices (e.g., by restricted maximum likelihood [REML] estimate), the BLP (or LPSI) is close to the BLUP (Christensen, 2011; Henderson, 1990).

Several authors (Baker, 1986; Bramscap, 1984; Dekkers & Settar, 2004; Hazel et al., 1994; Henderson, 1963; Lin, 1978; Moreau et al., 2007; Nordskog, 1978; Phillipsson et al., 1994) have given partial reviews of the LSIs theory, such as the unconstrained and constrained LPSI and the LMSI, among others. In addition, using minimum mathematical rigor, Cerón‐Rojas and Crossa (2018) wrote a user manual for breeders based on the LSI theory, and they gave many illustrative practical examples. In the current study, we reviewed the statistical theory of the LSIs from the unconstrained and constrained LPSI to the unconstrained and constrained LGSI. That is, based on the BLP theory, in the present work, we unified all the statistical theory of the LSIs associated with the LPSI (e.g., CLPSI, LGSI, CLGSI, etc.) from phenotypic to genomic selection. This requires greater mathematical rigor; as such, it is not possible to provide as many illustrative examples as Cerón‐Rojas and Crossa (2018) did. In addition, based on the CCA theory, we present a unified theory of the ESIM, RESIM, and PPG‐ESIM.

This review has four main parts. The first describes two earlier approaches to the unconstrained LSI theory and a description of the BLUP theory. The second describes the unconstrained LSI theory, including the LPSI, multi‐stage LPSI, LGSI, and combined LSI. Similarly, the third part contains the constrained LSI that includes the CLPSI, constrained multi‐stage LPSI, and CLGSI. Finally, in the fourth part, we describe ESIM, RESIM, PPG‐ESIM, and their relationships with the LPSI and CLPSI. The MESIM and GESIM are only variants of ESIM and the combined LSIs, which can be found in Cerón‐Rojas and Crossa (2018). The main objective of this review is to introduce researchers to the statistical theory of the LSIs. Our intended audience for this study is mainly plant breeders and graduate‐level students who are interested in LSI theory and who come from animal and plant breeding programs.

In subsection “Numerical example: LPSI vs ESIM,” we provide an illustrative example using the LPSI and ESIM theory. In subsection “RIndSel,” we introduce RIndSel (Alvarado et al., 2018; Pacheco et al., 2017; Perez‐Elizalde et al., 2014), an R‐software that is useful for estimating the LPSI, LGSI, CLPSI, ESIM, etc., parameters and for selecting individual candidates as parents for the next selection cycle. Furthermore, RIndSel is completely automated, which eliminates the need to learn a specialized syntax for anyone who is a novice user of R‐software. This means that the users only need to learn how to introduce their data into the program and how to interpret the RIndSel results. This software, and a complete RIndSel user manual, can be downloaded from https://data.cimmyt.org/dataset.xhtml?persistentId=hdl:11529/10854.

2. NOTATION

We use uppercase letters to denote random variables and lowercase letters to denote their particular realizations. For example, X is a random variable, whereas x (X = x) is a particular realization of X. Vectors are denoted with bold lowercase letters (e.g., g and y) and matrices with bold uppercase letters (X). Sometimes it is necessary to denote vectors with bold capital letters; for example, E denotes the vector of expected genetic gain per trait.

The expectation of the random variable X[E(X)] is scalar, whereas the conditional expectation of X given Y = y [E(X|Y = y)] is a function of y. Thus, before we observe Y, the value of E(X|Y = y) is unknown, so it is a random variable that we denote as E(X|Y).

3. TWO EARLY APPROACHES TO THE INDEX SELECTION THEORY

Pearson (1903) and Pearl and Surface (1909) were the first to describe the index theory before the Smith (1936) LPSI theory. As we shall see, Pearson (1903) predicted breeding values based on a linear mixed model, whereas Pearl and Surface (1909) used a nonlinear approximation to the index theory. In addition, whereas the Pearson (1903) index selects genotypes, the Pearl and Surface (1909) index selects phenotypes. This last index selects phenotypes because it does not maximize its correlation with the net genetic merit.

3.1. Pearson index

Suppose that X and Z are design matrices and that θ and g are vectors of fixed and random (e.g., breeding values) effects, respectively, associated to the trait of n individual candidates (y) to selection, and assume that y and g have multivariate normal (MVN) distribution, also denoted as “∼”, i.e., y ∼ MVN and g ∼ MVN. Consider the following linear mixed model:

y=Xθ+Zg+ε (1)

where y ∼ MVN(, V) is a n × 1 vector of observations for one trait, and V = σ g 2 ZAZ′ + I n σε 2 are the unconditional expectation and variance of y, respectively, and var(g) = Aσ g 2 is the unconditional variance of g ∼MVN (0, Aσ g 2); var(y|g) = I n σε 2 is the conditional variance of the vector of residuals ε ∼MVN (0, I n σε 2), 0 denotes the null expectations of g and ε, A is the numerator relationship matrix (Lynch & Walsh, 1998; Mrode, 2014), and I n is an identity matrix of size n × n.

Based on the multivariate normal distribution theory, Pearson (1903) predicted g as

g^=E(g|y)=σg2AZV1(yXθ^)

which now is called “single trait linear selection index” and where V −1 is the inverse matrix of the variance of y, whereas θ^=(XV1X)1XV1y (Xu, 2003). Henderson (1990) indicated that the above equation is one of the main Pearson (1903) results. Currently, g^=E(g|y)is called the BLUP of g and is one of the main tools to make genotypic selection in plant (Piepho et al., 2008) and animal (Mrode, 2014) breeding programs.

The BLUP properties have been developed by Henderson (1949, 1950, 1963, 1984, 1990). This author also showed that the BLUP and BLP (or LPSI) are similar; however, whereas the BLUP predicts vector g, the LPSI predicts a linear combination of g (i.e., the net genetic merit), as shown in Equations 24. In addition, Mrode (2014) indicated that BLUP reduces to LPSI when no adjustments for environmental factors are needed. This means that one of the main advantages of BLUP compared with BLP occurs when records have to be pre‐adjusted for fixed or for environmental effects. When the environmental effects are random, Xu (2003) writes Equation 1 as

y=1nμ+Xγ+Zg+Wπ+ε

where π is a vector of interaction genotypic–environment effects, W = XZ, and ⊗ denotes the Kronecker product (Lych & Walsh, 1998). In this case, E(y) = 1 n μ, where 1 n is a vector of n ones, μ is the population mean, V = Xvar(γ)X′ + σ g 2 ZAZ′ + Wvar(π)W′ + I n σε 2, and var(γ) and var(π) are the variance of γ and π, respectively. Note that in this last case, nongenetic effects (included in γ), such as location and year effects, are treated as random effects, whereas in Equation 1 those effects are treated as fixed and were included in θ. For additional details, see Xu (2003).

3.2. Pearl and Surface index

The Pearl and Surface (1909) approach to the index theory is based on the index numbers theory (Ralph et al., 2015). By this reasoning, those authors called to their index “selection index numbers.” The Pearl and Surface (1909) arguments to develop the selection index numbers are the same as those indicated by Smith (1936) in his paper. Pearl and Surface (1909) defined the selection index numbers as

I=υ1yυ2y

where y is a vector of traits of economic interest for the breeder that become more desirable as their values increase, and y* is a vector of traits of economic interest that become more desirable as their values decrease. In the above equation, ν 1 and ν 2 are vector of constants to be given arbitrary values in the proportions that the different variables are to be weighted. The Pearl and Surface (1909) index can be considered a nonlinear selection index and is does not maximize its correlation with the net genetic merit (Equation 2). As we have indicated in the Introduction, in this work we describe the LSIs associated to the LPSI theory and the ESIM index theory.

3.3. BLUP in the multi‐trait context

Equation 1 is only useful to predict the individual breeding values associated to one trait. However, it is easy to extend Equation 1 to the multi‐trait selection context, and it is possible to include in Equation 1 non‐additive genetic effects, such as dominant genetic effects (or intra‐locus dominance allelic interaction) and epistasis effects (or inter‐loci allelic interaction). Multi‐trait evaluation is the optimum methodology to evaluate plants and animals because this method increases the accuracy of predictions when it accounts for the phenotypic and genetic correlation between individual traits (Mrode, 2014). For t traits, we can write the vector of breeding and phenotypic values as g′ = [g1 g2g t ] and y′ = [y1 y2y t ], respectively, from where, for two traits, Equation 1 can be written as

y1=X1θ1+Z1g1+ε1

for Trait 1 and as

y2=X2θ2+Z2g2+ε2

for Trait 2. Thus, in this context, Equation 1 is equivalent to the following model:

y1y2=X100X2θ1θ2+Z100Z2g1g2+ε1ε2

where 0 is a null matrix and the other parameters are defined in a similar manner as in Equation 1. Nevertheless, in this case varg1g2=CA, where C=σg12σg12σg21σg22 is an additive genetic covariance matrix, and and A are as defined in Equation 1. In addition, var(y 1, y 2 | g 1, g 2) = RI n , where R=σε12σε12σε21σε12 is a residual covariance matrix. The unconditional covariance matrix of [y1y2] is V=vary1y2=Z[CA]Z+RIn, where Z=Z100Z2, and I n is as defined in Equation 1. It is possible to extend the above equation to any number of traits of economic interest (Mrode, 2014).

For one trait, g 1 can be partitioned into additive genetic (a 1) effects (or intra‐locus additive allelic interaction), dominant genetic (δ 1) effects, and epistasis (ι1) effects, such that g 1 = a 1 + δ1 + ι1. Epistasis refers to the interaction among additive and dominance genetic effects, for instance, additive by additive, dominance by dominance, additive by dominance, additive by additive by dominance, etc.; that is, ι1 = aa 1 + dd 1 + ad 1 + …

When we separate non‐additive genetic effects from additive effects, we remove the confounding factors that would otherwise bias the results of the analysis (Cockerham, 1954; Fisher, 1918; Holland, 2001; Kempthorne, 1954; Mrode, 2014). In this case, the variance of g 1 is var(g1)=Aσa2+Θσd2+(AA)σaa2+(AΘ)σad2+(ΘΘ)σdd2+

where Θ is the dominance relationship matrix, “” denotes the Hadamard product of two matrices or the element‐by‐element multiplication for unlinked loci (Gianola & de los Campos, 2009; Hill, 2010), σa 2 is the additive variance component, σd 2 is the deviation of dominance variance component, etc.

For t traits, asumming additive and dominance effects, and epistasis effects such as additive by additive, dominance by dominance, and additive by dominance, the variance of g′ = [g1 g2g t ] can be written as

CA+CDΘ+CAA(AA)+CDD(ΘΘ)+CAD(AΘ)

where C is as defined earlier; C D is the covariance matrix of the dominant genetic interactions of effects between alleles within loci; C AA, C DD, and C AD are the covariance matrices of the epistasis effects among loci; and A, Θ, ⊗, and “” are as defined above. Lynch and Walsh (1998) and Mrode (2014) have given details of how Equation 1 can be written for additive and non‐additive genetic effects jointly.

Although the above results seem very interesting, non‐additive effects are not transmitted to offspring in non‐inbreeding populations (Lynch & Walsh, 1998); the breeding value (or additive genetic effects a 1) is the only component that can be selected and is thus the main component of interest in breeding programs (Mrode, 2014). Furthermore, in the index selection context, Andersson et al. (1998) found that the introduction of the dominance variance (e.g., 25% of the total phenotypic variance) has only a small positive effect (<1.5%) on the selection response; thus, to describe the LSI theory, the assumption that only additive effects are transmitted from generation to generation seems acceptable. This last asumption implies that the combined effects of all factors that make y′ = [y1 y2y t ] different from g′ = [g1 g2g t ], such as non‐additive effects, are in the residual vector (Hazel & Lush, 1942). Mrode (2014) indicated that non‐additive effects tend to be confounded with others, such as common maternal environment; thus, breeders should take care when they estimate non‐additive effects. Du and Hoeschele (2000) described methods to estimate additive and non‐additive variance components in the Bayesian context. In a similar manner, using linear mixed models theory, in the phenotypic and genomic selection context, Lynch and Walsh (1998) and Su et al. (2012), respectively, described methods to estimate additive and nonadditive variance components.

As we will see later, to predict H, the LPSI uses only matrix C and includes matrix A only for the estimation of C (Cerón‐Rojas & Crossa, 2018). The LPSI is the BLP of H, not the BLUP of H (Bulmer, 1980; Cochran, 1951; Henderson, 1963). However, as we have indicated in Equation 1, the BLP (LPSI) and BLUP theory are related, but, as we have indicated earlier, whereas the BLUP predicts g′ = [g1 g2g t ], the LPSI predicts linear combination of the elements of g (see Equations (2), (3), (4)).

3.4. Matrix P as a special case of matrix V

We have indicated that in the LSI context the assumption that only additive effects are transmitted from generation to generation is acceptable. For t traits, matrix V is equal to

vary1yt=Z[CA]Z+RIn=V

where

Z=Z100Z20000Zt

whereas matrices C and R are of size t × t. Here, matrix V is associated with the prediction of vector g′ = [g 1g 2′ … g t ′] (Equation 1). Now suppose that matrix Z* = I nt (where I nt is an identity matrix of size nt × nt) and that the level of inbreeding among individuals is null; then A = I n , and the matrix V can be written as

V=C+R

As we shall see later, in this last case matrix V is associated with the BLP (LPSI) theory. Hereafter we denote it as P, which is the usual notation in the LPSI context.

By the foregoing results it is evident that, in the prediction of the net genetic merit (Equation 2), the LSI theory does not use matrix A. However, matrix A is used for the estimation of C (Cerón‐Rojas & Crossa, 2018). This seems to be a disaventage of the LPSI theory with respect to the BLUP theory. As we have indicated earlier, BLUP predicts vector g′ = [g 1g 2′ … g t ′], not the net genetic merit; this means that BLUP is associated only with a part of the LPSI theory: that associated to the expected genetic gain per trait (see Equations 8 and 10), which also predict the mean values of g′ = [g 1g 2′ … g t ′]. Jeyaruban et al. (1995, Abstract, p. 1) compared the efficiency of BLUP and the LPSI and concluded that: “The relative selection response with the selection indices compared to the BLUP estimates were from 94.5 to 99.4% of BLUP. Inbreeding was higher in the BLUP selected populations, which could offset any advantage of BLUP.”

4. UNCONSTRAINED LINEAR SELECTION INDICES

4.1. The LPSI theory

In this section, we describe the LPSI theory and all the LSIs associated to this theory.

4.2. Basic conditions to construct a valid LPSI in the single‐stage selection context

Conditions to construct a valid LPSI are as follows. (a) The phenotypic value is additively composed of two main parts: the genotypic value and the environmental contribution. (b) The genotypic value is the individual breeding value; thus, it is composed only of additive genes effects. (c) The net genetic merit of an individual is the genotypic economic value (Kempthorne & Nordskog, 1959). (d) Selection should be made at only one stage of the plant's or animal's life cycle. (e) The generations do not overlap. (f) All individuals below a certain level of desirability are culled without exception. (g) Selected individuals have an equal opportunity to have offspring (Hazel & Lush, 1942). (h) The LPSI values in the c th selection cycle and the LPSI values in the (c + 1)th selection cycle are not correlated. (i) The correlation between the LPSI and the net genetic merit should be at its maximum in each selection cycle. These conditions are valid for all unconstrained or constrained LPSIs in the single‐stage selection context. Cerón‐Rojas and Crossa (2018) have given conditions for the unconstrained and constrained LGSI.

4.3. The net genetic merit

Let g q ′ = [Gq 1 Gq 2Gqt ] be a vector of true unobservable genotypic random variables for t traits with multivariate normal distribution and null expectation. Then the q th (q = 1, 2, …, n; = number of individuals) individual net genetic merit is

Hq=wgq (2)

where w′ = [w 1 w 2wt ] is a 1 × t (t = number of traits) vector of known economic weights. The variance of Hq is denoted as σ H 2 = wCw. Smith (1936) called Equation 2 “the genetic value of a plant or line.”

In Equation 2, g q′ = [Gq 1 Gq 2Gqt ] is a vector of random variables whereas in the BLUP equation described earlier, g′ = [g1 g2g t ] is a vector of particular realization of breeding values. Note that using the BLUP of g′ = [g1 g2g t ] (i.e., g^=[g^1g^2g^t]), a predictor of Equation 2 is:

h^=j=1twjg^j

where h^=[H^1H^2H^n] is a vector of estimated net genetic merit values for n indviduals, and wj is the jth element of the vector of economic weights w (Dekkers, 1997; Portes et al., 2020). The predictor h^ combines the BLUP and LPSI theory to predict the net genetic merit.

4.4. The net genetic merit and its relationship with y

For the qth individual, the relationship between Equation 2 and the vector of random phenotypic variables y q = [Yq 1 Yq 2Yqt ] for t traits is

Hq=mH+β(yqμ)+eq (3)

where mH is the mean of Hq , μ′ = [μ1 μ2 … μ t ] is a vector 1 × t of phenotypic means of y q , and eq is the deviation representing the extent to which β′(y q μ) differs from the expected value of Hq (Young, 1964). In Equation 3, y q = [Yq 1 Yq 2Yqt ] is a vector of random variables, whereas in the BLUP equation described earlier, y′ = [y1 y 2y t ] is a vector of particular realization of phenotypes values for t traits. In addition, the jth (= 1, 2, …, t) element of vector μ is equivalent to (Equation 1) when X = 1 (a vector of 1s) and θ = μ j . We have assumed that eq has a normal distribution, null expectation, and variance σ e 2 and that the covariance between any pairs of eq and er (qr) is zero. As we will see later, β′(y q μ) = Iq is the LPSI. It is possible to show that the random error (eq ) and the LPSI are uncorrelated (Bickel & Doksum, 2015). The variance of Hq can be written as σ H 2 = wCw = σ I 2 + σ e 2, where σ I 2 = β′Pβ is the variance of Iq , and P = C + R is the phenotypic covariance matrix of y q = [Yq 1 Yq 2Yqt ]; C is as defined earlier, and R is the residual covariance matrix. All these last three matrices are of size t × t.

Equation 3 is a multiple linear regression model where the independent (y q ) and dependent (Hq ) variables are random and have a joint multivariable normal distribution (Rencher & Schaalje, 2008). Nevertheless, whereas in the standard regression model the values of Hq and y q are observables, in Equation 3 only the y q values are observable. For this reason, Hq should be predicted using, for example, the BLP.

4.5. The BLP of H

Hereafter we will denote the vector of phenotypic trait random variables as Y and its particular realization values as y; and, to simplify notation, we will denote Hq and Iq as H and I, respectively. Suppose that H and Y have a joint multivariate normal distribution and that we are looking for a linear predictor of H assuming that Y is the only information that we can use to predict the unobsevable random variable H. We want to find a predictor function of Y such that this predictor is “close” to H in the sense of the MSPE, which is a measure used in the mathematical theory of prediction (Bickel, 2015). Suppose that E(H) = mH , E(Y) = μ, and that var(H), var(Y) = P, and the covariance between H and Y [cov(H, Y) = wC] exist and are finite, then the BLP of H (or the LPSI) is

E(H|Y=y)=mH+wCP1(yμ)=β0+βy (4)

where β0 = mH βμ, wC and μ are as defined above, P −1 is the inverse of the matrix P, and β = P −1 Cw. Cochran (1951) was the first to indicate that the LPSI is the BLP of H. Equation 4 is unique and unbiased (Christensen, 2011; Searle et al., 2006) and represents a model that is linear in y as well as in β. In addition, it indicates that the problem of predicting H is simply estimating its conditional mean, whereas the LPSI variance is σ I 2 = βP β. The assumption of multivariate normal distribution of H and Y is a sufficient condition for Equation 4 to be linear; moreover, any function that is expressible as Equation 4 is regarded as linear even if the vector β depends on the joint distribution of H and Y (Bickel & Doksum, 2015; Harville, 2018). Smith (1936) called Equation 4 “discriminant function.”

By Equations (1), (2), (3), (4), the random vectors g and Y have joint multivariate normal distribution with mean E(g) = 0 and E(Y) = μ, and covariance matrix varYg=PCCC, where P and C are as defined above. Moreover, the distribution of LPSI and H = wg values is a bivariate normal distribution, with mean E(H) = mH and E(I) = mI and covariance matrix varHI=wCwwCβwCββPβ=σH2σHIσHIσI2, where β, w′, σ H 2, and σ I 2 were defined earlier and σ HI = wCβ is the covariance between H and LPSI.

As we will see later, because we can assume that β0 = mH βμ is a fixed contant, the maximized LPSI expected genetic gain per trait (Equation 10), the maximized correlation between H and LPSI (Equation 11), and the maximized LPSI selection response (Equation 12), are not affected if in Equation 4 we assume that E(H) = 0 and E(Y) = 0. Likewise, when we select parents for the next generation, they will be the same in both cases, that is, when E(H) = 0 and E(Y) = 0 and when E(H) = mH and E(Y) = μ. Thus, we think that there are not loss of generality or loss of accuracy to describe the LSI theory if we assume that E(H) = 0 and E(Y) = 0. In practice, E(Y) = 0 only if y is centered with respect to μ.

4.6. The LPSI prediction error

Because β = P −1 Cw, then σ HI = w and σ I 2 = βP β are the same, from where the variance of the LPSI prediction error is

var(H|I)=σH2σHI2σI2=σH2σI2=σe2 (5)

Equation 5 and equation σ e 2 = σ H 2(1 − ρ HI 2) are the same, where ρ HI 2 = (σ HI 2)/(σ H 2σ I 2) and σ HI 2 = (wCβ)2 are, respectively, the square of correlation and the square of covariance between H and LPSI. We defined the other parameters earlier.

The conditional mean of H is dependent on y (Equation 4); however, the conditional variance of H (Equation 5) is independent of y. Thus, under the multivariate normal assumption, Equations 4 and 5 provide a linear model with constant variance (Rencher & Schaalje, 2008). Moreover, if the marginal density of LPSI is normal and if the conditional density of H given LPSI is normal for every LPSI value, the assumption of bivariate normality of LPSI and H is guaranteed (Arnold et al., 1999; Kagan & Wesolowski, 1996).

4.7. The intensity of selection in the single‐stage context

Let I be transformed into a random variable T, as T = (ImI )/σ I , where E(I) = mI and σ1 is the standard deviation of I. Assume that all I values higher than I *value corresponding to any proportion of selected individuals (p) will be selected; then the variables T* = (I* − mI )/σ I and T are related to selection intensity (k) as

k=1pE(T)=1pττ2πexp12τ2dτ=z(τ)p (6)

where z(τ)=exp{0.5τ2}2π is the height of the ordinate of the normal curve at the lowest value of τ* retained, and p is the proportion of the population of animal or plant lines that will be selected (Smith, 1936). In the selected population, Equation 6 is the uncoditional expectation of the random variable T [E(T)] weighted by p and was obtained by Smith (1936). It is possible to show that the variance of T (weighted by p), over the interval (τ*, ∞), is equal to 1 + k(τ* − k). Several authors (Hazel & Lush, 1942; Kempthorne & Nordskog, 1959; Xu & Muir, 1992; Young, 1964) have given additional details of Equation 6.

4.8. The conditional expectation of H and g given I

Suppose that the expectations of H and g are null, whereas the expectation of I is E(I) = mI ; then the conditional expectation of H and g given I is, respectively,

E(H|I)=Cov(H,I)σI2(ImI)=βCwσIImIσI=βCwσIT (7)

and

E(g|I)=Cov(g,I)σI2(ImI)=CβσI(ImI)σI=CβσIT (8)

where cov(H, I) = βCw is the covariance between H and I, cov(g, I) = C β is the covariance between g and I, T = (ImI )/σ I , and σ I 2 = βP β and σ1 are the variance and standard deviations of I, respectively. In Equation 7, cov(H, I)/σ I 2 is the relationship between E(H|I) and I (or the regression of H on I), and cov(g, I)/σ I 2 is the relationship between E(g|I) and I in Equation 8.

4.9. The LPSI selection response, expected genetic gain per trait, and correlation

By Equations (5), (6), (7), the LPSI selection response (R) and expected genetic gain per trait (E) are, respectively,

R=kβCwσI=kσHρHI (9)

and

E=kCβσI (10)

where k was defined in Equation 6, σH=wCw is the standard deviation of H, and ρ HI = σHI H σ I is the correlation between H and LPSI. All the other parameters of Equations 9 and 10 were defined earlier. Smith (1936) obtained Equation 9, and Hanson and Johnson (1957) were the first to describe Equation 10. In Equation 9, R is a scalar; in Equation 10, E is a vector t × 1 (t = number of traits) of the expected genetic gain value of each trait. In addition, R is the expectation of H, and E is the expectation of g.

To maximize the selection response (Equation 9), Smith (1936) used differential calculus to derive Equation 9 with respect to β. The result reported by this author was the vector of regression coefficients β = P −1 Cw. He indicated that β = P −1 Cw should maximize the selection response. Thus, Smith (1936) did not use the BLP theory to develop the LPSI theory because using differential calculus is not necessary to obtain β = P −1 Cw in the BLP context (Equation 4).

4.10. The maximized correlation between H and I

Because β = P −1 Cw, the maximized correlation between H and I is

ρmax=β'Pβw'Cw=w'CP1Cww'Cw (11)

where all the terms were defined earlier. Equation 11 represents the proportion of the variance of H that can be attributed to the regression relationship with I. The covariance (σ HI ) and correlation are good measures of relationship only for variables with linear trends and are generally unsuitable for non‐normal random variables with a curvilinear relationship (Rencher, 2002). That is, Equation 11 is a measure of linear dependency between H and I and is called the “multiple correlation coefficient,” whereas ρHImax2=σI2σH2 is called “the coefficient of determination or squared multiple correlation” (Rencher & Schaalje, 2008). Note that while ρ HI = σ HI H σ I can take any value in the interval −1.0 and 1.0, Equation 11 gives the maximum value of ρ HI , and when ρ HI = 0, H and I are independent.

4.11. The maximized selection response

Once again, because β = P −1 Cw, the maximized LPSI selection response is

Rmax=kβ'Pβ=kw'CP1Cw (12)

Equation 12 predicts the mean improvement in H attributable to indirect selection on I only when β = P −1 Cw and is proportional to β'Pβ and k. Thus, whereas in Equation 9 the selection response can take any value, Equation 12 gives the maximum value of Equation 9. This is the main difference between the two equations.

Cerón‐Rojas et al. (2006) showed that Equation 12 is related to the Cauchy–Schwarz inequality, which corroborates that β = P −1 Cw is a global minimum when the mean squared difference between I and H, E[(HI)2], is minimized, and a global maximum when the ρ HI correlation between I and H is maximized because Equation 12 is true only when β = P −1 Cw.

4.12. The maximized correlation, selection response, and expected genetic gains have the same form for all LSIs

Equations 912 have the same form for all LSIs described in this review. Thus, as we will see later, the only difference between those equations and the selection response, expected genetic gain per trait, and correlation, associated with the other LSIs, is the type of information that the LSI vector of coefficients and the matrices contain.

4.13. Summary of the LPSI statistical properties

When H and I have a joint bivariate normal distribution; β = P −1 Cw; and P, C, and w are known, the statistical LPSI properties are the following:

  1. Equations 1012 are maximum.

  2. The variance of I and the covariance between H and I are the same.

  3. The maximized correlation between H and I is equal to Equation 11.

  4. The LPSI is unbiased in the sense that E(I) = E(H) (Searle et al., 2006).

  5. The variance of the predicted error (Equation 5) is minimal.

  6. The total variance of H explained by I is σ I 2 = ρ HI 2σ H 2.

  7. Selecting any upper fraction of the population based on values of I (Equation 4) ensures that for that selected proportion, E(H) is maximized (Equation 9). That is, with truncation selection, Equation 4 maximizes the expected mean of the selected individuals (Cochran, 1951; Henderson, 1990).

  8. The I values are appropriate to make selection, i.e., selecting the plants or animals with the highest predictions to be parents of the next generation (Searle et al., 2006).

  9. The LPSI is the BLP of H in the sense that it (a) is a linear function of y; (b) is “best” in the sense that it has minimum mean squared error of prediction; and (c) is optimal and unique among the class of linear predictors based on y (Robinson, 1991; Searle et al., 2006).

Cerón‐Rojas and Crossa (2018) showed that all the LSIs associated with the LPSI have similar properties as those described in Points 1–9, as exemplified by CLPSI and CLGSI. This occur because the CLPSI and CLGSI vector of coefficients are linear combinations of the LPSI and LGSI vector of coefficients or projections of these vectors to a different space.

4.14. Multi‐stage LPSIs

One stage is the animal or plant age at which breeders can measure and select the trait of economic interest. Cochran (1951) and Young (1964) combined the LPSI theory with the independent culling method and developed the optimum multi‐stage LPSI (OMLPSI), which can include several stages in each selection cycle. The OMLPSI takes into consideration the correlation among indices at different stages when making selections and requires multiple integration techniques to derive selection intensities. In addition, after the first selection stage, the OMLPSI values could be non‐normally distributed, and there are problems of convergence when the traits and the index values are highly correlated (Börner & Reinsch, 2012).

Xu and Muir (1992) developed the decorrelated multi‐stage LPSI (DMLPSI) as one possible solution to the OMLPSI problems. After the first selection stage, the DMLPSI values could be normally distributed, and this index does not require multiple integration techniques to derive selection intensities because it maximizes the correlation between the index and the net genetic merit at each stage under the restriction that the covariance between the DMLPSI values at different stages be zero. To obtain truncation points and selection intensities, Xu and Muir (1992) derived a set of nonlinear equations that can be solved iteratively using a multidimensional Newton method. One problem associated with DMLPSI is that the selection responses and correlation with H are lower than the OMLPSI selection response and correlation after the first selection stage (Cerón‐Rojas et al., 2019a, 2019b).

4.15. The OMLPSI

As we have indicated earlier, there is no loss of generality if we assume that E(H) = 0 and E(y) = 0 to describe the LSI theory. Thus, to simplify notation, hereafter we will assume that E(H) = 0 and E(y) = 0. These last two assumptions are common in the multi‐stage LPSI context (Xie & Xu, 1997, 1998; Xu & Muir, 1992; Young, 1964). In practice, E(y) = 0 only if y is centered with respect to μ = E(y).

Let y′ = [Y 1 Y 2Yt ] be a vector of random variables for t traits. We can partition this vector into S (ti  < St, where ti is the number of traits at Stage i) subvectors as y′ = [ω1 ω2ω S ], where ωi=[Y1Y2Yti] is a subvector of y at Stage i. Thus, at Stage S, t=i=1Sti is the total number of selected traits from vector y and, for each stage, the BLP (or OMLPSI) of H is

E(H|ω1)=β1ω1,E(H|ω1,ω2)=β2ω2,,E(H|ω1,ω2,,ωS)=βSy (13)

where ω′*2 = [ω1 ω2]. In Equation 13, the BLP changes at each stage; however, the form of this predictor is the same as that of the LPSI.

At Stage i, let Ii = β i ω i be the index, where βi=[βi1βi2βiti] is the OMLPSI vector of coefficients, and ω i and ti are as defined earlier; then we can construct the OMLPSIs as follows:

I1I2IS=β10β1β200β1β2βSω1ω2ωS

This last result indicates that until Stage S − 1, each index is partial, but at Stage S, IS = β1ω1 + β2 ω 2 + … + β S ω S is a whole index. Young (1964) called to this procedure “the part and whole index selection,” whereas Xu and Muir (1992) called it “selection index updating” because as traits become available, each subsequent index contains all traits available up to that current stage. The above equation is valid for any unconstrained or constrained multi‐stage index.

4.16. The OMLPSI vector of coefficients

Let P = {P ij } be the covariance matrix of vector y′ =[ω1 ω2ω S ], where P ij = cov(ω i , ω j ) is the ij th (i, j = 1, 2, …, S) submatrix of P, and let Λ = {cov(ω i , g)}′ = [C 1 C 2C S ] be the covariance matrix between ω i (i = 1, 2, …, S) and g, where cov(ω i , g) = C i is a submatrix of matrix Λ at Stage i. Moreover, let Qii and Λi=[C1C2Ci] be submatrices of P and Λ until Stage i (Cerón‐Rojas et al. 2019a), respectively, then the OMLPSI vector of coefficients that maximizes Equations 9 and 10 (and the correlation between OMLPSI and H) at Stage i is

βi=Qii1Λiw (14)

where Q ii −1 is the inverse of matrix Q ii , and w′ = [w 1 w 2w t] is as defined earlier.

4.17. Maximized selection response, expected genetic gain per trait, and correlation at Stage i

By Equation 14, the maximized selection response (Rmaxi), expected genetic per trait (Emaxi), and correlation (ρmaxi) at Stage i, respectively, are

Rmaxi=kiβiQiiβi,Emaxi=kiΛiβiβiQiiβi,andρmaxi=βiQiiβiwCw (15)

where ki is the selection intensity. For all stages, the total selection response and expected genetic gain per trait are RT=i=1SRmaxi and ET=i=1SEmaxi, respectively.

4.18. DMLPSI

4.18.1. Covariance matrix to make the DMLPSI values among stages null

Let ω( i −1) = [ω1 ω2ω i −1] and ω( i ) = [ω1 ω2ω i ] (where ωi=[Y1Y2Yti] is a subvector of y′ = [Y 1 Y 2Yt ] at Stage i) be subvectors of y until Stages i − 1 and i, and let ti −1 and ti −1 + ti be the size of vectors ω( i −1) and ω( i ), respectively; then the covariance matrix between ω( i −1) and ω( i ) (Q ( i −1) i ) will be of size ti −1 × (ti −1 + ti ) and can be written as

Qi1i=σ11σ12σ1(ni1)σ21σ22σ2(ni1)σ(ni1)1σ(ni1)2σ(ni1)(ni1)σ1ni1+niσ1ni1+niσ(ni1)ni1+ni

Matrix Q ( i –1) i is a nonsquare and nonsymmetric phenotypic variance matrix and is useful by making the DMLPSI and constrained decorrelated multi‐stage LPSI (CDMLPSI) values independent among stages. By definition, a covariance matrix is symmetric; however, in this case, matrix Q ( i –1) i is of size ti −1 × (ti −1 + ti ) because until Stages i − 1 there are ti −1 traits, whereas at Stage i there are ti −1 + ti traits. This means that at Stage i − 1 we have an index with ti −1 traits, whereas at Stage i we have an index with ti –1 + ti ; that is, the trait selected until Stage i − 1 and the traits selected until Stage i. By this reasoning, as we indicated earlier, Young (1964) called this procedure “the part and whole index selection,” whereas Xu and Muir (1992) called it “selection index updating” because as traits become available, each subsequent index contains all traits available up to that current stage.

4.18.2. The DMLPSI vector of coefficients at Stage i

Let Ii −1 = b i −1 ω i −1 and Ii = b i ω i be the DMLPSIs at Stages i – 1 and i, respectively, and let J i −1 = [I 1 I 2Ii −1] be a vector of DMLPSIs values until Stage i − 1. We need to obtain the DMLPSI vector of coefficients at Stage i such that the covariance between Ii and J i −1 will be null [i.e., cov(Ii , J i –1) = 0]. By this restriction, Ii and Ii −1 are not correlated; hence the name “decorrelated multi‐stage linear phenotypic selection index.” Xu and Muir (1992) showed that cov(Ii , J i –1) = B i −1 Q ( i −1) i b i , where, Bi1 is an upper triangular matrix (Cerón‐Rojas et al. 2019a) where each column contain the vector of coefficients of the DMLPSI, bi1=[b(i1)1b(i1)2b(i1)ni1] is the DMLPSI vector of coefficients at Stage i − 1, Q ( i −1) i is as defined above, and b i is the DMLPSI vector of coefficients at Stage i.

Let S i ( i −1) = Q i ( i −1) B i −1 and S i ( i −1) = S ( i −1) i = B i −1 Q ( i −1) i be the transpose of matrix S i ( i −1), and assume that matrices Q ii , Q i ( i −1), and Λ i and vector w are known. To minimize the mean squared difference between H = wg and Ii = b i ω i (i.e., E[(HIi )2]), under the restriction S ( i −1) i b i = B i −1 Q ( i −1) i b i = 0, it is necessary to minimize the function

f(bi,ξ)=biQiibi2wΛibi+2ξS(i1)ibi

with respect to vectors b i and ξ′ = [ξ1 ξ2 … ξ i −1], where ξ is a vector of Lagrange multipliers. In matrix notation, the derivative results of b i and ξ are

biξ=QiiSi(i1)S(i1)i01Λiw0

from where the DMLPSI vector of coefficients that minimizes E[(HIi )2] at Stage i can be written as

bi=Kiβi (16)

where K i = [I i Φ i ], Φ i = Q ii −1 S i ( i −1)[S ( i −1) i Q ii −1 S i ( i −1)]−1 S ( i −1) i , β i = Q ii −1 Λ i w (Equation 14), Q ii −1 is the inverse of matrix Q ii , and I i is an identity matrix of the same size as Q ii . When the restriction cov(Ii ,J i −1) = 0 is not imposed, β i = b i , and at Stage 1 (= 1), S i ( i −1) = S 1(0) = Q 1(0) B 0 = 0, Φ 1 = Q 11 −1 S 1(0)(S (0)1 Q 11 −1 S 1(0))−1 S 1(0) = 0, and K i = K 1 = [I0] = I, where I is an identity matrix of the same size as Φ 1. Thus, b 1 = K 1 β 1 = Q 11 −1 Λ 1 w = β 1, where Q 11 −1 is the inverse of matrix Q 11 −1, Λ 1 = C 1, and cov(ω 1,g) = C 1. The method to obtain Equation 16 is valid for all selection indices associated with DMLPSI.

The DMLPSI expected genetic gain per trait, selection response, and correlation at Stage i are similar to those of Equation 15 changing β i for b i = K i β i . The only difference between the DMLPSI and OMLPSI is the way the selection intensities are obtained. Xu and Muir (1992) described a general method to obtain the DMLPSI selection intensity, whereas Cerón‐Rojas et al. (2019a, 2019b) described a method to obtain the OMLPSI selection intensity in the two‐stage context. Methods to obtain the DMLPSI and OMLPSI selection intensities are valid for both the unconstrained and constrained multi‐stage indices.

4.18.3. OMLPSI selection intensity

As we have seen in Equation 6, the selection intensity (k), in the single‐stage context, is related to the height of the ordinate of the normal curve z(τ*) and the proportion selected (p) as k = [z(τ*)]/p. Kempthorne and Nordskog (1959) have indicated that

p=τ12πexp12τ2dτ

from where it is evident that k = [z(τ*)]/p is a left truncated normal distribution (Arismendi, 2013; Wilhelm & Manjunath, 2010). A truncated distribution is a conditional distribution resulting when the domain of the parent distribution is restricted to a smaller region (Hattaway, 2010). The equation k = [z(τ*)]/p is also called the “hazard function,” the “hazard rate,” or the “inverse Mill's ratio” of the normal distribution (Rausand & Hϕyland, 2004).

Young (1964) showed that the intensity of selection for the two‐stage case could be obtained in a similar manner as k = [z(τ*)]/p (Equation 6). Thus, suppose that indices I 1 and I 2 have a joint normal distribution and can be transformed into the standardized random normal variables T 1 = (I 1m 1)/σ1 and T 2 = (I 2m 2)/σ with mean zero and variance 1, where m 1 and m 2 are the means, and σ1 and σ2 are the standard deviations of I 1 and I 2, respectively. In this case, the method of selection is to retain animals or plants with T 1c 1 at Stage 1 and T 1 + T 2c 2 at Stage 2, where c 1 and c 2T 1 are truncation points for I 1 and I 2, respectively. Thus, according to Young (1964), the selected population has the bivariate left truncated normal distribution with probability density function given by

hT1,T2(τ1,τ2)=fT1,T2(τ1,τ2)p

where

fT1,T2(τ1,τ2)=12π1ρ122×exp12(1ρ122)τ12+τ222ρ122τ12τ22

and ρ12 is the correlation between T 1 and T 2. The total proportion selected (p = q 1 q 2) can now be written as

p=q1q2=c1c2τ1fT1,T2(τ1,τ2)dτ2dτ1

where c 1 and c 2 – τ1 are truncation points for I 1 and I 2, respectively. It is evident that when the number of stages increases, the number of variables in fT1,T2(τ1,τ2), and the number of integrals in p also increases.

By the foregoing results, in the multi‐stage selection context it is usual to fix a proportion (p) before selection is carried out and then, based on the fixed p value, to determine the proportion qi (i = 1, 2, …, S) for each stage under the restriction

p=i=1Sqi

where S is the number of stages. In a two‐stage selection context, p = q 1 q 2. It is assumed that p is known but qi is unknown. One way of solving the equation p = q 1 q 2 is by trial and error, depending on the importance of each trait (Hazel & Lush, 1942) or each index at each stage. Cerón‐Rojas et al. (2019a, 2019b) have given a general method to estimate the OMLPSI selection intensities in the two‐stage context, whereas Xu and Muir (1992) have provided a general method to obtain the selection intensities for the DMLPSI. The methods to obtain the selection intensities described above are valid for any multi‐stage index. Until now, there has been no general procedure to obtain the OMLPSI selection intensity for any number of stages.

4.19. LGSI

Togashi et al. (2011) originally proposed the LGSI, which is a linear combination of GEBVs; however, Ceron‐Rojas et al. (2015) and Cerón‐Rojas and Crossa (2019) developed the complete unconstrained and contrained LGSI theory. These authors designed the LGSI for selecting genotypes in inbred populations; that is, LGSI exploits the linkage disequilibrium between markers and quantitative trait loci (QTL) produced when inbred lines are crossed to select parents for the next selection cycle. This index uses all available marker information in the prediction because it obtains the GEBV by multiplying the genomic best linear unbiased predictor of all estimated marker effects in the training population by the marker code values of the testing population (VanRaden, 2008).

4.19.1. The genomic breeding values

Let u j be an m × 1 vector of QTL additive effects associated with markers that affect the jth trait, and let L be a matrix n × m (= number of individual and m = number of markers in the population) of coded marker values (e.g., 2 – 2q, 1 – 2q, and –2q for the marker genotypes AA, Aa, and aa, respectively); then

αj=Luj (17)

is the vector of individual genomic breeding values associated with the jth characteristic (j = 1, 2, …, t; t = number of traits) of the candidates for selection. We assumed that α j has multivariate normal distribution with mean 0 and covariance matrix Gσα 2, where σα 2 is the additive genomic variance of α j ; G = LL′|κ is the matrix of additive genomic relationship, κ=l=1m2ql(1ql) in an F2 population; ql is the frequency of allele A; and 1 − ql is the frequency of allele a for the lth marker (l = 1, 2, …, m).

4.19.2. The net genetic merit and its relationship with the genomic breeding values

The net genetic merit is related to the vector of genomic breeding values (Equation 17) as

H=βα+e (18)

where β′ = [β1 β2 … β t ] is the vector of regression coefficients, α′ = [α 1 α 2α t ], and e is the residual with normal distribution, null expectation, and variance σ e 2. It is possible to show that the covariance between βα and e equals zero.

4.19.3. The LGSI

The BLP of H is the conditional expectation of H given α, that is

E(H|α)=wΓΓ1α=wα (19)

where wΓ = cov(H, α)′ is the covariance between H and α, Γ −1 is the inverse matrix of the genomic covariance matrix Γ = var(α) = cov(α,g), w is as defined in Equation 2, and IG = wα is the LGSI.

4.19.4. The estimator of the LGSI

Let u^j be a predictor of the u j vector of QTL effects associated with the markers that affect the jth trait (Cerón‐Rojas & Crossa, 2019; Ceron‐Rojas et al., 2015); then the expectation of u j given u^j is E(uj|u^j)=u^j. This implies that the expectation of H, given the estimator of the LGSI,

I^G=w1α^1+w2α^2+·+wtα^t

is equal to

E(H|I^G)=I^G

where α^j=Lu^j is the vector of GEBVs associated with trait jth.

4.19.5. Selection response and expected genetic gain per trait

By Equations 810, the LGSI selection response and expected genetic gain per trait are, respectively,

RG=kw'ΓwandEG=kΓww'Γw (20)

We have previously defined all parameters of Equation 20. To simplify notation, in Equation 20 we omitted the intervals between selection cycles, which denote the time required for LGSI to complete one selection cycle (Ceron‐Rojas et al., 2015). Using the Cauchy–Schwarz inequality, Cerón‐Roja and Crossa (2020b) showed that in the ssymptotic context, when the number of markers and genotypes tend to be infinite, RG=kw'Γw is the the upper boundary for the unconstrained LPSI selection response. It is possible to show that, in the constext of the constrained LPSI selection response, there is a similar result (Cerón‐Roja & Crossa, 2019).

4.20. Relationships between the LPSI and LGSI selection response

In the asymptotic context, when the number of markers and genotypes tend to infinite, matrix Γ approaches matrix C, and, at the limit, we would expect that Γ = C (Cerón‐Rojas & Sahagún‐Castellanos, 2016). This means that matrix P = C + R approaches matrix P = Γ + R, from where P −1 = (Γ + R)−1 = Γ −1Γ −1(Γ −1 + R−1)−1 Γ −1 is the inverse of matrix P (Searle et al., 2006), whereas Γ −1 and R −1 are the inverses of matrices Γ and R, respectively. In the asymptotic context, let b = P −1 Γw be the LPSI vector of coefficients; then, the LPSI response can be written as

R=kbPb=kwΓP1Γw=kwΓww(Γ1+R1)1w

Assuming that bPb and wΓw are positive semi‐definite (i.e., bPb ≥ 0 and wΓw ≥ 0), then wΓww′(Γ −1 + R −1)−1 w ≥ 0, from where in the asymptotic context, RG R.

In the non‐asymptotic context, when the number of markers and genotypes is low, matrices C and P can be written as C = Γ + Ξ and P = (Γ + Ξ) + R, respectively, where Ξ = CΓ. Thus, the inverse of matrix P is

4.20.

from where b = P −1(Γ + Ξ)w, and the LPSI selection response (R) is

R=kbPb=kw(Γ+Ξ)P1(Γ+Ξ)w=kwΓw+wΞww[(Γ+Ξ)1+R1]1w

This last equation indicates that in the non‐asymptotic context, RG and R are related in three possible ways:

  1. R>RG if wΞw>w[(Γ+Ξ)1+R1]1w

  2. R=RG if wΞw=w[(Γ+Ξ)1+R1]1w

  3. RG>R if wΞw<w[(Γ+Ξ)1+R1]1w

By Points 2 and 3, in the non‐asymptotic context, RG may be equal to or larger than R depending on the size of w′ Ξ w and w′[(Γ + Ξ)−1 + R −1]−1 w.

5. COMBINED LINEAR SELECTION INDICES

A combined LSI to predict H (Equation 2) is a linear combination of phenotypic values and marker scores (LMSI) or phenotypic values and GEBVs (called combined LGSI). Both indices combine information on markers linked to QTL and phenotypic values of the traits in the prediction of H because it is impossible to identify all QTL affecting the economically important traits (Dekkers, 2007; Dekkers & Settar, 2004; Lande & Thompson, 1990; Li, 1998). The LMSI assumes that favorable alleles and their average effects on phenotype are known; however, this assumption is only valid for traits conditioned by major genes and not for quantitative traits that are influenced by the environment. This is important because many QTL with small effects could interact with the environment and among themselves (Heffner et al., 2009; Hospital et al., 1997). Linear molecular selection index efficiency depends on various factors, such as number and density of markers associated with QTL, population size, trait heritability, additive genetic variances explained by markers, and the precision of the estimated effect of gene substitution (Dekkers & Dentine, 1991). Several authors (Gimelfarb & Lande, 1994, 1995; Zhang & Smith, 1992, 1993) have pointed out the effectiveness of the LMSI in inbred populations with large population sizes and traits with low heritability values when only one trait and its associated molecular score are considered. Moreau et al. (2000, 2007) found that the LMSI was more effective than LPSI only in early‐generation testing and that LMSI had the additional disadvantage of increased costs due to molecular marker evaluation.

In the genomic selection context, Dekkers (2007) developed the “combined LGSI” as a possible solution to LMSI problems. This index uses GEBVs (no marker scores) and phenotypic values jointly to predict H and is very similar to the LMSI. We describe the LMSI, the combined LGSI, and the combined optimum and decorrelated multi‐stage LGSI (OMLGSI and DMLGSI, respectively) developed by Cerón‐Rojas and Crossa (2020a).

5.1. The LMSI molecular score

Suppose that the QTL effects combine additively both within and between loci; then the h th unobservable genetic value Gh can be written as

Gh=k=1Nϑkυk

where ϑ k is the effect of the QTL k th, υ k is the number of favorable alleles at the QTL k th (2, 1, or 0), and N is the number of QTL affecting the j th trait of interest. Because the QTL effect values are not observable, the Gh values are also not observable; however, we can use a linear combination of the markers linked to the QTL (sh ) that affect the jth trait to predict Gh as

sh=k=1Mbklk

where sh is a predictor of Gh , bk is the regression coefficient of the linear regression model, lk is the coded value of the k th markers (e.g., 1, 0, and −1 for marker genotypes AA, Aa, and aa, respectively), and M is the number of selected markers linked to the QTL that affect the j th trait. The above equation is called the “marker score” (Lande & Thompson, 1990). The number of selected markers is only a subset of potential markers linked to QTL in the population under selection; thus, the the variance of the sh values should be lower than or equal to the variance of the Gh values.

5.2. The combined LSI

The combined LSI is

IM=βyy+βCφ=[βyβC]yφ=βt (22)

where φ can be the vector of marker scores, s′ = [s 1 s 2st ], or the vector of genomic breeding values α (Equation 18). When φ = s, Equation 22 is the LMSI, whereas when φ = α, Equation 22 is the combined LGSI. Vector y was defined earlier, whereas β′ = [β y β C ], and t′ = [yφ′].

5.3. Vector of coefficients, maximized selection response, and correlation

Let var(s) = S be the covariance matrix of marker score and Γ as defined in Equation 19; then, according to the LPSI theory, the vector of coefficients that maximizes the combined LSI selection response, expected genetic gain per trait, and correlation is

β=O1Πν (23)

where when O=varys=PSSS and Π=vargs=CSSS, Equation 23 is the LMSI vector of coefficients, whereas when O=varyz and Π=vargz, Equation 23 is the combined LGSI vector of coefficients. Vector v′ = [w0′] is the vector of economic weights, where 0 is a null vector of size t × 1. The other parameters were previously defined.

5.4. The maximized combined LSI selection response

By Equation 23, the maximized LMSI and combined LGSI selection response of both indices is

RC=kβOβ (24)

where k is the selection intensity. When S and Γ are null matrices, β y = b = P −1 Cw (the LPSI vector of coefficients), whereas RC = R (the LPSI selection response). In addition, when the number of markers and genotypes increases, matrix Γ tends to matrix C and at the limit Γ = C, from where β y = 0, β C = w, and Equation 24 is equal to RG=kwΓw (Equation 20). That is, the weights and selection response of the LGSI and the combined LGSI will be equal.

5.5. The combined LGSI expected genetic gain per trait and correlation

The maximized combined LGSI expected genetic gain per trait (E C ) and correlation (ρHIC) are, respectively,

EC=kΠββOβandρHIC=βOβwCw (25)

where k is the selection intensity. Note that the denominator of ρHIC (Equation 25) and the denominator of the maximized LPSI correlation (Equation 11) are very similar; however, the numerator of both equations is different. We would expect that βOβbPb, and then ρHICρHI.

5.6. Asymptotic relationship between combined LGSI and LGSI parameters

The asymptotic relationship between E C (Equation 25) and the LGSI expected genetic gain per trait EG (Equation 20) is as follows. When Γ=C, Π=ΓΓΓΓ and β=0w, from where

EC=k2ΓwwΓw=2EG (26)

This means that in the asymptotic context, the CLGSI expected genetic gain per trait is twice the LGSI expected genetic gain per trait. Of course, 2 is only a proportionality constant; thus, in reality, EC=EG. Similarly, when Γ = C and β′ = [0w′], βO β = wΓw and wCw = wΓw in ρHIC(Equation 25); thus, the maximum correlation between H and IC in the asymptotic context will be equal to 1.0, as we would expect.

5.7. Combined multi‐stage LGSIs

Based on the LMSI, Xie and Xu (1998) developed a decorrelated multi‐stage LMSI. They assumed that their index would increase the efficiency of molecular assisted selection in a two‐stage context by first (Stage 1) selecting immature seedlings (embryos) based on molecular score information only, and later, at Stage 2, by selecting individual traits based on molecular score and phenotypic value information jointly. The Xie and Xu (1998) index has the same LMSI problems as previously indicated. For this reason, Cerón‐Rojas and Crossa (2020a) adapted the Dekkers (2007) index to the multi‐stage context using the OMLPSI and DMLPSI approaches. Both these indices use GEBVs and phenotypic information jointly to predict the net genetic merit; then they are free of the problems of the Xie and Xu (1998) index. In the multi‐stage context, the Cerón‐Rojas and Crossa (2020a) indices are as follows: at Stage 1, we can select immature (seedlings, embryos) based on GEBVs information only and, in Stage 2, we select individual traits based on GEBVs and phenotypic values jointly. This is the Xie and Xu (1998) idea but in the genomic selection context.

The approach based on the OMLPSI optimum index will be called “optimum combined multi‐stage LGSI” (OCMLGSI), whereas the index based on the DMLPSI will be called “decorrelated combined multi‐stage LGSI” (DCMLGSI) because, at stage two, both indices use GEBVs and phenotypic information jointly to predict the net genetic merit.

5.8. The OCMLGSI and DCMLGSI parameters at Stage 1

The LGSI and LPSI are part of the OCMLGSI and DCMLGSI. At stage‐one the estimator of the LGSI (I^G=w1α^1+w2α^2+·+wtα^t, where α^j=Lu^j is the vector of GEBV of trait j th) is used to predict H because at this stage, both indices select immature seedlings (or embryos) based only on GEBV information. The OCMLGSI and DCMLGSI maximized selection responses are, respectively,

RO1=kO1wΓwandRD1=kD1wΓw (27)

where kO1 and kD1 (the selection intensities for each index) are the only difference between RO1 and RD1. The maximized correlation between H = wg and IG = w′α is ρHI=wΓwwCw, where wΓw is the standard deviation of the variance of IG = wα, σH=wCw is the standard deviation of H = wg, and C is the covariance matrix of g.

5.9. Covariance matrices for OCMLGSI and DCMLGSI

The main difference between the multi‐stage LPSIs described earlier (OMLPSI and DMLPSI) and OCMLGSI and DCMLGSI is that, in addition to y′ = [Y 1 Y 2Yt ], the latter two indices incorporate α (Equation 18) in the prediction of H. Let η′ = [αy′]; then vector η can be partitioned into S subvectors as η′ = [αy′] = [ω1 ω2ω S ], where vector y has been partitioned into S − 1 subvector as y′ = [ω2 ω3ω S ]. This means that at Stage 1, ω 1 = α (Equation 18), and Ii = β i ω i is the index at Stage i (i = 2, 3, …, S), where βi=[βi1βi2βiti].

Let g′ = [G1 G2 … G t ] be a vector of true unobservable random breeding values for t traits, and let α, y, and ω 1 be as defined earlier. The covariance between α j and G j (j = 1, 2, …, t) is equal to the variance of α j [i.e., cov(α j , Gj ) = σα 2] (Dekkers, 2007); the covariance matrix of α is var(α) = Γ, and the covariance matrices between ω i (i = 1, 2, …, S) and g for S stages is

covω1,gcovω2,gcovωS,g=ΓC2CS=ΩΓ

where at Stage 1, ω 1 = α, cov(ω 1, g) = Γ is the genomic covariance matrix, and from Stage 2 to S, cov(ω i , g) = C i is the i th genotypic covariance submatrix of Ω Γ. Note that while the size of matrix Γ is t × t, the size of matrix C i will be ni × t, where ni is the size of vector ω i . For example, if g′ = [g 1 g 2 g 3 g 4], ω1 = [α 1 α 2 α 3 α 4], and ω1 = [Y 1 Y 2], then Γ will be of size 4×4, and C 2 will be of size 2×4, where 2 is the number of rows and 4 is the number of columns in matrix C 2. By the foregoing results, the covariance matrix of η′ = [αy′] = [ω′1 ω′2 … ω′ S ] is

=ΓΓ12Γ21P22Γ1SP2SΓS1PS2PSS

where all the gamma matrices are genomics matrices, while the others are phenotypic covariance matrices. Matrices Q ii and Λ i are constructed as

Qii=ΓΓ12Γ21P22Γ1iP2iΓi1Pi2PiiandΛi=ΓC2Ci

which are submatrices of and Ω Γ , respectively, until Stage i.

5.10. The OCMLGSI and DCMLGSI at Stage 2

At Stage 2, OCMLGSI and DCMLGSI is

IC=βGα+βyy=βGβyαy=β2t (28)

where y′ = [ω2 ω3ω S ] and α is as defined earlier; β G and β y are vectors of coefficients of genomic and phenotypic weight values, respectively, β2 = [β G β y ], and t′* = [αy′]. The only difference between OCMLGSI and DCMLGSI is the way vector β2 = [β G β y ] is obtained.

5.11. The OCMLGSI parameters at Stage 2

The OCMLGSI vector of coefficients (β 2) that maximizes the response, correlation, and expected genetic gain per trait (Equation 25) at Stage 2 is

β2=Q221Λ2ν (29)

where Q221 is the inverse of matrix Q22=ΓΓΓP22, where ω2=ω1ω2 and P22=var(ω2); Λ2=ΓC2, C2=cov(g,ω2), ν=0w is a vector where 0 is a null vector of size t × 1, and w is the vector of economic weights. The maximized selection response, and the maximized correlation between H and OCMLGSI, respectively, are

RO2=kO2β2Q22β2andρHIC=β2Q22β2wCw (30)

where kO2 is the OCMLGSI intensity at stage two. The total selection response for Stages 1 and 2 is ROt=RO1+RO2.

5.12. The DCMLGSI parameters at Stage 2

The process to obtain the DCMLGSI vector of coefficients is similar to that described in subsection “The DMLPSI vector of coefficients at Stage i,” where we described how to obtain the DMLPSI vector of coefficients (Equation 16). Subsequently, in this subsection we will only provide the main results of the DCMLGSI for Stage 2. Additional details associated with DCMLGSI are in Cerón‐Rojas and Crossa (2020a).

The DCMLGSI vector of coefficients at Stage 2 is

b2=K2β2 (31)

where K 2 = [I 2Φ 2], Φ 2 = Q 22 −1 S 21[S 12 Q 22 −1 S 21]−1 S 12, and β 2 = Q 22 −1 Λ 2 α (Equation 29); and I 2 is an identity matrix of the same size as matrix Q 22, S 21 = Q 21 B 1; and S21 = S 12 = B1 Q 12 is the matrix of constraints similar to the matrix defined in the DMLPSI context. Matrix K 2 transforms the OCMLGSI vector of coefficients (β 2) into Equation 31 and is the only difference with respect to Equation 29. The maximized DCMLGSI selection response and correlation for two stages can be written as in Equation 30 by changing β 2 = Q 22 −1 Λ 2 ν* by Equation 31 and the OCMLGSI selection intensity (kO2) by the DCMLGSI selection intensity (kD2). Finally, note that at Stage 2 B1 = [w0 S −1], where w = b 1 is the vector of economic weights and 0 S −1 is a null vector of size S − 1. The size of matrix Q 12 depend on the number of traits selected at Stage 2.

6. CONSTRAINED LINEAR SELECTION INDICES

By imposing null restrictions on Equations 10, Kempthorne and Nordskog (1959) developed the null restricted LPSI (denoted RLPSI) in the single‐stage context, which imposes restrictions equal to zero on the trait expected genetic gain. Later, Mallard (1972) generalized the RLPSI and developed an optimum constrained LPSI (CLPSI), which imposes restrictions different to zero and includes the RLPSI as a particular case. Both indices mainly affect the expected genetic gain per trait (Equation 10). Whereas in the single‐stage context the CLPSI vector of coefficients is a linear combination of the LPSI vector of coefficients (β = P −1 Cw), in the multi‐stage context, the constrained optimum multi‐stage LPSI (COMLPSI) vector of coefficients is a linear combination of the OMLPSI vector of coefficients (Equation 14). In turn, the CDMLPSI vector of coefficients is also a linear combination of the OMLPSI vector of coefficients.

6.1. Single‐stage CLPSI

6.1.1. CLPSI main objective and Mallard matrix

Assume that μ j is the population mean of the j th trait before selection. The main objective of the CLPSI is to change μ j to μ j + dj , where dj is a predetermined change in μ j imposed by the breeder on Equation 10. Let

N=dr00d10dr0d200drdr1

a Mallard (1972) matrix (r − 1) × r of predetermined proportional gains, where dj (= 1, 2, …, r) is the j th element of vector d′ = [d 1 d 2dr ], and let U′ be a matrix (tr) × t (t = number of traits; r = number of restricted traits) of 1s and 0s, where 1 indicates that the traits are restricted, and 0 indicates that the traits are unrestricted. The size of matrix U′ depends on the number of restricted traits.

6.1.2. Matrix U

We can construct matrix U′ as follows. Suppose that we restrict one of the t traits; then, we can restrict the first of them as U′ = [1 0 0 … 0], the second one as U′ = [0 1 0 … 0], etc. In addition, we can restrict the first and second traits jointly as U=10001000, the first and third traits as U=10000100, the second and third traits as U=01000100, etc. The procedure used to construct matrix U′ is valid for any number of restricted traits (Cerón‐Rojas & Crossa, 2018).

6.1.3. The CLPSI vector of coefficients

Let M′ = NΨ′ be a Mallard (1972) matrix of restrictions, where Ψ′ = UC is a matrix of null restrictions, and N′, U′, and C are as defined earlier. To obtain the CLPSI vector of coefficients we need to minimize the mean squared difference E[(IH)2] under the restriction Mβ = 0 assuming that P, C, U′, N′, and w are known; that is, we need to minimize the function

f(β,v)=βPβ+wCw2wCβ+2ξMβ

with respect to vectors β and ξ′ = [ξ1 ξ2 … ξ r −1], where ξ is a vector of Lagrange multipliers. The derivative results from β and ξ are

PMM0βξ=Cw0

from where the CLPSI vector of coefficients that minimizes E[(IH)2] under the restriction Mβ = 0 is

βC=Kβ (32)

where K = [I t D], D = P −1 M(MP −1 M)M′, and I t is an identity matrix of size t × t. It is possible to show that when N = U, the restriction Mβ = 0 changes to Ψβ = 0, from where Equation 32 changes to the RLPSI vector of coefficients, that is,

bR=KRβ (33)

where K R = [ID], D = P −1 Ψ(ΨP −1 Ψ)−1 Ψ′, P −1 is the inverse of matrix P, and I is an identity matrix t × t. If N = U and U is a null matrix, then b R = β (the LPSI vector of coefficients). This means that the CLPSI is the most general index and it includes the RLPSI and LPSI as particular cases.

Using restriction UCβ = θ*d, Tallis (1985) showed that Equation 32 can be written as

bT=bR+θδ (34)

where b R = K R β (Equation 34), δ* = P −1 Ψ(ΨP −1 Ψ)−1 d, d′ = [d 1 d 2dr ], and θ* = [βΨ(ΨP −1 Ψ)−1 d]/[d′(ΨP −1 Ψ)−1 d]. When θ* = 0, b T = b R , and if θ* = 0 and U′ is a null matrix, b T = β (Equation 14). In addition, when θ* = 1.0, Equation 34 is equal to b T = b R + δ*, which is not an optimum CLPSI (Mallard, 1972; Tallis, 1962). Itoh and Yamada (1987) have shown that Equation 34 can be written as Equation 32, and vice versa. Thus, Equations 32 and 34 express the same result in a different mathematical way.

There are some problems associated with θ*. For example, the θ* values should be higher than zero and θ* ≠ 1.0 because when the value of θ* is negative, the CLPSI moves the population trait means in the opposite direction to the desired direction (Itoh & Yamada, 1987). Problems associated with the RLPSI and CLPSI are (a) due to the constraints, the correlation between RLPSI and CLPSI with the net genetic merit decreases as the number of constraints increases and (b) the CLPSI may change the means in the opposite direction to the desired direction due to the opposite sign values between w and d.

According to Itoh and Yamada (1987), one possible solution to the above problems could be to use the desired gains LPSI theory (Brascamp, 1984; Itoh & Yamada, 1986) to make a selection. The most important aspect of the desired LPSI gains is that it does not require economic weights, which means that the covariance between I and H [Cov(H,I) = wC β] is not defined. In addition, this index maximizes neither the correlation between I and H IH ) nor the selection response (Equation 9). Another problem with this index is that it is not a predictor of H = wg. Thus, we think that the CLPSI is a better option to select than the desired gains LPSI.

6.2. The maximized CLPSI expected genetic gain per trait, correlation, and response

By changing vector β = P −1 Cw in Equations 32, 33, or 34 in Equations 1012, it is possible to obtain the maximized CLPSI expected genetic gain per trait, correlation, and selection response. That is, the LPSI and CLPSI parameters are equivalent, and the only difference is how the vector of coefficients is obtained.

6.3. COMLPSI

According to the single‐stage CLPSI (Equation 32) and OMLPSI theory described above, to obtain the COMLPSI vector of coefficients at Stage i we need to minimize the mean squared difference between the net genetic merit H = wg and the index Ii = β i ω i , i.e., E[(HIi )2], under the restrictions M i β i = 0, where M i ′ = NUΛ i ′ and Λ i ′ is as defined earlier. Supposing that matrices Q ii , U, and Λ i ′ and vectors d and w are known, then at Stage i it is necessary to minimize the function

fO(βi,v)=βiQiiβi2wΛiβi+2ξMiβi

concerning the vector of coefficients β i and the vector of Lagrange multipliers ξ′ = [ξ1 ξ2 … ξ r −1]. The COMLPSI vector of coefficients at Stage i is

βCi=KOiβi (35)

where β i is as defined in Equation 14, KOi=[IiFOi], FOi=Qii1Mi(MiQii1Mi)1Mi, and I i is an identity matrix of the same size as Q ii . When N = U, the vector of coefficients of Equation 35 is similar to Equation 33, and when N = U is a null matrix, Equation 35 is equal to Equation 14. Thus, the COMLPSI is more general than the OMLPSI and includes the OMLPSI and the multi‐stage null phenotypic restricted index as particular cases.

6.4. CDMLPSI

Xie and Xu (1997) extended the DMLPSI to a constrained DMLPSI (CDMLPSI); however, their approach is based on the single‐stage Tallis (1962) constrained index, which is not optimal. Based on the Mallard (1972) index (Equation 32), which is an optimal index, Cerón‐Rojas et al. (2019b) developed an optimal CDMLPSI, which we will describe in this subsection. The main difference between the CDMLPSI and the COMLPSI is that the COMLPSI imposes only one restriction when it solves the LPSI equations to obtain its vector of coefficients, whereas the CDMLPSI imposes the additional restriction that the covariance among the CDMLPSI values between stages be zero. As we shall see, the CDMLPSI vector of coefficients is a linear combination of the OMLPSI vector of coefficients (Equation 14).

Let Ii −1 = δ i −1 ω i −1 and Ii = δ i ω i be the CDMLPSI at Stages i − 1 and i, respectively, and let J i −1 = [I 1 I 2Ii −1] be a vector of CDMLPSIs values until Stage i − 1 so that the covariance between Ii = δ i ω i and J i −1 is null. To obtain δ i , we need to minimize the mean squared difference between H = wg and Ii = δ i ω i {E[(HIi )2]} under the joint restrictions M i δ i = 0 and cov(Ii , J i −1) = B( i −1) Q ( i −1) i δ i = 0, where M i = NU ′Λ′ i , and Λ i is as defined earlier. In this case, B i −1 is an upper triangular matrix (Cerón‐Rojas et al., 2019a, 2019b) where each column contains the vectors of coefficients of the CDMLPSI until Stage i − 1.

In a manner similar to the DMLPSI, let S i ( i −1) = Q i ( i −1) B ( i −1) and S i ( i −1) = S ( i −1) i ; then at Stage i, the CDMLPSI vector of coefficients that maximizes Equations 9 and 10 and the correlation is

δi=KDiβi (36)

where β i is as defined in Equation 14, KDi=[IiFDi], FDi=Qii1Vi(ViQii1Vi)1Vi, Vi=MiSi(i1), Vi=M"iS(i1)i, and I i is an identity matrix of the same size as Q ii . When matrix S i ( i −1) is null, Equations 36 and 35 are the same. When N = U, Equation 36 imposes null restrictions, giving us a null restricted DMLPSI index similar to Equation 33.

According to Equations 35 and 36, matrices KOi=[IiFOi] and KDi=[IiFDi] transform the OMLPSI vector of coefficients (β i ) into the COMLPSI and the CDMLPSI vectors of coefficients, respectively. In addition, the CDMLPSI expected genetic gain per trait, selection response, and the correlation at Stage i are the same as those of the OMLPSI when replacing β i with δi=KDiβi.

6.5. The single‐stage constrained LGSI

In a manner similar to the single‐stage CLPSI context (Equation 32), it can be shown (Cerón‐Rojas & Crossa, 2019) that the difference between the unconstrained LGSI described above and the constrained LGSI (CLGSI) is the projection matrix K G = [I t D G ], where D G = UN(NUΓUN)−1 NUΓ. Matrices N and U were defined in Equation 32, whereas I t is an identity matrix t × t. Matrix K G is idempotent and projects the LGSI vector of coefficients (w = β) into a space that is smaller than the original space of w. Thus, while w is the vector of coefficients of the unconstrained LGSI,

bG=KGw (37)

is the CLGSI vector of coefficients. In the CLGSI context, the selection response and expected genetic gain per trait are the same as those described in Equation 20 when changing w for b G = K G w.

7. ESIM, RESIM, and PPG‐ESIM

In the current review, we describe ESIM, RESIM, and PPG‐ESIM, which were developed in the canonical correlation context, and do not use economic weights when predicting H = wg and making selections. The other indices associated with ESIM (e.g., MESIM and GESIM) can be seen in Cerón‐Rojas and Crossa (2018).

7.1. ESIM

The fundamental difference between ESIM and the LPSI is the interpretation of the vector w. Whereas w is a vector of economic weights (known and fixed) in the LPSI context, in ESIM w is fixed but unknown and can be estimated in each selection cycle as a linear combination of the ESIM vector of coefficients. Moreover, in ESIM, the breeding values (g) and the phenotypic records (y) are two sets of variables that should have maximum correlation (λ j ), from where the CCA theory is the base of ESIM. The ESIM index can be written as I = b E y, where bE = [b 1 b 2b t] is the unknown index vector of coefficients, and t is the number of traits. In the CCA context, the vector of ESIM coefficients (b E ) is called the “canonical vector,” whereas λ j is called the “canonical correlation” (Wilms & Croux, 2016).

For the jth correlation (λ j ), the CCA theory allows us to write ESIM and H = wg as IE=bEjy and HE=wEjg, respectively, where wEj=C1PbEj, and bEjand λ j are obtained from

(P1Cλj2I)bEj=0 (38)

Note that bEj is the jth vector (j = 1, 2, …, t) of the multi‐trait heritability matrix P −1 C and maximizes the correlation ρHI = σHIHσI described in Equation 9 (Cerón‐Rojas & Crossa, 2018). Equation 38 is the ESIM fundamental results obtained by Cerón‐Rojas, Sahagún‐Castellanos, et al. (2008) and is the base for constructing RESIM, PPG‐ESIM, MESIM, and GESIM. To select with ESIM, breeders should use the first eigenvector (bE1) of matrix P −1 C in IE=bE1y and the first λ1 in the ESIM selection response.

7.2. Changing the signs and proportions of the ESIM vector of coefficients

Equation 38 can be written as (P1C)IbEj=λj2IbEj, where I = F −1 F is an identity matrix of size t × t (t = number of traits), and F = diag{f 1 f 1f t} is a diagonal matrix with values equal to any real number, except zero values. Thus, another way of writing Equation 38 is

[F(P1C)F1λj2I]βEj=0

where βEj=FbEj. Matrices P −1 C and F(P −1 C)F −1 are similar, and both have the same eigenvalues but different eigenvectors (Harville, 2008). When the F values are only 1s, vector bEj is not affected; when the F values are only −1s, vector bEj will change its direction, and if the F values are different from 1 and −1, matrix F will change the proportional values of bEj. In practice, bEj is first obtained from Equation 38 and then multiplied by matrix F to obtain βEj=FbEj; that is, βEj is a linear transformation of bEj. Matrix F(P −1 C)F −1 is called “similarity transformation,” and matrix F is called the “transforming matrix” (Cerón‐Rojas & Crossa, 2018). Vector βEj=FbEj can substitute bEj, and in this case, the optimized ESIM index should be written as IEj=βEjy.

7.3. The maximized ESIM selection response, expected genetic gain per trait, and correlation

Changing β = P −1 Cw (Equation 4) for bE1(Equation 38) in Equation 12, the ESIM selection response can be written similarly as the LPSI response. The same is true for the maximized expected genetic gain per trait, while the ESIM correlation is λ1, the square root of λ1 2. Cerón‐Rojas and Crossa (2018) showed the ESIM correlation could also be written as Equation 11.

There are some problems associated with ESIM. For example, when matrix P is not positive definite and matrix C is not positive semidefinite, the eigenvalues of Equation 38 could be negative, and, in such cases, ESIM should not be used to make selections. Nevertheless, Okamoto (1973) showed that when the number of observations was higher than the number of traits, P was symmetric and positive definite, and its eigenvalues were different with probability one. In turn, Gentle (2007) showed that when P and C had the same size, all eigenvalues of P −1 C existed and were finite. Thus, under the Okamoto (1973) and Gentle (2007) results, ESIM is a good option for making phenotypic selection in breeding programs when it is not possible to obtain vector w for the LPSI.

7.4. Summary of the ESIM statistical properties

  1. The variance of IE and the covariance between HE and IE are the same.

  2. The ESIM variance prediction error is (1λ12)σHE2, where σHE2 is the variance of HE .

  3. The relative effectiveness of IE in predicting HE is the ratio of (1λ12)σHE2 over σHE2 (i.e., 1 − λ1 2); thus, the greater λ1 2 is, the more effective IE is in predicting HE .

  4. The mean squared effect of IE on HE or the total variance of HE explained by IE is σIE2=λ12σHE2, and the relative mean squared effect can be measured by λ12.

Cerón‐Rojas and Crossa (2018) showed that all the above results are valid for RESIM, PPG‐ESIM, MESIM, and GESIM. Note that the above ESIM properties are very similar to the LPSI described earlier. This is so because the only diference among the LPSI an ESIM is the method to obtain the vector of coefficients.

7.5. The constrained eigen selection index method

Similar to the CLPSI (Equation 32), the objective of the predetermined proportional gain ESIM (PPG‐ESIM) is to fix r of t (r < t) traits by predicting only the genetic gains of (tr) of them. In Equation 32, we gave details associated with the CLPSI that will be used in the subsequent subsections. The statistical bases and objectives of the PPG‐ESIM are the same as those of the CLPSI described in Equation 32. That is, PPG‐ESIM tries to change μ j to μ j + dj , where dj is a predetermined change in μ j . In the RESIM, dj = 0 (j = 1, 2, …, r), where r is the number of predetermined proportional gains.

7.6. The PPG‐ESIM vector of coefficients

According to Cerón‐Rojas et al. (2016a), the PPG‐ESIM vector of coefficients that maximizes the PPG‐ESIM selection response and expected genetic gain per trait is

(KEP1CλP2It)bP=0 (39)

where matrices K E = [I t Q E ] and Q E = P −1 M(MP −1 M)M′ are the same as those obtained in Equation 32, I t is an identity matrix t × t, and λ P 2 and b p are the eigenvalue and the eigenvector of matrix K E P C, respectively. When N = U, b p = b R (the vector of coefficients of RESIM), and when U′ is a null matrix, b p = b E (the vector of coefficients of ESIM). That is, PPG‐ESIM is more general than RESIM and ESIM and includes the latter two indices as particular cases.

7.7. Relationships among ESIM, PPG‐ESIM, LPSI, and CLPSI

Cerón‐Rojas et al. (2008a, 2016b) described the relationships between the ESIM and PPG‐ESIM vector of coefficients and the LPSI and CLPSI vector of coefficients as follows. When the LPSI and CLPSI vector of coefficients (β and b C = Kβ, respectively) are in the space generated by the eigenvectors of Equations 38 and 39, respectively, β and b C can be written as

β=j=1tαjbEjandbC=j=1tajbPj

where α j * and aj * are constants that can be estimated by least squares, and bEj and bPj are the eigenvectors of Equations 38 and 39. Based on the Bessel inequality, Cerón‐Rojas et al. (2008a) described a method to determine conditions under which β and b C are in the space generated by the eigenvectors of Equations 38 and 39. The above results allowed Cerón‐Rojas et al. (2016b) to find, in the asymptotic context, approximate expectations and variances of the estimators of β and b C .

7.8. Estimators of the LPSI parameters

Cerón‐Rojas and Crossa (2018) have described the REML method to estimate the LSI parameters. Here, we consider the estimators of the LPSI vector of coefficients, selection response, correlation, and expected genetic gain per trait, as

β^=P^1C^w,R^=kβ^P^β^,ρ^HI=β^P^β^wC^w,andE^=kC^β^σ^I

respectively, and the estimated LPSI value will be expressed as I^=β^y. In the above equations, P^(P^1) and C^ are REML estimators of the phenotypic (P) and genotypic (C) covariance matrices, whereas σ^I=β^P^β^ is the estimator of the standard deviation of I^, and wC^w is the estimator of the standard deviation of H.

The REML method is based on projecting the data into a subspace free of fixed effects and maximizing the likelihood function in this subspace (Blasco, 2001). Thus, because P^ (P^1) and C^ are REML estimators, by the invariance property of maximum likelihood estimators (Rencher & Schaalje, 2008), β^,R^, ρ^HI, and E^ are maximum likelihood estimators and they are minimum variance and asymptotic unbiased estimators for β, R, ρ HI , and E (Cerón‐Rojas & Crossa, 2020b; Muirhead, 2005). Cerón‐Rojas and Crossa (2020b) showed that, in effect, R^ and ρ^HI are asymptotic unbiased estimators and these authors described methods to obtain confidence intervals for the expetations of R^ and ρ^HI. The REML estimators can also be used in the CLPSI, ESIM, combined LGSI, multi‐stage LSI, etc.

7.9. Numerical example: LPSI vs ESIM

We compare the ESIM efficiency versus LPSI efficiency using a real data set from commercial egg poultry lines obtained from Akbar et al. (1984). The estimated phenotypic (P^) and genetic (C^) covariance matrices among the rate of lay (RL, number of eggs), age at sexual maturity (SM, days), and egg weight (kg), were P^=240.5795.622.0795.62167.204.582.074.5822.80 and C^=29.8617.904.1317.9018.561.494.131.499.24, respectively. Furthermore, the number of genotypes and the vector of economic weights were n = 3,330 and w′ = [19.54 −3.56 17.01], respectively, and the selection intensity was 10% (k = 1.755) for both indices.

The estimated LPSI vector of coefficients was β^=wP^1C^=1.821.383.25, and the LPSI estimated selection response, expected genetic gain per trait, and correlation was R^=kβ^P^β^=74.91, E^=kβ^C^β^P^β^=[2.702.200.84], and ρ^HI=β^P^β^wC^w=0.362, respectively.

Because in the ESIM context b^E1b^E1=1, the best way of comparing ESIM results versus LPSI results is when the LPSI coefficient vector is normalized (i.e., when the LPSI coefficient vector is equal to β^*=β^/β^β^ and then β^β^=1); however, it can be shown that the normalization process only affects the estimated LPSI selection response because in that case, R^=74.91 is divided by β^β^. For example, for this data set result, β^β^=15.76; then, the estimated LPSI selection response using β^=β^/β^β^ will be R^=74.9115.74=4.75, while the rest of the estimated LPSI parameters will be the same. When 0<β^β^<1 and 1<R^, the values of R^ will increase, but when 1<β^β^, the values of R^ will decrease, as in this example. The product β^β^ does not affect ρ^HI because it is invariant to scale change. Also, β^β^ does not affect E^ because β^β^ appears in the numerator and denominator of both estimated parameters.

In ESIM, the sign and proportion of the expected genetic gain values for traits RL, SM, and egg weight should be according to the breeder's interest. For example, if the breeder's interest is that the expected genetic gain per trait for RL be positive and negative for SM, the sign and proportion of the values of the first eigenvector should be modified using a linear combination of b^E1 as described earlier (i.e., β^E1=Fb^E1) in order to achieve expected genetic gain per trait values in RL and SM.

The information needed to obtain the estimated ESIM parameters is matrix P^1C^. We need to find the eigenvalues and eigenvectors of the equation [(P^1C^)(P^1C^)μ^jI]b^Ej=0, where μ^j=λ^Ej4(Cerón‐Rojas & Crossa, 2018). In this case, μ^1=λ^E14=0.2115, λ^E12=0.4599 and the estimated ESIM correlation was λ^E1=0.6782. The estimated ESIM eigenvector of coefficients is b^E1=[0.17010.02590.9851], and the estimated ESIM index can be constructed as

I^E=0.170RL+0.026SM+0.985EW

The estimated ESIM selection response and expected genetic gain per trait were R^E=1.755b^E1P^b^E1=9.54 and E^E=1.755b^E1C^b^E1P^b^E1=[3.101.613.18], respectively. Because the estimated LPSI selection response was R^=(74.91/15.74)=4.75, the estimated ESIM selection response was higher than the estimated LPSI response for this data set.

Now suppose that the breeder's interest is to increase RL and decrease SM; then E^ is a good result but E^E is wrong. We can change the sign and proportion of E^E by transforming b^E1 as β^E1=Fb^E1 using a convenient matrix F such as

F=900010001.

In such a case, β^E1=b^E1F=[1.5310.0260.981], R^E=1.755β^P^β^=42.44, and E^E=1.755(β^C^)/(β^P^β^)=2.9901.850.205. However, vector β^E1 has not been normalized. To normalize β^E1, we need to divide it by β^E1β^E1=3.314, but β^E1β^E1 should only affect R^E=42.44, which should be divided by 3.314; that is, R^E=42.44/3.314=12.806. According to the theory of similar matrices (Harville, 2008), the estimated maximized ESIM correlation (λ^E1=0.6782) should not be affected by matrix F.

7.10. RIndSel

RIndSel is a powerful freely available Java software useful for estimating the LPSI, LGSI, CLPSI, ESIM, etc., parameters and selecting individual candidates as parents for the next selection cycle. It is available in the CIMMYT (for its Spanish acronym, or International Maize and Wheat Improvement Center) RIndSel Selection indices for plant breeding repository at https://data.cimmyt.org/dataverse/root?q=rindsel. RIndSel is compatible with Windows XP, 7, 8, and 10 and can be installed on 32‐bit and 64‐bit computers. To input the data information, the user must create an Excel file (saved as a comma‐delimited “.csv”) with the data information to be analyzed, from where the software reads that information and estimates the index parameters.

In the linear mixed model context (Searle et al., 2006), RIndSel uses REML and two experimental designs (a randomized complete block design and a lattice or alpha lattice design) to estimate the phenotypic and genotypic covariance matrices. This software presents the results of the estimated index parameters in two Excel files (e.g., alloutSmith.csv and outSmith.csv, where, in this case, “Smith” denotes the LPSI used for the analysis) and one text file (outextSmith.txt). File “alloutSmith.csv” contains all the trait adjusted means for all genotypes, and the estimated index values, whereas file “outSmith.csv” contains the genotypes and the trait values selected with the index values after these have been ranked. File “outextSmith.txt” contains all information associated with the estimated index parameters such as the estimated phenotypic and genotypic covariance matrices, the estimated selection response, etc. A complete user manual with instructions on how to use RIndSel in plant breeding is available at https://data.cimmyt.org/dataset.xhtml?persistentId=hdl:11529/10854.

7.11. Summary of the main results of this review

In this manuscript, we have summarized and expanded upon the results of the following authors:

  1. Smith (1936) established the LPSI theory on which all LSIs described in this work are based.

  2. Cochran (1951) was the first to indicate that the LPSI is the BLP of the net genetic merit and was the first to develop the two‐stage optimum LPSI.

  3. Kempthorne and Nordskog (1959) were the first to develop the RLPSI.

  4. Young (1964) combined the LPSI theory with the culling selection method and developed the OMLPSI in the multi‐stage context, from where Cerón‐Rojas et al. (2019b) developed the COMLPSI and Cerón‐Rojas and Crossa (2020a) developed a COMLGSI.

  5. Mallard (1972) developed the first optimum constrained LPSI, which allows imposing constraints diferent to zero on the expected genetic gain per trait and includes the RLPSI as a particular case.

  6. Xu and Muir (1992) developed the DMLPSI, from where Cerón‐Rojas et al. (2019b) developed a constrained optimum DMLPSI and Cerón‐Rojas and Crossa (2020a) developed a DCMLGSI.

  7. Cerón‐Rojas et al. (2008a) and Cerón‐Rojas et al. (2016b) developed the ESIM, RESIM, and PPG‐ESIM in the CCA context.

  8. Ceron‐Rojas et al. (2015) and Cerón‐Rojas and Crossa (2019) expanded the unconstrained and constrained LPSI to the genomic selection context and developed the unconstrained and constrained LGSI.

This work complements the classic characterizations of the LSI made by several authors (Bulmer, 1980; Cochran, 1951; Henderson, 1963) and indicates that the LSI theory is a general mathematical‐statistical framework that should allow breeders to confidently use it in plant and animal breeding programs.

8. CONCLUSIONS

Populations of plants or animals are sets of individuals that, in turn, are sets of phenotypic characteristics, such as color, weight, height, etc. Thus, when breeders select parents in a breeding program, they select individuals, not single characteristics, and the selection of any set of traits on the individuals changes not only the mean of the individual's unselected traits but also the correlation between selected and unselected traits (Pearson, 1903). Then, in breeding programs, breeders should select parents using LSI because the prediction procedures with univariate models do not contemplate the genetic correlations between traits. In practice, evaluating plants or animals requires several simultaneous traits. For example, breeders studying yield and grain quality register phenotypic data that include yield components (e.g., grain weight or biomass), grain quality (e.g., flavor, shape, color, nutrient content), and resistance to biotic and abiotic stresses (Jia & Jannink, 2012).

We have described the LSI statistical theory assuming that the net genetic merit and the vector of phenotypic values have multivariate normal distribution. Under this assumption, and when the phenotypic and genotypic covariance matrices are known, the LSIs are particular cases of the BLP theory. Furthermore, we found that regardless of the LSI characteristics (single‐stage or multi‐stage, unconstrained or constrained, etc.), the LSI parameters (selection response, expected genetic gain per trait, and correlation) are the same.

The BLP theory provides the mathematical‐statistical framework to develop the LSI theory. Thus, the LSI is a particular case or an application of the BLP to plant and animal breeding selection. This result allows breeders to locate the LSI theory in a general mathematical framework and then confidently use it when making selections. Nevertheless, some problems of the LSI not discussed in this work are associated with the estimation of the LSI parameters. For example, when the phenotypic and genotypic covariance matrices are estimated, the LPSI could not be BLP; indeed, it is not linear in y (Searle et al., 2006). Nevertheless, Cerón‐Rojas and Crossa (2020b) found that the LPSI is a good predictor of the net genetic merit including when REML estimates the phenotypic and genotypic covariance matrices. Because of this, breeders should use the LSI theory in plant breeding with confidence.

AUTHOR CONTRIBUTIONS

J. Jesus Cerón‐Rojas: Conceptualization; Investigation. Jose Crossa: Conceptualization; Investigation.

CONFLICT OF INTEREST

The authors do not have any conflict of interest.

ACKNOWLEDGMENTS

The authors are grateful to Dr. Roberto Fritsche‐Neto, Senior Scientist‐IRRI, for reading and revising the original version of the manuscript. We very much appreciate the contribution of one anonymus reviewer, who carefully read and corrected the original manuscript work several times during the past months. We introduced many important suggestions made by the anonymous reviewer that significantly improved the quality of the manuscript and its presentation and readability. We also thank Associate Editor Manjit Kang, who invested a significant time handling the revision of this study. Open Access fees are received from the Bill and Melinda Gates Foundation. We acknowledge the financial support provided by the Bill and Melinda Gates Foundation [INV‐003439 BMGF/FCDO Accelerating Genetic Gains in Maize and Wheat for Improved Livelihoods (AG2MW)] as well as USAID projects [Amend. No. 9 MTO 069033, USAID‐CIMMYT Wheat/AGGMW, AGG‐Maize Supplementary Project, AGG (Stress Tolerant Maize for Africa)]. We are also thankful for the financial support provided by the Foundations for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) in Norway through NFR grant 267806, as well as the CIMMYT CRP.

Cerón‐Rojas, J. J. , & Crossa, J. (2022). The statistical theory of linear selection indices from phenotypic to genomic selection. Crop Science, 62, 537–563. 10.1002/csc2.20676

Assigned to Associate Editor Manjit Kang.

REFERENCES

  1. Akbar, M. K. , Lin, C. Y. , Gyles, N. R. , Gavora, J. S. , & Brown, C. J. (1984). Some aspects of selection indices with constraints. Poultry Science, 63, 1899–1905. 10.3382/ps.0631899 [DOI] [Google Scholar]
  2. Alvarado, G. , Pacheco, A. , Perez‐Elizalde, S. , Burgueño, J. , & Rodriguez, F. M. (2018). RinSel: Selection indices with R. pp. 243–256. In Linear selection indices in modern plant breeding (p. 10). Springer. [Google Scholar]
  3. Andersson, E. , Spanos, K. , Mullin, T. , & Lindgren, D. (1998). Phenotypic selection compared to restricted combined index selection for many generations. Silva Fennica, 32(2), 111–120. 10.14214/sf.689 [DOI] [Google Scholar]
  4. Arismendi, J. C. (2013). Multivariate truncated moments. Journal of Multivariate Analysis, 117, 41–75. 10.1016/j.jmva.2013.01.007 [DOI] [Google Scholar]
  5. Arnold, B. C. , Castillo, E. , & Sarabia, M. (1999). Conditional specification of statistical models. Springer. [Google Scholar]
  6. Baker, R. J. (1986). Selection indices in plant breeding. CRC Press. [Google Scholar]
  7. Bickel, P. J. , & Doksum, K. A. (2015). Mathematical statistics: Basic ideas and selected topics (Volume 1, 2nd ed.). CRC Press Taylor & Francis Group. [Google Scholar]
  8. Blasco, A. (2001). The Bayesian controversy in animal breeding. Journal of Animal Science, 79, 2023–2046. 10.2527/2001.7982023x [DOI] [PubMed] [Google Scholar]
  9. Börner, V. , & Reinsch, N. (2012). Optimising multi‐stage dairy cattle breeding schemes including genomic selection using decorrelated or optimum selection indices. Genetics Selection Evolution, 44, 1–11. 10.1186/1297-9686-44-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bramscap, E. W. (1984). Selection indices with constrains. Animal Breeding Abstract, 52(9), 645–654. [Google Scholar]
  11. Bulmer, M. G. (1980). The mathematical theory of quantitative genetics. Lectures in biomathematics. Clarendon Press. [Google Scholar]
  12. Carlin, B. P. , & Louis, T. A. (2000). Bayes and empirical Bayes methods for data analysis (2nd ed.). Chapman & Hall/CRC. [Google Scholar]
  13. Cerón‐Rojas, J. J. , Crossa, J. , Sahagún‐Castellanos, J. , Castillo‐González, F. , & Santacruz‐Varela, A. (2006). A selection index method based on eigen analysis. Crop Science, 46, 1711–1721. 10.2135/cropsci2005.11-0420 [DOI] [Google Scholar]
  14. Cerón‐Rojas, J. J. , Sahagún‐Castellanos, J. , Castillo‐González, F. , Santacruz‐Varela, A. , & Crossa, J. (2008). A restricted selection index method based on eigenanalysis. Journal of Agricultural Biological and Environmental Statistics, 4, 421–438. [Google Scholar]
  15. Cerón‐Rojas, J. J. , Castillo‐González, F. , Sahagún‐Castellanos, J. , Santacruz‐Varela, A. , Benítez‐Riquelme, I. , & Crossa, J. (2008). A molecular selection index method based on eigenanalysis. Genetics, 180, 547–557. 10.1534/genetics.108.087387 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Ceron‐Rojas, J. J. , Crossa, J. , Arief, V. N. , Basford, K. , Rutkoski, J. , Jarquín, D. , Alvarado, G. , Beyene, Y. , Semagn, K. , & DeLacy, I. (2015). A genomic selection index applied to simulated and real data. Genes Genomes Genetics, 5, 2155–2164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Cerón‐Rojas, J. J. , & Sahagún‐Castellanos, J. (2016). Multivariate empirical Bayes to predict the plant breeding values. Agrociencia, 50, 633–648. [Google Scholar]
  18. Cerón‐Rojas, J. J. , Crossa, J. , Toledo, F. H. , & Sahagún‐Castellanos, J. (2016a). A predetermined proportional gains eigen selection index method. Crop Science, 56, 2436–2447. 10.2135/cropsci2015.11.0718 [DOI] [Google Scholar]
  19. Cerón‐Rojas, J. J. , Crossa, J. , & Sahagún‐Castellanos, J. (2016b). Statistical sampling properties of the coefficients of three phenotypic selection indices. Crop Science, 56, 51–58. 10.2135/cropsci2015.03.0189 [DOI] [Google Scholar]
  20. Cerón‐Rojas, J. J. , & Crossa, J. (2018). Linear selection indices in modern plant breeding. Springer. 10.1007/978-3-319-91223-3 [DOI] [Google Scholar]
  21. Cerón‐Rojas, J. J. , & Crossa, J. (2019). Efficiency of a constrained linear genomic selection index to predict the net genetic merit in plants. Genes/Genomes/Genetics, 9, 3981–3994. 10.1534/g3.119.400677 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Cerón‐Rojas, J. J. , Toledo, F. H. , & Crossa, J. (2019a). The relative efficiency of two multi‐stage linear phenotypic selection indices to predict the net genetic merit. Crop Science, 59, 1037–1051. 10.2135/cropsci2018.11.0678 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Cerón‐Rojas, J. J. , Toledo, F. H. , & Crossa, J. (2019b). Optimum and decorrelated constrained multi‐stage linear phenotypic selection indices theory. Crop Science, 59, 2585–2600. 10.2135/cropsci2019.04.0241 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Cerón‐Rojas, J. J. , & Crossa, J. (2020a). Combined multi‐stage linear genomic selection indices to predict the net genetic merit in plant breeding. G3: Genes/Genomes/Genetics, 10, 2087–2101. 10.1534/g3.120.401171 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Cerón‐Rojas, J. J. , & Crossa, J. (2020b). Expectation and variance of the estimator of the maximized selection response of linear selection indices with normal distribution. Theoretical and Applied Genetics, 133(9), 2743–2758. 10.1007/s00122-020-03629-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Christensen, R. (2011). Plane answers to complex questions: The theory of linear models (4th ed.). Springer. [Google Scholar]
  27. Cochran, W. G. (1951). Improvement by means of selection. In J. Neyman (Ed.), Proceedings of the second Berkeley symposium on mathematical statistics and probability (pp. 449—470). University of California Press. [Google Scholar]
  28. Cockerham, C. C. (1954). An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics, 39, 859–882. 10.1093/genetics/39.6.859 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Crawley, M. J. (2013). The R book (2nd ed.). John Wiley & Sons. [Google Scholar]
  30. Dekkers, J. C. M. (2007). Prediction of response to marker‐assisted and genomic selection using selection index theory. Journal of Animal Breeding and Genetics, 124, 331–341. 10.1111/j.1439-0388.2007.00701.x [DOI] [PubMed] [Google Scholar]
  31. Dekkers, J. C. M. , & Settar, P. (2004). Long‐term selection with known quantitative trait loci. Plant Breeding Reviews, 24, 311–335. [Google Scholar]
  32. Dekkers, J. C. M. , & Gibson, J. P. (1997). Applying breeding objectives to dairy cattle improvement. Journal of Dairy Science, 81(2), 19‐35. [DOI] [PubMed] [Google Scholar]
  33. Dekkers, J. C. M. , & Dentine, M. R. (1991). Quantitative genetic variation associated with chromosomal markers in segregating populations. Theoretical and Applied Genetics, 81, 212–220. 10.1007/BF00215725 [DOI] [PubMed] [Google Scholar]
  34. Du, F.‐X. , & Hoeschele, I. (2000). Estimation of additive, dominance and epistatic variance components using finite locus models implemented with a single‐site Gibbs and a descent graph sampler. Genetics Research, 76, 187–198. 10.1017/S0016672300004614 [DOI] [PubMed] [Google Scholar]
  35. Eisenhart, C. (1947). The assumptions underlying the analysis of variance. Biometrics, 3(1), 1–21. 10.2307/3001534 [DOI] [PubMed] [Google Scholar]
  36. Elston, R. C. (1963). A weight‐free index for the purpose of ranking or selection with respect to several traits at a time. Biometrics, 19, 85–97. 10.2307/2527573 [DOI] [Google Scholar]
  37. Finney, D. J. (1962). Genetics gains under three methods of selection. Genetics Research, 3, 417–423. 10.1017/S0016672300003256 [DOI] [Google Scholar]
  38. Fisher, R. A. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Earth and Environmental Science, 52, 399–433. 10.1017/S0080456800012163 [DOI] [Google Scholar]
  39. Gentle, J. E. (2007). Matrix algebra theory: Computations and applications in statistics. Springer. [Google Scholar]
  40. Gianola, D. , & de los Campos, G. (2009). Inferring genetic values for quantitative traits non‐parametrically. Genetics Research, 90, 525–540. 10.1017/S0016672308009890 [DOI] [PubMed] [Google Scholar]
  41. Gianola, D. , Perez‐Enciso, M. , & Toro, M. A. (2003). On marker‐assisted prediction of genetic value: Beyond the ridge. Genetics, 163, 347–365. 10.1093/genetics/163.1.347 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Gimelfarb, A. , & Lande, R. (1994). Simulation of marker‐assisted selection in hybrid populations. Genetic Research, 63, 39‐47. 10.1017/S0016672300032067 [DOI] [PubMed] [Google Scholar]
  43. Gimelfarb, A. , & Lande, R. (1995). Marker‐assisted selection and marker‐QTL associations in hybrid populations. Theoretical and Applied Genetics, 91, 522‐528. 10.1007/BF00222983 [DOI] [PubMed] [Google Scholar]
  44. Hanson, W. D. , & Johnson, H. W. (1957). Methods for calculating and evaluating a general selection index obtained by pooling information from two or more experiments. Genetics, 42(4), 421–432. 10.1093/genetics/42.4.421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Harville, D. A. (1975). Index selection with proportionality constraints. Biometrics, 31, 223–225. 10.2307/2529722 [DOI] [PubMed] [Google Scholar]
  46. Harville, D. A. (2008). Matrix algebra from a statistician's perspective. Springer. [Google Scholar]
  47. Harville, D. A. (2018). Linear models and the relevant distributions and matrix algebra. CRC Press Taylor & Francis Group. [Google Scholar]
  48. Hattaway, J. T. (2010). Parameter estimation and hypothesis testing for the truncated normal distribution with applications to introductory statistics grades [Master's thesis]. https://scholarsarchive.byu.edu/etd/2053/
  49. Hazel, L. N. (1943). The genetic basis for constructing selection indexes. Genetics, 8, 476–490. 10.1093/genetics/28.6.476 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Hazel, L. N. , Dickerson, G. E. , & Freeman, A. E. (1994). The selection index: Then, now, and for the future. Journal of Dairy Science, 77, 3236‐3251. 10.3168/jds.S0022-0302(94)77265-9 [DOI] [PubMed] [Google Scholar]
  51. Hazel, L. N. , & Lush, J. L. (1942). The efficiency of three methods of selection. Journal of Heredity, 33, 393–399. 10.1093/oxfordjournals.jhered.a105102 [DOI] [Google Scholar]
  52. Henderson, C. R. (1990). Statistical methods in animal improvement: Historical overview. In Gianola D. & Hammond K. (Eds.), Advances in statistical methods for genetic improvement of livestock (pp. 2–14). Springer‐Verlag. [Google Scholar]
  53. Henderson, C. R. (1984). Applications of linear models in animal breeding. University of Guelph. [Google Scholar]
  54. Henderson, C. R. (1963). Selection index and expected genetic advance. In Statistical genetics and plant breeding (pp. 141—163). National Academy of Science‐National Research Council. [Google Scholar]
  55. Henderson, C. R. (1949). Estimation of changes in herd environment. Journal of Dairy Science, 32, 706–715. [Google Scholar]
  56. Henderson, C. R. (1950). Estimation of genetic parameters. Annals of Mathematical Statistics, 21, 309–310. [Google Scholar]
  57. Heffner, E. L. , Sorrells, M. E. , & Jannink, J.‐L. (2009). Genomic selection for crop improvement. Crop Science, 49, 1–12. 10.2135/cropsci2008.08.0512 [DOI] [Google Scholar]
  58. Hill, W. G. (2010). Understanding and using quantitative genetic variation. Philosophical Transactions of the Royal Society, 365(B), 73–85. 10.1098/rstb.2009.0203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Holland, J. B. (2001). Epistasis and plant breeding. Plant Breeding Review, 21, 27–112. [Google Scholar]
  60. Hospital, F. , Moreau, L. , Lacoudre, F. , Charcosset, A. , & Gallais, A. (1997). More on the efficiency of marker‐assisted selection. Theoretical and Applied Genetics, 95, 1181–1189. 10.1007/s001220050679 [DOI] [Google Scholar]
  61. Hotelling, H. (1936). Relations between two sets of variables. Biometrika, 28, 321–377. 10.1093/biomet/28.3-4.321 [DOI] [Google Scholar]
  62. Itoh, Y. , & Yamanda, Y. (1986). Re‐examination of selection index for desired gains. Genetics, Selection, Evolution., 18(4), 499–504. 10.1186/1297-9686-18-4-499 [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Itoh, Y. , & Yamada, Y. (1987). Comparisons of selection indices achieving predetermined proportional gains. Genetics, Selection, Evolution., 19(1), 69–82. 10.1186/1297-9686-19-1-69 [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Jeyaruban, M. G. , Gibson, J. P. , & Gowe, R. S. (1995). Comparison of index selection and best linear unbiased prediction for simulated layer poultry data. Poultry Science, 74, 1566–1576. 10.3382/ps.0741566 [DOI] [PubMed] [Google Scholar]
  65. Jia, Y. , & Jannink, J.‐L. (2012). Multiple‐trait genomic selection methods increase genetic value prediction accuracy. Genetics, 192, 1513–1522. 10.1534/genetics.112.144246 [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Kempthorne, O. , & Nordskog, A. W. (1959). Restricted selection indices. Biometrics, 15, 10–19. 10.2307/2527598 [DOI] [Google Scholar]
  67. Kempthorne, O. (1954). The correlation between relatives in a random mating population. Proceedings of the Royal Society of London, 143(B), 102–113. [PubMed] [Google Scholar]
  68. Kagan, A. , & Wesolowski, J. (1996). Normality via conditional normality of linear forms. Statistics & Probability Letters, 29, 229–232. [Google Scholar]
  69. Lande, R. , & Thompson, R. (1990). Efficiency of marker‐assisted selection in the improvement of quantitative traits. Genetics, 124, 743–756. 10.1093/genetics/124.3.743 [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Li, Z. (1998). Molecular analysis of epistasis affecting complex traits. In Paterson A. H. (Ed.), Molecular dissection of complex traits (pp. 119–130). CRC Press. [Google Scholar]
  71. Lin, C. Y. (1978). Index selection for genetic improvement of quantitative characters. Theoretical and Applied Genetics, 52, 49–56. 10.1007/BF00281316 [DOI] [PubMed] [Google Scholar]
  72. Lynch, M. , & Walsh, B. (1998). Genetics and analysis of quantitative traits. Sinauer. [Google Scholar]
  73. McCulloch, C. A. , & Searle, S. R. (2001). Generalized, linear, and mixed models. John Wiley and Sons. [Google Scholar]
  74. Mallard, J. (1972). The theory and computation of selection indices with constraints: A critical synthesis. Biometrics, 28, 713–735. 10.2307/2528758 [DOI] [Google Scholar]
  75. Moreau, L. , Hospital, F. , & Whittaker, J. (2007). Marker‐assisted selection and introgression. pp. 718–751. In Balding D. J., Bishop M., & Cannings C. (Eds.), Handbook of statistical genetics (Volume 1, 3rd ed.), Wiley. [Google Scholar]
  76. Moreau, L. , Lemarié, S. , Charcosset, A. , & Gallais, A. (2000). Economic efficiency of one cycle of marker‐assisted selection efficiency. Crop Science, 40, 329–337. 10.2135/cropsci2000.402329x [DOI] [Google Scholar]
  77. Mrode, R. A. (2014). Linear models for the prediction of animal breeding values (3rd ed.), CABI. [Google Scholar]
  78. Muirhead, R. J. (2005). Aspects of multivariate statistical theory. John Wiley & Sons. [Google Scholar]
  79. Nordskog, A. W. (1978). Some statistical properties of an index of multiple traits. Theoretical and Applied Genetics, 52, 91–94. 10.1007/BF00281322 [DOI] [PubMed] [Google Scholar]
  80. Okamoto, M. (1973). Distinctness of the eigenvalues of a quadratic form in a multivariate sample. The Annals of Statistics, 1(4), 763–765. 10.1214/aos/1176342472 [DOI] [Google Scholar]
  81. Pacheco, A. , Pérez, S. , Alvarado, G. , Ceron, J. , Rodríguez, F. , Crossa, J. , & Burgueño, J. (2017). RIndSel: Selection indices for plant breeding. CIMMYT Research Data & Software Repository Network. [Google Scholar]
  82. Pearl, R. , & Surface, F. M. (1909). Selection index numbers and their use in breeding. The American Naturalist, 43(511), 385–400. 10.1086/279071 [DOI] [Google Scholar]
  83. Pearson, K. (1903). Mathematical contributions to the theory of evolution. XI. On the influence of natural selection on the variability of organs. Philosophical Transactions of the Royal Society, 200(A), 1–66. [Google Scholar]
  84. Perez‐Elizalde, S. , Cerón‐Rojas, J. J. , Crossa, J. , Fleury, D. , & Alvarado, G. (2014). Rindsel: An R package for phenotypic and molecular selection indices used in plant breeding. In Fleury D. & Whitford R. (Eds.), Crop breeding. Methods in molecular biology (methods and protocols) (Volume 1145). Humana Press. [DOI] [PubMed] [Google Scholar]
  85. Philipson, J. , Banos, G. , & Arnason, T. (1994). Present and future uses of methodology in dairy cattle selection index. Journal of Dairy Science, 77, 3252–3261. 10.3168/jds.S0022-0302(94)77266-0 [DOI] [PubMed] [Google Scholar]
  86. Piepho, H. P. , Möhring, J. , Melchinger, A. E. , & Büchse, A. (2008). BLUP for phenotypic selection in plant breeding and variety testing. Euphytica, 161, 209–228. 10.1007/s10681-007-9449-8 [DOI] [Google Scholar]
  87. Portes, J. V. , Menezes, G. R. O. , Macneil, M. D. , Da Silva, L. O. C. , Gondo, A. , & Braccini Neto, J. (2020). Selection indices for Nellore production systems in the Brazilian Cerrado. Livestock Science, 242, 1–8. 10.1016/j.livsci.2020.104309 [DOI] [Google Scholar]
  88. Ralph, J. , O'Neill, R. , & Winton, J. (2015). A practical introduction to index numbers. John Wiley & Sons. [Google Scholar]
  89. Rencher, A. C. (2002). Methods of multivariate analysis (2nd ed.). Wiley. [Google Scholar]
  90. Rencher, A. C. , & Schaalje, G. B. (2008). Linear models in statistics (2nd ed.). John Wiley and Sons. [Google Scholar]
  91. Robinson, G. K. (1991). The BLUP is a good thing. Statistical Science, 6(1), 15–51. [Google Scholar]
  92. Rausand, M. , & Hϕyland, A. (2004). System reliability theory: Models, statistical methods, and applications (2nd ed.). Wiley. [Google Scholar]
  93. Searle, S. R. , Casella, G. , & McCulloch, C. E. (2006). Variance components. John Wiley and Sons. [Google Scholar]
  94. Smith, C. (1983). Effects of changes in economic weights on the efficiency of index selection. Journal of Animal Science, 56(5), 1057–1064. 10.2527/jas1983.5651057x [DOI] [Google Scholar]
  95. Smith, H. F. (1936). A discriminant function for plant selection. Annals of Eugenics, 7, 240–250. 10.1111/j.1469-1809.1936.tb02143.x [DOI] [Google Scholar]
  96. Su, G. , Christensen, O. F. , Ostersen, T. , Henryon, M. , & Lund, M. S. (2012). Estimating additive and non‐additive genetic variances and predicting genetic merits using genome‐wide dense single nucleotide polymorphism markers. PLOS ONE, 7(9), e45293. 10.1371/journal.pone.0045293 [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Tallis, G. M. (1962). A selection index for optimum genotype. Biometrics, 18, 120–122. 10.2307/2527716 [DOI] [Google Scholar]
  98. Tallis, G. M. (1985). Constrained selection. The Japanese Journals of Genetics, 60, 151–155. 10.1266/jjg.60.151 [DOI] [Google Scholar]
  99. Togashi, K. , Lin, C. Y. , & Yamazaki, T. (2011). The efficiency of genome‐wide selection for genetic improvement of net merit. Journal of Animal Science, 89, 2972–2980. 10.2527/jas.2009-2606 [DOI] [PubMed] [Google Scholar]
  100. Vandepitte, W. M. , & Hazel, L. N. (1977). The effect of errors in the economic weights on the accuracy of selection indexes. Annales de Genetique et de Selection Animale, 9, 87–103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Vanraden, P. M. (2008). Efficient methods to compute genomic predictions. Journal of Dairy Science, 91, 4414–4423. 10.3168/jds.2007-0980 [DOI] [PubMed] [Google Scholar]
  102. Wilhelm, S. , & Manjunath, B. G. (2010). tmvtnorm: A package for the truncated multivariate normal distribution. The R Journal, 2(1), 24–28. 10.32614/RJ-2010-005 [DOI] [Google Scholar]
  103. Williams, J. S. (1962). The evaluation of a selection index. Biometrics, 18, 375–393. 10.2307/2527479 [DOI] [Google Scholar]
  104. Wilms, I. , & Croux, C. (2016). Robust sparse canonical correlation analysis. BMC Systems Biology, 10, 72. 10.1186/s12918-016-0317-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Xie, C. , & Xu, S. (1997). Restricted multi‐stage selection indices. Genetics Selection Evolution, 29(2), 193–203. 10.1186/1297-9686-29-2-193 [DOI] [Google Scholar]
  106. Xie, C. , & Xu, S. (1998). Efficiency of multi‐stage marker‐assisted selection in the improvement of multiple quantitative traits. Heredity, 80, 489–498. 10.1046/j.1365-2540.1998.00308.x [DOI] [PubMed] [Google Scholar]
  107. Xu, S. (2003). Advanced statistical methods for estimating genetic variances in plants. Plant Breeding Reviews, 22, 113–163. [Google Scholar]
  108. Xu, S. , & Muir, W. M. (1992). Selection index updating. Theoretical and Applied Genetics, 83, 451–458. 10.1007/BF00226533 [DOI] [PubMed] [Google Scholar]
  109. Yamada, Y. , Yokouchi, K. , & Nishida, A. (1975). Selection index when genetic gains of individual traits are of primary concern. The Japanese Journal of Genetics, 50(1), 33–41. 10.1266/jjg.50.33 [DOI] [Google Scholar]
  110. Young, S. S. Y. (1964). Multi‐stage selection for genetic gain. Heredity, 19, 131–145. 10.1038/hdy.1964.11 [DOI] [Google Scholar]
  111. Young, S. S. Y. (1961). A further examination of the relative efficiency of three methods of selection for genetic gains under less‐restricted conditions. Genetics Research, 2, 106–121. 10.1017/S0016672300000598 [DOI] [Google Scholar]
  112. Zhang, W. , & Smith, C. (1992). Computer simulation of marker‐assisted selection utilizing linkage disequilibrium. Theoretical and Applied Genetics, 83, 813–820. 10.1007/BF00226702 [DOI] [PubMed] [Google Scholar]
  113. Zhang, W. , & Smith, C. (1993). Simulation of marker‐assisted selection utilizing linkage disequilibrium: The effects of several additional factors. Theoretical and Applied Genetics, 86, 492–496. 10.1007/BF00838565 [DOI] [PubMed] [Google Scholar]

Articles from Crop Science are provided here courtesy of Wiley

RESOURCES