Skip to main content
Springer logoLink to Springer
. 2020 Jun 20;133(9):2743–2758. doi: 10.1007/s00122-020-03629-6

Expectation and variance of the estimator of the maximized selection response of linear selection indices with normal distribution

J Jesus Cerón-Rojas 1, Jose Crossa 1,
PMCID: PMC7421161  PMID: 32561956

Abstract

Key message

The expectation and variance of the estimator of the maximized index selection response allow the breeders to construct confidence intervals and to complete the analysis of a selection process.

Abstract

The maximized selection response and the correlation of the linear selection index (LSI) with the net genetic merit are the main criterion to compare the efficiency of any LSI. The estimator of the maximized selection response is the square root of the variance of the estimated LSI values multiplied by the selection intensity. The expectation and variance of this estimator allow the breeder to construct confidence intervals and determine the appropriate sample size to complete the analysis of a selection process. Assuming that the estimated LSI values have normal distribution, we obtained those two parameters as follows. First, with the Fourier transform, we found the distribution of the variance of the estimated LSI values, which was a Gamma distribution; therefore, the expectation and variance of this distribution were the expectation and variance of the variance of the estimated LSI values. Second, with these results, we obtained the expectation and the variance of the estimator of the selection response using the Delta method. We validated the theoretical results in the phenotypic selection context using real and simulated dataset. With the simulated dataset, we compared the LSI efficiency when the genotypic covariance matrix is known versus when this matrix is estimated; the differences were not significant. We concluded that our results are valid for any LSI with normal distribution and that the method described in this work is useful for finding the expectation and variance of the estimator of any LSI response in the phenotypic or genomic selection context.

Introduction

The maximized selection response and the correlation of the linear selection index (LSI) with the net genetic merit are the main criterion to compare the efficiency of any LSI. The selection response is the expectation of the net genetic merit of the selected individuals when the mean of the original population is zero, whereas the net genetic merit is a linear combination of the true unobservable breeding values of traits weighted by their respective economic values (Smith 1936; Cochran 1951). The LSI theory is divided into two main parts: (1) the unconstrained LSI (Smith 1936) and (2) the constrained LSI (Kempthorne and Nordskog 1959; Mallard 1972). The constrained LSI imposes restrictions on the expected genetic gain (or multitrait selection response) of some traits to make some of them change their expected genetic gain values based on a predetermined level, while the rest of them remain without restrictions. This index is the most general LSI, and it includes the unconstrained LSI as a particular case.

The unconstrained and constrained LSI can be a linear combination of phenotypic values (Smith 1936; Mallard 1972), genomic estimated breeding values (GEBV) (Ceron-Rojas et al. 2015; Cerón-Rojas and Crossa 2019), or phenotypic values and GEBV (Dekkers 2007) jointly. It can also be a linear combination of phenotypic values and marker scores (Lande and Thompson 1990). Thus, there are three main kinds of LSI: phenotypic, genomic and marker. The main advantage of the LSI based on GEBV over the other indices lies in the possibility of reducing the intervals between selection cycles by more than two-thirds.

The aims of any LSI are to predict the net genetic merit values of the candidates for selection, select parents for the next generation and maximize the selection response. When the phenotypic and genotypic variance and covariance are known, the maximized selection response is optimum and the LSI is the best linear predictor of the net genetic merit; in addition, the correlation between the net genetic merit and the LSI is maximized, and the mean prediction error is minimized.

The estimator of the maximized selection response is the square root of the variance of the estimated LSI values multiplied by the selection intensity. In this case, the phenotypic and genotypic variances and covariance are estimated and the expectation and variance of the estimator of the maximized selection response are unknown. Then, methods to find the expectation and variance of the estimator of the maximized LSI selection response are of interest to the breeder because they are important to complete the analysis of a selection process and because they allow the breeder to construct confidence intervals and determine the appropriate sample size for each selection cycle in a selection program.

The unconstrained and constrained linear phenotypic selection index (LPSI and CLPSI, respectively) theory was developed under the assumptions that the genotypic values that make up the net genetic merit are composed entirely of the additive effects of genes and that the LPSI (CLPSI) and the net genetic merit have bivariate normal distribution (Smith 1936, Kempthorne and Nordkog 1959; Mallard 1972). The major advantage of these indices is that they assign higher weights to traits whose differences are genetic. Their disadvantages are that they require large amounts of information, economic weights are difficult to assign and the sampling error could be large. Ceron-Rojas et al. (2015) and Cerón-Rojas and Crossa (2019) extended the LPSI and CLPSI theory to the genomic selection context and developed an unconstrained and a constrained linear genomic selection index (LGSI and CLGSI, respectively).

In the LPSI context, Tallis (1960) derived a large sample variance of LPSI weights for individually selecting any number of traits and the estimated LPSI selection response when phenotypic and genetic parameters are estimated in a half-sib analysis; however, the expressions are complicated and do not allow identifying situations where selection indices are likely to be inefficient. Williams (1962a) obtained an exact formula for the sampling variance of the index weights but for only two traits of a specific experimental design. Harris (1964) utilized the Delta method to determine the sampling properties of the index; however, the results are confusing and the author did not present a simple and general formula to find the expectation and variance of the estimator of the LPSI selection response. Hayes and Hill (1980) proposed a transformation of the trait variables used for constructing genetic selection indices, such that the sampling properties of the LPSI weights can be easily computed using a general formula; however, the formula depends on the transformation of the trait variables, which negatively affects the estimated LPSI selection response.

Assuming that the estimated LPSI and CLPSI values have normal distribution (we corroborated the normality assumption using graphical methods and normality tests), we present a simple and general formula to find the expectation and variance of the estimator of the maximized LPSI and CLPSI selection response, which we obtained in two steps. First, we obtained the distribution of the variance of the estimated LPSI and CLPSI values using the Fourier transform (Springer 1979, Chapters 2 and 9). Their distribution was a Gamma distribution, and therefore, the expectation and variance of this distribution were the expectation and variance of the variance of the estimated LPSI and CLPSI values.

In the second step, using the results obtained in the first step, we found the expectation and the variance of the estimator of the maximized LPSI and CLPSI selection responses using the Delta method. We validated the theoretical results using real and simulated dataset. In addition, with the simulated dataset, we compared the LPSI and CLPSI parameters when the genotypic covariance matrix is known versus when this matrix is estimated by restricted maximum likelihood (REML). We did this because while the sampling properties of the estimator of the phenotypic covariance matrix are well known (Rencher and Schaalje 2008), the sampling properties of the estimator of the genotypic covariance matrix are not well known. The results indicated that the differences are not significant. We concluded that our method is useful to find the expectation and variance of the estimator of the maximized selection response for any LSI with normal distribution.

Materials and methods

The net genetic merit and the LPSI

The individual net genetic merit is

H=wg, 1

where g=[g1g2...gt] and w=[w1w2...wt] (t= number of traits) are vectors of true unobservable breeding values and known economic values, respectively. The individual linear phenotypic selection index (LPSI) is

I=by, 2

where b=[b1b2...bt] is the LPSI vector of coefficients, and y=[y1y2...yt] is the vector of the traits of interest. The variances of H and I are σH2=wCw and σI2=bPb, respectively, where C and P are t×t covariance matrices of genotypic (g) and trait phenotypic values (y), respectively.

The LPSI selection response

The LPSI selection response (R) is the expectation of H (Eq. 1) for a proportion p of individuals selected and can be written as

R=kσHρHI, 3

where k=z(u)p is the intensity of selection, z(u)=exp{-0.5u2}2π is the height of the ordinate of the normal curve and u=I-μIσI is the truncation point, whereas μI and σI=bPb are the mean and standard deviations of the variance of I (Eq. 2); σH=wCw is the standard deviation of the variance H and ρ=wCbwCwbPb is the correlation between H and the LPSI, whereas σHI=wCb is the covariance between H and I.

The genetic gain in Eq. (3) will be larger as p becomes smaller—i.e., as the selection intensity becomes more intense. Equation (3) is the same for all LSI; the only change is the type of information (phenotypic or genomic) and restrictions used when the index vector of coefficients is obtained to predict H and to maximize Eq. (3).

The maximized LPSI selection response and coefficient of correlation

The maximized LPSI selection response and the correlation of the LPSI with the net genetic merit are

Rmax=kbPb=kσI, 4a
ρmax=bPbwCw, 4b

respectively, where b=P-1Cw (Cerón-Rojas and Crossa 2018, Chapter 2). Equation (4a) predicts the mean improvement in H due to indirect selection on I and is proportional to the standard deviation of the LPSI variance (σI) and the selection intensity k. Whereas in Eq. (3) R can take any value, in Eq. (4a) Rmax gives the maximum value of Eq. (3). This is the main difference between the two equations.

The expected genetic gain per trait

The main objective of the CLPSI is to maximize Eq. (3) under some restrictions imposed on the expected genetic gain per trait (E), which can be written as

E=kCbbPb. 5

We defined all the terms in Eq. (5) earlier. The type of restriction imposed on Eq. (5) can be a null restriction (RLPSI) or a predetermined constraint (CLPSI). Thus, let d=[d1d2dr] be a vector of r constraints and assume that μq is the population mean of the qth trait (q=1,2,,r, and r is the number of constraints) before selection. The CLPSI changes μq to μq+dq, where dq is a predetermined change in μq imposed by the breeder. When d is a null vector, we have a null restricted LPSI (RLPSI), which is a particular case of the CLPSI. The restriction effects will be observed on the CLPSI expected genetic gains per trait (Eq. 5), where each restricted trait will have an expected genetic gain according to the d=[d1d2dr] values imposed by the breeder.

Equation (5) is the same for all LSI; the only change is the type of information (phenotypic or genomic) and restrictions used when the LSI vectors of coefficients are obtained to predict H and to maximize Eq. (3).

The CLPSI vector of coefficients

Let D=dr000dr000dr-d1-d2-dr-1 be a Mallard (1972) matrix (r-1)×r of predetermined proportional gains, where dq (q=1,2,r) is the qth element of vector d=[d1d2dr], and let U be a matrix of 1′s and 0′s, where 1 indicates that the traits are restricted and 0 that the traits are not restricted (Kempthorne and Nordskog 1959). To obtain the CLPSI vector of coefficients, we minimized the mean-squared difference between I and H, E[(H-I)2], with respect to b under the restriction DUCb=0, where C is the covariance matrix of genotypic values.

The CLPSI vector of coefficients is

β=Kb, 6

where K=[It-Q], Q=P-1M(MP-1M)-1M, M=DUC, It is an identity matrix of size t×t and b=P-1Cw. When d is a null vector, D=U, Q=P-1CU(UCP-1CU)-1UC, and the CLPSI is the RLPSI. When D=U and U is a null matrix, β=b. Thus, the CLPSI is the most general linear phenotypic selection index and includes the LPSI and the RLPSI as particular cases.

The maximized CLPSI selection response and coefficient of correlation

The maximized CLPSI selection response and the correlation of the LPSI with the net genetic merit are

RmaxC=kβPβ=kσIC, 7a
ρmaxC=βPβwCw, 7b

respectively, where k is the selection intensity. Under r restrictions, Eq. (7a) predicts the mean improvement in H due to indirect selection on IC=βy.

Estimators of the LPSI and CLPSI vector of coefficients

We denote the restricted maximum likelihood (REML) estimators of matrices C and P as C^ and P^, respectively (Cerón-Rojas and Crossa 2018, Chapter 2), from where the LPSI and CLPSI vectors of coefficients (b=P-1Cw and β=Kb) can be estimated, respectively, as

b^=P^-1C^wandβ^=K^b^, 8

where K^=[It-Q^], Q^=P^-1M^(M^P^-1M^)-1M^ and M^=DUC^.

Estimators of LPSI and CLPSI

By Eq. (8), the estimators of LPSI (I=by) and CLPSI (IC=βy) are

I^=b^yandI^C=β^y, 9

respectively. The I^ and I^C values (Eq. 9) are used to rank and select genotypes in the population. In this work, we assumed that the I^ and I^C values have normal distributions (Fig. 1).

Fig. 1.

Fig. 1

Histograms and quantile–quantile plots of the estimated LPSI (Fig. 1a, d, respectively) and CLPSI (Fig. 1b, c, respectively) values for a real dataset with four traits and 247 genotypes

Estimator of the LPSI and CLPSI variances

The estimator of the variance of the LPSI (σI2=bPb) is

SI2=1n-1j=1n(I^j-m^)2, 10

where m^=1nj=1nI^j is the arithmetic means of the I^ values. In a similar manner, the estimator of the variance of the CLPSI (σIC2=βPβ) is

SIC2=1n-1j=1n(I^Cj-μ^)2, 11

where μ^=1nj=1nI^Cj is the arithmetic means of the I^C values. In both equations, n is the size of the population in each selection cycle.

It is possible to estimate σI2=bPb as σ^I2=b^P^b^, and σIC2=βPβ as σ^IC2=β^P^β^; however, in this work, we found that the estimated values of σ^I2 and σ^IC2 are the same as those of Eqs. (10) and (11), respectively. We estimated σI2=bPb and σIC2=βPβ with SI2 and SIC2, respectively, because when I^ and I^C have normal distribution, it is easier to find the distribution of the SI2 and SIC2 values (Appendices AD) than the distribution of the σ^I2=b^P^b^ and σ^IC2=β^P^β^ values. The expectation and variance of SI2 and SIC2 are useful to find the expectation and variance of the estimator of the maximized selection responses of both indices.

Estimators of the maximized selection responses

By Eqs. (10) and (11), the estimators of the maximized LPSI and CLPSI selection responses are

R^max=kSI2 12

and

R^maxC=kSIC2, 13

respectively.

Testing the normality assumption to the estimated LPSI and CLPSI values

For the real dataset, we corroborated the normality assumption to the estimated LPSI and CLPSI values using graphical methods (histograms and normal quantile–quantile plots) and analytical test procedures (the Shapiro–Wilk and Kolmogorov–Smirnov normality tests), while for the simulated dataset, we used only analytical test procedures.

If the estimated LPSI and CLPSI values have normal distribution, the histograms of the values of both indices should not show a strong negative or positive skew in the LPSI and CLPSI values seen in the histogram (Fig. 1a, b). In a similar manner, if the estimated LPSI and CLPSI values are normally distributed, the LPSI and CLPSI values should form a straight line in the quantile–quantile plots (Fig. 1c, d). If there are departures from normality, the LPSI and CLPSI values should show up as various kinds of nonlinearity, e.g., S-shaped or banana-shaped in the quantile–quantile plots (Crawley 2015).

We tested the null hypothesis that the estimated LPSI and CLPSI values have normal distribution using the Shapiro–Wilk and Kolmogorov–Smirnov normality tests. The statistical value of the Shapiro–Wilk test should be close to 1.0 to accept the null hypothesis, while the statistical value of the Kolmogorov–Smirnov test should be close to 0.0 to accept the null hypothesis (Crawley 2015).

Estimator of the maximized LPSI and CLPSI selection responses using CversusC^.

Based on the Cauchy–Schwarz inequality, in Appendix A (Eqs. A1A3), we describe an upper boundary for the maximized LPSI and CLPSI selection responses. By Eq. (A2), kwCw is the maximum possible value of the maximized LPSI selection response (Rmax=kbPb=kwCP-1Cw); i.e., RmaxkwCw. In a similar manner, by Eq. (A3), kδCδ is the maximum possible value of the maximized CLPSI selection response (RmaxC=kβPβ), i.e., RmaxCkδCδ.

In the simulated datasets, the true genotypic covariance matrix C is known. Thus, in this case, it is possible to estimate the LPSI vector of coefficients as b^=P^-1C^w, where C^ is the REML of C, and as b~=P^-1Cw, where C is known. In the CLPSI context, we would have β^=K^b^ (Eq. 8) and β~=K~b~, where K~=[It-Q~], Q~=P^-1M~(M~P^-1M~)-1M~ and M~=DUC. In both cases, the only difference among the estimator of the indices vectors of coefficients is matrix C. With these results, we can compare the maximized LPSI selection response when this is estimated as R^max=kwC^P^-1C^w and as R~max=kwCP^-1Cw, where the only difference is matrices C^ and C. If C^ is a good estimate of C, we would expect that R^max and R~max be equivalent, and we would assume that C^ is a good estimator of C. The same is true for the CLPSI.

Variance and confidence interval for the LPSI and CLPSI correlation coefficients using C and C^.

In Appendix A (Eqs. A4 and A5), we describe the standard deviation of the variance of ρmax (Eq. 4b) and ρmaxC(Eq. 7b) and one form to construct an approximated 100(1 − α)% confidence interval for ρmax and ρmaxC. In the simulated dataset selection context, for the REML estimate C^, the estimated LPSI and CLPSI correlation coefficients (ρmax and ρmaxC, respectively) are r^max=b^P^b^wC^w and r^maxC=β^P^β^wC^w, respectively, whereas for matrix C, those estimates are ρ~max=b~P^b~wCw and ρ~maxC=β~P^β~wCw, respectively, where b~=P^-1Cw and β~=K~b~. The only difference of those estimates is matrices C^ and C. If C^ is a good estimate of C, we would expect that r^max and ρ~max, and r^maxC and ρ~maxC, be equivalent. In such a case, we would assume that C^ is a good estimator of C. Therefore, we compared these parameters in a similar manner as we did for the estimators of the maximized LPSI and CLPSI selection responses in the last subsection.

Real data

To validate the theoretical results of the expectation and variance of the estimator of the maximized LPSI and CLPSI selection response, we used a real maize (Zea mays L.) F2 population with 247 genotypes and four phenotypic traits: grain yield (GY, t/ha), plant height (PHT, cm), ear height (EHT, cm) and anthesis days (AD, d), where we assumed that the breeding objective was to increase GY while decreasing PHT, EHT and AD. The vector of economic weights for GY, PHT, EHT and AD was w=[5-0.3-0.3-1] for both indices. Beyene et al. (2015) described this dataset and denoted it as JMpop1 DTMA Mexico optimum environment.

We estimated P and C by REML, and we denoted such estimates as P^ and C^, i.e.,

P^=1.404.693.250.124.69130.5768.390.803.2568.3968.22-0.720.120.80-0.721.44 and C^=0.943.762.620.293.7672.2443.811.992.6143.8135.600.310.291.990.310.90. For illustration purposes only, in the CLPSI context, we restricted traits GY, PHT and EHT with vector d=[0.5-1.0-0.5] and matrices U=100001000010 and D=-0.50-0.50-0.51.0, when we made selection. For both indices, the total proportion of retained value for this dataset was p= 0.10 (k=1.755).

Simulated datasets

The datasets were simulated by Ceron-Rojas et al. (2015) with QU-GENE software (Podlich and Cooper 1998) using 2500 molecular markers and 315 quantitative trait loci (QTLs) for eight phenotypic selection cycles (C0 to C7), each with four traits (T1, T2, T3 and T4), 500 genotypes and four replicates for each genotype. The authors distributed the markers uniformly across ten chromosomes and the QTLs randomly across the ten chromosomes to simulate maize (Zea mays L.) populations. A different number of QTLs affected each of the four traits: 300, 100, 60, and 40, respectively. The common QTLs affecting the traits generated genotypic correlations of − 0.5, 0.4, 0.3, − 0.3, − 0.2, and 0.1 between T1 and T2, T1 and T3, T1 and T4, T2 and T3, T2 and T4, T3 and T4, respectively. The economic weights for T1, T2, T3 and T4 were 1, − 1, 1 and 1, respectively.

We used seven phenotypic selection cycles (C1 to C7) with p= 0.10 (k=1.755) in each cycle. We selected all four traits in each selection cycle. For illustration purposes only, in the CLPSI context, we restricted traits T1, T2 and T3 with vector d=[5-23] and matrices U=100001000010 and D=30-5032 when we made selection. We estimated P and C by REML, and we denoted such estimates as P^ and C^. In addition, we use this dataset to compare the results of the maximized LPSI and CLPSI response (and correlation with the net genetic merit), when matrix C is known and when this matrix is estimated (C^).

Real and simulated data availability

The real and simulated datasets are available in the Application of a Genomic Selection Index to Real and Simulated Data repository, at https://hdl.handle.net/11529/10199, where the folder of the real dataset is denoted as DATA_SET-3, whereas the folder of the simulated dataset is denoted as PSI_Phenotypes-05.

Results

Theoretical results

Distribution, expectation and variance of SI2 and SIC2.

In Appendix B, we gave a brief description of the Fourier transform theory (Eqs. A6 to A8) used to find the distribution of SI2 and SIC2. Based on the Springer (1979, Chapter 9) results, in Appendix C (Eqs. A9A11), we present the mathematical process used to obtain the distribution of the SI2 and SIC2 values, and we showed that the distribution of SI2 and SIC2 is a Gamma distribution (r, λ), where r=n-22 is the shape parameter and λ=n-12σ2 is the rate parameter (Stuart and Ord 1987). The distribution of SI2 and SIC2 is essentially scaled Chi-squares (r=n-22, a Chi-square with n − 2 degree of freedom and a scale of λ=n-12σ2). This is expected from their form as sums of squares of normally distributed data.

As shown in Appendix D (Eqs. A12A15), the expectation and variance of SI2 and SIC2 were the expectation of the Gamma distribution (r, λ). They are useful to obtain the expectation and variance of the estimator of the maximized LPSI (Eq. 12) and the maximized CLPSI (Eq. 13) selection responses. In r and λ, n is the size of the population in each selection cycle and σ2 is a parameter that denotes the unknown and fixed variance of I=by (σI2=bPb) or the unknown and fixed variance of IC=βy (σIC2=βPβ).

Expectation and variance of R^max=kSI2 and R^maxC=kSIC2.

In Appendix E (Eqs. A16 and A17), we give a brief description of the Delta method, which we used to determine the expectations and the variance of R^max=kSI2 and R^maxC=kSIC2. In this subsection, we present the expectations and variances only in terms of R^max; however, the results can be applied to any linear selection index with normal distribution.

Let Y=kS2=R^, where k (the selection intensity) is a fixed constant, μ=E(S2)=σI2 and Var(S2)=2(σI2)2n-1 (Appendix D, Eqs. A14 and A15, respectively). According to the Delta method, the expectation, variance and standard deviation of R^max are:

E(R^max)kσI-kσI4(n-1), 14
Var(R^max)k2σI22(n-1), 15
SD(R^max)kσI2(n-1), 16

respectively, where σI=bPb and σI2=bPb are the unknown and fixed standard deviation and variance of I=by. The results of Eqs. (14) to (16) are the same for the CLPSI, changing σI2=bPb by σIC2=βPβ. In Eq. (14), the term kσI4(n-1) is the bias of the estimator R^max and the symbol “” denotes an approximation. Equation (14) indicates that in the asymptotic context, R^max is an unbiased estimator of Rmax=kbPb, whereas Eq. (15) indicates that the variance of R^max tends to zero when n increases. That is, when the number of genotypes (n) increases in the training population, the particular realizations of R^max will be concentrated around the Rmax value. The same is true for the R^maxC values of the CLPSI and RmaxC=kβPβ.

We can estimate Eqs. (14), (15) and (16) as

E^(R^max)=kSI-kSI4(n-1), 17
V^ar(R^max)=k2SI22(n-1), 18
SD^(R^max)=kSI2(n-1), 19

respectively, where SI and SI2 are the standard deviation and variance of the I^=b^y values in each selection cycle. The same is true for SIC2 associated with the estimator of the maximized CLPSI selection response R^maxC.

Desirable properties of the estimator of the maximized selection responses

An estimator should be unbiased, i.e., the expectation of the estimator should be equal to the parameter [E(R^max)=Rmax], and the variance of the error of estimation [Var(Rmax-R^max)] and the mean-squared error (MSE, i.e. Var(R^max)+[biasR^max]2) should be minimum (Montgomery and Ruger 2003, Chapter 7). According to Eq. (14), E(R^max)=Rmax in the asymptotic context, and by Eq. (15), Var(Rmax-R^max) = Var(R^max)k2σI22(n-1). In addition, because kσI4(n-1) is the bias of R^max, MSE=Var(R^max)+[biasR^max]2k2σI22(n-1)+k2σI216(n-1)2. We would expect that when the population size (n) is large, Var(R^max) and MSE will be minimal. Eqs. (17) to (19) are useful to estimate Var(R^max), kσI4(n-1), and MSE.

A large-sample confidence interval for E(R^max).

By the central limit theorem (Rencher 2002, Chapter 4), when the sample size n is large (e.g., n>40), the estimated expectation E^(R^max) and the estimated standard deviation SD^(R^max) allow constructing confidence intervals for E(R^max). A confidence interval (CI) shows the likely range in which the E(R^max) value would fall if the sampling exercise were to be repeated (Crawley 2015, Chapter 4). A large-sample confidence interval for E(R^max) is

E^(R^max)±Zα/2SD^(R^max), 20

where E^(R^max) and SD^(R^max) were defined earlier, Zα/2 is the upper 100 α/2 percentage point of the standard normal distribution, and 0α1 is the level of confidence. Thus, if for E(R^max) we want to establish a 100(1-α)%= 95% CI, in addition to SD^(R^max), we need to obtain (from the standard normal distribution) the value of Zα/2 associated with α2=0.052=0.025, i.e., Zα/2=1.96. Equation (20) holds, regardless of the shape of the population distribution (Montgomery and Ruger 2003, Chapter 8).

Choice of sample size

By Eq. (20), the length or precision of the 100(1-α)% CI for E(R^max) is 2Zα/2SD^(R^max), whereas the error is ε=E^(R^max)-E(R^max), where denotes the absolute value of the difference E^(R^max)-E(R^max). In using E^(R^max) to estimate E(R^max), the error ε is less than or equal to SD^(R^max) with confidence 100(1-α)%. We can choose n so that we are 100(1-α)% confident that the error in estimating E(R^max) is less than a specified bound on the error ε as follows

n=Zα/2SD^(R^max)ε2. 21

If the right-hand side of Eq. (21) is not an integer, it must be rounded off. This will ensure that the level of confidence does not fall below 100(1-α)% (Montgomery and Ruger 2003, Chapter 8). Equation (21) indicates that the lower the ε value, the higher the n size.

Real data numerical results

Normality test for the estimated LPSI and CLPSI values

For the estimated LPSI values, the Shapiro–Wilk and Kolmogorov–Smirnov test values were 0.985 and 0.075, respectively, while for the estimated CLPSI values, those test values were 0.989 and 0.080, respectively. Thus, we assumed that the estimated indices values approach the normal distribution.

Histograms and quantile–quantile plots for the estimated LPSI and CLPSI values

With the estimated LPSI and CLPSI values, we constructed histograms (Fig. 1a, b) and quantile–quantile plots (Fig. 1c, d). The histograms of Fig. 1a, b of both indices do not show a strong negative or positive skew, while in Fig. 1c, d, the estimated LPSI and CLPSI values form a straight line in the quantile–quantile plots. Thus, the estimated LPSI and CLPS values approach the normal distribution.

Estimate of the maximized LPSI and CLPSI selection responses

For a selection intensity of 10% (k = 1.755), the estimate of the maximized LPSI response was 5.87, whereas the estimate of the maximized CLPSI selection response was 5.74. That is, the estimated selection responses of both indices were very similar. This means that the CLPSI constraint mainly affected the CLPSI expected genetic gains per trait.

Estimated bias, standard deviation and expectation of the estimator of the maximized LPSI and CLPSI selection responses

The bias of the estimator of the maximized LPSI and CLPSI selection responses was equal to 0.006. That is, the estimated bias was the same for both indices. In a similar manner, the standard deviation of the estimator of the maximized LPSI and CLPSI selection responses was 0.26, whereas the expectations of the estimator of the maximized LPSI and CLPSI selection responses were 5.86 and 5.73. These last two values were very similar to the estimated values of the maximized LPSI and CLPSI responses (5.87 and 5.74, respectively). The 95% confidence intervals for the E(R^max) of the estimated LPSI and CLPSI selection responses were, respectively, (5.35, 6.37) and (5.22, 6.24).

Numerical results of the simulated data

For seven simulated selection cycles, in Table 1, we present the Shapiro–Wilk and Kolmogorov–Smirnov statistical test values, the estimated standard deviation, bias, the estimated mean-squared error (MSE^), the estimated maximized selection response (R^max), its estimated expectation [E(R^max)], and 95% confidence interval for the E(R^max) of the LPSI and CLPSI, respectively.

Table 1.

Shapiro–Wilk and Kolmogorov–Smirnov (SW and KS, respectively) statistical test values; estimated unconstrained and constrained linear phenotypic selection indices (LPSI and CLPSI, respectively) standard deviation (SD), bias, mean-squared error (MSE), maximized selection response (R^max and R^maxC), expectation [E^(R^max) and E^(R^maxC)], and 95% confidence interval (CI, LCL lower confidence limit, UCL upper confidence limit) for seven simulated selection cycles when the genotypic covariance matrix was estimated

Statistical test Estimated LPSI parameters 95% CI
Cycle SW KS SD Bias MSE R^max E^(R^max) LCL UCL
1 0.996 0.035 0.57 0.009 0.32 17.81 17.80 16.68 18.92
2 0.995 0.042 0.50 0.008 0.25 15.69 15.68 14.70 16.66
3 0.997 0.024 0.45 0.007 0.20 14.21 14.21 13.33 15.09
4 0.998 0.037 0.46 0.007 0.21 14.34 14.34 13.44 15.24
5 0.997 0.024 0.44 0.007 0.19 13.64 13.63 12.77 14.49
6 0.996 0.027 0.39 0.006 0.15 12.04 12.03 11.27 12.79
7 0.996 0.035 0.36 0.006 0.13 11.61 11.60 10.89 12.31
Average 0.997 0.032 0.46 0.007 0.21 14.19 14.18 13.30 15.07
Statistical test Estimated CLPSI parameters 95% CI
Cycle SW KS SD Bias MSE R^maxC E^(R^maxC) LCL UCL
1 0.998 0.024 0.50 0.008 0.25 15.79 15.78 14.80 16.76
2 0.996 0.032 0.47 0.008 0.22 14.98 14.97 14.05 15.89
3 0.998 0.024 0.42 0.007 0.18 13.58 13.57 12.75 14.39
4 0.998 0.038 0.39 0.006 0.15 12.36 12.36 11.60 13.12
5 0.996 0.025 0.40 0.006 0.16 12.80 12.79 12.01 13.57
6 0.995 0.025 0.36 0.006 0.13 11.23 11.23 10.52 11.94
7 0.995 0.031 0.36 0.006 0.13 11.23 11.23 10.52 11.94
Average 0.997 0.028 0.42 0.007 0.17 13.14 13.13 12.32 13.94

Normality test for the estimated LPSI and CLPSI values

The averages of the Shapiro–Wilk and Kolmogorov–Smirnov normality test values for the seven simulated selection cycles associated with the estimated LPSI values were 0.997 and 0.032, respectively, whereas those values associated with the estimated CLPSI values were 0.997 and 0.028 (Table 1), respectively; thus, we assumed that the estimated values of both indices approach the normal distribution.

Estimated standard deviation, bias and MSE of the estimator of the maximized LPSI and CLPSI selection responses

The averages of the estimated standard deviation of the estimator of the maximized LPSI and CLPSI selection responses were 0.46 and 0.42, respectively, whereas the average of the estimated bias for both indices was equal to 0.007. In addition, the averages of the estimated MSE of the estimator of the maximized LPSI and CLPSI selection responses were 0.21 and 0.17, respectively (Table 1). This means that the estimators of the maximized LPSI and CLPSI selection responses were good.

Estimates of the maximized LPSI and CLPSI selection responses, expectation and confidence intervals

For a selection intensity of 10% (k = 1.755), the averages of the estimates of the maximized LPSI and CLPSI selection response values were 14.19 and 13.14, respectively (Table 1). Thus, since the estimated responses of both indices were very similar, the CLPSI constraint mainly affected the CLPSI expected genetic gains per trait.

The averages of the estimated values of the expectations of the estimator of the maximized LPSI and CLPSI selection responses were 14.18 and 13.13. These last two values were very similar to the estimated values of the maximized LPSI and CLPSI responses (14.19 and 13.14, respectively). In addition, the averages of the estimated values of the 95% confidence intervals for the expectations of the estimator of the maximized LPSI and CLPSI selection responses were (13.30, 15.07) and (12.32, 13.94).

Estimator of the maximized LPSI and CLPSI selection responses using C

For seven simulated selection cycles, in Table 2, we present the estimated LPSI and CLPSI standard deviation, bias, mean-squared error, maximized selection response, expectation, 95% confidence interval for E(R^max) and E(R^maxC) and response upper bound when the genotypic covariance matrix C is known. When we compared those parameters with those obtained with C^ (Table 1), we can see that the results were basically the same. That is, the estimated LPSI and CLPSI parameters were very similar when we used C^ and C. This means that the REML estimate C^ is a good estimator of C, at least for this simulated dataset. Finally, note that the average values of the upper boundary for R (kwCw) and RC (kδCδ) presented in Table 2 were higher than estimated maximized LPSI and CLPSI selection responses for C^ and C, as we would expect.

Table 2.

Estimates of the unconstrained and constrained linear phenotypic selection indices (LPSI and CLPSI, respectively) standard deviation (SD), bias, mean-squared error (MSE), maximized selection response (R~max and R^maxC), expectation [E~(R~max) and E~(R~maxC)], 95% confidence interval (CI, LCL lower confidence limit, UCL upper confidence limit) for E(R~max) and response upper bound (Rmax and RmaxC), for seven simulated selection cycles when the genotypic covariance matrix is known

Estimated LPSI parameters when the genotypic covariance matrix is known Upper bound
Cycle SD bias MSE R~max E~(R~max) LCL UCL Rmax
1 0.556 0.009 0.309 17.559 17.550 16.469 18.648 19.63
2 0.480 0.008 0.231 15.179 15.172 14.238 16.121 17.56
3 0.451 0.007 0.204 14.261 14.254 13.376 15.146 16.49
4 0.437 0.007 0.191 13.797 13.790 12.941 14.653 16.32
5 0.435 0.007 0.189 13.742 13.735 12.889 14.594 15.99
6 0.392 0.006 0.154 12.387 12.381 11.619 13.156 14.69
7 0.409 0.006 0.168 12.935 12.928 12.132 13.737 14.90
Average 0.452 0.007 0.206 14.266 14.259 13.381 15.151 16.511
Estimated CLPSI parameters when the genotypic covariance matrix is known Upper bound
Cycle SD bias MSE R~maxC E~(R~maxC) LCL UCL RmaxC
1 0.497 0.008 0.247 15.700 15.692 14.726 16.674 17.47
2 0.456 0.007 0.208 14.391 14.384 13.499 15.284 16.24
3 0.420 0.007 0.176 13.266 13.259 12.443 14.089 15.15
4 0.387 0.006 0.150 12.215 12.209 11.457 12.973 13.95
5 0.395 0.006 0.156 12.466 12.460 11.692 13.239 14.28
6 0.362 0.006 0.131 11.443 11.437 10.733 12.153 13.11
7 0.361 0.006 0.130 11.404 11.399 10.697 12.112 13.14
Average 0.411 0.007 0.171 12.984 12.977 12.178 13.789 14.763

Variance and confidence interval for the LPSI and CLPSI correlations using C and C^

Using the known (C) and estimated (C^) genotypic covariance matrix, in Table 3, we present the estimated LPSI and CLPSI correlation coefficients when the genotypic covariance matrix is known (ρ~max and ρ~maxC) and estimated (r^max and r^maxC), standard deviation (SDρ~max, SDρ~maxC, SDr^max and SDr^maxC), and 95% confidence intervals for the true unknown correlation (ρmax and ρmaxC) for seven simulated selection cycles. For both indices, the estimated parameters were very similar when we used C^ and C. This means that the REML estimate C^ was a good estimator of C, at least for this simulated dataset.

Table 3.

Estimated unconstrained and constrained linear phenotypic selection indices (LPSI and CLPSI, respectively) correlation coefficients when the genotypic covariance matrix is known (ρ~max and ρ~maxC) and estimated (r^max and r^maxC); standard deviation (SDρ~max, SDρ~maxC, SDr^max and SDr^maxC) and 95% confidence interval (CI, LCL lower confidence limit, UCL upper confidence limit) for the true unknown correlation (ρmax and ρmaxC) for seven simulated selection cycles

LPSI correlation coefficient
Genotypic covariance matrix known Estimated Genotypic covariance matrix
Cycle ρ~max SDρ~max LCL UCL r^max SDr^max LCL UCL
1 0.894 0.009 0.875 0.911 0.906 0.008 0.875 0.911
2 0.864 0.011 0.840 0.885 0.883 0.010 0.840 0.885
3 0.865 0.011 0.841 0.885 0.866 0.011 0.841 0.885
4 0.845 0.013 0.818 0.869 0.863 0.011 0.818 0.869
5 0.859 0.012 0.834 0.881 0.855 0.012 0.834 0.881
6 0.843 0.013 0.816 0.867 0.830 0.014 0.816 0.867
7 0.868 0.011 0.845 0.888 0.832 0.014 0.845 0.888
Average 0.863 0.011 0.839 0.884 0.862 0.011 0.839 0.884
CLPSI correlation coefficient
Genotypic covariance matrix known Estimated genotypic covariance matrix
Cycle ρ~maxC SDρ~maxC LCL UCL r^maxC SDr^maxC LCL UCL
1 0.800 0.016 0.766 0.829 0.803 0.016 0.769 0.832
2 0.819 0.015 0.788 0.846 0.842 0.013 0.815 0.866
3 0.804 0.016 0.771 0.833 0.827 0.014 0.797 0.853
4 0.748 0.020 0.707 0.785 0.744 0.020 0.702 0.781
5 0.779 0.018 0.742 0.812 0.803 0.016 0.769 0.832
6 0.779 0.018 0.742 0.811 0.775 0.018 0.738 0.808
7 0.765 0.019 0.727 0.800 0.805 0.016 0.772 0.834
Average 0.785 0.017 0.749 0.817 0.800 0.016 0.766 0.829

Discussion

The multivariate normality assumption

The study of quantitative traits (QTs) in plants and animals is based on the mean and variance of QT phenotypic values. Quantitative traits are phenotypic expressions of plant and animal characteristics that show continuous variability and are the result of many gene effects interacting among them and with the environment (Cerón-Rojas and Crossa 2018, Chapter 2). That is, QTs are the result of unobservable gene effects distributed across plant or animal genomes, which interact among themselves and with the environment to produce the observable characteristic plant and animal phenotypes. The traits that concern plant and animal breeders the most are QTs. They are particularly difficult to analyze because heritable variations of QTs are masked by larger nonheritable variations that make it difficult to determine the genotypic values of individual plants or animals (Smith 1936). However, since QTs usually have normal distribution, it is possible to apply normal distribution theory when analyzing this type of data.

In the context of plant and animal breeding, the most important distribution theory associated with the QTs is the multivariate normality distribution, which had been the basis for developing the LSI theory. Under the multivariate normal distribution assumption, means, variances and covariances completely describe the index and trait values. In addition, if the trait values are not correlated, they are independent; linear combinations of traits are normal; and even when the trait phenotypic values do not have normal distribution, this distribution serves as a useful approximation, especially in inferences involving sample mean vectors, which, by the central limit theorem, have multivariate normal distribution (Rencher 2002, Chapter 4). By this reasoning, a fundamental assumption in this work was that the trait values have multivariate normal distribution and that the net genetic merit and the index values have bivariate normal distribution. Under the latter assumption, the regression of the net genetic merit on any linear function of the phenotypic values is linear (Kempthorne and Nordskog 1959).

Based on the normality assumption of the estimated LPSI and CLPSI values, we obtained the expectation and variance of the estimator of the maximized LPSI and CLPSI selection responses. The histograms, quantile–quantile plots and the Shapiro–Wilk and Kolmogorov–Smirnov normality tests of the estimated LPSI and CLPSI values indicated that these values approached the normal distribution. Thus, our results were valid under the normality assumption of the estimated LPSI and CLPSI values.

The expectation and variance of SI2 and SIC2

The expectation and variance of SI2 and SIC2 were the basis for obtaining the expectation and variance of the estimator of the maximized LPSI and CLPSI selection responses. According to Montgomery and Ruger (2003, Chapter 7), the expectations of SI2 and SIC2 are unbiased. In addition, using the maximum likelihood estimator of the variance of the estimated LPSI and CLPSI values (SI2=n-1j=1n(I^j-m^)2 and SIC2=n-1j=1n(I^Cj-μ^)2, respectively), it can be shown that Eq. (A15) (Appendix D) can be written as 2(σI2)2n (Stuart and Ord 1987, Chapter 10). These results were similar to our result and did not affect the expectation and variance of estimated maximized LPSI and CLPSI selection responses because, to obtain those expectation and variance, we assumed that E(SI2)=σI2.

Using the Delta method, Lynch and Walsh (1998, Appendix 1) showed that 2(SI2)2n+2 is an unbiased estimator the variance of SI2 (Eq. A15, Appendix D) when this is obtained as 2(σI2)2n. By the Lynch and Walsh (1998, Appendix 1) results, the bias of the expectation of the estimator of the maximized selection response can be written as kσI4(n+2) and its estimates as kS4(n+2). In a similar manner, the variance of the estimator of the maximized selection response can be written as k2σI22(n+2) and its estimates as k2S22(n+2). We would expect that the difference between the results we obtained with our equations and those that are possible to obtain with the Lynch and Walsh (1998, Appendix 1) results would be minimal.

Let MSE1 be the mean-squared error of the estimator of the variance of the selection response when we use Eq. (A15, Appendix D), and let MSE2 be the mean-squared error of the estimator of the variance of the selection response when we use 2(SI2)2n+2 to estimate Var(R^) (Eq. 15). Montgomery and Ruger (2003, Chapter 7) have indicated that a good criterion for comparing the relative efficiency of two different estimators is the ratio MSE1MSE2. In the present case, this ratio is equal to MSE1MSE2=(n+2)2[8(n-1)+1](n-1)2[8(n+2)+1], which is independent of SI2, and when n is large, it is close to 1.0, as we would expect. Thus, we would expect that both approaches would be similar.

The standard deviation of SI2 and SIC2.

Due to Jensen’s inequality, E(SI)=E[(SI2)1/2]<[E(SI2)]1/2=σI (Patel and Read 1996, Chapter 5). This means that the standard deviation of the variance of the estimated values of the LPSI and CLPSI (SI and SIC, respectively) subestimates σI=bPb and σIC=βPβ.

An unbiased estimator of σI (σIC) is SI/c(n)[i.e., E(SI)=c(n)σI], where c(n)=2n-1Γ(n/2)Γn-121-14n-732n2-19128n3 is a factor of correction (Johnson et al. 1994, Chapter 13; Montgomery and Ruger 2003, Chapter 7). However, when we used c(n) to correct SI (data no presented), we did not find that c(n) affects the expectation and variance of the estimated selection response. Johnson et al. (1994, Chapter 13) found that, in practice, c(n) only affects SI when n10. Thus, when n=247 (real data) or n=500 (simulated data), the results shall not be affected by c(n).

Note that c(n)σI is the expectation of a Nakagami-m distribution (Ramos et al. 2015). Patel and Read (1996, Chapter 5) indicated that such result is valid only when E(SI) is obtained with respect to the origin of the distribution of SI, but when this expectation is obtained with respect to the average value of SI, there is no concise expression for E(SI). These authors presented equations for the expectation and variance of SI that are very similar to those presented in Eqs. (14) and (15) of this work. That is, the Patel and Read (1996) results were in agreement with our results.

The constrained LPSI (CLPSI)

The CLPSI solved the LPSI equations subject to the restriction that the covariance between the CLPSI and some linear combinations of the genotypes involved be equal to a vector of predetermined proportional gains (or constraints) imposed by the breeder. These constraints are similar to the null restriction imposed by the restricted LPSI (RLPSI), which imposes restrictions equal to zero on the expected genetic advances of some traits, while the expected genetic advances of other traits increased (or decreased) without imposing any restrictions. The RLPSI solves the usual LPSI equations subject to the restriction that the covariance between the LPSI and some linear functions of the genotypes involved be equal to zero, thus preventing selection on the index from causing any genetic change in the expected genetic advance of the restricted traits (Cunningham et al. 1970). Although both constraints are similar, their effects on the maximized selection response and expected genetic gain per trait, and coefficient of correlation, are different.

The RLPSI uses a projector matrix to project the LPSI vector of coefficients into a space smaller than the original space of the LPSI vector of coefficients. The reduction of the space into which the RLPSI matrix projects the LPSI vector of coefficients is equal to the number of zeros that appears in the expected genetic gain per trait, and the selection response and correlation coefficient decrease as the number of restrictions increases (Cerón-Rojas and Crossa, 2018, Chapter 3). Nevertheless, the CLPSI constraints affect only the expected genetic gain pert trait, not the maximized CLPSI selection response (Cerón-Rojas and Crossa 2019). In addition, the maximized CLPSI correlation coefficient is only affected when the number of constraints is equal to or higher than three, but even in this last case, such affectation could be not significant, as we saw in this work. Thus, the CLPSI is a good predictor of the net genetic merit and breeder could use it with confidence.

The estimated LPSI and CLPSI parameters when the genotypic covariance matrix is known and estimated

While the sampling properties of the estimator of the phenotypic covariance matrix are well known (Rencher and Schaalje 2008), the sampling properties of the estimator of the genotypic covariance matrix are not well known. By this reason, in this work, we estimated and compared the LPSI and CLPSI parameters when the genotypic covariance matrix is known and estimated. The results indicated that the differences were not significant; thus, when the phenotypic and genotypic covariance matrices are estimated by REML, breeder could use LPSI and CLPSI with confidence.

Other LSIs associated with the LPSI and CLPSI

The LPSI and the CLPSI are optimal LSIs when the phenotypic (P) and the genotypic (C) covariance matrices are known. In practice, however, it is necessary to estimate such matrices. When the estimator of the phenotypic covariance matrix (P^) is not positive definite (all eigenvalues positive) or the estimator of the genotypic covariance matrices (C^) is not positive semidefinite (no negative eigenvalues), the estimator of the LPSI and CLPSI vector of coefficients could be biased when the sample size is low. For this reason, Williams (1962b) proposed using the base linear phenotypic selection index (IB=wy) which could be a better predictor of H=wg than the estimated LPSI I^=b^y if indeed the vector of economic values w is known. If vector w values is known, then IB has certain advantages because of its simplicity and its freedom from parameter estimation errors. Williams (1962b) pointed out that the IB is superior to I^ unless a large amount of data is available for estimating P and C; however, the availability of accurate and fast algorithms for estimating P and C by REML, such as those implemented in RIndSel (Cerón-Rojas and Crossa 2018, Chapter 11), makes I^ a good option to make selection. RIndSel (R software to analyze Selection Indices) is a graphical unit interface that uses selection index theory to select individual candidates as parents for the next selection cycle in the phenotypic and genomic selection context.

There are some problems associated with IB. For example, what is its selection response when no data are available for estimating P and C? IB is a better selection index than the LPSI only if the correlation between IB and the net genetic merit is higher than that between the LPSI and the net genetic merit (Hazel 1943). But if estimations of P and C are not available, how can we obtain the correlation between the base index and the net genetic merit? Williams (1962a) pointed out that the correlation between IB and H can be written as ρB=wCwwPw and indicated that the ratio ρB/ρ (ρ is the correlation between the LPSI and H; see Eqs. 3 and 4b) can be used to compare LPSI efficiency vs. IB efficiency; however, in the latter case, we at least need to know the estimates of P and C, i.e., P^ and C^. For this reason, we think that breeders should use the LPSI when the population size is sufficiently large.

An index similar to the CLPSI described in this work is the desired gains linear phenotypic selection index (Pesek and Baker 1969). The most important aspect of this last index is that it does not require economic weights. The main problem of this index is that it does not maximize the correlation between I and H (ρ) nor the selection response because the covariance between I and H (Cov(H,I)=wCb) is not defined, given that wCb requires the economic weight vector w and that index does not use economic weights (Itoh and Yamada 1986, 1988). Another problem with this index is that it is not associated with H; then, it is not a predictor of H and the ρ and the selection response could not be maximum. For this reason, we think that breeders should use the CLPSI described in this work when making selection.

Conclusions

We described a method to obtain the expectation and variance of the estimator of the maximized selection response for unconstrained and constrained linear phenotypic selection indices. The estimator of the maximized selection response was the square root of the variance of the estimated LSI values multiplied by the selection intensity. The expectation and variance allow the breeder to construct confidence intervals and determine the appropriate sample size to complete the analysis of a selection process. We validated the theoretical results in the phenotypic selection context using real and simulated datasets. We concluded that our results are valid for any LSI with normal distribution and that the method described in this work is useful for finding the expectation and variance of the estimator of any LSI response in the phenotypic or genomic selection context.

Acknowledgements

We thank all scientists, field workers and laboratory assistants from National Programs and CIMMYT who collected the real data used in this study. We acknowledge the financial support provided by the Foundation for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) in Norway through NFR grant 267806. We are also thankful for the financial support provided by CIMMYT CRP (maize and wheat), the Bill & Melinda Gates Foundation, as well the USAID projects (Cornell University and Kansas State University).

Appendix A

Upper boundary for the maximized LPSI and CLPSI selection response

By the Cauchy–Schwarz inequality (Sorensen and Gianola 2002, Chapter 2), the relationship among the variance of H=wg and I=by (wCw and bPb, respectively), and the covariance (wCb), is (wCb)2(wCw)(bPb). But because the LPSI vector of coefficients is b=P-1Cw, that relationship can be written as

wCP-1CwwCw. A1

The maximized LPSI selection response is Rmax=kbPb=kwCP-1Cw; thus, the upper boundary for Rmax is kwCw, i.e.,

Rmax=kwCP-1CwkwCw. A2

Equation (A2) indicates that if P=C, then b=w and Rmax=kwCw. This is the maximum possible value of Rmax. Cerón-Rojas and Crossa (2019) showed that the upper boundary for the maximized CLPSI selection response is

kδCδ, A3

where δ=KTw, KT=[It-QT] and QT=UD(DUCUD)-1DUC.

Variance and confidence intervals for ρmax

As H=wg and I=by have bivariate normal distribution, the standard deviation of the variance of ρmax is

(1-ρmax2)n, A4

while an approximated 100(1 − α)% confidence interval for ρmax is

tanhv-Zα/2n-3ρmaxtanhv+Zα/2n-3, A5

where tanh() is the hyperbolic tangent function and v=tanh-1(r^max) its inverse, whereas r^max is an estimate of ρmax, Zα/2 is the upper 100 α/2 percentage point of the standard normal distribution, and 0α1 is the level of confidence (Rencher and Schaalje 2008, Chapter 10). Results of Eqs. (A4) and (A5) are also valid for the CLPSI.

Appendix B

Fourier transform (Ft[fX(x)]) of fX(x)

The basis for analyzing distributions of sums of continuous random variables that take on both positive and negative values is the Fourier transform, which allows deriving the probability density function of their sums. In this appendix, we give a brief review of the Fourier transform theory.

Let fX(x), -<x<, a single-valued real function such that the integral

-fX(x)eitxdx A6

converges for some real value of t, where i=-1 and denote the absolute value; then, fX(x) is said to be Fourier transformable, and

Ft[fX(x)]=-eitxfX(x)dx A7

is the Fourier transform of fX(x) (Springer 1979, Chapter 2). Equation (A7) is also called the characteristic function of the random variable X and can be denoted as ϕX(t)=E(eitX). This is the expectation of a complex function, and since eitx=costX+isintX=1, Equation (A7) always exists. Furthermore, when t=0, ϕX(0)=1 and ϕX(t)1 (Soong 2004, Chapter 4).

For Ft[fX(x)], there is a corresponding inverse transform, which can be written as

fX(x)=12π-e-itxFt[fX(x)]dt. A8

Equation (A8) shows that knowledge of the Fourier transform, or characteristic function (Eq. A7) specifies the distribution of X. Furthermore, fX(x) is uniquely determined from Eq. (A8); that is, no two distinct density functions can have the same characteristic function (Springer 1979, Chapter 2; Soong 2004, Chapter 4).

Appendix C

Distribution of SI2 and SIC2

In this appendix, under the assumption that the estimated LPSI and CLPSI values are normally distributed, we used the Fourier transform to obtain the distribution of SI2 and SIC2 (Eqs. 10 and 11, respectively).

Suppose that I^1,I^2 …, I^n is a random sample of size n of estimated index values (LPSI or CLPSI) and that j=1nI^j=0. Let S2=1n-1j=1nI^j2 be an estimator of the variance (σ2) of the index (in the LPSI context σ2=bPb, whereas in the CLPSI context, σ2=βPβ). That is, we are assuming that in each selection cycle, the estimated index values are a random sample of the distribution of all possible estimated index values.

To simplify notation, let I^1=X1, I^2=X2 …, I^n=Xn and suppose that we obtain the sample n of estimated index values from the normal distribution fX(x)=1σ2πexp-x22σ2, -<x<. Let N=n-1 (n= number of index values in each selection cycle) and U=i=1nXi2=NS2, subject to j=1nXi=0, where S2=1n-1j=1nXj2 is an estimator of the variance (σ2) of the LSI. By Eq. (A7), the Fourier transform of gU(u) is

Ft[gU(u)]=-1σ2πexp-x22σ2+itx2dxN-1=[1-2it]-N-12 A9

which is the characteristic function of a Chi-square distribution with N-1 degrees of freedom (or n-2 degrees of freedom because N=n-1). In addition, by Eq. (A8)

gU(u)=12π-e-itu(1-2it)(N-1)/2dt=u(N-3)/2e-(u/2σ2)(2σ2)(N-1)/2Γ[(N-1)/2] A10

is the inverse transform of the Fourier transform of Eq. (A9) (Springer 1979, Chapter 9).

Let hS2(s2) be the density function of S2. It follows from Eq. (A10) and the relationships U=i=1nXi2=NS2 and du=NdS2 (where du and dS2 are differentials) that

hS2(s2)=N2σ2(N-1)/2(s2)(N-3)/2e-(Ns2/2σ2)Γ[(N-1)/2] A11

is the distribution function of S2 (Springer 1979, Chapter 9), where for r=N-12,Γ(r)=0e-zzr-1dz is the Gamma function (Stuart and Ord 1987, Chapter 5). Let V=S2, λ=N2σ2 and r=N-12; then, Eq. (A11) can be written as

hV(v)=λrvr-1e-λvΓ(r), A12

which is a Gamma distribution (r, λ), where for 0<v<, r>0 is the shape parameter, λ is the rate parameter and Γ(r)=0e-zzr-1dz is defined earlier.

Appendix D

The expectation and variance of SI2 and SIC2

The characteristic function of Eq. (A12) is ϕ(t)=1-itλ-r, and the expectation and variance of V=S2 are

rλandrλ2, A13

respectively (Stuart and Ord 1987, Chapter 5).

In the LPSI context, let σ2=bPb be the unknown variance of the LPSI; then, by Eq. (A13), the expectation and variance of S2 are

E(S2)=rλ=n-2n-1σ2σ2 A14

and

Var(S2)=rλ2=2(n-2)(n-1)2(σ2)22(σ2)2n-1 A15

, respectively. In Eqs. (A14) and (A15), the symbol “” indicates an approximation. Equation (A14) indicates that S2 is an asymptotic unbiased estimator of σ2=bPb, whereas Eq. (A15) indicates that Var(S2) tends to zero when n increases. Equations (A14) and (A15) are valid for CLPSI.

Appendix E

The Delta method

We determined the expectation and the variance of the estimator of the maximized LPSI and CLPS the selection responses using the Delta method (Lynch and Walsh 1998, Appendix 1; Sorensen and Gianola 2002, Chapter 2; Cerón-Rojas and Sahagún-Castellanos 2007, Appendix B). To find the expectation and variance of the estimator of the of the maximized LPSI and CLPSI selection response, we need to expand the function Y=f(X) as a Taylor series around the expectation of the estimator of the maximized LPSI and CLPS selection response and then find the expectation and variance of the expansion of Y=f(X). The first and second derivatives of the function are sufficient to obtain results that are very close to the expected results.

Suppose that X is a random variable with mean μ(E(X)=μ) and that Y=f(X) is a function of X; then, approximations of the expectation and variance of Y are obtained as

E(Y)f(μ)+12d2dX2f(X)X=μVar(X) A16

and

Var(Y)ddXf(X)X=μ2Var(X), A17

respectively, where ddXf(X)X=μ and 12d2dX2f(X)X=μ are the first and second derivatives of f(X) with respect to X evaluated at μ, and Var(X) is the variance of X.

Author contributions statement

JCR developed the conceptual framework and wrote the first version. JC revised the theoretical developments of the original version and contributed to writing and editing the manuscript.

Funding

No specific funding used.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical standards

The authors complied with all required ethical standards.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

J. Jesus Cerón-Rojas, Email: jesusceronrojas@live.com.mx.

Jose Crossa, Email: j.crossa@cgiar.org.

References

  1. Beyene Y, Semagn K, Mugo S, Tarekegne A, Babu R, et al. Genetic gains in grain yield through genomic selection in eight bi-parental maize populations under drought stress. Crop Sci. 2015;55:154–163. doi: 10.2135/cropsci2014.07.0460. [DOI] [Google Scholar]
  2. Cerón-Rojas JJ, Sahagún-Castellanos J. Estimating QTL biometrics parameters in F2 populations: a new approach. Agrociencia. 2007;41:57–63. [Google Scholar]
  3. Ceron-Rojas JJ, Crossa J, Arief VN, Basford K, Rutkoski J, Jarquín D, Alvarado G, Beyene Y, Semagn K, DeLacy I. A genomic selection index applied to simulated and real data. Genes Genomes Genetics. 2015;5:2155–2164. doi: 10.1534/g3.115.019869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Cerón-Rojas JJ, Crossa J (2018) Linear Selection Indices in Modern Plant Breeding. Springer, Cham, the Netherlands. 10.1007/978-3-319-91223-3. https://link.springer.com/book/10.1007/978-3-319-91223-3
  5. Cerón-Rojas JJ, Crossa J. Efficiency of a constrained linear genomic selection index to predict the net genetic merit in plants. Genes/Genomes/Genetics. 2019;9:3981–3994. doi: 10.1534/g3.119.400677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cochran WG (1951) Improvement by means of selection. In: Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, pp 449–470. https://projecteuclid.org/euclid.bsmsp/1200500247.
  7. Crawley MJ. Statistics: An introduction using R. 2. United Kingdom: John Wiley & Sons Ltd; 2015. [Google Scholar]
  8. Cunningham EP, Moen RA, Gjedrem T. Restriction of selection indexes. Biometrics. 1970;26(1):67–74. doi: 10.2307/2529045. [DOI] [PubMed] [Google Scholar]
  9. Dekkers JCM. Prediction of response to marker-assisted and genomic selection using selection index theory. J Anim Breed Genet. 2007;124:331–341. doi: 10.1111/j.1439-0388.2007.00701.x. [DOI] [PubMed] [Google Scholar]
  10. Harris DL. Expected and predicted progress from index selection involving estimates of population parameters. Biometrics. 1964;20(1):46–72. doi: 10.2307/2527617. [DOI] [Google Scholar]
  11. Hayes JF, Hill WG. A reparameterization of a genetic selection index to locate its sampling properties. Biometrics. 1980;36(2):237–248. doi: 10.2307/2529975. [DOI] [PubMed] [Google Scholar]
  12. Hazel LN. The genetic basis for constructing selection indexes. Genetics. 1943;8:476–490. doi: 10.1093/genetics/28.6.476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Itoh Y, Yamada Y. Re-examination of selection index for desired gains. Genet Sel Evol. 1986;18(4):499–504. doi: 10.1186/1297-9686-18-4-499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Itoh Y, Yamada Y. Selection indices for desired relative genetic gains with inequality constraints. Theor Appl Genet. 1988;75:731–735. doi: 10.1007/BF00265596. [DOI] [Google Scholar]
  15. Johnson NL, Kotz S, Balakrishnan N. Continuous univariate distributions, 2nd edn. New York: Wiley; 1994. [Google Scholar]
  16. Kempthorne O, Nordskog AW. Restricted selection indices. Biometrics. 1959;15:10–19. doi: 10.2307/2527598. [DOI] [Google Scholar]
  17. Lande R, Thompson R. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics. 1990;124:743–756. doi: 10.1093/genetics/124.3.743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Lynch M, Walsh B. Genetics and analysis of quantitative traits. Sunderland: Sinauer; 1998. [Google Scholar]
  19. Mallard J. The theory and computation of selection indices with constraints: a critical synthesis. Biometrics. 1972;28:713–735. doi: 10.2307/2528758. [DOI] [Google Scholar]
  20. Montgomery DC, Ruger GC. Applied statistics and probability for engineer. 3. New York: Wiley; 2003. [Google Scholar]
  21. Patel JK, Read CB. Handbook of the normal distribution. 2. New York: Marcel Dekkers, Inc.; 1996. [Google Scholar]
  22. Pesek J, Baker RJ. Desired improvement in relation to selection indices. Can J Plant Sci. 1969;49:803–804. doi: 10.4141/cjps69-137. [DOI] [Google Scholar]
  23. Podlich DW, Cooper M. QU-GENE: a simulation platform for quantitative analysis of genetic models. Bioinformatics. 1998;14:632–653. doi: 10.1093/bioinformatics/14.7.632. [DOI] [PubMed] [Google Scholar]
  24. Ramos PL, Louzada F, Ramos E. Posterior properties of the Nakagami-m distribution using non-informative priors and applications in reliability. IEEE Trans Reliab. 2015;14(8):1–13. [Google Scholar]
  25. Rencher AC. Methods of multivariate analysis. 2. New York: Wiley; 2002. [Google Scholar]
  26. Rencher AC, Schaalje GB. Linear models in statistics. 2. New Jersey: Wiley; 2008. [Google Scholar]
  27. Smith HF. A discriminant function for plant selection. Ann Eugen. 1936;7:240–250. doi: 10.1111/j.1469-1809.1936.tb02143.x. [DOI] [Google Scholar]
  28. Soong TT. Fundamentals of probability and statistics for engineers. England: Wiley; 2004. [Google Scholar]
  29. Springer MD. The algebra of random variables. New York: Wiley; 1979. [Google Scholar]
  30. Sorensen D, Gianola D. Likelihood, Bayesian, and MCMC methods in quantitative genetics. New York: Springer; 2002. [Google Scholar]
  31. Stuart A, Ord JK. Kendall’s advanced theory of statistics, 5th edn. New York: Oxford University Press; 1987. [Google Scholar]
  32. Tallis GM. The sampling errors of estimated genetic regression coefficients and the error of predicted genetic gains. Aust J Stat. 1960;2:66–77. doi: 10.1111/j.1467-842X.1960.tb00127.x. [DOI] [Google Scholar]
  33. Williams JS. Some statistical properties of a genetic selection index. Biometrika. 1962;9:325–337. doi: 10.1093/biomet/49.3-4.325. [DOI] [Google Scholar]
  34. Williams JS. The evaluation of a selection index. Biometrics. 1962;18:375–393. doi: 10.2307/2527479. [DOI] [Google Scholar]

Articles from TAG. Theoretical and Applied Genetics. Theoretische Und Angewandte Genetik are provided here courtesy of Springer

RESOURCES