Bayesian multitrait kernel methods improve multienvironment genome-based prediction

Osval Antonio Montesinos-López; José Cricelio Montesinos-López; Abelardo Montesinos-López; Juan Manuel Ramírez-Alcaraz; Jesse Poland; Ravi Singh; Susanne Dreisigacker; Leonardo Crespo; Sushismita Mondal; Velu Govidan; Philomin Juliana; Julio Huerta Espino; Sandesh Shrestha; Rajeev K Varshney; José Crossa

doi:10.1093/g3journal/jkab406

. 2021 Nov 29;12(2):jkab406. doi: 10.1093/g3journal/jkab406

Bayesian multitrait kernel methods improve multienvironment genome-based prediction

Osval Antonio Montesinos-López ¹, José Cricelio Montesinos-López ², Abelardo Montesinos-López ^3,^✉, Juan Manuel Ramírez-Alcaraz ¹, Jesse Poland ⁴, Ravi Singh ⁵, Susanne Dreisigacker ⁵, Leonardo Crespo ⁵, Sushismita Mondal ⁵, Velu Govidan ⁵, Philomin Juliana ⁵, Julio Huerta Espino ⁶, Sandesh Shrestha ⁴, Rajeev K Varshney ^7,⁸, José Crossa ^5,^9,^✉

Editor: A Lipka

PMCID: PMC9210316 PMID: 34849802

Abstract

When multitrait data are available, the preferred models are those that are able to account for correlations between phenotypic traits because when the degree of correlation is moderate or large, this increases the genomic prediction accuracy. For this reason, in this article, we explore Bayesian multitrait kernel methods for genomic prediction and we illustrate the power of these models with three-real datasets. The kernels under study were the linear, Gaussian, polynomial, and sigmoid kernels; they were compared with the conventional Ridge regression and GBLUP multitrait models. The results show that, in general, the Gaussian kernel method outperformed conventional Bayesian Ridge and GBLUP multitrait linear models by 2.2–17.45% (datasets 1–3) in terms of prediction performance based on the mean square error of prediction. This improvement in terms of prediction performance of the Bayesian multitrait kernel method can be attributed to the fact that the proposed model is able to capture nonlinear patterns more efficiently than linear multitrait models. However, not all kernels perform well in the datasets used for evaluation, which is why more than one kernel should be evaluated to be able to choose the best kernel.

Keywords: multitrait, kernel methods, plant breeding, genomic-enabled prediction, genomic prediction, GenPred, shared data resources

Introduction

Genomic selection (GS) has been widely adopted because its predictive methodology enables the selection of candidates before phenotypes are available on all individuals (Meuwissen et al. 2001). Current research in GS includes the use of prediction models in GS that were successful in other fields, or the adaptation or development of specific models for GS (Montesinos-López et al. 2019b, 2019c), and models that couple mechanistic and statistical approaches (Tong et al. 2020). At the same time, breeders usually select multiple traits that are often genetically correlated, with correlations ranging from weak to strong. Often analyses of multitrait data are performed with uni-trait (UT) models, which assume zero genetic and residual covariances among these traits so that information from other traits is not used (Montesinos-López et al. 2019b) when obtaining expected breeding values of the evaluated individuals for the traits under study (Okeke et al. 2017). However, the optimal estimation process is composed of the combination of information from multiple traits and estimated breeding values using the multitrait (MT) models (van der Werf 1992; Ducrocq 1994; Okeke et al. 2017).

The use of UT models is very common, partly due to the lower number of existing MT models. However, the attraction of MT models continues growing, as pointed out by Mbebi et al. (2021). UT models are trained using only one dependent variable. However, these models are unable to capture the correlation between traits when only one dependent variable is used, that is, when the training process is done separately for each trait (Montesinos-López et al. 2019b), whereas MT models are trained using all the available traits simultaneously, which is why they are able to capture the correlation between traits. When this correlation between traits is moderate or large, most of the time the prediction performance of MT models is better than that of UT models (Montesinos-López et al. 2016, 2019b, 2019c, 2020).

In MT models, even when the traits are unfavorably correlated (opposite signs), improvement of the prediction performance is expected as compared to UT models because the borrowing of information is possible (Neyhart et al. 2019). However, from a practical perspective, unfavorable correlations are common and complicate breeders’ decisions (Neyhart et al. 2019). Opposite directions of such correlations imply an unfavorable response in one trait when selecting on another (Falconer and Mackay 1996); thus the underlying cause will impact the prospects of long-term improvement (Neyhart et al. 2019).

There is empirical evidence that MT models (frequentist and Bayesian) outperform UT when the traits are correlated, as reported by some authors such as Calus and Veerkamp (2011), Jia and Jannink (2012), Jiang et al. (2015), Montesinos-López et al. (2016), He et al. (2016), and Schulthess et al. (2018), who reported that, at least for some traits, MT models outperform UT models in terms of prediction accuracy. Schulthess et al. (2018) also reported that, compared to UT models, MT models improve parameter estimates. Small differences are observed between frequentist and Bayesian methods in terms of prediction performance.

However, it has also been reported that when the correlation between traits is low, MT models are not really advantageous (Montesinos-López et al. 2016, 2018, 2019), since MT models provide less benefits when the degree of relatedness between traits is low (Montesinos-López et al. 2016, 2018, 2019). An early study of multivariate genomic prediction (Jia and Jannink 2012) showed the usefulness of multivariate models, but large differences were only observed when variable selection methods (BayesA and BayesC) were applied to nonpolygenic traits (20 QTLs), and little difference was observed in polygenic traits.

The following seven advantages of MT models with regard to UT models have been pointed out by Montesinos-López et al. (2019b): (1) MT models represent complex relationships between traits more efficiently; (2) they exploit not only the correlation between lines, but also the correlation between traits; (3) they are much more interpretable than a series of UT models; (4) they are more computationally efficient (less time for training) than multiple UT models individually; (5) they improve the selection index because they allow more precise estimates of random effects of lines and genetic correlation between traits; (6) they can improve indirect selection because they increase the precision of genetic correlation parameter estimates between traits; and (7) they improve the power of hypothesis testing better than UT models.

Although MT models have many advantages over UT models, they require the estimation of more parameters (i.e., genetic and error covariances), which affects the prediction performance of the MT models as well as the accuracy of breeding value estimates. The larger the number of traits, the larger the required number of parameters that need to be estimated (Runcie et al. 2021). Also, the more complex the model is and the larger the number of traits included, the greater chances there are of facing convergence problems in the analysis (Runcie et al. 2021). This means that MT models require more data to be able to accurately estimate the additional parameters (Okeke et al. 2017). The optimum training size depends upon the effective population size and the available genetic diversity within the population (Arojju et al. 2020). In general, results have shown that Bayesian MT methods have less issues related to convergence problems than frequentist MT methods (Montesinos-López et al. 2019b).

However, despite these seven advantages of MT models, most of them are unable to capture complex nonlinear patterns of the inputs. For example, MT models with a linear predictor are unable to capture these complex nonlinear patterns (Cuevas et al. 2016, 2017); however, it is quite straightforward to use the machinery of linear models for nonlinear tasks using Reproducing Kernel Hilbert Spaces (RKHS) methods (Gianola and van Kaam 2008). The use of RKHS methods for UT analysis is very common in GS (Cuevas et al. 2016, 2017; Crawford et al. 2018). For example, Long et al. (2010) reported that RKHS methods outperformed linear models in body weight of broiler chickens. Crossa et al. (2010) reported better prediction performance of RKHS methods with regard to linear Bayesian Lasso regression in wheat. In maize and wheat data, Cuevas et al. (2016, 2017, 2018, 2020) reported a greater performance of RKHS with Gaussian kernels over linear GBLUP for several UT genomic predictions incorporating genomic × environment interaction. Cuevas et al. (2019) also reported that nonlinear kernel methods (Gaussian kernel and arc-cosine kernel) outperformed linear kernel methods in terms of prediction performance using markers and near infrared spectroscopy data in the predictor pedigree.

The basic idea of RKHS methods is to project the original independent variables given in a finite dimensional vector space into an infinite-dimensional Hilbert space (Gianola and van Kaam 2008). Kernel methods transform the independent variables (inputs) using a kernel function, and then the transformed inputs can be used in conventional machine learning techniques at a low computational cost and repeatedly, with better results in terms of prediction performance (Shawe-Taylor and Cristianini 2004). RKHS methods based on implicit transformations have become very popular in analyses of nonlinear patterns in datasets from various fields of study. Kernel methods obtain measures of similarity between objects that do not have natural vector representation (Montesinos-López et al. 2021).

Due to its many attractive characteristics, the mixed-model framework under a frequentist approach is still very popular in GS for the implementation of MT models. However, the adoption of the Bayesian paradigm in plant breeding continues to grow due to the great computational advancements and new methodological applications and elucidations. Bayesian MT models offer some of the following advantages mentioned by Montesinos-López et al. (2019b): (1) they allow prior information to be incorporated; (2) they do not need good starting values to estimate parameters of interest such as the restricted maximum likelihood; (3) they increase the precision of parameter estimates (smaller standard errors); (4) conclusions can be drawn about the correlations between the dependent variables, notably, the extent to which the correlations depend on the individual and on the group level; (5) testing whether the effect of an explanatory variable on dependent variable Y1 is larger than its effect on Y2, when Y1 and Y2 data were observed (totally or partially) in the same individuals, is possible only by means of a multivariate analysis; (6) when attempting to carry out a single test of the joint effect of an explanatory variable on several dependent variables, a multivariate analysis is also required; such a single test can be useful, e.g., to avoid the danger of chance capitalization, which is inherent to carry out a separate test for each dependent variable; and (7) it does not have strong identifiability problems. In general, the MT Bayesian approach has the advantage of being more parsimonious and providing a more informative and powerful analysis. However, Bayesian MT analysis is computationally more demanding than univariate analysis, and its implementation is therefore many times impractical.

Furthermore, the implementation of conventional MT (frequentist and Bayesian) models is, in general, computationally demanding (Runcie et al. 2021). The fragility of these methods is due to the number of variance–covariance parameters that must be estimated, which increases quadratically with the number of traits (Runcie et al. 2021). The computational demands increase even more dramatically, from cubically to quantically, with the number of traits (Zhou and Stephens 2014) because most algorithms require repeated inversion of large covariance matrices. These matrix operations dominate the time required to fit conventional MT models, leading to models that take days, weeks, or even years to converge (Runcie et al. 2021).

In this study, we propose Bayesian kernel methods for the multitrait genome-enabled prediction of multienvironment trials. We applied the proposed methods to three extensive wheat multitrait multienvironment trial datasets and compared the prediction performance using four kernels—linear (GBLUP), Gaussian kernel (GK), polynomial kernel (PK) and sigmoid kernel (SK)—and conventional Bayesian multitrait Ridge Regression (BRR) under two scenarios: Scenario 1, in which all traits are missing in the testing set (MT), and Scenario 2, in which only a fraction of the traits are missing in the testing set (MT_P). We also evaluated the prediction performance with and without including genotype $\times$ environment interaction (G × E) under a multitrait framework. Finally, we also provide the R code to implement these methods in conventional Bayesian multitrait software.

Materials and methods

Bayesian multitrait kernel model

This model is given in (1) as:

Y = 1_{n} μ^{T} + X_{E} β_{E} + Z_{L} g + Z_{E L} g E + ϵ

(1)

where $Y$ is the matrix of phenotypic response variables of order $n \times n_{T}$ ; with $n = J I$ and $J$ and $I$ denotes the number of lines and environments respectively. $Y$ is ordered first by environments and then by lines, $n_{T}$ denotes the number of traits, $1_{n}$ is a vector of ones of length $n$ , $μ^{T}$ is a vector of intercepts for each trait of length $n_{T}$ , $T$ denotes the transpose of a vector or matrix, that is, $μ = {[μ_{1}, \dots, μ_{n_{T}}]}^{T}, X_{E}$ is the design matrix of environments of order $n \times I$ , $β_{E}$ is the matrix of beta coefficients for environments with a dimension of $I \times n_{T}$ , $Z_{L}$ is the design matrix of lines of order $n \times J$ , $g$ is the matrix of random effects of lines of order $J \times n_{T}$ distributed as $g \sim {M N}_{J {\times n}_{T}} (0, K^{l}, Σ_{T})$ , that is, with a matrix-variate normal distribution with parameters $M = 0$ , $U = K^{l}$ , and $V = Σ_{T}$ , $K^{l}$ is the $l th$ type of kernel matrix built with marker data (equivalent to a genomic relationship matrix) of order $J \times J$ that captures linear or nonlinear relationships ( $l = linear, Gaussian, polynomial and sigmoid)$ and $Σ_{T}$ is the variance–covariance matrix of traits of order $n_{T} \times n_{T}$ .

Note that $Z_{L} g$ are the BLUPs of lines of the $n_{T}$ traits, but repeated in the $I$ environments. $Z_{EL}$ is the design matrix of the genotype $\times$ environment interaction of order $n \times J I$ , $gE$ is the matrix of genotype $\times$ environment interaction random effects distributed as $gE \sim {M N}_{J {I \times n}_{T}} (0, K^{l} \otimes Σ_{E}, Σ_{T})$ , where $Σ_{E}$ is a diagonal variance–covariance matrix of environments of order $I \times I$ , and $K^{l} \otimes Σ_{E}$ is the Kronecker product of the $l th$ type of kernel matrix of lines and the environmental relationship matrix. Furthermore, the term $Z_{EL} g E$ contains the BLUPs corresponding to the genotype $\times$ environment interaction terms of the $n_{T}$ traits. $ϵ$ is the residual matrix of dimension $n \times n_{T}$ distributed as $ϵ \sim {M N}_{n {\times n}_{T}} (0, I_{IJ}, R)$ , where $R$ is the residual variance–covariance matrix of order $n_{T} \times n_{T}$ . The criteria for using these four kernels ( $linear, Gaussian, polynomial and sigmoid)$ were that these are very popular kernels used in statistical science and two of them in genomic prediction (linear and Gaussian).

The kernel methods

The linear kernel (LK) was computed as $K (x_{i}, x_{j}) = x_{i}^{T} x_{j}$ (Shawe-Taylor and Cristianini 2004), since $x_{i}^{T}$ and $x_{j}^{T}$ are any two rows of the scaled matrix of markers ( $X$ of order $J \times p$ ) divided by the square root of the total number of markers ( $p$ ) then this is indeed the linear kernel relationship matrix proposed by Van Raden (2008) and called Genomic Best Linear Unbiased Predictor (GBLUP). The polynomial kernel (PK) was computed as $K (x_{i}, x_{j}) = {({γ x}_{i}^{T} x_{j} + a)}^{d}$ , where $a = 0$ is a real scalar, γ =1 and $d = 3$ is a positive integer (Shawe-Taylor and Cristianini 2004). The sigmoidal kernel (SK) was computed as $K (x_{i}, x_{j}) = \tan h (x_{i}^{T} x_{j} + a)$ , where $\tanh$ is the hyperbolic tangent defined as $\tan h (z) = \sin h (z) / \cos h (z) = \frac{\exp (z) - \exp (- z)}{\exp (z) + \exp (- z)}$ (Shawe-Taylor and Cristianini 2004). The Gaussian kernel (GK), also known as the radial basis function kernel, was computed as $K (x_{i}, x_{j}) = e^{- γ {‖x_{i} - x_{j}‖}^{2}} = e^{- γ [x_{i}^{T} x_{i} - 2 x_{i}^{T} x_{i} + x_{j}^{T} x_{j}]}$ , where $γ$ is a positive real scalar (Shawe-Taylor and Cristianini 2004) and in this application, the parameter $γ$ used was $γ = 1$ , assuming that the markers were scaled.

Computational implementation of the Bayesian multitrait kernel model

Note that when $Σ_{T}$ , $Σ_{E}$ , and $R$ are diagonal matrices, model (1) is equivalent to separately fitting a univariate linear model to each trait. Also, when a linear kernel for $K^{l}$ is used in model (1), the model is equivalent to a conventional multitrait GBLUP model. The Bayesian multitrait kernel model (1) can be implemented in the BGLR package of de los Campos and Pérez-Rodríguez (2014). The github version of the BGLR R library can be accessed at https://github.com/gdlc/BGLR-R and can be installed directly in the R console by running the following commands: install.packages(‘devtools’); library(devtools); install_github (https://github.com/gdlc/BGLR-R). First we need to have computed: $X_{E}$ denotes the design matrix of environments, $Z_{L}$ denotes the design matrix of lines, $K^{l}$ any of the 4 kernels described above ( $l = linear, Gaussian, polynomial and sigmoid)$ , $KL = Z_{L} K^{l} Z_{L}^{T}$ , $KE = X_{E} X_{E}^{T}$ , and $KLE = KL ° KE$ (see Appendix B).

This implementation of model (1) can be carried out with this version of the BGLR package as follows:

ETA = list (Env = list (X = X_{E}, model =' FIXED'), Line = list (K = KL, model = ’ RKHS ’), LinexEnv = list (K = KLE, model = ’ RKHS ’))

A = Multitrait (y = Y, ETA = ETA, resCov = list (type =' UN', S 0 = S_{R}, df 0 = v_{R}), nIter = nI, burnIn = nb)

The first argument in the multitrait function is the response variable that is a phenotype matrix, in which each row corresponds to the measurements of $n_{T}$ traits in each individual. The second argument is a list predictor in which the first sub-list specifies the design matrix and prior model to the fixed effects part of the predictor in model (1), while the second sub-list specifies the parameters of the distribution of random genetic effects ( $g$ ), where the $KL$ is the expanded genomic relationship matrix specified, and which accounts for the similarity between individuals based on marker information. The third sub-list specifies the parameters of the distribution of random genotype by environment effects of $gE$ , where the KLE is the genomic relationship matrix specified, and which accounts for the similarity between individuals. df0 = $v_{T}$ and S0 = $S_{T}$ are the degrees of freedom parameter ( $v_{T}$ ) and the scale matrix parameter ( $S_{T}$ ) of the inverse Wishart prior distribution for $Σ_{T}$ , respectively. In the third argument (resCOV), S0 and df0 are the Scale matrix parameter ( $S_{R}$ ) and the degree of freedom parameter ( $v_{R}$ ) of the inverse Wishart prior distribution for $R$ . The last two arguments are the required number of iterations (nI) and the burn-in period (nb) to run the Gibbs sampler.

Datasets 1–3: elite wheat yield trial years 2013–2014, 2014–2015, and 2015–2016

These three datasets were collected by the Global Wheat Program (GWP) of the International Maize and Wheat Improvement Center (CIMMYT) and belong to elite yield trials (EYT) established in four different cropping seasons with four or five environments each. The lines involved in each of the environments of the same year are the same, but those in different years are different lines. EYT dataset 1 was sown in 2013–2014 and contains 767 lines, EYT dataset 2 was established in 2014–2015 and contains 775 lines and EYT dataset 3 was cultivated in 2015–2016 and contains 964 lines. The experimental design used was an alpha-lattice design and the lines were sown in 39 trials, each covering 28 lines and two checks in six blocks with three replications. In each dataset, several traits were available for some environments and lines. In this study we included four traits that were measured for each line in each environment: days to heading (DTHD, number of days from germination to 50% spike emergence), days to maturity (DTMT, number of days from germination to 50% physiological maturity or the loss of the green color in 50% of the spikes), plant height, and grain yield (GY). Full details of the experimental design and how the BLUEs were computed are given in Juliana et al. (2018).

In EYT 2013–2014 dataset 1, the lines under study were evaluated in 4 environments, while in EYT 2014–2015 dataset 2 and EYT 2015–2016 dataset 3, the lines were evaluated in five environments. For EYT dataset 1, the environments were bed planting with five irrigations (Bed5IR), flat planting and five irrigations (Flat5IR), early heat (EHT), and late heat (LHT). For EYT dataset 2, the environments were bed planting with two irrigation levels (Bed2IR), bed planting with five irrigations levels (Bed5IR), flat planting with five irrigation levels (Flat5IR), early heat (EHT) and late heat (LHT). Finally, for EYT dataset 3, the environments were bed planting with two irrigation levels (Bed2IR), bed planting with five irrigations levels (Bed5IR), flat planting with five irrigation levels (Flat5IR), flat planting with drip irrigation (FlatDrip), and late heat (LHT).

Genome-wide markers for the 2506 (667 + 775 + 964) lines in the three datasets were obtained using genotyping-by-sequencing (GBS; Elshire et al. 2011; Poland et al. 2012) at Kansas State University using an Illumina HiSeq2500. After filtering, 2038 markers were obtained from an initial set of 34,900 markers. The imputation of missing markers data was carried out using LinkImpute (Money et al. 2015) and implemented in TASSEL (Bradbury et al. 2007), version 5. Lines that had over 50% of missing data were removed and 2506 lines were used in this study (767 lines in the first dataset, 775 lines in the second dataset, and 964 lines in the third dataset). Also expected is a high level of relatedness given by pedigree or kinship between lines within a year of testing and also across years of testing due to the nature of the lines under study.

Evaluation of prediction accuracy with random cross-validation

The prediction accuracy of the Bayesian multitrait kernel model was evaluated with cross-validation (CV). A fivefold CV was implemented and the original dataset was partitioned into five subsamples of equal size, and each time, four of them were used for training and the remaining one for testing. In fivefold CV, one observation cannot appear in more than onefold. In the design, some lines can be evaluated in some, but not all, target environments, which mimics a prediction problem faced by breeders in incomplete field trials. Our validation strategy is exactly the same as the strategy denoted as CV2 that was proposed and implemented by Jarquín et al. (2014), in which a certain portion of test lines (genotypes) in a certain portion of test environments is predicted, since some test lines that were evaluated in some test environments are assumed to be missing in others.

We used the mean square error of prediction [ $MSE = \frac{1}{T} (\sum_{i = 1}^{T} {(y_{i} - \hat{f} (x_{i}))}^{2}$ , where $y_{i}$ is the observed value of the ith observation, $\hat{f} (x_{i})$ is the prediction that $\hat{f}$ gives to the ith observation and $T$ is the number of observations in the testing set] to evaluate the prediction performance, since we are working with continuous variables and MSE was calculated from each environment in each trait for each of the testing sets. The formula given above was used to compute the MSE error in each fold, but the average of all folds was reported as a measure of genome-based prediction performance. The lower the average of MSE, the better the prediction performance. All the analyses were carried out using the R statistical software (R Core Team 2020).

Results

The results are given in two sections that correspond to datasets 1 and 2. In each dataset, the genome-based prediction performance was assessed without including G $\times$ $E$ interactions and including G $\times$ $E$ interactions. Both cases are provided under the following scenarios: (1) when all the traits in the testing set are predicted (standard MT method) and (2) when only a fraction of the traits in the testing sets are predicted (MT_P). Two traits were considered: DTHD and DTMT. For simplicity and clarity, results from dataset 3 are provided in Appendix A, where genome-based predictions measured under the MSE of prediction without G $\times$ $E$ interaction and with G $\times$ $E$ interactions are described under the two scenarios, MT and MT_P.

Results are presented for each trait including (I) and ignoring (WI) G $\times$ E interaction for each of the scenarios, MT and MT_P in the form of tables and figures for each environment (of each of the datasets) and across environments.

Dataset 1 (EYT 2013–2014)

DTHD (without G $\times$ E interaction, WI)

We first compared the prediction performance for trait DTHD in terms of MSE for the methods (Figure 1A, WI, and Table 1) without G $\times$ E interaction under conventional multitrait Bayesian Ridge Regression (BRR) and four types of kernels [linear GBLUP, Gaussian (GK), polynomial (PK), and sigmoid (SK)] when all traits in the testing set are predicted (MT) and when only a fraction of the traits is predicted (MT_P). In Figure 1A, WI, and Table 1 under both scenarios (MT and MT_P), the best performance for most of the four environments was observed under the multitrait GK and the worst was found under the multitrait SK for both MT and MT_P scenarios. In environment EHT under scenario MT_P, the predictions were considerably better than under scenario MT, while in environment LHT, scenario MT was slightly better than scenario MT_P (Table 1 and Figure 1A, WI).

Dataset 1—DTHD. Prediction performance in terms of mean square error of prediction (MSE) for five methods (BRR, GBLUP, GK, PK, and SK) (A) without G × E interaction (WI) and (B) including G × E interaction (I) for four environments (Bed5IR, EHT, Flat5IR and LHT) and two scenarios (MT and MT_P).

Table 1.

Dataset 1 EYT 2013–2014

		Models and methods					Models and methods
		BRR	GBLUP	GK	PK	SK	BRR	GBLUP	GK	PK	SK

Env.	Scenario	Without G × E (WI)					With G × E (I)
		DTHD
Bed5IR	MT	14.95	14.94	12.96	13.08	25.12	12.77	12.52	11.36	12.17	25.07
EHT	MT	31.32	31.30	29.51	29.86	42.57	27.77	27.19	23.17	24.43	42.62
Flat5IR	MT	8.68	8.59	8.61	8.62	9.36	5.97	5.92	6.49	7.26	7.85
LHT	MT	6.00	5.99	5.87	5.94	7.71	4.56	4.68	5.36	5.84	7.12
Bed5IR	MT_P	14.57	14.56	13.07	13.26	23.68	12.46	12.21	10.97	11.86	23.58
EHT	MT_P	26.06	26.09	24.63	24.98	34.09	24.50	23.75	20.45	20.89	34.58
Flat5IR	MT_P	9.12	9.09	9.09	9.18	9.96	6.25	6.29	6.75	7.62	8.45
LHT	MT_P	6.97	6.99	6.63	6.71	9.54	5.52	5.63	6.06	6.56	9.23
		DTMT
Bed5IR	MT	11.62	11.58	10.17	10.18	18.88	10.25	9.94	9.07	9.37	18.93
EHT	MT	26.21	26.22	24.72	24.89	35.73	23.81	23.55	19.81	20.35	37.19
Flat5IR	MT	8.92	8.88	9.37	9.45	8.35	6.58	6.58	7.58	8.30	6.64
LHT	MT	7.80	7.77	7.58	7.62	10.83	6.52	6.45	6.44	6.93	10.77
Bed5IR	MT_P	11.47	11.49	10.34	10.43	17.96	10.21	9.91	8.99	9.40	18.05
EHT	MT_P	19.56	19.61	18.38	18.58	26.16	18.94	18.69	15.41	15.38	27.49
Flat5IR	MT_P	9.68	9.66	10.02	10.10	9.55	7.19	7.16	7.96	8.78	7.89
LHT	MT_P	8.42	8.42	8.00	8.11	11.83	7.24	7.20	7.30	7.69	12.13

Open in a new tab

Average mean squared error (MSE) of prediction for five multitrait multienvironment model-methods: BRR, Bayesian ridge regression; GBLUP, genomic best linear unbiased predictor; GK, Gaussian kernel; PK, polynomial kernel; SK, sigmoidal kernel without G × E (WI) and with G × E (I) for two scenarios (MT and MT_P) for four environments (Bed5IR, EHT, Flat5IR, LHT) and two traits (DTHD, days to heading and DTMT, days to maturity). Boldface indicates model-method with the lowest MSE for the environment.

Across environments, multitrait GK was always better than the other kernels for MT and MT_P (Figure 2A, WI, and Table 2). For the MT predictions, the GK outperformed the BRR, GBLUP, PK and SK by 7.012%, 6.76%, 0.928%, and 48.8%, respectively, while across environments for the MT_P predictions, the GK outperformed the BRR, GBLUP, PK, and SK by 6.17%, 6.19%, 1.32%, and 44.64%, respectively. Under scenario 2, MT_P gave a slightly better genome-based prediction than under scenario MT.

Dataset 1—DTHD and DTMT. Prediction performance across environments in terms of mean square error of prediction (MSE) for traits (A) DTHD with (I) and without (WI) including G $\times$ E interaction term for two scenarios (MP and MT_P) and (B) DTMT with (I) and without (WI) including G $\times$ E interaction term for two scenarios (MP and MT_P).

Table 2.

Dataset 1 EYT 2013–2014

	Models and methods					Models and methods
	BRR	GBLUP	GK	PK	SK	BRR	GBLUP	GK	PK	SK

Scenario	Without G × E (WI)					With G × E (I)
DTHD
MT	15.24	15.20	14.24	14.37	21.19	12.77	12.58	11.60	12.43	20.67
MT_P	14.18	14.18	13.36	13.53	19.32	12.18	11.97	11.06	11.73	18.96
DTMT
MT	13.64	13.61	12.96	13.03	18.44	11.79	11.63	10.73	11.24	18.38
MT_P	12.28	12.30	11.68	11.80	16.37	10.90	10.74	9.92	10.31	16.39

Open in a new tab

Average mean squared error (MSE) prediction across environments for five model-methods: BRR, Bayesian ridge regression; GBLUP, genomic best linear unbiased predictor; GK, Gaussian kernel; PK, polynomial kernel; SK, sigmoidal kernel without G × E (WI) and with G × E (I) for two scenarios (MT and MT_P), four environments (Bed5IR, EHT, Flat5IR, LHT), and two traits (DTHD, days to heading and DTMT, days to maturity). Boldface indicates model-method with the lowest MSE for each scenario.

DTHD (G $\times$ E interaction, I)

Taking into account the G $\times$ E interaction term, we also see that the worst performance was observed under the SK under both scenarios (MT and MT_P; Figure 1B, I, and Table 1). The best performance was observed under the GK under MT_P in environments Bed5IR and EHT, and BRR and GBLUP in environments Flat5IR and LHT. Large differences were not observed between the predictions without G $\times$ E interaction (Figure 1A, WI) and with G $\times$ E interaction (Figure 1B, I).

Across environments (Figure 2A, I, and Table 2) for MT predictions, the GK outperformed the BRR, GBLUP, PK, and SK by 10.35%, 8.47%, 7.15%, and 78.23%, respectively, while for scenario MT_P, the GK outperformed the BRR, GBLUP, PK, and SK by 10.18%, 8.25%, 6.08%, and 71.43%, respectively. There were increases in genome-based prediction when (1) including G $\times$ E (Figure 2A, I) compared to when ignoring G $\times$ E (Figure 2A, WI; Table 2) and (2) employing the MT_P scenario.

DTMT (without G $\times$ E, WI)

The prediction performance for trait DTMT is provided in terms of MSE for the five kernel methods (Figure 3A, WI, and Table 1) under conventional multitrait Ridge regression (BRR) and four types of kernels (GBLUP, GK, PK, and SK) under the same two scenarios (MT and MT_P). In Figure 3A, WI, and Table 1, it is observed that ignoring the G $\times$ E interaction term, under both scenarios (MT and MT_P), that the worst performance was for SK, while the best performance was the GK method for all environments except MT_P in Flat5IR (MSE = 9.66). The SK was considerably worse than the other methods under both scenarios (Figure 3A, WI). In environment LHT, scenario MT was slightly better than MT_P (Figure 3A, WI, and Table 1).

Dataset 1—DTMT. Prediction performance in terms of mean square error of prediction (MSE) for five methods BRR, GBLUP, GK, PK, and SK when (A) without G × E interaction (WI) and (B) including G × E interaction (I) for four environments (Bed5IR, EHT, Flat5IR, and LHT).

Across environments, under scenario MT predictions, the GK was better than BRR, GBLUP, PK, and SK by 5.23, 5.06, 0.57 and 42.34%, respectively, while under MT_P predictions, the GK outperformed the BRR, GBLUP, PK, and SK by 5.10, 5.23, 1.02 and 40.14%, respectively (Figure 2B, WI, and Table 2). The genome-based predictions under MT_P were better than under MT (Figure 2B, WI, and Table 2).

DTMT (G $\times$ E, I)

Considering the G $\times$ E interaction term, we also see that the worst performance was observed under the SK under both scenarios (MT and MT_P; Figure 3B, I, and Table 1). The best performance was observed under the GK in environments Bed5IR and EHT, and under BRR and GBLUP in environments Flat5IR and LHT. Large differences were not observed between the predictions without G $\times$ E interaction (Figure 3A, WI) and with G $\times$ E interaction (Figure 3B, I).

For trait DTMT across environment analyses, taking the G $\times$ E interaction into account, under MT and MT_P, the worst performance was observed under the SK, and in general, scenario MT_P was better than MT (Figure 2B, I, and Table 2). Under MT predictions across environments, the GK was superior in genomic-enabled prediction accuracy than BRR, GBLUP, PK, and SK by 9.90%, 8.43%, 4.76%, and 71.37%, respectively, whereas for MT_P, the GK was better than BRR, GBLUP, PK and SK by 9.98%, 8.31%, 3.97%, and 65.25%, respectively (Figure 2B, I, and Table 2). As for trait DHTD, there was a slight consistent increase in genome-based prediction accuracy when including G $\times$ E (Figure 2B, I) compared to when ignoring G $\times$ E (Figure 2B, WI) and for scenario 2 MT_P over scenario MT (Table 2).

Summary of results for dataset 1

The nonlinear multitrait Gaussian kernel showed the best genome-based prediction accuracies in most of the environments for both traits, DTHD and DTMT, whereas the sigmoidal kernel (SK) gave the worst prediction. Consistently for the 4 kernel methods linear GBLUP, GK, PK, and SK, the model including G × E gave lower MSE than models ignoring G × E, whereas the scenario that included all the traits (MT) gave a slightly worse prediction accuracy than the scenario including only a fraction of the traits in the testing sets to be predicted (MT_P). Although these patterns are expressed in most (but not all) of the environments, the across environments analyses of Table 2 and Figure 2 clearly displayed these conclusions.

Dataset 2 (EYT 2014–2015)

DTHD (without G $\times$ E, WI)

We first compared the prediction performance of the five methods (Figure 4A, WI, and Table 3) under MT and MT_P scenarios when ignoring G $\times$ E (WI). The best performance was observed under the GK, and the worst, under the SK. The SK was also considerably worse than the other methods under both MT and MT_P (Figure 4A, WI). Figure 4A, WI, and Table 3 also show that the worst prediction under both MT and MT_P scenarios was in environment EHT, whereas the best prediction was in environment Bed2IR. In all environments, MT_P slightly outperformed MT (Figure 4A, WI).

Dataset 2—DTHD. Prediction performance in terms of mean square error of prediction (MSE) for five methods (BRR, GBLUP, GK, PK, and SK) (A) without G × E interaction (WI) and (B) including G × E interaction (I) for five environments (Bed2IR, Bed5IR, EHT, Flat5IR, and LHT) and two scenarios (MT and MT_P).

Table 3.

Dataset 2 EYT 2014–2015

		Models and methods					Models and methods
		BRR	GBLUP	GK	PK	SK	BRR	GBLUP	GK	PK	SK

Env.	Scenario	Without G × E (WI)					With G × E (I)
DTHD
Bed2IR	MT	2.66	2.65	2.40	2.43	5.90	2.27	2.26	2.05	2.04	5.98
Bed5IR	MT	7.68	7.67	7.21	7.28	13.23	6.58	6.48	5.66	5.54	13.75
EHT	MT	16.13	16.17	15.55	15.54	22.87	14.34	14.44	11.59	11.76	23.58
Flat5IR	MT	4.34	4.32	4.03	4.05	6.34	3.67	3.62	3.84	3.79	6.39
LHT	MT	4.34	4.30	4.30	4.27	5.25	3.29	3.14	2.67	2.87	4.76
Bed2IR	MT_P	2.58	2.61	2.38	2.41	5.86	2.22	2.22	2.01	2.06	5.92
Bed5IR	MT_P	7.55	7.55	7.04	7.15	12.57	6.42	6.37	5.48	5.46	12.97
EHT	MT_P	16.19	16.16	15.50	15.55	22.72	14.17	14.31	11.43	11.74	23.18
Flat5IR	MT_P	4.34	4.33	4.12	4.14	6.21	3.69	3.64	3.70	3.90	6.23
LHT	MT_P	4.30	4.30	4.29	4.32	5.28	3.32	3.15	2.75	2.94	4.80
DTMT
Bed2IR	MT	4.80	4.79	4.63	4.70	6.56	4.26	4.20	3.90	4.03	6.27
Bed5IR	MT	6.29	6.30	5.98	6.05	9.82	5.33	5.36	4.72	4.77	10.18
EHT	MT	12.87	12.89	12.69	12.75	16.77	11.34	11.44	9.81	10.30	17.12
Flat5IR	MT	5.02	4.98	4.82	4.87	7.24	4.53	4.52	4.65	4.84	7.61
LHT	MT	3.92	3.87	3.90	3.86	4.77	3.13	3.05	2.66	2.79	4.42
Bed2IR	MT_P	4.68	4.70	4.52	4.60	6.54	4.16	4.20	3.90	4.07	6.29
Bed5IR	MT_P	5.93	5.95	5.66	5.75	8.93	5.07	5.10	4.51	4.53	9.19
EHT	MT_P	12.70	12.71	12.45	12.55	16.44	11.08	11.22	9.68	10.20	16.54
Flat5IR	MT_P	5.05	5.05	4.90	4.97	7.06	4.56	4.57	4.65	4.95	7.46
LHT	MT_P	3.74	3.71	3.70	3.72	4.53	3.01	2.88	2.59	2.67	4.26

Open in a new tab

Average mean squared error (MSE) of prediction for five multitrait multienvironment model-methods: BRR, Bayesian ridge regression; GBLUP, genomic best linear unbiased predictor; GK, Gaussian kernel; PK, polynomial kernel; SK, sigmoidal kernel without G × E (WI) and with G × E (I) for two scenarios (MT and MT_P), four environments (Bed2IR, Bed5IR, EHT, Flat5IR, LHT), and two traits (DTHD, days to heading and DTMT, and days to maturity). Boldface indicates model-method with the lowest MSE for the environment.

Across environments, scenario MT_P slightly outperformed MT (Figure 5A, WI; Table 4). Under MT across environments, the GK kernel performed better than BRR, GBLUP, PK, and SK by 4.96%, 4.86%, 0.258%, and 59.97%, respectively, while for scenario MT_P, the GK outperformed the BRR, GBLUP, PK, and SK by 4.88%, 4.82%, 0.704%, and 57.92%, respectively.

Dataset 2—DTHD and DTMT. Prediction performance across environments in terms of mean square error of prediction (MSE) for traits (A) DTHD with (I) and without (WI) including G $\times$ E interaction term for two scenarios (MP and MT_P) and (B) DTMT with (I) and without (WI) including G $\times$ E interaction term for two scenarios (MP and MT_P).

Table 4.

Dataset 2 EYT 2014–2015.

	Models and methods					Models and methods
	BRR	GBLUP	GK	PK	SK	BRR	GBLUP	GK	PK	SK

Scenario	Without G × E (WI)					With G × E (I)
DTHD
MT	7.03	7.02	6.70	6.72	10.72	6.03	5.99	5.16	5.20	10.89
MT_P	6.99	6.99	6.67	6.71	10.53	5.96	5.94	5.08	5.22	10.62
DTMT
MT	6.58	6.57	6.40	6.45	9.03	5.72	5.71	5.15	5.35	9.12
MT_P	6.42	6.43	6.25	6.32	8.70	5.58	5.59	5.07	5.29	8.75

Open in a new tab

Average mean squared error (MSE) prediction, across environments for five model-methods: BRR, Bayesian ridge regression; GBLUP, genomic best linear unbiased predictor; GK, Gaussian kernel; PK, polynomial kernel; SK, sigmoidal kernel without G × E (WI) and with G × E (I) for two scenarios (MT and MT_P) and two traits (DTHD, days to heading and DTMT, days to maturity). Boldface indicates model-method with the lowest MSE for the scenario.

DTHD (G $\times$ E, I)

When the G $\times$ E interaction (Figure 4B, I, and Table 3) term was taken into account for trait DTHD, the best prediction performance under MT occurred under the GK, PK, and GBLUP kernels, but we found differences in the prediction performance of the five methods between environments, since the worst predictions were observed in environment EHT and the best in environment LHT. For this trait, the worst predictions were observed for SK. Under MT_P, the best model was GK (with GBLUP being the best only for Flat5IR).

Sigmoid kernel SK considering the G $\times$ E interaction term was also the worst under both scenarios. However, the best performance was observed in environments LHT and EHT under the GK, in environments Bed5IR and Bed2IR with PK and in Flat5IR under GBLUP. No large differences were found in predictions without (Figure 4A) and with (Figure 4B) the G $\times$ E interaction term.

Across environments, MT_P was slightly better than the MT scenario (Figure 5A, I, and Table 4). For MT across environments, the GK method had better prediction accuracy than BRR, GBLUP, PK, and SK by 16.67%, 15.95%, 0.716%, and 110.91%, respectively, while for MT_P predictions, the GK method outperformed the BRR, GBLUP, PK and SK by 17.45%, 16.97%, 2.87%, and 109.22%, respectively. As previously found, results including G $\times$ E improved the genome-based prediction accuracy as compared to ignoring the interaction term, and MT_P had better prediction accuracy than MT.