Abstract
Developing high yielding rice varieties that are tolerant to drought stress is crucial for the sustainable livelihood of rice farmers in rainfed rice cropping ecosystems. Genomic selection (GS) promises to be an effective breeding option for these complex traits. We evaluated the effectiveness of two rather new options in the implementation of GS: trait and environment-specific marker selection and the use of multi-environment prediction models. A reference population of 280 rainfed lowland accessions endowed with 215k SNP markers data was phenotyped under a favorable and two managed drought environments. Trait-specific SNP subsets (28k) were selected for each trait under each environment, using results of GWAS performed with the complete genotype dataset. Performances of single-environment and multi-environment genomic prediction models were compared using kernel regression based methods (GBLUP and RKHS) under two cross validation scenarios: availability (CV2) or not (CV1) of phenotypic data for the validation set, in one of the environments. Trait-specific marker selection strategy achieved predictive ability (PA) of genomic prediction up to 22% higher than markers selected on the bases of neutral linkage disequilibrium (LD). Tolerance to drought stress was up to 32% better predicted by multi-environment models (especially RKHS based models) under CV2 strategy. Under the less favorable CV1 strategy, the multi-environment models achieved similar PA than the single-environment predictions. We also showed that reasonable PA could be obtained with as few as 3,000 SNP markers, even in a population of low LD extent, provided marker selection is based on pairwise LD. The implications of these findings for breeding for drought tolerance are discussed. The most resource sparing option would be accurate phenotyping of the reference population in a favorable environment and under a managed drought, while the candidate population would be phenotyped only under one of those environments.
Introduction
Drought stress is a major constraint for rice production in rainfed rice growing ecosystems that represent approximately 45% of rice growing areas [1]. It is estimated that 50% of world’s rice production is vulnerable to drought [2]. To reduce yield losses of rice crops in rainfed lowland areas and to increase overall rice production, rice cultivars with improved drought tolerance need to be developed. Early studies reported low selection efficiency for improving grain yield under drought stress [3–4]. Consequently, initial breeding efforts targeted secondary traits such as root architecture, leaf water potential, panicle water potential, osmotic adjustment and relative water content [5–7]. However, these traits do not always have higher broad-sense heritability than grain yield under drought and are not always highly correlated with grain yield [8–9]. Several studies in the last decade have documented the effectiveness of direct selection for grain yield under drought stress [10, 11]. A number of quantitative trait loci (QTL) involved in response of rice grain yield to drought stress have been identified [12] and successfully transferred to elite materiel through marker-assisted selection [13–14]. However, the complex genetic architecture of grain yield under drought hampers the objective of reducing the rice yield loss under drought and achieving higher genetic gain for grain yield under drought.
Genomic selection (GS) has recently emerged as an alternative option to conventional marker-assisted selection targeting mapped QTLs [15–17]. By shifting the plant breeding paradigm from “breeding by design” to a “genome-wide-approach”, GS provides genomic estimated breeding values (GEBV) based on all available marker data, instead of focusing on a limited set of markers that tag putative genes or QTLs. In the last few years, successful proof of GS concept has been reported in maize [18–20], wheat [21–25], barley [26–28] and oats [29]. In rice, moderate to high predictive ability (PA) of GEBV has been reported for a variety of quantitative traits in experiments with the progeny of bi-parental crosses and in diverse germplasm collections [30–35]. In particular, it was shown that rice diversity panels provide accurate genomic predictions for complex traits in the progenies of biparental crosses involving members of the panel [36].
One major feature of all the above-mentioned genomic prediction studies is the use of phenotypic data from only one environment, usually non-stressed. The main reason for such focalization was the lack of appropriate statistical framework that model G×E interactions for the purpose of genomic prediction. A number of such frameworks have recently been proposed. First, the single-trait single-environment genomic best linear unbiased prediction (GBLUP) model was extended to multi-environment context [37]. Then a GBLUP-type model using marker × environment interactions (M×E) was proposed [38]. The M×E based approach was further developed, using a non-linear (Gaussian) kernel and an empirical Bayesian method to model the G×E [39], and extended using Bayesian ridge regression priors that produce shrinkage [40]. The latest models go beyond the extension of single environment models and propose multi-environment prediction models in which the genetic effects are modeled by the Kronecker product of the variance-covariance matrix of genetic correlations between environments with the genomic relationship between lines, using GBLUP or RKHS methods [41]. Application of these multi-environment models to data from multilocal trials of CIMMYT’ maize and wheat breeding programs confirmed their superiority over the single environment models. Multi-environment prediction models were also reported to be better than single environment models when used with rice phenotypic data from managed environments representing two water management systems: continuous flooding and alternate watering and drying [42]. However, the differences between the PA of different categories multi-environment models were not significant. Likewise, it was shown that single-step, best linear unbiased prediction-based reaction norm models using data from non-genotyped and genotyped progenies, can enhance PA in rice recurrent genomic selection [43].
Another feature shared by almost all genomic prediction studies is the use of unsorted markers or markers selected on the base of variety of criteria except association with the target trait. In accordance with the infinitesimal model of the genetic architecture of traits [15], the prediction models are trained and GEBV are computed using the same set of markers for all the phenotypic traits targeted by the breeding programs, whatever their genetic architecture. Zhang et al. [44] computed the PA of different genomic prediction methods trained with a relationship matrix built with markers of equal effect (infinitesimal model) and with the same set of markers with weighted effects. The markers’ weight was derived from the result of a ridge regression best linear unbiased prediction or from BayesB prediction for different phenotypic traits. Genomic prediction with markers of weighted effect had higher PA. Similar improvement of the PA of genomic prediction of complex traits was reported, when a trait specific relationship matrix was built using the results of genome wide association studies (GWAS) available in the literature [45].
Here, we report the results of our investigations on the effectiveness of trait-specific marker selection and of multi-environment prediction models in improving the PA of genomic predictions for drought tolerance in rice. We disposed of a large genotypic dataset of 215 k SNP for the reference population (280 accessions) of the rainfed-lowland rice breeding program of the International Rice Research Institute (IRRI), and of phenotypic data of the same population under one normal and two managed drought growth conditions. Subsets of trait and environment-specific markers (28k SNP) were selected for each trait under each of the three growth environments, using results of GWAS performed with the complete genotypic dataset. Performances of single-environment and multi-environment genomic prediction models were compared using kernel regression based methods (GBLUP and RKHS).
Material and methods
Rice populations studied
The plant material was a diversity panel of 280 accessions (S1 Table), of which 245 belong to the four major subgroups of the indica genetic group and 35 to the aus group [46]. Landraces originating from South and South-East Asia represented the largest share of the panel (215). The remaining 65 accessions were improved lines. The 280 accessions were selected for their potential interest for the IRRI rainfed lowland rice breeding program, among the 3,000 rice accessions recently re-sequenced [46]. Seeds of the plant material for the present study were provided by the IRRI Gene-bank.
Phenotyping and analysis of phenotypic data
Field experiments
The field experiments took place at the experimental station of the International Rice Research Institute (IRRI; 14.18°N, 121.25°E), in the framework of the research activity of Dr. Arvind Kumar, in charge of lowland rice breeding at IRRI, the Philippines. Three independent experiments were conducted in the 2015 dry season, to evaluate the performances of the 280 accessions under three managed environments, E1, E2 and E3. E1 corresponded to the standard lowland rice cultivation, without stress, i.e. transplanting crop establishment in puddled soil and continuous flooding (anaerobic conditions) throughout the crop cycle. E2 corresponded to standard lowland rice cultivation associated with application of drought stress at the reproductive stage (see below). E3 corresponded to standard cultivation of upland rice (crop establishment by direct seeding in well-drained soil with continuous aerobic conditions during the crop cycle) with drought stress applied at the reproductive stage.
Drought stress at the reproductive stage was applied following the procedure described in [13]. Briefly, in the E2 experiment, the field was drained 30 days after transplanting and irrigation was withheld until severe leaf rolling was observed in at least 75% of the accessions. Fields were thereafter re-irrigated by sprinklers and the water was again drained after 24 hours in another cycle of drought stress. This cyclic pattern was continued until harvest. In the E3 experiment, drought stress was started 45 days after sowing, by withholding sprinkler irrigation until the soil water tension fell below –50 kPa at a depth of 30 cm. Thereafter, drought stress was applied in a cycle of sprinkler-irrigation and drainage 24 hours later until harvest.
The experimental design in E1 was an augmented randomized complete block, with 16 checks, in single row plots with rows 5 m in length. The E2 and E3 experiments were planted in an Alpha-lattice design with 2 replications and 16 checks, in two-row plots with rows 5 m and 3 m in length, respectively.
Three traits were measured or computed for each individual plot. Days to flowering (DTF, day) as the number days between sowing and visual estimation of 50% exerted panicles of the plot. Plant height at maturity (PH, cm), was measured as the mean height from the soil surface to the tip of the panicle of the main tiller of three randomly chosen plants. Grain yield (GY, kg/ha), was computed from the grain weight at 12.5% moisture of five plants harvested at physiological maturity.
Analysis of phenotypic data
The drought stress applied in E2 and E3 experiments proved to be too severe for a number of accessions that did not reach the reproductive stage or did not produce any grain. The three traits could consequently only be measured properly in 230 accessions in E2 and in 226 accessions in E3, of which 204 were common to the two experiments. Consequently, data analyses were run for 280 and 204 accessions in E1 and for 204 accessions in the E2 and E3.
The phenotypic data were analyzed using the PROC MIXED procedure of SAS v9.0 (SAS Institute Inc., Cary NC, USA).
For each experiment and trait, the following model was used to adjust phenotypic measurements:
where Yijk is the phenotype of either the ith accession or the jth check in the kth block, μ the overall mean, Cj the checks effect considered as fixed, ßk the block effect considered as random, Ni a covariate that amounted one for accessions and zero for checks entrees in each block, αik the random effect of the ith accession in the kth block and εijk is the residual considered as random. The block effects were estimated using data from the 16 checks.
The variance component was estimated using the restricted maximum likelihood method. Finally, Yadj values were extracted for each genotype and were used in genomic prediction models.
For each trait, broad sense heritability was calculated using the ratio H² = σ2g / σ2p, where σ²g is the genotypic variance obtained from the experimental data and the phenotypic variance σ2p = σ2g + σ2e, where σ2e is the residual variance obtained from the model.
Genotyping and analysis of genotypic data
Genotypic data
The genotypic data for the 280 accessions were obtained from the International Rice Informatics Consortium database for the 3,000 rice genomes project (http://iric.irri.org/). It corresponded to the 962k Core-SNP dataset. Filtering of the 962k loci for missing data (threshold of ≤ 20%), the frequency of the minor allele (MAF, threshold of ≥ 2%), and for the rate of heterozygosity (Ho, threshold of ≤ 5%), resulted in a working set of 215,242 SNP. The average rate of missing data per locus for the 280 accessions was 4.1%. The rate of missing data per accession ranged from 1.5 to 17.3%. Heterozygoty (Ho) per accession was 0.2 to 10.5, average 1.0%. The average minor allele frequency was 14.6%. The missing data were imputed using Beagle v4.0 [47]. The final matrix of 280 accessions and 215,242 SNP is hereafter referred to as the “215k SNP” marker set.
The SNP x accession matrix after imputation is available for download at http://tropgenedb.cirad.fr/tropgene/JSP/interface.jsp?module=RICE as GS-Ruse datasets “GS-Ruse_IRRI-Reference-Population_150K” and “GS-Ruse_IRRI-Reference-Population_28K”.
Characterization of population structure and linkage disequilibrium
The population structure was investigated with 2,859 SNP with no missing data, using the sNMF software [48]. Once the likely number of subpopulations was determined, each accession was assigned to one of those subpopulations, if the proportion of its inferred ancestry derived from a subpopulation was above 0.80%. Otherwise, the accession was considered as admixed. A distance-based clustering method was also implemented using the same genotypic data: First pairwise dissymmetry between genotypes was computed using simple matching information and an unweighted neighbor-joining tree was built using Darwin software v6 [49]. Then the tree topology was visualized using FigTree v1.4.3. [50].
Pairwise linkage disequilibrium (LD) between SNP loci was calculated at the level of the individual chromosome, using the working genotypic datasets and the r² estimator proposed for non-phased genotypic data [51].
Marker selection for genomic prediction
Taking advantage of the large number of markers available, two marker-selection approaches were tested, with the aim of (i) identifying the minimum marker density while maintaining the highest possible PA of genomic predictions and (ii) of analyzing the performances of trait-specific marker selection.
In the first approach, five LD thresholds (r² ≤ 0.25, r² ≤ 0.50, r² ≤ 0.75, r² ≤ 0.90, and r² ≤ 1) combined with three MAF thresholds (≥ 2%, ≥ 5% and ≥ 25%) were considered based on the following procedure. The complete pairwise r² matrix for each chromosome was computed on the 215k SNP dataset. Then for each chromosome, single loci or clusters of loci with pairwise LD with other loci below the threshold were identified. All singletons (LD with other loci below the given threshold) were kept. For the clusters of loci, one locus with the fewest missing data before imputation and the highest MAF was randomly selected to represent the cluster. Lastly, the three thresholds of MAF were applied to each of the five levels of LD, leading to 15 genotypic matrices, the smallest composed of 2,859 SNP (S2 Table). Comparison of PA of genomic predictions obtained with each of these matrices, using phenotypic data from E1 environment, (see below and results section) identified the matrix with 28,091 SNP (r² ≤ 0.5 and MAF ≥ 5%), hereafter referred to as 28k SNP dataset, as the best compromise.
In the second approach, the selection criterion was the degree of association of the SNP loci with the target phenotypic trait, as measured by genome-wide association analysis (GWAS). Taking the results of the first selection into account, marker selection was performed as follows: (i) the 215k SNP dataset was filtered for MAF ≥ 5%. (ii) The resulting set of 148,916 SNP was used for GWAS, for each trait, under each of the three environments (E1, E2 and E3), using the standard mixed linear model (MLM) under Tassel v5.2.48 [52]. MLM uses a genotype based kinship matrix (K) jointly with population structure (Q). We computed K and Q using the dataset of 2,859 SNP corresponding to the LD and MAF thresholds of r² ≤ 0.25 and MAF ≥ 20% respectively. Q corresponded to the estimate of the individual ancestry coefficients for the most likely number of subpopulations. (iii) For each trait, under each drought management, 28,091 SNP with the smallest P-value of trait-marker association were extracted to serve in genomic prediction experiments. All GWAS experiments were implemented with accessions that were not included in the corresponding model training process for the genomic prediction experiments, i.e. with 80% of the total number of accessions. The objective was to avoid the overfitting of the training model. The practical implication was the need to select a new set of 28k SNP for each of the replicates of each genomic prediction experiment (see below). Hereafter, we refer to the first marker selection approach as LD-derived marker selection, and to the second approach as GWAS-derived marker selection.
Statistical methods for genomic prediction
Single environment models
Two kernel regression models were used to predict GEBV in each drought stress experiment: the genomic best linear unbiased prediction (GBLUP) and the reproducing kernel Hilbert spaces regressions, RKHS, [53]. The implementation procedures of the two models is detailed in [42]. Briefly the GBLUP method that hypothesizes a strictly additive determinism of the genetic effects [54] was implemented using the genomic matrix G = M*M’ (M being the genotypic matrix) and the Expectation-Maximization convergence algorithm with the R package KRMM [55]. The RKHS method that captures more complex genetic determinism [56] was also implemented using the KRMM.
Multi-environment models
To predict the GEBV with data from the three environments (E1, E2 and E3), hereafter referred to as multi-environment prediction, we used the statistical models described above with extensions to include environmental effects. In the extended GBLUP model, the effects of environments and markers are separated into two components: the main effect of the markers for all the environments and the effect of markers for each individual environment [38]. For RKHS, we used the extended models incorporating G×E corresponding to the “Empirical Bayesian–Genotype × Environment Interaction Model” proposed in [39].
Implementation of the models
Analyses were performed in the R 3.4.2 environment [57] with the R packages BGLR 1.0.5 [58] and MTM 1.0.0 [59]. For both packages, 25,000 iterations for the Gibbs sampler were used after 5,000 burn-ins were removed. The analyses were supported by the CIRAD—UMR AGAP HPC Data Center of the South Green Bioinformatics platform (http://www.southgreen.fr/).
Assessing predictive ability of genomic prediction
In the single environment model, 80% of the accessions (i.e. 224 and 163 for the population of 280 and 204 accessions, respectively) were used as the training set and the remaining 20% (56 and 41, for the population of 280 and 204 accessions, respectively) was used as the validation set.
The multi-environment models were implemented for the three possible 1x1 combinations of the three environments [E1(E2), E1(E3), E2(E1), E2(E3), E3(E1), and E3(E2)] and the three possible 2x1 combinations, [E1(E2+E3), E2(E1+E3) and E3(E1+E2)]. In each case, the PA was assessed with two different cross-validation schemes. The first method (CV1), which resembled what was done in the single environment, used 80% of the observations as a training set and the remaining 20% as the validation set and assumed that phenotypic observations for the two or three environments were available for the individuals serving as training set, and that no phenotypic data were available for the individuals in the validation set. CV1 corresponds to the situation in which the phenotypes of newly generated individuals have to be predicted based only on their genotypic information [38]. The second method (CV2) also used 80% of the observations as a training set and the remaining 20% as the validation set but assumed that at least one observation in one environment was available for the individuals in both the training set and the validation set [38]. The same proportions of the training set and validation set (i.e. 80% and 20%, respectively) were kept when the prediction model combined data from the three environments.
Under both single and multi-environment models, one hundred replicates were computed for all random partitioning in the training and validation sets. The PA of each partition was calculated as the Pearson correlation coefficient between predictions and phenotypes in the validation set. The resulting estimates of PA were averaged and the associated standard error was calculated. For the multi-environment models, the correlation was calculated within each environment. For each trait (DTF, GY and PH) and each statistical method (GBLUP, RKHS), the same partitions were used to compute the PA. The number of replicates under GWAS-derived marker selection method was limited to 20.
The correlation coefficient data of all prediction experiments were transformed into a Z-statistic using the equation: Z = 0.5{ln[1 + r]−ln[1 − r]} and analyzed as a dependent variable in an analysis of variance. After estimation of confidence limits and means for Z, these were transformed back to r variable. In each case, ANOVA was performed to partition the variance of PA into different sources, with all effects declared as fixed.
Results
Phenotypic characteristics of the populations
Box-plots of the adjusted means of the three phenotypic traits under the three environments are presented in Fig 1. As expected, average DTF increased under the drought environments E2 and E3. Conversely, GY and PH decreased. Drought also affected phenotypic variances, which decreased for GY and PH, and increased for DTF. However, in all cases, the distribution of each trait in each environment was reasonably symmetric. The mean phenotypic performances of the improved lines were not significantly different from the mean phenotypic performances of the landrace varieties.
Correlation between the adjusted means under non-drought (E1) and drought environments (E2 and E3) was above 0.50 and highly significant for DTF and PH, close to zero for GY (Table 1). Correlations between the adjusted means under E2 and E3 ranged between 0.50 for GY and 0.74 for PH. Overall, the pattern of these correlations suggests significant G×E interactions for response to drought (E2 and E3 compared to E1), much less G×E interactions between lowland and upland drought (E2 compared to E3).
Table 1. Sources of phenotypic variation and derived summary statistics of days to flowering (DTF), grain yield (GY) and plant height (PH) under three drought environments, E1, E2 and E3.
Single environment analysis | ||||||||||||
Phenotypic variances | Phenotypic correlation | |||||||||||
Trait | Environment | Mean | SD | Fixed effects | Random effects | Total | H² | SE | E1-E2 | E2-E3 | E1-E3 | |
Genotype | Residual | |||||||||||
DTF | E1 | 86.7 | 9.53 | 0.38 | 70.27 | 23 | 93.64 | 0.75 | 0.05 | 0.71* | ||
E2 | 85.84 | 38.3 | 0 | 211.17 | 67.41 | 278.58 | 0.76 | 0.03 | 0.58* | |||
E3 | 90.19 | 32.44 | 0 | 348.52 | 54.55 | 403.07 | 0.84 | 0.01 | 0.68* | |||
GY | E1 | 4.98 | 1.68 | 4.26 | 3,425.56 | 1,578.10 | 5,007.92 | 0.68 | 0.07 | -0.18* | ||
E2 | 0.18 | 0.19 | 2.14 | 50.22 | 40.15 | 92.51 | 0.54 | 0.05 | 0.50* | |||
E3 | 0.43 | 0.62 | 1.77 | 137.38 | 11.98 | 151.12 | 0.91 | 0.08 | -0.16* | |||
PH | E1 | 106.56 | 21.2 | 0 | 469.94 | 21.71 | 491.65 | 0.96 | 0.01 | 0.64* | ||
E2 | 68.57 | 12.38 | 0.84 | 199.19 | 83.17 | 283.2 | 0.71 | 0.03 | 0.74* | |||
E3 | 73.62 | 13.03 | 0.84 | 194.98 | 45.36 | 241.18 | 0.81 | 0.02 | 0.59* | |||
Combined analysis of the three environments | ||||||||||||
Phenotypic variance | ZValue | ProbZ | ||||||||||
Genotype | Residual(1) | Genotype | Residual | Genotype | Residual | |||||||
DTF | 274.58 | 274.58 | 9.16 | 28.55 | < .0001 | < .0001 | 0.76 | 0.05 | ||||
GY | 208.38 | 208.38 | 7.06 | 28.55 | < .0001 | < .0001 | 0.71 | 0.06 | ||||
PH | 178.74 | 178.74 | 9.6 | 28.55 | < .0001 | < .0001 | 0.96 | 0.01 |
SD: Standard deviation; Fixed effects: Bloc (Rep); H²: Broad sense heritability for single environment analysis; SE: Standard error for H².
*: p-value of correlation <0.01. In the combined; (1): The residual variance includes the genotype by environment interaction.
Partitioning of the observed phenotypic variation into different sources of variation via the mixed model analysis is shown in Table 1. Whatever the trait or drought environment, the genotype contributed significantly to the phenotypic variation. The highest contribution of the genotype effect to the phenotypic variation, relative to residual, was observed for PH under E1 and for DTF under E3. Whatever the environment, the genotype effect contributed the least to GY variations. The broad-sense heritability H² was rather high even for GY. H² ranged from 0.75 to 0.86 for DTF, from 0.68 to 0.91 for GY, and from 0.81 to 0.96 for PH. Given the differences in the experimental design in the three environments, combined analysis of data was performed using accessions’ adjusted mean from each environment. The genotype effect was highly significant for the three traits. This was also the case for the residual variance that included the phenotypic variance due to G×E interactions.
Genotypic characteristics of the population
Within the 215k genotypic dataset, the average marker density along the 12 chromosomes was one marker every 1.95 kb. However, the SNP loci were unevenly distributed (S1 Fig). The 28k genotypic dataset represented an average marker density of one marker every 13.7 kb. However, 426 pairs of adjacent loci had distances above 100 kb, the largest distance being 427 kb.
The distribution of MAF was skewed toward low frequencies in both 215k and 28k datasets. The MAF was below 10% for 50% and 37% of loci in 215k and 28k datasets, respectively.
The average LD was rather low, 0.123 and 0.063 in the 215k and 28k datasets, respectively. The decay of LD along physical distance is presented in the (S2 Fig). For between-marker distances of 0 to 25 kb, the average r² value was 0.145 and 0.103 in the 215k and 28k datasets, respectively. These r² dropped to half their initial level at a mean distance between markers of around 600 kb and 250 kb, in the 215k and 28k datasets, respectively.
Analysis of population structure did not provide a clear-cut indication on the number of sub-populations. We opted for five subpopulations, as the estimate of individual ancestry coefficient derived with K = 5 provided the most congruent assignment of the accessions with the group assignment provided by the International Rice Informatics Consortium (S1 Table). Four subpopulations corresponded to the four subgroups of the indica group and the fifth assembled accessions from aus group. The distance-based analysis of diversity also confirmed the assignment of the accessions to indica and aus groups as well as the presence of subgroups within the indica group (S3 Fig).
Effect of LD and MAF based marker selection on the predictive ability of genomic prediction
The effect of LD and MAF-based marker selection was analyzed using the phenotypic data of the reference population of 280 accessions, under the E1 environment. The 90 cross validation experiments involving five levels of LD, three levels of MAF, two prediction methods and three phenotypic traits yielded mean PA of genomic prediction ranging from 0.417 to 0.586 (Fig 2 and S3A Table). Among these sources of variation, only the effects of phenotypic trait and LD were significant as well as the interaction between those factors (S3B Table). The highest average PA was observed for DTF (r = 0.566) and the lowest for GY (r = 0.427). The Fisher least significant difference (F-LSD) indicated that, whatever the trait, genotypic matrices with LD value of r² ≤ 0.25 led to significantly higher PA than the ones with r² > 0.25, except for DTF. In the latter case, the difference in PA between r² ≤ 0.25 and r² ≤ 0.5 was not significant (S3B Table). Given these results, and the fact that the next steps of our study dealt with phenotypic data from three environments, we took the conservative decision of using the 28k SNP dataset (r² ≤ 0.50, MAF ≥ 5%).
Effect of trait and environment-specific marker selection on the predictive ability of genomic prediction
The effect of trait specific selection of the 28k markers on the predictive ability of genomic prediction was analyzed in the reference population of 280 accessions under the E1 environment. GWAS-derived markers led to significantly higher PA than LD-derived markers for the three traits considered (Tables 2 and S4). Compared to LD-derived markers, the GWAS-derived markers led to average PA gain of 16% for DTF, 26% for GY and 24% for PH.
Table 2. Predictive ability (PA) of GBLUP and RKHS statistical models obtained with 28,091 SNPs selected based on linkage disequilibrium (LD), and based on association with target trait detected by genome wide association study (GWAS).
Environment & size of the population | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
280 accessions | 204 accessions | |||||||||
E1 | E1 | E2 | E3 | |||||||
LD | GWAS | LD | GWAS | LD | GWAS | LD | GWAS | |||
GBLUP | DTF | PA | 0.540 b | 0.618 a | 0.599 b | 0.653 a | 0.450 b | 0.519 a | 0.579 b | 0.635 a |
STD | 0.084 | 0.067 | 0.082 | 0.071 | 0.129 | 0.108 | 0.098 | 0.097 | ||
GY | PA | 0.389 b | 0.489 a | 0.345 b | 0.398 b | 0.420 b | 0.476 b | 0.672 b | 0.682 a | |
STD | 0.09 | 0.061 | 0.115 | 0.124 | 0.12 | 0.162 | 0.122 | 0.133 | ||
PH | PA | 0.505 b | 0.623 a | 0.558 b | 0.595 a | 0.648 b | 0.679 b | 0.673 b | 0.703 a | |
STD | 0.087 | 0.086 | 0.085 | 0.046 | 0.087 | 0.049 | 0.073 | 0.057 | ||
RKHS | DTF | PA | 0.539 b | 0.633 a | 0.585 b | 0.662 a | 0.466 b | 0.553 a | 0.554 b | 0.628 a |
STD | 0.081 | 0.062 | 0.09 | 0.071 | 0.114 | 0.110 | 0.098 | 0.089 | ||
GY | PA | 0.387 b | 0.490 a | 0.352 b | 0.391 b | 0.435 b | 0.479 b | 0.668 b | 0.695 b | |
STD | 0.089 | 0.054 | 0.106 | 0.127 | 0.113 | 0.683 | 0.149 | 0.132 | ||
PH | PA | 0.501 b | 0.625 a | 0.547 b | 0.595a | 0.646 b | 0.683 a | 0.675 b | 0.711 a | |
STD | 0.09 | 0.088 | 0.084 | 0.046 | 0.078 | 0.051 | 0.075 | 0.060 |
DTF: days to flowering; GY: grain yield PH: plant height (PH); E1: lowland non-stress environment; E2: lowland stressed; E3: upland stressed. AP of GWAS-derived markers predictions that are followed by different letters than their LD-derived counterpart are significantly different at α <0.05.
The effect of trait and environment specific selection of the 28k markers on the predictive ability of genomic prediction was analyzed within the population of 204 accessions phenotyped under three drought environments, E1, E2 and E3. Whatever the phenotypic trait, the environment and the genomic prediction method, GWAS-derived markers resulted in significantly higher PA than their LD-derived counterpart (Fig 3 and Table 2). Significant interactions between the marker selection method and the environment were observed only for the DTF trait (S5 Table). The average gain in PA, over LD derived markers was of 11% under E1, 10% under E2 and 6% under E3 (Fig 3 and Table 2). The F-LSD values indicate that the gains of PA for DTF (12% in average) were significant under each of the three environments. For PH, the PA gains (9% in average) were significant only under E1 and E3. For GY, the PA gains (6% on average) were not significant under any of the three environments. It is noteworthy that, whatever the marker selection method, the standard deviation of genomic predictions for GY in the population of 204 accessions, was almost systematically higher (average STD = 0.149) than the ones observed for DTF (average STD = 0.091) and PH (average STD = 0.071).
The two prediction methods GBLUP and RKHS (overall average PA of 0.560 and 0.562, respectively), performed similarly. Whatever the environment, the size of the population, and the trait, the differences in PA between the two methods did not exceed 2%.
Predictive ability of multi environment genomic prediction
Comparisons of the PA of the multi-environment genomic predictions for the three traits under the two cross validation strategies, three drought environments and two prediction methods are summarized in Fig 4. PA ranged from 0.226 to 0.809. Analysis of the sources of variation of PA revealed a significant effect of all the factors considered, ranked in decreasing importance: the cross-validation strategy, the trait, the environment, the type of model, and the statistical method. Interactions between the factors were also significant (Table 3). Given these interactions, for each trait, each multi-environment prediction was compared to its single environment counterpart, using Student’s test (S6 Table).
Table 3. Analysis of factors that influence the variation in predictive ability of single and multi-environment prediction models.
Analysis with only main effects | |||||
Source | DF | SS | MS | F-value | ProbF |
Model | 10 | 309.728 | 30.973 | 1 773.667 | < 0.0001 |
Error | 32 389 | 565.595 | 0.017 | ||
Corrected Total | 32 399 | 875.323 | |||
Trait | 2 | 145.581 | 72.791 | 7 141.750 | < .0001 |
Model | 1 | 10.730 | 10.730 | 1 052.770 | < .0001 |
Method | 2 | 1.426 | 0.713 | 69.930 | < .0001 |
CV | 1 | 15.993 | 15.993 | 1 569.160 | < .0001 |
Drought environment | 2 | 134.409 | 67.204 | 6 593.650 | < .0001 |
Training environment | 2 | 1.588 | 0.794 | 77.880 | < .0001 |
Analysis with main effects and all first-order interactions | |||||
Model | 51 | 496.946 | 9.744 | 833.032 | < 0.0001 |
Error | 32 348 | 378.377 | 0.012 | ||
Corrected Total | 32 399 | 875.323 | |||
Trait | 2 | 145.581 | 72.791 | 7 141.750 | < .0001 |
Model | 1 | 10.730 | 10.730 | 1 052.770 | < .0001 |
Method | 2 | 1.426 | 0.713 | 69.930 | < .0001 |
CV | 1 | 15.993 | 15.993 | 1 569.160 | < .0001 |
Drought environment | 2 | 134.409 | 67.204 | 6 593.650 | < .0001 |
Training environment | 2 | 1.588 | 0.794 | 77.880 | < .0001 |
Trait × Model | 2 | 4.605 | 2.303 | 225.920 | < .0001 |
Trait × Method | 4 | 1.418 | 0.355 | 34.780 | < .0001 |
Model × Method | 2 | 1.812 | 0.906 | 88.910 | < .0001 |
Trait × CV | 2 | 1.882 | 0.941 | 92.330 | < .0001 |
Model × CV | 1 | 15.993 | 15.993 | 1 569.160 | < .0001 |
Method × CV | 2 | 1.627 | 0.814 | 79.820 | < .0001 |
Trait × Drought | 4 | 152.543 | 38.136 | 3 741.620 | < .0001 |
Model × Drought | 2 | 0.786 | 0.393 | 38.580 | < .0001 |
Method × Drought | 4 | 0.499 | 0.125 | 12.240 | < .0001 |
CV × Drought | 2 | 0.792 | 0.396 | 38.840 | < .0001 |
Trait × Training environment | 4 | 0.957 | 0.239 | 23.470 | < .0001 |
Model × Training environment | 2 | 1.588 | 0.794 | 77.880 | < .0001 |
Method × Training environment | 4 | 0.935 | 0.234 | 22.930 | < .0001 |
CV × Training environment | 2 | 1.782 | 0.891 | 87.440 | < .0001 |
Drought × Training environment | 4 | 1.686 | 0.422 | 41.350 | < .0001 |
DF: Degree of freedom; SS: Sum of squares; MS: Mean square; ×: interaction between the factors.
Whatever the trait or the environment, the multi-environment models combined with the CV2 strategy provided the highest average PA of 0.630, and outperformed single environment models, with an average gain of 22% for DTF, of 6% for GY and 15% for PH. Under CV2, the multi-environment models using interaction between the non-stressed environment E1 and one of the drought environments E2 or E3, to predict the phenotypic performance in E2 or E3 (i.e. E2(E1) or E3(E1)), resulted in similar PA to single environment models. Conversely, the multi-environment model E2(E3) and E3(E2) resulted in PA gains of 22% and 13% respectively compared to single environment model. The multi-environment models predicting phenotypic performances in one of the drought environments, using data from the non-stressed environment and the remaining drought environment (i.e E2(E1+E3) and E3(E1+E2)), resulted in PA gains of 24% over the corresponding single environment model.
Performances of the multi-environment prediction models combined with the CV1 strategy (average PA of 0.408) did not significantly differ from their single environment model counterpart, whatever the trait and the environment and the prediction method considered. Multi-environment models predicting performances in drought stressed environments from E2(E1+E3) and E3(E1+E2) or from E2(E1) and E3(E1) did not result in significantly different PA either.
Among the multi-environment prediction models, RKHS, with average PA of 0.600 performed systematically slightly better than the GBLUP, average PA of 0.566.
Discussion
The drought stress applied in this study was so severe that 25% of the accessions did not reach the flowering stage. However, the stress corresponded to the drought severity that rice crops are often facing in the drought-prone rainfed lowland ecosystem of South Asia [60]. Despite the severity of the drought stress, the broad-sense heritability observed for the three phenotypic traits was reasonably high, and the distribution of the three phenotypic traits in each of the three environments was reasonably symmetric, once the accessions that did not reach the flowering stage were discarded.
Effect of markers selection on the predictive ability of genomic prediction
The promise of GS to improve response to selection in populations that are under artificial selection is often presented as depending on the availability of high-density genotypic data to maximize the number of QTLs in LD with at least one marker [15–17], and a large training population to accurately estimate marker effects. However, high-density SNP genotyping may be too expensive for many plant breeding programs, especially when it comes to genotyping the selection candidates in each breeding cycle. Regarding the size of the population, for a given amount of resources, breeders can choose between evaluating more individuals less accurately or fewer individuals more accurately. Simulation studies have shown the feasibility of using low-density genotypic data, with SNP markers evenly distributed along the genome with no significant decrease in the PA of genomic predictions [61–62]. Empirical studies of PA of genomic prediction in various plant species also showed very little, if any, effect of marker densities, until very low marker densities were used [63]. The results of our analysis of the effects of LD and MAF and the associated marker density confirmed these findings. The highest PA were observed with marker densities below one SNP every 15kb. On the other hand, our results somewhat deviate from the rule of positive relationship between the PA of GEBV and the size of the training set [63–65]. Indeed, we did not observe any significant differences in PA between the populations of 280 and 204 accessions evaluated under the E1 environment. The reasons for this deviation may be that (i) the rule applies mainly in across population predictions whereas we were performing cross-validation within each population; (ii) the reduction in size from 280 to 204 accessions was accompanied by a change in the structure of the populations. Indeed, as all the discarded accessions belonged to the more drought susceptible indica group, the proportion of aus accessions increased in the population of 204 accessions from 12% to 17%.
Compared to LD-derived marker selection, GWAS-derived marker selection led, on average, to 22% higher PA in cross validation experiments within the population of 280 accessions. The gains in PA were smaller (6%, on average) but still statistically significant, in cross validation experiments within the population of 204 accessions. Similar results were reported in a simulation-based comparison of GEBV obtained with the GBLUP method using relationship matrices derived from trait-specific markers or from non-trait-specific markers [43]. The difference in the gains of PA observed in E1 environment between the populations of 280 and 204 accessions (12% on average), suggests that the size of the population plays a determining role in the informativeness of the trait-specific markers. In other words, GWAS performed with 224 accessions (the size of the training set for GWAS in the population of 280 accessions) is more informative than GWAS performed with 163 accessions, the size of the training set in the population of 204 accessions. This is in agreement with the well-known positive relationship between the size of the population and the power of GWAS to accurately evaluate the effect of each marker [63]. These gains of PA with GWAS-derived marker, despite the limited size of our training populations, suggest that more substantial gains of PA could be achieved if more consolidated QTL information was available. Such consolidated QTL information can be built from the large number of publicly available QTL database and SNPs detected in different linkage mapping and GWAS experiments. These QTLs’ information could serve building trait-specific genomic relationship matrices, based on the modified VanRaden genomic relationship matrix, with marker weights for each locus, proposed by [45].
Predictive ability of multi-environment predictions
The PA of genomic predictions obtained with the multi-environment models were, on average, similar to their single environment counterparts under the CV1 cross-validation strategy, and significantly higher under the CV2 cross-validation strategy. Gains in the PA of the multi-environment models combined with CV2 were the highest in the two drought affected environments, E2 and E3. The two multi-environment models tested (GBLUP, RKHS) led to significantly different predictive abilities. Under GBLUP, the overall average gain in PA was 7% and ranged from– 4% for GY to 15% for PH. These gains were significantly lower than the 30% gain reported in [38] comparing multi-environment and single environment GBLUP models for GY in wheat. This was also the case for the gains in PA reported in [42] using rice data from two managed environments, continued flooding and alternate watering and drying. These authors reported gains of up to 29% for DTF compared with 22% in our case. The RKHS multi-environment model enabled gains in PA of up to 32%. These gains are much lower than those of up to 68% reported in [39] in wheat, similar to the ones reported by [42], and in accordance with [41], higher than the GBLUP multi-environment model. The differences in the amplitude of gains of PA observed in our study and the ones reported in [39, 41] are probably due to several factors, including the size of the population (599 wheat lines against 204 accessions in our case), the number of environments (larger number of environments that were grouped in four target sets of environments versus three in our case), and the distinctive features of those environments. While in the case of wheat, data from a multi-local trial across a natural continuum of environments were grouped in four subsets, our data were produced in a managed-environment trial with clear-cut treatments to assess the effect of drought stress on rice development and yield. Overall, these results reinforce the conclusion drawn by [42] that, multi-environment genomic prediction models that account for G×E interactions as evaluated from multi-local trials, are also of interest for breeding for tolerance to abiotic stresses, including drought.
Implications for breeding rice for drought tolerance
Improving resilience to drought during floral development and anthesis is an important target for major cereal crops including rice [1, 66]. It is now widely accepted that the ability to predict complex traits is better when using whole-genome marker prediction than using a few markers that target a few quantitative trait loci [67, 68]. In the case of rice, several empirical studies embedded in ongoing breeding programs have confirmed the potential of GS in accelerating genetic gains [34, 36, 42]. Recently, a new generation of genomic prediction models was developed that further improve the PA of genomic prediction by explicitly modeling G×E interactions in the same way as in multilocal trials that all breeding program resort to [37–41]. The effectiveness of such models in the context of breeding for tolerance to abiotic stresses was first documented in rice [42] using data from managed-environment trials to evaluate the performances of a reference population and a progeny population under two water management systems: alternate watering and drying and conventional continuous flooding. These authors also reported gains in PA of up to 14% compared to direct selection based on phenotypic correlation between the two water management systems. Likewise, they showed that the share of non-phenotyped individuals in the CV2 prediction strategy could be increased up to 40% with no significant negative effect on PA.
In the present study, we showed that (i) reasonably high PA of genomic prediction can be obtained with as few as 3,000 SNP markers, even in a population of limited LD extent, provided the markers are selected on the basis of LD. (ii) Trait-specific markers selection resulted in higher PA of genomic prediction than markers selected on the basis of neutral LD, especially when the size of the training population is large enough to allow the accurate estimation of the effect of the markers. (iii) Tolerance to drought stress was better predicted by multi-environment models (especially RKHS) that accounted for G×E interaction (E being non-stressed lowland and stressed lowland or upland, or both), than their single-environment model counterparts, when associated with the CV2 prediction strategy. (iv) Finally, yet importantly, even under the less favorable CV1 prediction strategy, multi-environment models achieved similar PA to their single environment counterparts.
The final finding suggests the feasibility of a genomic breeding scheme aiming at simultaneous improvement of yield potential and drought tolerance. It requires a training population carefully phenotyped under both favorable environmental conditions and managed drought, while in the first step of selection, the candidate population would be phenotyped only under favorable environments. The selected candidate would be phenotyped under managed drought to ascertain their GEBV and to update the multi-environment prediction model for the next breeding cycle [69–70]. The process can be implemented in the framework of the pedigree-breeding of progeny of biparental crosses between members of the reference population of the breeding program that is used as the training population [36; 42]. In the context of multi-environment model-based genomic predictions, the effectiveness of trait-specific markers merits investigations using a simulation approach. Nevertheless, breeders should consider including a limited share of trait specific markers (especially for the most important target traits) when genotyping their candidate populations.
Supporting information
Data Availability
The genotypic datasets generated and analysed during the current study are available in hapmap format at the following address http://tropgenedb.cirad.fr/tropgene/JSP/interface.jsp?module=RICE, in the tab “STUDIES” then selecting for “Genotypes” in Study type and for “GS-Ruse". All other relevant data, including the phenotypic data, are within the manuscript and Supporting Information files.
Funding Statement
This work was funded by Agropolis Foundation (http://www.agropolis-fondation.fr/) and Cariplo Foundation (http://www.fondazionecariplo.it/), Grant no 1201-006. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Serraj R, Kumar A, McNally KL, Slamet-Loedin I, Bruskiewich R, Mauleon R et al. Improvement of drought resistance in rice. Adv Agron. 2009; 103: 41–98. 10.1016/S0065-2113(09)03002-8 [DOI] [Google Scholar]
- 2.Bouman BA M, Humphreys E, Tuong TP, Barker R. Rice and water. Adv. Agron. 2006; 92: 187–237. [Google Scholar]
- 3.Rosielle AA, Hamblin J. Theoretical aspects of selection for yield in stress and non-stress environments. Crop Sci. 1981; 21: 943–946. [Google Scholar]
- 4.Edmeades GO, Bolanos J, Lafitte HR, Rajaram S, Pfeiffer W, Fischer RA. Traditional approaches to breeding for drought resistance in cereals In ‘‘Drought Resistance in Cereals” (Baker F. W. G., Ed.); pp. 27–52. CAB International, Wallingford, Oxon, UK: 1989. [Google Scholar]
- 5.Fukai S, Cooper M. Development of drought-resistant cultivars using physiomorphological traits in rice. Field Crop Res. 1989; 40: 67–86. [Google Scholar]
- 6.Price AH, Young EM, Tomos AD. Quantitative trait loci associated with stomatal conductance, leaf rolling and heading date mapped in upland rice (Oryza sativa). New Phytol. 1997; 137: 83–91. [Google Scholar]
- 7.Jongdee B, Fukai S, Cooper M. Leaf water potential and osmotic adjustment as physiological traits to improve drought tolerance in rice. Field Crop Res. 2002; 76: 153–163. [Google Scholar]
- 8.Atlin G. Improving drought tolerance by selecting for yield In ‘‘Breeding Rice for Drought-Prone Environments” (Fischer KS, Lafitte R, Fukai S, Atlin Gardy B, Eds.), pp. 14–22. International Rice Research Institute, Los Banos, Philippines: 2003. [Google Scholar]
- 9.Bernier J, Atlin GN, Serraj R, Kumar A, Spaner D. Breeding upland rice for drought resistance (a review). J. Sci. Food Agric. 2008; 88: 927–939. [Google Scholar]
- 10.Venuprasad R, Lafitte HR, Atlin GN. Response to direct selection for grain yield under drought stress. Crop Sci. 2007. a; 47: 285–293. [Google Scholar]
- 11.Kumar A, Bernier J, Verulkar S, Lafitte HR Atlin GN. Breeding for drought tolerance: Direct selection for yield, response to selection and use of drought tolerant donors in upland and lowland-adapted populations. Field Crop Res. 2008; 107: 221–231. [Google Scholar]
- 12.Dixit S, Mallikarjuna Swamy BP, Vikram P, Bernier J, Sta Cruz MT, et al. Increased drought tolerance and wider adaptability of qDTY12.1 conferred by its interaction with qDTY2.3 and qDTY3.20. Mol Breed. 2012; 30:1767–1779. [Google Scholar]
- 13.Kumar A, Dixit S, Ram T, Yadaw RB, Mishra KK, Mandal NP. Breeding high-yielding drought-tolerant rice: genetic variations and conventional and molecular approaches. J. Exp. Bot. 2014; 65(21): 6265–6278 10.1093/jxb/eru363 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Dixit S, Singh A, Sandhu N, Bhandari A, Vikram P, Kumar A. Combining drought and submergence tolerance in rice:marker-assisted breeding and QTL combination effects. Mol Breeding. 2017; 37:143 10.1007/s11032-017-0737-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Meuwissen THE, Hayes BJ, Goddard ME. Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps. Genetics. 2001; 157(4): 1819–1829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Heffner EL, Sorrells ME, Jannink J-L. Genomic Selection for Crop Improvement. Crop Sci. 2009; 49(1): 1–12. [Google Scholar]
- 17.Lorenz AJ, Chao S, Asoro FG, Heffner EL, Hayashi T, Iwata H et al. Genomic Selection in Plant Breeding. Knowledge and Prospects. Adv. Agron. 2011; 110: 77–123. 10.1016/B978-0-12-385531-2.00002-5 [DOI] [Google Scholar]
- 18.Bernardo R, Yu J. Prospects for Genomewide Selection for Quantitative Traits in Maize. Crop Sci. 2007; 47:1082–1090. 10.2135/cropsci2006.11.0690 [DOI] [Google Scholar]
- 19.Zhao Y, Gowda M, Liu M, Wurschum T, Maurer HP, Longin FH et al. Accuracy of genomic selection in European maize elite breeding populations. Theor Appl Genet. 2012; 124:769–776. 10.1007/s00122-011-1745-y [DOI] [PubMed] [Google Scholar]
- 20.Guo Z, Tucker DM, Wang D, Basten CJ, Ersoz E, Briggs WH, Lu J et al. Accuracy of across-environment genome-wide prediction in maize nested association mapping populations. G3. 2013; 3: 263–272. 10.1534/g3.112.005066 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Heffner EL, Jannink J-L, Sorrells ME. Genomic Selection Accuracy using Multifamily Prediction Models in a Wheat Breeding Program. Plant Genome. 2011; 4(1): 65–75. [Google Scholar]
- 22.Rutkoski J, Benson J, Jia Y, Brown-Guedira G, Jannink JL, Sorrells M. Evaluation of genomic prediction methods for fusarium head blight resistance in wheat. Plant Genome. 2012; 5:51–61 [Google Scholar]
- 23.Poland J, Endelman J, Dawson J, Rutkoski J, Wu S, Manes Y et al. Genomic selection in wheat breeding using genotyping-by-sequencing. Plant Genome. 2012; 5(3): 103–113 [Google Scholar]
- 24.Ornella L, Singh S, Pérez P, Burgueño J, Singh R, Tapia E, et al. Genomic prediction of genetic values for resistance to wheat rusts. Plant Genome. 2012; 5: 136–148 [Google Scholar]
- 25.Bassi FM, Bentley AR, Charmet G, Ortiz R, Crossa J. Breeding schemes for the implementation of genomic selection inwheat (Triticum spp.) Plant Sci. 2015; 242: 23–36. 10.1016/j.plantsci.2015.08.021 [DOI] [PubMed] [Google Scholar]
- 26.Lorenz AJ, Smith KP, Jannink J-L. Potential and optimization of genomic selection for Fusarium head blight resistance in six-row barley. Crop Sci. 2012; 52: 1609–1621. [Google Scholar]
- 27.Sallam AH, Endelman JB, Jannink JL, Smith KP. Assessing genomic selection prediction accuracy in a dynamic barley breeding population. Plant genome. 2015; 8(1): 1–15 [DOI] [PubMed] [Google Scholar]
- 28.Sorrells ME. Genomic Selection in Plants: Empirical Results and Implications for Wheat and Barley Breeding Programs Conference Paper· 2015. Doi: 10.1007/978-4-431-55675-6_45.
- 29.Asoro FG, Newell MA, Beavis WD, Scott MP, Jannink J-L. Accuracy and training population design for genomic selection on quantitative traits in elite North American oats. Plant Genome 2011; 4(2): 132–144 [Google Scholar]
- 30.Guo Z, Tucker DM, Basten CJ, Gandhi H, Ersoz E, Guo B et al. The impact of population structure on genomic prediction in stratified populations. Theor Appl Genet. 2014; 127(3): 749–762 10.1007/s00122-013-2255-x [DOI] [PubMed] [Google Scholar]
- 31.Isidro J, Jannink J-L, Akdemir D, Poland J, Heslot N, Sorrells M. Training set optimization under population structure in genomic selection. Theor Appl Genet. 2015; 128(1): 145–158 10.1007/s00122-014-2418-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Onogi A, Ideta O, Inoshita Y, Ebana K, Yoshioka T, Yamasaki M et al. Exploring the areas of applicability of whole-genome prediction methods for Asian rice (Oryza sativa L.). Theor Appl Genet. 2015; 128(1): 41–53. 10.1007/s00122-014-2411-y [DOI] [PubMed] [Google Scholar]
- 33.Spindel J, Begum H, Akdemir D, Virk P, Collard B, Redoña E et al. Genomic Selection and Association Mapping in Rice (Oryza sativa): Effect of Trait Genetic Architecture, Training Population Composition, Marker Number and Statistical Model on Accuracy of Rice Genomic Selection in Elite, Tropical Rice Breeding Lines. PLoS Genet. 2015; 11(2): e1004982 10.1371/journal.pgen.1004982 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Grenier C, Cao T-V, Ospina Y, Quintero C, Châtel MH, Tohme J et al. Accuracy of genomic selection in a rice synthetic population developed for recurrent selection breeding. PLoS One. 2015; 10(8): e0136594 10.1371/journal.pone.0136594 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wang X, Li L, Yang Z, Zheng X, Yu S, Xu C et al. Predicting rice hybrid performance using univariate and multivariate GBLUP models based on North Carolina mating design II. Heredity. 2017; 118(3): 302–310 10.1038/hdy.2016.87 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ben Hassen M., Cao T.V., Bartholomé J., Orasen G., Colombi C. et al. Rice diversity panel provides accurate genomic predictions for complex traits in the progenies of biparental crosses involving members of the panel. Theor Appl Genet. 2017; 131(2): 417–435. 10.1007/s00122-017-3011-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Burgueño J, de los Campos G, Weigel K, Crossa J. Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Sci. 2012; 52 (2):707–719. [Google Scholar]
- 38.Lopez-Cruz M, Crossa J, Bonnett D, Dreisigacker S, Poland J et al. Increased prediction accuracy in wheat breeding trials using a marker × environment interaction genomic selection model. G3. 2015; 5 (4):569–582. 10.1534/g3.114.016097 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Cuevas J, Crossa J, Soberanis V, Pérez-Elizalde S, Pérez-Rodríguez P et al. Genomic prediction of genotype × environment interaction kernel regression models. Plant Genome. 2016; 9 (3):1–20. 10.3835/plantgenome2016.03.0024 [DOI] [PubMed] [Google Scholar]
- 40.Crossa J, de los Campos G, Maccaferri M, Tuberosa R, Burgueño J et al. Extending the marker × environment interaction model for genomic-enabled prediction and genome-wide association analysis in durum wheat. Crop Sci. 2016; 56:2193–2209. [Google Scholar]
- 41.Cuevas J, Crossa J, Montesinos-López OA, Burgueño J, Pérez-Rodríguez P et al. Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models. G3. 2017; 7 (1):41–53. 10.1534/g3.116.035584 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Ben Hassen M, Bartholomé J, Valè G, Cao T-V, Ahmadi N. Genomic prediction accounting for genotype by environment interaction offers an effective framework for breeding simultaneously for adaptation to an abiotic stress and performance under normal cropping conditions in rice. G3. 2018; 8(9): 2319–2332. 10.1534/g3.118.200098 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Morais Júnior OP, Duarte JB, Breseghello F, Coelho ASG, Morais OP, Magalhães Júnior AM. Single-Step Reaction Norm Models for Genomic Prediction in Multienvironment Recurrent Selection Trials. Crop Sci. 2018;58(2):592–607. 10.2135/cropsci2017.06.0366 [DOI] [Google Scholar]
- 44.Zhang Z, Liu J, Ding X, Bijma P, de Koning D-J, et al. Best Linear Unbiased Prediction of Genomic Breeding Values Using a Trait-Specific Marker- Derived Relationship Matrix. PLoS One. 2010; 5(9): e12648 10.1371/journal.pone.0012648 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Zhang Z, Ober U, Erbe M, Zhang H, Gao N, et al. Improving the Accuracy of Whole Genome Prediction for Complex Traits Using the Results of Genome Wide Association Studies. PLoS One; 2014; 9(3): e93017 10.1371/journal.pone.0093017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Wang W, Mauleon R, Hu Z, Chebotarov D, Tai S, Wu Z, et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature. 2018; 557(7703):43–9. 10.1038/s41586-018-0063-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Browning Brian L, Browning Sharon R. Genotype Imputation with Millions of Reference Samples. Am. J. Hum. Genet. 2016; 98(1): 116–126 10.1016/j.ajhg.2015.11.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Frichot E, Mathieu F, Trouillon T, Bouchard G and François O. Fast and efficient estimation of individual ancestry coefficients. Genetics. 2014, 196:973–983. 10.1534/genetics.113.160572 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Perrier X, Jacquemoud-Collet JP. DARwin software. Centre de Cooperation Internationale en Recherche Agronomique pour le Developpement, Montpellier, France. 2006. http://darwin.cirad.fr/
- 50.Rambaut A, Drummond A. (2016) http://tree.bio.ed.ac.uk/software/figtree/
- 51.Rogers AR, Huff C. Linkage disequilibrium between loci with unknown phase. Genetics. 2009; 182(3): 839–844. 10.1534/genetics.108.093153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000; 155: 945±959. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Gianola D, Fernando RL, Stella A. Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics. 2006; 173:1761–1776. 10.1534/genetics.105.049510 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Van Raden PM. Efficient Methods to Compute Genomic Predictions. J. Dairy Sci. 2008; 91 (11):4414–4423 10.3168/jds.2007-0980 [DOI] [PubMed] [Google Scholar]
- 55.Jacquin L, Cao T-V, Ahmadi N. A unified and comprehensible view of parametric and kernel methods for genomic prediction with application to Rice. Front. Genet. 2016; 7:145 10.3389/fgene.2016.00145 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Gianola D, van Kaam JB. Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics. 2008; 178:2289–2303. 10.1534/genetics.107.084285 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.R Core Team. A Language and Environment for Statistical Computing. URL http://www.R-project.org/, R Foundation for Statistical Computing, Vienna, Austria: 2017. [Google Scholar]
- 58.Perez P, de Los Campos G. Genome-wide regression and prediction with the BGLR statistical package. Genetics, 2014; 198(2):483±95. 10.1534/genetics.114.164442 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.de los Campos G, Grüneberg A, 2018; MTM (Multiple-Trait Model) package: http://quantgen.github.io/MTM/vignette.html
- 60.Bhandari H., Pandey S, Sharan R, Naik D, Hirway I, Taunk SK, Sastri ASR. Economic costs of drought and rice farmers’ drought coping mechanisms in eastern India In: Pandey S., Bhandari H., Hardy B. (Eds.), Economic Costs of Drought and Rice Farmers’ Coping Mechanisms: A Cross-Country Comparative Analysis. pp. 43–111; IRRI, Los Banos, Philippines: 2007. [Google Scholar]
- 61.Habier D, Fernando RL, Dekkers JCM. Genomic Selection Using Low-Density Marker Panels. Genetics 182: 343–353. 10.1534/genetics.108.100289 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Zhong S, Dekkers J, Fernando RL Jannink JL (2009) Factors Affecting Accuracy From Genomic Selection in Populations Derived From Multiple Inbred Lines: A Barley Case Study. Genetics. 2009; 182(1):355–64. 10.1534/genetics.108.098277 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Lorenzana R, Bernardo R. Accuracy of genotypic value predictions for marker-based selection in biparental plant populations. Theor Appl Genet. 2009; 120: 151–161 10.1007/s00122-009-1166-3 [DOI] [PubMed] [Google Scholar]
- 64.Luan T, Woolliams JA, Lien S, Kent M, Svendsen M, Meuwissen THE. The accuracy of genomic selection in Norwegian red cattle assessed by cross validation. Genetics. 2009; 183(3): 1119–1126 10.1534/genetics.109.107391 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD et al. Invited review: Reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci. 2009; 92: 16–24 10.3168/jds.2008-1514 [DOI] [PubMed] [Google Scholar]
- 66.Witcombe JR, Hollington PA, Howarth CJ, Reader S, Steele KA Breeding for abiotic stresses for sustainable agriculture. Phil Trans R Soc B. 2008; 363: 703–716 10.1098/rstb.2007.2179 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, et al. Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci. 2017; 22 (11):961–975. 10.1016/j.tplants.2017.08.011 [DOI] [PubMed] [Google Scholar]
- 68.Hickey JM, Chiurugwi T, Mackay I, Powell W. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery. Nat Genet. 2017; 49 (9):1297–1303. 10.1038/ng.3920 [DOI] [PubMed] [Google Scholar]
- 69.Heffner EL, Sorrells ME, Jannink J-L. Genomic Selection for Crop Improvement. Crop Sci. 2009; 49(1): 1–12. [Google Scholar]
- 70.Pszczola M, Calus MP. Updating the reference population to achieve constant genomic prediction reliability across generations. Animal. 2016; 10(6):1018–24. 10.1017/S1751731115002785 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The genotypic datasets generated and analysed during the current study are available in hapmap format at the following address http://tropgenedb.cirad.fr/tropgene/JSP/interface.jsp?module=RICE, in the tab “STUDIES” then selecting for “Genotypes” in Study type and for “GS-Ruse". All other relevant data, including the phenotypic data, are within the manuscript and Supporting Information files.