Abstract
Key message
We compare genomic selection methods that use correlated traits to help predict biomass yield in sorghum, and find that trait-assisted genomic selection performs best.
Abstract
Genomic selection (GS) is usually performed on a single trait, but correlated traits can also help predict a focal trait through indirect or multi-trait GS. In this study, we use a pre-breeding population of biomass sorghum to compare strategies that use correlated traits to improve prediction of biomass yield, the focal trait. Correlated traits include moisture, plant height measured at monthly intervals between planting and harvest, and the area under the growth progress curve. In addition to single- and multi-trait direct and indirect GS, we test a new strategy called trait-assisted GS, in which correlated traits are used along with marker data in the validation population to predict a focal trait. Single-trait GS for biomass yield had a prediction accuracy of 0.40. Indirect GS performed best using area under the growth progress curve to predict biomass yield, with a prediction accuracy of 0.37, and did not differ from indirect multi-trait GS that also used moisture information. Multi-trait GS and single-trait GS yielded similar results, indicating that correlated traits did not improve prediction of biomass yield in a standard GS scenario. However, trait-assisted GS increased prediction accuracy by up to when using plant height in both the training and validation populations to help predict yield in the validation population. Coincidence between selected genotypes in phenotypic and genomic selection was also highest in trait-assisted GS. Overall, these results suggest that trait-assisted GS can be an efficient strategy when correlated traits are obtained earlier or more inexpensively than a focal trait.
Electronic supplementary material
The online version of this article (10.1007/s00122-017-3033-y) contains supplementary material, which is available to authorized users.
Introduction
Releasing new varieties usually requires evaluation of progenies in a large number of environments. Because the costs of field experiments are becoming the limiting factor (Gawenda et al. 2015; Heslot et al. 2015), strategies that allow rapid, accurate, and resource-efficient predictions are of increasing interest. The application of best linear unbiased prediction (BLUP) using pedigree information Henderson (1975) and more recently using molecular markers (GBLUP) (VanRaden 2008; Hayes et al. 2009b) are examples of efforts to meet those goals.
When GBLUP or other GS models are applied, selection is made on genomic estimated breeding values (GEBVs) calculated from molecular markers and using phenotypic information of a training population. GS has been successfully applied in many animal (Vallée et al. 2014; de los Campos et al. 2013) and plant (Heffner et al. 2011; Heslot et al. 2012) breeding programs, and prediction accuracy (r) generally shows a positive correlation with heritability (Hayes et al. 2009a). When a focal trait has low , indirect or multi-trait GS can be applied to take advantage of correlated traits with higher to increase r for the focal trait (Mrode 2014, page 70). Benefits of multi-trait GS over single-trait GS have been reported in simulated (Calus and Veerkamp 2011) and real data (Jia and Jannink 2012; Schulthess et al. 2016).
Sorghum [(Sorghum bicolor (L.) Moench] is a multipurpose crop that is grown to produce grain, forage, and most recently biomass for second-generation biofuel production. Some advantages of sorghum as a biomass crop include low implementation cost, short cycle, wide adaptability, mechanized management, and high calorific value in boilers (Vermerris and Saballos 2013; Castro et al. 2015). Biomass yield in sorghum has low heritability (Shiringani and Friedt 2011) and is costly and laborious to phenotype. Correlated traits, including plant height, are much easier and more cost-effective to phenotype and have higher heritability (Monk et al. 1984; Castro et al. 2015; Burks et al. 2015). One previous study applied single-trait GS to predict biomass yield in a diverse photoperiod-sensitive sorghum panel (Yu et al. 2016). Much of the phenotypic variation in biomass yield could be explained in a model including plant height, stalk number, and lodging , and indirect GS using these three traits yielded a prediction accuracy only slightly lower than direct GS on biomass yield versus 0.76). However, the authors did not test multi-trait GS approaches.
In this study, we compare the efficiency of various GS strategies for increasing prediction accuracy of a focal trait, sorghum biomass yield, using information from correlated traits.
Materials and methods
Plant material and field experiments
A panel of 453 diverse photoperiod-sensitive sorghum lines was obtained from the United States National Plant Germplasm System (NPGS) and evaluated in Urbana, IL from 2012 to 2014. Along with the diverse panel, the commercial hybrid “Pacesetter” (Richardson Seeds, Vega, TX, USA) was included as check in all years. The experimental design in 2012 was a randomized complete block design with two replications of single row plots with a row length of 7.6 m, 1.5 m alleys and 0.76 m row spacing and a total of 24 rows and 16 columns. Thus, 179 sorghum lines were planted in 2012 and the remaining plots were filled with the commercial hybrid. The experimental design in 2013 and 2014 was an augmented block design with the commercial hybrid included as a check in each block and 24 additional genotypes repeated twice in each year. Each incomplete block consisted of 24 four-row plots with a row length of 3 m, 1.5 m alleys and 0.76 m row spacing and a total of 12 rows and 40 columns. The 480 plots used in 2013 and 2014 were filled with 415 lines, among which 141 lines were also included in 2012. The remaining plots were filled with the check hybrid. The target density in all years was approximately 207, 570 plants/ha, though the final density in 2013 was lower due to climatic conditions and planting error. In each year, field experiments were planted in late May and harvested in early October.
Phenotyping
Plant height was measured as plot average from the ground to the whorl, at 30 (H1), 60 (H2), 90 (H3) and 120 (H4) days after planting. Total plot wet weight (kg) was measured with a forage harvester consisting of a John Deere 5830 tractor with a four-row Kemper head and a weigh wagon modified with load cells accurate to within 1 kg. A 0.5 kg chopped subsample was captured from each plot at harvest, then weighed before and after oven drying at C for to determine moisture content: Moisture (M) = (subsample wet weight − subsample dry weight)/subsample wet weight. Biomass yield in dry metric tons per hectare (Y) was calculated as: dry metric tons/ha = total plot wet weight plot moisture) / (plot area .
Genotyping
DNA was extracted from dark-grown etiolated seedling tissue in 96-well plates using a CTAB protocol. Illumina libraries were created using two pairs of restriction enzymes: PstI-HF/HinP1I and PstI-HF/BfaI (New England Biolabs, Ipswich, MA). Restriction–ligation was performed in 96-well plates, and unique barcoded adapters were ligated to each DNA sample. 96 DNA samples per library were pooled into a single tube for all subsequent steps including size selection using AMPure beads (Beckman-Coulter, Pasadena, CA, USA), PCR amplification using Phusion polymerase (New England Biolabs), and a second round of a bead-based size selection. Single-end, 100-bp sequencing reads were obtained for all libraries on an Illumina HiSeq2000 instrument following submission protocol to the Keck Center at the University of Illinois. The TASSEL3 GBS pipeline (Glaubitz et al. 2014) was used to identify SNPs, using Bowtie2 (Langmead and Salzberg 2012) for tag alignment. Only reads that perfectly matched a barcode and restriction site overhang were retained. After barcode trimming, a set of “master tags” was generated from the unique 64 bp sequences present at least ten times in the dataset that mapped uniquely to the sorghum genome. SNPs were called by comparing the tags in each individual to the set of master tags at each genomic address. SNPs and individuals with more than missing data as well as SNPs with MAF less than were discarded. Missing data were imputed using BEAGLE4 (Browning and Browning 2011) using a window size and overlap of 500 and 100 SNPs, respectively. The final genotypic dataset consisted of 59264 SNPs with an average MAF of 0.21 and heterozygous genotypes.
Data analysis
Due to the differences in field experimental designs and field heterogeneity across years, as well as for reasons of computational efficiency, a two-stage analysis was performed. In the first stage, a mixed model approach was used to account for spatial variation, generating adjusted means for each genotype in each trial. The most appropriate model for each combination of trait and year was chosen based on the variogram (Gilmour et al. 1997) and the Akaike information criterion (AIC) (Table S1), where the full model is:
1 |
Each phenotypic data point () was observed in genotype i, block j; is a constant; is the fixed effect of the ith genotype; is the independent and identically distributed random effect of the jth block with and is the random effect of residuals, with , where is a first-order auto-regressive structure applied to row and column for spatial correction. Adjusted means () were then calculated as the mean of the scaled values from each year.
In the second stage, a GBLUP model was used to obtain genomic predictions for different traits. In addition to predicting each height measurement individually, the area under the growth progress curve (A) was also calculated from the adjusted values of all height measurements and analyzed as a different trait. Since all height measurements were 30 days apart, this was obtained from the following simplified equation:
2 |
where m is the number of height measurements, and is height measure at the ith observation.
The model used for single-trait GS was:
3 |
where is the adjusted means from the first stage, is a constant; is the vector of random effect of genotypes with and , where is the additive genetic variance and is the realized additive relationship matrix calculated from the genotypic dataset using the A.mat function from rrBLUP package (Endelman and Jannink 2012); is the identical and independently distributed residual with , where is the residual variance. Genomic heritability () was calculated by the ratio of additive and phenotypic variance (de los Campos et al. 2015).
The model used for multi-trait GS with p variables, following a notation similar to that used by Ferreira (2011, page 331) was:
4 |
where is the vector of multivariate responses associated with genotype i , in which , is the vector of the constants associated with each trait, with , is the vector of random effects of genotype i associated with each trait, in which , , is the vector of random effects of residuals from the multivariate model, , with . The matrices G and R are the variance–covariance matrices (VCOV) for genetic and residual effects, respectively. In both cases, these are assumed to be unstructured, considering correlation for all pairs of traits and specific variances for each trait. The multi-trait model was used in this study for . Genetic and residual correlation were obtained from the multi-trait analysis and its respective standard errors were estimated by the Delta method, all of which are given as an output of ASReml-R (Fikret Isik 2017, page 116).
Cross-validation and prediction accuracy
The prediction accuracy of each model was accessed through cross-validation with , randomly splitting the dataset in five sets and using four of them to predict the remaining set. This process was repeated for each one of the five sets, storing all GEBVs before calculating a single Pearson’s correlation between five folds of GEBVs and adjusted means. This process was repeated 30 times and the same folds were used to perform cross-validation for the different models. Mean and standard deviation of the correlations were calculated and reported as prediction accuracy and its standard deviation, respectively. Training set and validation set varied according to the model used (Table 1).
Table 1.
Model | Training | Validation | |
---|---|---|---|
1 | Standard GS | Yield | Yield |
2 | Indirect GS | Height | Height a |
3 | Multi-trait indirect GS | Height moisture | Height moisture b |
4 | Multi-trait GS | Yield height | Yield |
5 | Trait-assisted GS | Yield height | Yield |
aPrediction accuracies obtained as GEBV
bGEBV and GEBV were scaled and weighted by their genetic correlations with
In single-trait, multi-trait, and trait-assisted GS, genomic predictions of biomass yield itself were used to obtain r. In indirect GS, genomic predictions for a correlated trait (eg: height) were correlated with of biomass yield to obtain r. In multi-trait indirect GS, genomic predictions for multiple correlated traits were scaled to have equal mean and variance before the following index was calculated:
5 |
where is the additive genetic correlation between trait i and biomass yield, and is the vector of GEBVs for trait i. Prediction accuracy of indirect multi-trait GS was calculated as correlation between this index and biomass . Multi-trait and trait-assisted GS differ only in that the latter uses , rather than , of correlated trait data for prediction of the focal trait. Thus, trait-assisted GS uses more total data points than multi-trait GS, including correlated trait phenotypes in the validation population. These strategies are similar to those used in Burgueño et al. (2012) for a multi-environment GS study. Analogously, predictions in multi-trait GS were entirely based on record of other lines, as in CV1. On the other hand, trait-assisted GS took advantage of correlated traits, similar to what was done in CV2 for correlated environments.
Coincidence between models
Coincidence between and GEBVs was calculated for the top and bottom individuals in each cross-validation run using the following coincidence index (CI) (Hamblin and Zimmermann 1986):
6 |
where B is the number of selected genotypes that is common in both models; T is the total number of selected genotypes; and R is the expected number of genotypes selected by chance. For example, repeated random selection of of genotypes (91 of 453) would yield an expected overlap of 18 genotypes ( of 91) between random drawings.
All statistical analyses were conducted using R 3.0.3 R Core Team (2014) and the GBLUP model was fitted using the ASReml-R library (Butler et al. 2009). Phenotypic and genotypic information used, as well as scripts for all analysis performed in this paper can be found in https://github.com/samuelbfernandes/Trait-assisted-GS.
Results
Prediction accuracy of the standard GS model was, in general, proportional to the square root of the genomic heritability for each trait (Fig. 1). The lowest accuracy in this study was obtained for H1 (0.33), followed by the one obtained for Y (0.40). On the other hand, the square root of the genomic heritability (h) for biomass (0.51) was slightly smaller than (0.54). The highest h (0.94) and r (0.68) were obtained for A, with H3 close behind (Fig. 1). The other traits (M, H2 and H4) had similar r and h.
All traits were genetically correlated with biomass yield (Fig. 2). The genetic correlation between biomass yield and moisture was negative, whereas genetic correlations with plant height traits were all positive and increased with each successive plant height measurement. For H2, H3, H4 and A, genetic correlations with Y were greater than residual correlations with Y, suggesting that they could be useful for multi-trait prediction of Y (Schaeffer 1984).
Prediction accuracies of indirect GS models (Fig. 3) were generally proportional to the genetic correlation of a correlated trait with biomass yield (Fig. 2). Prediction accuracy for Y using H3 data () was slightly higher than despite having a lower genetic correlation. The best prediction accuracy from indirect GS, , was nearly () as high as for standard GS. Multi-trait indirect GS did not show any advantage over single-trait indirect GS.
Using information from correlated traits in the training population (multi-trait GS) did not provide any increase in prediction accuracy over the standard, single-trait GS model (Fig. 4). On the other hand, using information from correlated traits in both the training and validation populations (trait-assisted GS) increased prediction accuracy for biomass regardless of the secondary trait analyzed with Y, with the highest accuracy obtained for YA (0.60) (Fig. 4). Prediction accuracy increases with trait-assisted GS ranged from using YM to with YA, relative to standard single-trait GS. For highly correlated traits (H3, H4, and A), trait-assisted GS models maintained their advantage over standard GS even when the training population was reduced to of the dataset (), though this was not true for moderately correlated traits (M, H1, and H2; Fig. S1). Interestingly, the reduction in variance of GEBVs relative to was also less dramatic for trait-assisted GS compared to the other GS models. Whereas, biomass yield had a standard deviation of 2.13 tons/ha, single trait, multi-trait, and trait-assisted GEBVs had standard deviations of 0.85, 0.86, and 1.21 tons/ha respectively, using A as the correlated trait.
Coincidence indices (CIs) between the top and bottom of and GEBVs were compared between single-trait, multi-trait, and trait-assisted GS models. In all cases CIs were below 0.5. However, CIs between trait-assisted GEBVs and were higher than between single- and multi-trait GEBVs and when the correlated trait was H2, H3, H4, or A. Higher CIs were observed for the bottom than for the top , likely reflecting the asymmetric distribution of the underlying (Table 2).
Table 2.
Trait | Top | Bottom | ||
---|---|---|---|---|
Multi-trait | Trait-assisted | Multi-trait | Trait-assisted | |
Y a | 0.33 ± 0.02 | 0.34 ± 0.02 | ||
YM | 0.32 ± 0.02 | 0.35 ± 0.02 | 0.33 ± 0.02 | 0.37 ± 0.02 |
YH1 | 0.33 ± 0.02 | 0.36 ± 0.02 | 0.34 ± 0.02 | 0.35 ± 0.02 |
YH2 | 0.35 ± 0.02 | 0.40 ± 0.02 | 0.34 ± 0.02 | 0.40 ± 0.02 |
YH3 | 0.33 ± 0.02 | 0.40 ± 0.02 | 0.34 ± 0.02 | 0.44 ± 0.02 |
YH4 | 0.33 ± 0.02 | 0.39 ± 0.02 | 0.35 ± 0.02 | 0.44 ± 0.02 |
YA | 0.30 ± 0.02 | 0.41 ± 0.02 | 0.35 ± 0.02 | 0.46 ± 0.02 |
Results are shown for a selection intensity of (top and bottom) with standard deviations
aStandard GS model is shown for comparison
We next compared the expected selection accuracy of multi-trait and trait-assisted GS to phenotypic selection and indirect phenotypic selection, given the heritabilities and genetic correlations observed for the focal trait (Y) and the correlated traits (M, H1, H2, H3, H4, A) in this study. Compared to phenotypic selection, multi-trait GS was always less accurate whereas trait-assisted GS was more accurate when using H3, H4 or A as correlated traits (Table 3). Compared to indirect phenotypic selection, both multi-trait and trait-assisted GS were less accurate when the correlated trait had a low genetic correlation with the focal trait (M, H1), and both were less accurate when this genetic correlation was high (H2, H3, H4, A).
Table 3.
Traits | MTA/PS | MTA/IPS | ||
---|---|---|---|---|
Multi-trait | Trait-assisted | Multi-trait | Trait-assisted | |
YM | 0.76 | 0.87 | 1.22 | 1.38 |
YH1 | 0.78 | 0.88 | 1.85 | 2.05 |
YH2 | 0.82 | 0.92 | 0.63 | 0.73 |
YH3 | 0.80 | 1.14 | 0.56 | 0.80 |
YH4 | 0.80 | 1.16 | 0.55 | 0.82 |
YA | 0.73 | 1.18 | 0.47 | 0.76 |
Discussion
In this study, we consider strategies for genomic selection of an expensive, low-heritability focal trait when correlated traits with higher heritability can be measured more easily, cost-effectively, or earlier in the life cycle. These strategies include single- and multi-trait direct and indirect GS, as well as a new approach we call trait-assisted GS.
Single-trait GS
Marker-based prediction relies on good phenotyping, and prediction accuracy generally increases with heritability (Combs and Bernardo 2013). In this study, sorghum biomass yield showed low (0.26) and moderate r (0.40). Similar results have been obtained in other crops such as wheat, where and r of biomass were 0.38 and 0.37, respectively (Combs and Bernardo 2013). In a study conduced by Lehermeier et al. (2014), r for biomass in corn varied from 0.17 in multi-parental to 0.41 in full-sib lines from a dent pool and from 0.30 in multi-parental to 0.48 in full-sib lines of a flint pool. GS offers the potential advantages of increasing selection intensity (Sonesson and Meuwissen 2009; Riedelsheimer et al. 2013) and allowing more selection cycles per unit time, both of which could result in higher genetic gain in comparison with phenotypic selection (Heffner et al. 2010). One previous study performed GS for biomass yield in sorghum (Yu et al. 2016), and found that r ranged from 0.69 using five-fold CV in a training set of 299 lines, to 0.76 in a validation set enriched for predicted-high and predicted-low lines, to 0.56 in an independent panel. The lower value of r in our study perhaps reflects the fact that our panel, while certainly not elite, had been pre-screened to exclude extremes of maturity variation, dwarfism, and lodging.
Height is usually a high-heritability trait (Heffner et al. 2011; Lipka et al. 2014; Burks et al. 2015), and the prediction accuracies of all height measurements except for the first one (H1, at 30 DAP) were higher then . Each height measurement was analyzed individually in addition to the area under growth progress curve (A). The H1 measurement by itself is clearly too early for accurate selection. Interestingly, H3 showed higher and r than H4, possibly due to residual variation in maturity and lodging among genotypes that affected height measurements at the end of the season. The highest and r were obtained for A. Given increasing adoption of high-throughput phenotyping techniques (Araus and Cairns 2014), more work could be done comparing the use of integrated measures such as A with multivariate models that include all individual time points.
Indirect GS
Indirect GS using predictions of H2, H3, H4, or A to predict biomass appears promising, with the A model achieving of the prediction accuracy of the standard, direct GS model (; ). Assuming that equivalent height heritabilities would be obtained from smaller plots, selection intensity and genetic gain could be increased by selecting on height instead of biomass in much larger population at equivalent field cost. An additional consideration in biomass sorghum is that measurement of vegetative biomass yield is incompatible with seed production. Indirect GS using an early-season trait such as H2 could potentially allow time for flowering induction and within-season seed production in selected lines, greatly reducing cycle length.
The failure of multi-trait indirect GS to increase prediction accuracy over single-trait indirect GS is very likely a consequence of the limited number of correlated traits measured in this study. Adding moisture information did not improve the ability of height models to predict biomass yield, but it seems likely that lodging, stand count, and a variety of architectural and spectral traits could be tested for improving multi-trait indirect GS models of biomass yield in sorghum.
Multi-trait and trait-assisted GS
An alternative to indirect GS is to include one or more correlated traits along with the focal trait in a multi-trait model. In this strategy, marker effects for biomass yield are influenced by information from higher heritability traits [Mrode 2014, page 70] such as plant height. Multi-trait GS provided no advantage over standard, single-trait GS in this study, in contrast to several previous results using simulated (Guo et al. 2014; Calus and Veerkamp 2011) and real data (Jia and Jannink 2012; Schulthess et al. 2016), and in agreement with one previous study (dos Santos et al. 2016). Similar to what was obtained by Burgueño et al. (2012) in CV1, this result was somehow expected, since no information is recovered within lines across traits.
Trait-assisted GS is a new strategy in which correlated traits are used along with marker data in the validation panel. In the five-fold cross-validation scheme used in this study, this meant that of the yield data and of the height data were used, along with molecular markers, to predict the remaining of the yield data. Trait-assisted GS yielded dramatic improvements in prediction accuracy over all other GS models, with showing an improvement of over prediction accuracy of Y in single-trait GS. Even and showed a improvement over the standard GS model, which was somewhat surprising given the relatively low genetic correlations of these traits with biomass (Schaeffer 1984; Galesloot et al. 2014). However, models including these traits did not maintain their advantage when the training population was reduced to a size as small as of the dataset (Fig. S1). These results suggest that even traits weakly correlated with a focal trait could be exploited in trait-assisted GS, given a training population of sufficient size.
Two other noteworthy results were obtained using the trait-assisted GS model. First, the standard deviations of the GEBVs were much higher in the trait-assisted models than in other GS models, though still greatly reduced relative to the standard deviations of . Second, the coincidence indices between biomass and GEBVs were also highest for the trait-assisted GS models. These results suggest that differentiation of favorable and unfavorable genotypes is enhanced using trait-assisted GS, facilitating selection in a breeding program (Kadarmideen et al. 2003).
Trait-assisted GS has similarities with both multi-trait and indirect GS, as well as indirect phenotypic selection (IPS). Like IPS, selections are made using direct observation of correlated traits in individuals. Like standard GS, however, trait-assisted GS makes use of focal trait phenotypes in a training population, and genotypes in both training and selection populations, to perform selection. Like multi-trait GS, trait-assisted GS borrows information from correlated traits to inform focal trait marker effects. Trait-assisted GS shares all previously mentioned advantages of indirect (single- and multi-trait) GS for biomass sorghum improvement. However, it seems pointless to exclude focal trait data from a prediction model, as in canonical indirect GS and IPS, even if this data is limited in scope compared to the correlated trait data.
Several limitations of this study also deserve mention. First, Table 3 compares the expected selection accuracy of various strategies, but does not take into account possible differences in cycle length and selection intensity between them. Trait-assisted GS is probably intermediate to standard GS and traditional phenotypic selection in both cycle length and selection intensity. Second, this study used a highly structured pre-breeding population and no attempt was made to account for population structure. Therefore, we can expect that prediction accuracies of all GS models might be inflated relative to what might be observed in an elite population. Third, this study used calculated across multiple years as input for the trait-assisted GS models. In an actual trait-assisted GS scenario in biomass sorghum, a single year of height data might be collected from a selection population, and used along with molecular markers and multiple years of height and yield data in a training population to perform selection.
Trait-assisted GS is probably intermediate to standard GS and traditional phenotypic selection in both cycle length and selection intensity. In biomass sorghum, for example, trait-assisted GS could reduce cycle length by selecting on correlated traits available prior to flowering (eg: H1, H2), and could increase selection intensity by reducing plot size for measurement of correlated traits with higher heritabilities (eg: one-row plots for plant height versus four-row plots for biomass yield).
Conclusion
In this study, we show that phenotypic data on correlated traits in the validation set can be exploited to achieve substantial increases in prediction accuracy in a focal trait. This strategy should be useful whenever correlated traits can be measured earlier or more cheaply than a focal trait. Many plant and animal domesticates take years or decades to mature and allow full evaluation of yield and quality traits, and in these situations trait-assisted GS may allow dramatic increases in prediction accuracy and genetic gain.
Author contribution statement
SBF and KOGD analyzed the data. DFF supported in the statistical analysis. SBF and PJB designed the field trials, collected the phenotypic data and wrote the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgements
This research was supported in part by the Office of Science (BER), U.S. Department of Energy, Grant no. DE-SC0012400. SBF was supported by the Brazilian Federal Agency for the Support and Evaluation of Graduate Education (CAPES), Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMG) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). KOGD was supported by FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo, Grant 2016/12977-7).
Abbreviations
- NPGS
National plant germplasm system
- GS
Genomic selection
- Y
Biomass yield
- M
Moisture
- DAP
Days after planting
- H1
Height at 30 DAP
- H2
Height at 60 DAP
- H3
Height at 90 DAP
- H4
Height at 120 DAP
- AIC
Akaike information criterion
- GBLUP
Genomic best linear unbiased prediction
- BLUP
Best linear unbiased prediction
- A
Area under the growth progress curve
- VCOV
Variance–covariance matrices
- GEBV
Genomic estimated breeding value
- IPS
Indirect phenotypic selection
- MAF
Minor allele frequency
- CI
Coincidence index
Compliance with ethical standards
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethical statement
The experiments were performed according to the current laws of The United States of America.
Footnotes
Electronic supplementary material
The online version of this article (10.1007/s00122-017-3033-y) contains supplementary material, which is available to authorized users.
References
- Araus JL, Cairns JE. Field high-throughput phenotyping: the new crop breeding frontier. Trends Plant Sci. 2014;19(1):52–61. doi: 10.1016/j.tplants.2013.09.008. [DOI] [PubMed] [Google Scholar]
- Browning BL, Browning SR. A fast, powerful method for detecting identity by descent. Am J Hum Genet. 2011;88(2):173–182. doi: 10.1016/j.ajhg.2011.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burgueño J, de los Campos G, Weigel K, Crossa J. Genomic prediction of breeding values when modeling genotype + environment interaction using pedigree and dense molecular markers. Crop Sci. 2012;52(2):707–719. doi: 10.2135/cropsci2011.06.0299. [DOI] [Google Scholar]
- Burks PS, Kaiser CM, Hawkins FM, Brown PJ. Genomewide association for sugar yield in sweet sorghum. Crop Sci. 2015;55(5):2138–2148. doi: 10.2135/cropsci2015.01.0057. [DOI] [Google Scholar]
- Butler DG, Cullis BR, Gilmour AR, Gogel BJ (2009) ASReml-r reference manual. Technical report Queensland Department of Primary Industries and Fisheries
- Calus MPL, Veerkamp RF. Accuracy of multi-trait genomic selection using different methods. Genet Sel Evol. 2011 doi: 10.1186/1297-9686-43-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Castro FMR, Bruzi AT, Nunes JAR, Parrella RAC, Lombardi GMR, Albuquerque CJB, Lopes M. Agronomic and energetic potential of biomass sorghum genotypes. Am J Plant Sci. 2015;6:1862–1873. doi: 10.4236/ajps.2015.611187. [DOI] [Google Scholar]
- Combs E, Bernardo R. Accuracy of genomewide selection for different traits with constant population size, heritability, and number of markers. Plant Genome. 2013;6(1):1–7. doi: 10.3835/plantgenome2012.11.0030. [DOI] [Google Scholar]
- de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics. 2013;193(2):327–345. doi: 10.1534/genetics.112.143313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de los Campos G, Sorensen D, Gianola D. Genomic heritability: what is it? Genetics. 2015;11(5):1–21. doi: 10.1371/journal.pgen.1005048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- dos Santos JPR, Vasconcellos RCC, Pires LPM, Balestre M, Von Pinho RG. Inclusion of dominance effects in the multivariate gblup model. PLoS One. 2016;11(4):1–21. doi: 10.1371/journal.pone.0152045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Endelman JB, Jannink JL. Shrinkage estimation of the realized relationship matrix. G3. 2012;2(11):1405–1413. doi: 10.1534/g3.112.004259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferreira DF (2011) Estatística Multivariada, 2nd edn. UFLA, Lavras, MG, Brazil
- Isik F, Maltecca C, Holland J. Genetic data analysis for plant and animal breeding. New York: Springer International Publishing; 2017. [Google Scholar]
- Galesloot TE, Van Steen K, Kiemeney LALM, Janss LL, Vermeulen SH. A comparison of multivariate genome-wide association methods. PLoS One. 2014;9(4):1–8. doi: 10.1371/journal.pone.0095923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gawenda I, Thorwarth P, Günther T, Ordon F, Schmid KJ. Genome-wide association studies in elite varieties of German winter barley using single-marker and haplotype-based methods. Plant Breed. 2015;134(1):2839. doi: 10.1111/pbr.12237. [DOI] [Google Scholar]
- Gilmour AR, Cullis BR, Verbyla AP. Accounting for natural and extraneuous variation in the analysis of field experiments. J Agric Biol Environ Stat. 1997;2(3):269–273. doi: 10.2307/1400446. [DOI] [Google Scholar]
- Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ, Sun Q, Buckler ES. Tassel-gbs: a high capacity genotyping by sequencing analysis pipeline. PLoS One. 2014;9(2):1–11. doi: 10.1371/journal.pone.0090346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo G, Zhao F, Wang Y, Zhang Y, Du L, Su G. Comparison of single-trait and multiple-trait genomic prediction models. BMC Genet. 2014;15:30. doi: 10.1186/1471-2156-15-30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hamblin J, Zimmermann MJO. Breeding common bean for yield in mixtures. Plant Breed Rev. 1986;4:245–272. [Google Scholar]
- Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME. Genomic selection in dairy cattle: progress and challenges. J Dairy Sci. 2009;92(2):433–443. doi: 10.3168/jds.2008-1646. [DOI] [PubMed] [Google Scholar]
- Hayes BJ, Visscher PM. Increased accuracy of artificial selection by using the realized relationship matrix. Genet Res. 2009;91(1):47–60. doi: 10.1017/S0016672308009981. [DOI] [PubMed] [Google Scholar]
- Heffner EL, Lorenz AJ, Jannink JL, Sorrells ME. Plant breeding with genomic selection: gain per unit time and cost. Crop Sci. 2010;50(5):1681. doi: 10.2135/cropsci2009.11.0662. [DOI] [Google Scholar]
- Heffner EL, Jannink JL, Sorrells ME. Genomic selection accuracy using multifamily prediction models in a wheat breeding program. Plant Genome. 2011;4(1):65–75. doi: 10.3835/plantgenome2010.12.0029. [DOI] [Google Scholar]
- Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31(2):423–447. doi: 10.2307/2529430. [DOI] [PubMed] [Google Scholar]
- Heslot N, Yang HP, Sorrells ME, Jannink JL. Genomic selection in plant breeding: a comparison of models. Crop Sci. 2012;52(1):146–160. doi: 10.2135/cropsci2011.06.0297. [DOI] [Google Scholar]
- Heslot N, Jannink JL, Sorrells ME. Perspectives for genomic selection applications and research in plants. Crop Sci. 2015;55(1):1–12. doi: 10.2135/cropsci2014.03.0249. [DOI] [Google Scholar]
- Jia Y, Jannink JL. Multiple-trait genomic selection methods increase genetic value prediction accuracy. Genetics. 2012;192(4):1513–1522. doi: 10.1534/genetics.112.144246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kadarmideen HN, Thompson R, Coffey MP, Kossaibati MA. Genetic parameters and evaluations from single- and multiple-trait analysis of dairy cow fertility and milk production. Livest Prod Sci. 2003;81(2—-3):183–195. doi: 10.1016/S0301-6226(02)00274-9. [DOI] [Google Scholar]
- Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–360. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lehermeier C, Kramer N, Bauer E, Bauland C, Camisan C, Campo L, Flament P, Melchinger AE, Menz M, Meyer N, Moreau L, Moreno-Gonzalez J, Ouzunova M, Pausch H, Ranc N, Schipprack W, Schonleben M, Walter H, Charcosset A, Schon CC. Usefulness of multiparental populations of maize (Zea mays L.) for genome-based prediction. Genetics. 2014;198(1):3–16. doi: 10.1534/genetics.114.161943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lipka AE, Lu F, Cherney JH, Buckler ES, Casler MD, Costich DE. Accelerating the switchgrass (Panicum virgatum L.) breeding cycle using genomic selection approaches. PloS one. 2014;9(11):e112,227. doi: 10.1371/journal.pone.0112227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Monk RL, Miller FR, McBee GG. Sorghum improvement for energy production. Biomass. 1984;6(1—-2):145–153. doi: 10.1016/0144-4565(84)90017-9. [DOI] [Google Scholar]
- Mrode RA (2014) Linear models for the prediction of animal breeding values, 3rd edn. CABI, Oxfordshire, UK
- R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. Accessed 4 July 2016
- Riedelsheimer C, Endelman JB, Stange M, Sorrells ME, Jannink JL, Melchinger AE. Genomic predictability of interconnected biparental maize populations. Genetics. 2013;194(2):493–503. doi: 10.1534/genetics.113.150227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaeffer LR. Sire and cow evaluation under multiple trait models. J Dairy Sci. 1984;67(7):1567–1580. doi: 10.3168/jds.S0022-0302(84)81479-4. [DOI] [Google Scholar]
- Schulthess AW, Wang Y, Miedaner T, Wilde P, Reif JC. Multiple-trait- and selection indices-genomic predictions for grain yield and protein content in rye for feeding purposes. Theor Appl Genet. 2016;129(2):273–287. doi: 10.1007/s00122-015-2626-6. [DOI] [PubMed] [Google Scholar]
- Shiringani AL, Friedt W. QTL for fibre-related traits in grain sweet sorghum as a tool for the enhancement of sorghum as a biomass crop. Theor Appl Genet. 2011;123(6):999–1011. doi: 10.1007/s00122-011-1642-4. [DOI] [PubMed] [Google Scholar]
- Sonesson AK, Meuwissen THE. Testing strategies for genomic selection in aquaculture breeding programs. Genet Sel Evol. 2009;41(37):1–9. doi: 10.1186/1297-9686-41-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vallée A, van Arendonk JAM, Bovenhuis H. Accuracy of genomic prediction when combining two related crossbred populations. J Anim Sci. 2014;92(10):4342–4348. doi: 10.2527/jas.2014-8109. [DOI] [PubMed] [Google Scholar]
- VanRaden PM. Efficient methods to compute genomic predictions. Am Dairy Sci Assoc. 2008;91:4414–4423. doi: 10.3168/jds.2007-0980. [DOI] [PubMed] [Google Scholar]
- Vermerris W, Saballos A. Genetic enhancement of sorghum for biomass utilization. In: Paterson HA, editor. Genomics of the Saccharinae. New York: Springer; 2013. pp. 391–425. [Google Scholar]
- Yu X, Li X, Guo T, Zhu C, Wu Y, Mitchell SE, Roozeboom KL, Wang D, Wang ML, Pederson GA, Tesso TT, Schnable PS, Bernardo R, Yu J. Global strategy to turbocharge gene banks. Nat Plants. 2016;2(10):1–7. doi: 10.1038/nplants.2016.150. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.