Heritability in Plant Breeding on a Genotype-Difference Basis

Paul Schmidt; Jens Hartung; Jörn Bennewitz; Hans-Peter Piepho

doi:10.1534/genetics.119.302134

. 2019 Jun 27;212(4):991–1008. doi: 10.1534/genetics.119.302134

Heritability in Plant Breeding on a Genotype-Difference Basis

Paul Schmidt ^*, Jens Hartung ^*, Jörn Bennewitz ^†, Hans-Peter Piepho ^*,¹

PMCID: PMC6707473 PMID: 31248886

Abstract

In plant breeding, heritability is often calculated (i) as a measure of precision of trials and/or (ii) to compute the response to selection. It is usually estimated on an entry-mean basis, since the phenotype is usually an aggregated value, as genotypes are replicated in trials, which stands in contrast with animal breeding and human genetics. When this was first proposed, assumptions such as balanced data and independent genotypic effects were made that are often violated in modern plant breeding trials/analyses. Due to this, multiple alternative methods have been proposed, aiming to generalize heritability on an entry-mean basis. In this study, we propose an extension of the concept for heritability on an entry-mean to an entry-difference basis, which allows for more detailed insight and is more meaningful in the context of selection in plant breeding, because the correlation among entry means can be accounted for. We show that under certain circumstances our method reduces to other popular generalized methods for heritability estimation on an entry-mean basis. The approach is exemplified via four examples that show different levels of complexity, where we compare six methods for heritability estimation on an entry-mean basis to our approach (example codes: https://github.com/PaulSchmidtGit/Heritability). Results suggest that heritability on an entry-difference basis is a well-suited alternative for obtaining an overall heritability estimate, and in addition provides one heritability per genotype as well as one per difference between genotypes.

Keywords: heritability, plant breeding, mixed models

THE idea behind measures of heritability as used in plant breeding is relatively simple: they express the proportion of the total phenotypic variance that is attributable to the average effects of genes, which in turn determines the degree of resemblance between relatives (Falconer and Mackay 2005, chapter 10). Lourenço et al. (2017) phrase it as “the extent to which a phenotype is genetically determined.” A phenotype is the composite of an organism’s observable traits. It results from (i) the expression of the organism’s genotype, (ii) the influence of environmental factors, and (iii) the interactions between both. Thus, heritability investigates the relationship between observed/phenotypic values with phenotypic variance $σ_{p}^{2}$ and their respective underlying true genotypic values $(g)$ with genotypic variance $σ_{g}^{2}$ . We can furthermore dissect $g$ and $σ_{g}^{2}$ into additive, dominance, and epistasis components to extract average effects of alleles and breeding values $(a)$ , with variance $σ_{a}^{2}$ . Depending on whether genotypic values or breeding values are considered, we refer to broad-sense heritability $(H^{2})$ or narrow-sense heritability $(h^{2})$ , respectively (e.g., Xu 2013). Accordingly, there is a clear distinction between $H^{2}$ and $h^{2}$ . However, note that the approach presented in this article is relevant for both measures, which is why we will refer to heritability in general throughout this article, unless it is necessary to refer to one of the two specifically. The true genotypic/breeding values (and their variances) are of course unknown, but can be estimated/predicted from phenotypic data.

Piepho and Möhring (2007) pointed out that heritability was originally proposed in the context of animal breeding where the observations are collected from individuals, and are used in mixed models to estimate genetic parameters or breeding values by utilizing the additive genetic relationship of all individuals. It is important to note that in animal breeding, each individual usually has its own and unique genotype. Hence, an animal and a genotype refer to the same thing: a single individual. Naturally, this means that genotypes/animals cannot be truly replicated in or across environments. Instead, the same genotype/animal may only be measured repeatedly over time (Mrode and Thompson 2014, chapter 1.3.2) and/or via its progeny/relatives (Mrode and Thompson 2014, chapter 3.4). Repeated observations of individuals are either modeled by adding a random uncorrelated permanent environment effect to the mixed model or by treating repeated observations as different traits.

However, in plant breeding, we usually have crop cultivars/lines/varieties that are represented by a large group of individual plants with exactly the same genotype. This is mainly because most crop cultivars are clones, inbred lines, or hybrids. Thus, in contrast to animal breeding, a genotype in plant breeding does not refer to just a single individual, but to all individual plants belonging to the same cultivar and thus sharing the same genotype. In other words, a genotype in animal breeding is a single, genetically unique individual, whereas a genotype in plant breeding is usually a group of genetically identical individuals (though there are exceptions; see the Discussion). This allows for true replication of a genotype in and across multiple environments, even at a single time point and thus without repeated measurements in time. Notice that plant breeders may also decide to have repeated measures over time and/or account for observations of related genotypes, but the salient feature of plant genotypes as opposed to animal genotypes is the availability of true replication.

As a consequence, the multiple observations of the same cultivar are usually aggregated to obtain a single phenotypic value. Levels of aggregation range from individual plants to means of many plants, with the same genotype tested across several locations and years in designed experiments. The need for this aggregation in plant breeding results is an additional step that is not necessary in animal breeding. Since we do not consider analyses of individual plants in this article, the phenotypic value for a genotype will always be assumed to be some sort of mean value. In this context, heritability is referred to as heritability on an entry-mean (i.e., genotype-mean) basis, which stands in contrast with heritability in animal breeding and human genetics. This difference can also be deduced from the level of the heritability, which is usually much lower for complex traits in humans and animals compared to the entry-mean heritability of complex traits in plant breeding. Nowadays, the aggregation of the multiple observations into a mean phenotypic value per genotype is usually either done by best linear unbiased estimation (BLUE) or best linear unbiased prediction (BLUP). Note that the use of BLUPs here is not the same as in the framework of breeding value estimation typical for animal breeders (see Materials and Methods). The choice between BLUE and BLUP in plant breeding depends on the goal of the analysis and both are commonly used in practice [see Piepho et al. (2008), for more information and a review on the decision-making].

In early work on measures of heritability for plant breeding trials (e.g., Hanson and Robinson 1963), it was assumed that (i) the trial design is completely balanced/orthogonal, (ii) genotypic effects are independent, and (iii) variances and covariances are constant, so that the arithmetic mean across all observations for a genotype is the natural choice to aggregate to a single phenotypic value. We will from now on refer to such a scenario as the simple, balanced setting. Only in a simple, balanced setting does heritability have multiple direct interpretations. Specifically, $H^{2}$ is (I) the ratio of genetic variance to phenotypic variance, (II) the slope coefficient of a linear regression of the genotypic values on the phenotypic values $[γ_{a p}]$ , (III) the squared correlation between genotypic values and phenotypic values $[r_{a p}^{2}]$ , and (IV) the ratio of response to selection $[Δ G]$ to selection differential $[S]$ [see Falconer and Mackay (2005), pages 160 and 186]. For the simple, balanced setting, these interpretations can be represented by

H^{2} = \overset{I}{\overset{︷}{\frac{σ_{g}^{2}}{σ_{p}^{2}}}} = \overset{I I}{\overset{︷}{γ_{g p}}} = \overset{I I I}{\overset{︷}{r_{g p}^{2}}} = \overset{I V}{\overset{︷}{\frac{Δ G}{S}}},

(1)

where $Δ G$ is often also referred to as the genetic gain and $S$ is the mean phenotypic value of the selected genotypes, expressed as a deviation from the population mean. Note that when heritability is estimated via (1)-IV, it is referred to as the realized heritability, as it requires $Δ G$ to be known, which means that selection has already occurred and that the offspring have been observed. By rewriting (1)-IV, we obtain the more directly interpretable breeder’s equation:

Δ G = H^{2} S,

(2)

which allows the prediction of the expected response to selection and is the key reason why heritability plays such an important role in plant (and animal) breeding. Hanson and Robinson (1963, page 128) get to the heart of it: “Heritability has value primarily as a method of quantifying the concept of whether progress from selection for a plant character is relatively easy or difficult to make in a breeding program. A plant breeder, through experience, can perhaps rate a series of characters on their response to selection. Heritability gives a numerical description of this concept.”

The standard method of estimating $H^{2}$ for phenotypic mean values is by plugging estimates for $σ_{g}^{2}$ and $σ_{p}^{2}$ into (1)-I. Given the simple, balanced setting, these variances can easily be estimated from mean squares and their respective expected mean squares of a conventional ANOVA (Yan 2014, chapter 1). For a single environment, where $n_{g}$ genotypes are tested in $n_{r}$ replicates in a completely randomized design (CRD), the observed data may be modeled as

y_{i k} = μ + g_{i} + ε_{i k},

(3)

where $y_{i k}$ is the $k$ th observation of the $i$ th genotype, $μ$ is the intercept, $g_{i}$ is the effect for the $i$ th genotype, and $ε_{i k}$ is the plot error effect corresponding to $y_{i k}$ . The phenotypic value of genotype $i$ can be obtained as the arithmetic mean across replicates, ${\bar{y}}_{i \cdot}$ , which is also the BLUE in this simple, balanced setting. Assuming that $g_{i}$ and $ε_{i k}$ are random, with independent distribution, zero mean, and variances $σ_{g}^{2}$ and $σ_{ε}^{2}$ , respectively, we can calculate the phenotypic variance of ${\bar{y}}_{i \cdot}$ as:

σ_{p}^{2} = σ_{g}^{2} + \frac{σ_{ε}^{2}}{n_{r}} .

(4)

When genotypes are tested in a multienvironment trial (MET), where an environment denotes a year-by-location combination and CRDs are used at all environments, the observed data may be modeled as

y_{i k t} = μ + g_{i} + e_{t} + {(g e)}_{i t} + ε_{i k t},

(5)

where $y_{i k t}$ is the $k$ th observation of the $i$ th genotype at the $t$ th environment, $μ$ is the intercept, $g_{i}$ is the main effect for the $i$ th genotype, $e_{t}$ is the main effect for the $t$ th environment, ${(g e)}_{i t}$ is the $i t$ th genotype-by-environment interaction effect, and $ε_{i k t}$ is the plot error effect corresponding to $y_{i k t}$ . Again, since we have a simple, balanced setting, we can obtain phenotypic values as ${\bar{y}}_{i \cdot \cdot} = B L U E (μ + g_{i})$ . Assuming that $g_{i}$ , $e_{t}$ , ${(g e)}_{i t}$ , and $ε_{i k t}$ are random, with independent distributions, zero mean, and variances $σ_{g}^{2}$ , $σ_{e}^{2}$ , $σ_{g e}^{2}$ and $σ_{ε}^{2}$ , respectively, one may then calculate the phenotypic variance as

σ_{p}^{2} = σ_{g}^{2} + \frac{σ_{g e}^{2}}{n_{e}} + \frac{σ_{ε}^{2}}{n_{e} n_{r}},

(6)

where $n_{e}$ is the number of environments (Hallauer et al. 2010, page 59). This approach assumes a single variance for genotype-by-environment interactions $(g \times e)$ , even when multiple locations were tested across multiple years. In the latter case, one may instead model the environmental effects via separate year, and location main and interaction effects as

y_{i k m q} = μ + g_{i} + f_{m} + l_{q} + {(f l)}_{m q} + {(g f)}_{i m} + {(g l)}_{i q} + {(g f l)}_{i m q} + ε_{i k m q},

(7)

where $y_{i k m q}$ is the $k$ th observation of the $i$ th genotype in the $m$ th year and the $q$ th location, $μ$ is the intercept, $g_{i}$ is the main effect of the $i$ th genotype, $f_{m}$ is the main effect for the $m$ th year, $l_{q}$ is the main effect for the $q$ th location, ${(f l)}_{m q}$ is the $m q$ th year-by-location interaction effect, ${(g f)}_{i m}$ is the $i m$ th genotype-by-year interaction effect, ${(g l)}_{i q}$ is the $i q$ th genotype-by-location interaction effect, ${(g f l)}_{i m q}$ is the $i m q$ th genotype-by-year-by-location interaction effect, and $ε_{i k m q}$ is the plot error effect corresponding to $y_{i k m q}$ . Once more, we have the phenotypic values for each genotype as ${\bar{y}}_{i \cdot \cdot \cdot} = B L U E (μ + g_{i})$ . Assuming $g_{i}$ , $f_{m}$ , $l_{q}$ , ${(f l)}_{m q}$ , ${(g f)}_{i m}$ , ${(g l)}_{i q}$ , ${(g f l)}_{i m q}$ , and $ε_{i k m q}$ are random, with independent distribution, zero mean, and variances $σ_{g}^{2}$ , $σ_{f}^{2}$ , $σ_{l}^{2}$ , $σ_{f l}^{2}$ , $σ_{g f}^{2}$ , $σ_{g l}^{2}$ , $σ_{g f l}^{2}$ , and $σ_{ε}^{2}$ , respectively, we can calculate the phenotypic variance as:

σ_{p}^{2} = σ_{g}^{2} + \frac{σ_{g l}^{2}}{n_{l}} + \frac{σ_{g f}^{2}}{n_{f}} + \frac{σ_{g f l}^{2}}{n_{l} n_{f}} + \frac{σ_{ε}^{2}}{n_{l} n_{f} n_{r}},

(8)

where $n_{f}$ and $n_{l}$ are the number of years and locations, respectively (Becker 2011; Yan 2014). Note that when a MET is conducted either at multiple locations within a single year or at a single location but across multiple years, (8) simplifies to (6). We will refer to heritability estimates involving (4), (6), or (8) and (1)-I as the standard heritability $(H_{S t d}^{2})$ .

By examining (4), (6), and (8), it can be seen that the calculation of $σ_{p}^{2}$ always involves the genotypic variance and all $g \times e$ variances. However, it does not incorporate purely environmental variance components, such as $σ_{e}^{2}$ in the context of (6), and $σ_{f}^{2}$ , $σ_{l}^{2}$ , and $σ_{f l}^{2}$ in the context of (8) [but see Yan (2014, page 3)]. Hence, referring to $σ_{p}^{2}$ as the phenotypic variance may be considered misleading, as it neglects the purely environmental effects and therefore is definitely not the variance of phenotypic mean values. However, a key feature of the perspective on heritability put forward in this paper is that $σ_{p}^{2}$ , as defined in (3)–(8), actually coincides with half the variance of a difference between two phenotypic mean values in the simple, balanced setting. Taking (5) as an example, the variance of a genotype mean is

v a r ({\bar{y}}_{i \cdot \cdot}) = σ_{g}^{2} + \frac{σ_{e}^{2}}{n_{e}} + \frac{σ_{g e}^{2}}{n_{e}} + \frac{σ_{ε}^{2}}{n_{e} n_{r}} \neq σ_{p}^{2},

(9)

whereas the variance of a difference between means of genotypes $i$ and $j$ is $v a r ({\bar{y}}_{i \cdot \cdot}) + v a r ({\bar{y}}_{j \cdot \cdot}) - 2 c o v ({\bar{y}}_{i \cdot \cdot}, {\bar{y}}_{j \cdot \cdot})$ , which in this simple, balanced setting reduces to

v a r ({\bar{y}}_{i \cdot \cdot} - {\bar{y}}_{j \cdot \cdot}) = 2 (σ_{g}^{2} + \frac{σ_{g e}^{2}}{n_{e}} + \frac{σ_{ε}^{2}}{n_{e} n_{r}}) = 2 σ_{p}^{2} .

(10)

It can be argued that this focus on genotype “differences” makes more sense than a focus on genotype means or effects themselves, since the goal is to select the best-performing genotype(s) to maximize $Δ G$ and the ranking of genotypes is uniquely determined by all pairwise differences, whereas the individual genotypic effects and the absolute (mean) performance level as such do not inform about the ranking (Searle et al. 1992; Piepho et al. 2008). Accordingly, the correct ranking of genotypes and thus the precision of estimating genotype differences is more relevant than the precision of the genotype effect estimates. Furthermore, we found that especially in some older publications, the definitions of heritability refer to differences: Knight (1948) defines heritability as “the portion of the observed variance for which differences in heredity are responsible,” while Hanson and Robinson (1963, page 125) state that the “concept of heritability originated as an attempt to describe whether differences actually observed between individuals arose from the differences in genetic makeup between the individuals or resulted from different environmental forces.” Finally, it is no coincidence that two published alternative heritability estimation methods also involve the variance of a difference between genotypes: Cullis et al. (2006) proposed to calculate heritability as

H_{C u l l i s}^{2} = 1 - \frac{{\bar{v}}_{Δ \cdot \cdot}^{B L U P}}{2 σ_{g}^{2}},

(11)

where ${\bar{v}}_{Δ \cdot \cdot}^{B L U P}$ is the mean variance of a difference of two BLUPs for the genotypic effect and $σ_{g}^{2}$ is the genotypic variance. The BLUE counterpart was suggested by Piepho and Möhring (2007) as

H_{P i e p h o}^{2} = \frac{σ_{g}^{2}}{σ_{g}^{2} + {\bar{v}}_{Δ \cdot \cdot}^{B L U E} / 2},

(12)

where ${\bar{v}}_{Δ \cdot \cdot}^{B L U E}$ is the mean variance of a difference of two genotypic BLUEs and $σ_{g}^{2}$ is the genotypic variance. Both measures reduce to $H_{S t d}^{2}$ in the simple, balanced setting.

These considerations suggest that heritability is best defined in terms of pairwise comparisons among genotypes, and this is in fact the key perspective taken in this paper. We aim to provide two things. First, an elaboration of how a heritability on an entry-difference basis $(H_{Δ}^{2} / h_{Δ}^{2})$ can be defined. Second, a comparison of how the application of $H_{Δ}^{2} / h_{Δ}^{2}$ compares to other published methods for estimating heritability on an entry-mean basis. To do so, we give a derivation of $H_{Δ}^{2}$ and subsequently apply it alongside six other heritability measures in four example analyses with real data.

Materials and Methods

Mixed model theory

We assume a standard linear mixed model for the observed data vector $y$ , which is of the form

y = Xβ + Zu + ε,

(13)

where $β$ and $u$ are vectors of fixed and random effects, respectively, $X$ and $Z$ are the associated design matrices, and $ε$ is a vector of random residual errors. The random effects $u$ and $ε$ are assumed to be independently distributed as $u \sim M V N (0, G)$ and $ε \sim M V N (0, R)$ , such that $y \sim M V N (Xβ, V)$ , where $M V N (., .)$ denotes the multivariate normal distribution with a mean vector given as the first argument and a variance–covariance matrix as the second. To obtain estimates for $β$ as well as predictions for $u$ , the mixed model equations (Henderson 1986; Searle et al. 1992) can be solved as

[\begin{matrix} B L U E (β) \\ B L U P (u) \end{matrix}] = C [\begin{matrix} X^{'} R^{- 1} y \\ Z^{'} R^{- 1} y \end{matrix}],

(14)

where

C = [\begin{matrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{matrix}] = {[\begin{matrix} X^{'} R^{- 1} X & X^{'} R^{- 1} Z \\ Z^{'} R^{- 1} X & Z^{'} R^{- 1} Z + G^{- 1} \end{matrix}]}^{-} .

(15)

Note that we will explicitly use the words “estimate/estimator” and “predict/predictor” for BLUEs and BLUPs, respectively. In the context of variety trials, genotype effects $g$ may be considered to be fixed or random. The choice between these two options depends on the goal of the analysis and both are commonly used in practice [see Piepho et al. (2008) for a review on decision-making as well as Appendix A]. Accordingly, we consider both as complementing cases in this article.

If $g$ is assumed to be a fixed effect, we obtain genotypic BLUEs $({\hat{g}}^{B L U E})$ as a subset of $B L U E (β)$ with $v a r ({\hat{g}}^{B L U E} | g) = C_{11 (g)}$ as the submatrix of $C_{11}$ associated with $g$ . If $g$ is assumed to be a random effect, we obtain genotypic BLUPs $({\hat{g}}^{B L U P})$ as a subset of $B L U P (u)$ with $v a r ({\hat{g}}^{B L U P} | g) = C_{22 (g)}$ as the submatrix of $C_{22}$ associated with $g$ . Note that in the case where $g$ is the only fixed/random effect in the model, its estimators/predictors and their variance matrices are not subsets of, but equal to, $B L U E (β) / B L U P (u)$ and $C_{11} / C_{22}$ , respectively. Notice further that BLUPs can always be computed from entry means (BLUEs) using a stage-wise approach (see Appendix A). Only in the simple, balanced setting is the correlation of genotypic BLUEs and BLUPs equal to unity, and thus the ranking of genotypes does not change. On a side note, it should be mentioned that one may want to use linear combinations of $B L U E (β)$ , such as, e.g., least squares means (SAS Institute Inc. 2013) or estimated marginal means (Searle et al. 2012) as phenotypic mean values instead. While this certainly results in different estimated values and (co)variances, it does not affect the heritability measures proposed in this article. This is because for all methods that rely on ${\hat{g}}^{B L U E}$ , the focus lies on differences, and any additional variance brought in via the linear combination with nongenetic effects cancels out when computing differences.

Example data sets

To exemplify the approach proposed in this article, we use four examples that show different levels of complexity. In examples 1 and 2, a single-environment data set (Wright 2017) is analyzed, while in example 3 a subset (i.e., a single environment) of the MET data set in example 4 (Hadasch et al. 2016) is analyzed.

Example 1:

The R-package agridat (Wright 2017) provides yield data from an oat trial at a single environment. The trial had 24 genotypes and was laid out as an $α$ -design, with three complete replicates and six incomplete blocks of size four within each replicate. Yet, in this first example, we will analyze the data as if the trial were laid out as CRD, and thus ignore the information about the complete replicates and incomplete blocks. Therefore, the model is simply

y_{i k} = μ + g_{i} + ε_{i k},

(16)

Example 2:

Here, we analyzed the same data set as in example 1, but included effects accounting for the trial’s design in the model

y_{i k o} = μ + g_{i} + r_{k} + b_{k o} + ε_{i k o},

(17)

where $y_{i k o}$ is the observation of the $i$ th genotype in the $o$ th block of the $k$ th replicate, $μ$ is the intercept, $g_{i}$ is the effect for the $i$ th genotype, $r_{k}$ is the effect for the $k$ th replicate, $b_{k o}$ is the effect for the $o$ th block in the $k$ th replicate, and $e_{i k o}$ is the plot error effect corresponding to $y_{i k o}$ with $v a r (ε) = I_{72} σ_{ε}^{2}$ . To recover interblock information, $b_{k o}$ was modeled as random with $v a r (b) = I_{18} σ_{b}^{2}$ , whereas $r_{k}$ was taken as fixed.

Example 3:

This data set was taken from Hadasch et al. (2016). It comprises plot data of 89 lettuce varieties (i.e., “geno1”–“geno89”) tested at three environments (i.e., “env1”–“env3”), each laid out as a randomized complete block design (RCBD). The measured trait was resistance to downy mildew scored on a scale ranging from 0 to 5. We note that despite the acknowledged discreteness of this response variable, residual plots did not indicate an appreciable deviation of residuals from the normal distribution. All 89 varieties were genotyped with a total of 300 markers [i.e., 95 single nucleotide polymorphisms and 205 amplified fragment length polymorphism markers, see Hayes et al. (2014) for details] so that a marker matrix $M_{89 \times 300}$ was provided. The biallelic marker $M_{i w}$ for the $i$ th genotype, and the $w$ th marker with alleles $A_{1}$ (i.e., the reference allele) and $A_{2}$ , was coded as 1 for $A_{1} A_{1}$ , −1 for $A_{2} A_{2}$ , and 0 for $A_{1} A_{2}$ and $A_{2} A_{1}$ . The kinship matrix $K$ was obtained as

K = M M^{'} .

(18)

We note that for our purposes, the scaling of M is largely immaterial, but see Appendix D for different scaling options of M and the associated interpretations of genetic variance. In this example, we analyzed only the second environment (“env2”) and thus have 89 genotypes at a single environment laid out as RCBD with three replicates, which we modeled as

y_{i k} = μ + a_{i} + r_{k} + ε_{i k},

(19)

where $y_{i k}$ is the observation of the $i$ th genotype in the $k$ th replicate, $μ$ is the intercept, $a_{i}$ is the additive effect for the $i$ th genotype, $r_{k}$ is the effect for the $k$ th replicate, and $ε_{i k}$ is the plot error effect corresponding to $y_{i k}$ with $v a r (ε) = I_{267} σ_{ε}^{2}$ . For simplicity, we assume here that there are no genetic effects for dominance or epistasis, but an extension of the model to cover this case is straightforward (Wellmann and Bennewitz 2011; Dias et al. 2018; Viana et al. 2018).

Example 4:

In contrast to example 3, we here analyzed all three environments jointly. The data set is not completely balanced, as (i) environment env1 has two complete replicates, while environments env2 and env3 have three, (ii) there are no observations for genotypes geno38 and geno49 at environment env1, and (iii) there are no observations for genotype geno81 at either env1 or env3. The linear mixed model used for this analysis was

y_{i k t} = μ + a_{i} + e_{t} + {(g e)}_{i t} + r_{k t} + ε_{i k t},

(20)

where $y_{i k t}$ is the observation of the $i$ th genotype in the $k$ th replicate at the $t$ th environment, $μ$ is the intercept, $a_{i}$ is the additive effect for the $i$ th genotype, $e_{t}$ is the effect for the $t$ th environment, ${(g e)}_{i t}$ is the $i t$ th genotype-by-environment interaction, $r_{k t}$ is the effect for the $k$ th replicate at the $t$ th environment, and $ε_{i k t}$ is the plot error effect corresponding to $y_{i k t}$ . We allowed for heterogeneous error variances between environments, assuming $v a r (ε) = \oplus_{t = 1}^{n_{e}} I_{n_{obs (t)}} σ_{ε_{t}}^{2}$ , where $\oplus$ is the direct sum operator, and $n_{o b s (t)}$ and $σ_{ε_{t}}^{2}$ are the number of observations and plot error variance at the $t$ th environment, respectively. Finally, $e_{t}$ and $r_{k}$ were taken as fixed effects, while ${(g e)}_{i t}$ was taken as random with $v a r (ge) = I_{263} σ_{g e}^{2}$ .

Modeling of genotypic main effect in all examples:

Note that some of the heritability measures used in this article require estimates from both the model with a fixed genotype main effect as well as the model with a random genotype main effect. For example, $H_{P i e p h o}^{2}$ in (12) needs an estimate for ${\bar{v}}_{Δ \cdot \cdot}^{B L U E}$ as well as for $σ_{g}^{2}$ . Thus, in this article we always fit both potential models in parallel, and accordingly obtain both ${\hat{g}}^{B L U E}$ and ${\hat{g}}^{B L U P}$ for the same example. For models where $g$ is assumed to be a random effect, we assume $v a r (g) = I_{n_{g}} σ_{g}^{2}$ . Additionally, we make use of the marker information in examples 3 and 4, and thus estimate additive genotypic effects $a$ by assuming $v a r (a) = K σ_{a}^{2}$ .

Further, notice that for each example the variance components in the model with the fixed genotype main effect are fixed to the corresponding estimates in the model where $g$ is assumed to be a random effect. This is done to rule out the potential influence of an estimation error on variance component estimation between the two models (Schmidt et al. 2019).

Pairwise heritability

As shown in (1), heritability can be interpreted and thus estimated in multiple ways. To derive a pairwise heritability or heritability on an entry-difference basis $(H_{Δ}^{2} / h_{Δ}^{2})$ , we can exploit and slightly alter (1)-III so that it applies to phenotypic and genotypic differences instead of their respective mean values. We here first derive $H_{Δ}^{2}$ under the assumption of independent genotypic effects with constant variance and afterward generalize $H_{Δ}^{2}$ to correlated genotypic effects with nonconstant (co)variances. For both scenarios, we differentiate between ${\hat{g}}^{B L U E}$ and ${\hat{g}}^{B L U P}$ .

Independent genotypic effects with constant variance:

Given independent genotypic effects with constant variance $σ_{g}^{2}$ (as in the simple, balanced setting), and irrespective of whether ${\hat{g}}^{B L U E}$ or ${\hat{g}}^{B L U P}$ is used, the variance of the true difference between genotypes $i$ and $j$ is

v a r (g_{i} - g_{j}) = 2 σ_{g}^{2} .

(21)

As stated before, the conditional variance, or prediction error variance, of the predicted BLUPs given the observed data is $v a r ({\hat{g}}^{B L U P} | g) = C_{22 (g)}$ . However, notice that the marginal variance of ${\hat{g}}^{B L U P}$ is $v a r ({\hat{g}}^{B L U P}) = G_{(g)} - C_{22 (g)}$ , where $G_{(g)}$ is the submatrix of $G$ associated with $g$ (see Searle et al. 1992, chapter 7.4.d.). As a result, we find that the variance of a difference between two BLUPs of genotypes $i$ and $j$ is

v a r ({\hat{g}}_{i}^{B L U P} - {\hat{g}}_{j}^{B L U P}) = 2 σ_{g}^{2} - v_{Δ i j}^{B L U P},

(22)

where $v_{Δ i j}^{B L U P}$ is the prediction error variance of a difference between BLUPs of genotypes $i$ and $j$ , and can be obtained via $C_{22 (g)}$ . Furthermore, we find the covariance to be

c o v (g_{i} - g_{j}, {\hat{g}}_{i}^{B L U P} - {\hat{g}}_{j}^{B L U P}) = 2 σ_{g}^{2} - v_{Δ i j}^{B L U P} .

(23)

Hence, making use of (21), (22), and (23), the squared correlation between the true difference and its predictor, which is the heritability of the predictor of the difference $(g_{i} - g_{j})$ , is found to be

H_{Δ i j_{B L U P}}^{2} = {(\frac{2 σ_{g}^{2} - v_{Δ i j}^{B L U P}}{\sqrt{2 σ_{g}^{2} (2 σ_{g}^{2} - v_{Δ i j}^{B L U P})}})}^{2} = \frac{2 σ_{g}^{2} - v_{Δ i j}^{B L U P}}{2 σ_{g}^{2}} .

(24)

This pairwise heritability is a quantity that explicitly accounts for unbalanced data, is analogous to the coefficient of determination (CD) (Piepho 2019), and can be reported in its own right. Yet, as the number of genotypes increases, the number of genotype differences $n_{Δ}$ can quickly get very large $(n_{Δ} = n_{g} [n_{g} - 1] / 2)$ . Thus, it is not a convenient statistic to report. However, the statistic can be averaged per genotype, so that a genotype-specific average heritability is obtained, i.e.

{\bar{H}}_{Δ i \cdot_{B L U P}}^{2} = \frac{1}{n_{g} - 1} \sum_{j \neq i} H_{Δ i j_{B L U P}}^{2} .

(25)

Note that (25) is similar in spirit to the repeatability for an individual genotype’s BLUP as is commonly reported in animal breeding (Laloë 1993; Mrode and Thompson 2014). Finally, if we go further and instead average (24) across all pairs of genotypes, we obtain a single average pairwise heritability as

{\bar{H}}_{Δ \cdot \cdot_{B L U P}}^{2} = \frac{2}{n_{g} (n_{g} - 1)} \sum_{i} \sum_{j < i} H_{Δ i j_{B L U P}}^{2} .

(26)

Hence, ${\bar{H}}_{Δ \cdot \cdot_{B L U P}}^{2}$ has the interpretation of an arithmetic mean of all pairwise heritabilities $H_{Δ i j_{B L U P}}^{2}$ and is (given independent genotypic effects with constant variance) the algebraically equivalent simplification of $H_{C u l l i s}^{2}$ , since the pairwise heritability in (24) has a constant denominator.

If ${\hat{g}}^{B L U E}$ is used, we take the marginal variance of a difference between two BLUEs of genotypes $i$ and $j$ as

v a r ({\hat{g}}_{i}^{B L U E} - {\hat{g}}_{j}^{B L U E}) = 2 σ_{g}^{2} + v_{Δ i j}^{B L U E},

(27)

where $v_{Δ i j}^{B L U E}$ is the variance of a difference between BLUEs of genotypes $i$ and $j$ , and can be obtained via $C_{11 (g)}$ . Moreover,

c o v (g_{i} - g_{j}, {\hat{g}}_{i}^{B L U E} - {\hat{g}}_{j}^{B L U E}) = 2 σ_{g}^{2} .

(28)

Analogous to (24), only this time applying (21), (27), and (28), the squared correlation between the true difference and its estimator, which is the heritability of the estimator of the difference $(g_{i} - g_{j})$ , is found to be

H_{Δ i j_{B L U E}}^{2} = {(\frac{2 σ_{g}^{2}}{\sqrt{2 σ_{g}^{2} (2 σ_{g}^{2} + v_{Δ i j}^{B L U E})}})}^{2} = \frac{2 σ_{g}^{2}}{2 σ_{g}^{2} + v_{Δ i j}^{B L U E}} .

(29)

In accordance with (25) and (26), we could now take the arithmetic mean of $H_{Δ i j_{B L U E}}^{2}$ per genotype and across all pairs. However, the inconvenient difference compared to ${\bar{H}}_{Δ \cdot \cdot_{B L U P}}^{2}$ in (24) is that the pairwise heritabilities in (29) do not have a constant denominator. As a result, taking the arithmetic mean across all $H_{Δ i j_{B L U E}}^{2}$ does not lead to the algebraically equivalent simplification of $H_{P i e p h o}^{2}$ . However, if we take the harmonic mean instead of the arithmetic mean for all pairwise heritabilities as

{\bar{H}}_{Δ \cdot \cdot_{B L U E}}^{2} = \frac{n_{g} (n_{g} - 1)}{2 \sum_{i} \sum_{i < j} \frac{1}{H_{Δ i j_{B L U E}}^{2}}},

(30)

the resulting average does indeed coincide with $H_{P i e p h o}^{2}$ . Correspondingly, we take the harmonic mean per genotype as:

{\bar{H}}_{Δ i \cdot_{B L U E}}^{2} = \frac{n_{g} - 1}{\sum_{i < j} \frac{1}{H_{Δ i j_{B L U E}}^{2}}} .

(31)

Notice that since we here assume independent genotypic effects with constant variance, both ${\bar{H}}_{Δ \cdot \cdot_{B L U P}}^{2}$ and ${\bar{H}}_{Δ \cdot \cdot_{B L U E}}^{2}$ can, in fact, also be obtained by averaging the numerator and denominator in (24) and (29) separately across pairs (see Appendix B).

Generalization to correlated genotypic effects with nonconstant variances:

So far, we have assumed independent genotypic effects with constant variance $σ_{g}^{2}$ . Yet, the approach outlined above naturally extends to the case where effects have nonconstant variance, (i.e., $v a r (g_{i}) = σ_{g (i, i)}^{2}$ ) and/or are correlated (i.e., $c o v (g_{i}, g_{j}) = σ_{g (i, j)}$ ), e.g., due to pedigree- or marker-based kinship. The variance of the true difference between genotypes $i$ and $j$ then becomes

v a r (g_{i} - g_{j}) = σ_{g (i, i)}^{2} + σ_{g (j, j)}^{2} - 2 σ_{g (i, j)} .

(32)

When BLUP is used, the pairwise heritability in (24) accordingly generalizes to

H_{Δ i j_{B L U P}}^{2} = \frac{v a r (g_{i} - g_{j}) - v_{Δ i j}^{B L U P}}{v a r (g_{i} - g_{j})} = \frac{σ_{g (i, i)}^{2} + σ_{g (j, j)}^{2} - 2 σ_{g (i, j)} - v_{Δ i j}^{B L U P}}{σ_{g (i, i)}^{2} + σ_{g (j, j)}^{2} - 2 σ_{g (i, j)}} .

(33)

When BLUE is used, the pairwise heritability in (29) generalizes to

H_{Δ i j_{B L U E}}^{2} = \frac{v a r (g_{i} - g_{j})}{v a r (g_{i} - g_{j}) + v_{Δ i j}^{B L U E}} = \frac{σ_{g (i, i)}^{2} + σ_{g (j, j)}^{2} - 2 σ_{g (i, j)}}{σ_{g (i, i)}^{2} + σ_{g (j, j)}^{2} - 2 σ_{g (i, j)} + v_{Δ i j}^{B L U E}} .

(34)

These pairwise heritabilities can be averaged as before, i.e., as per Equations (25) and (26) for BLUP, and Equations (31) and (30) for BLUE, to obtain $H_{Δ i \cdot}^{2}$ and $H_{Δ \cdot \cdot}^{2}$ . Unfortunately, the average over all pairwise heritabilities cannot generally be simplified algebraically for either BLUE or BLUP, since in both cases neither the denominator nor the numerator of the pairwise heritability is constant. But a simplification is forthcoming if, as an approximation, we average the numerator and denominator of the pairwise heritability separately (see Appendix C).

Other proposals to compute heritability

Besides $H_{S t d}^{2}$ , $H_{C u l l i s}^{2}$ (11), and $H_{P i e p h o}^{2}$ (12), we calculated three additional generalized heritability measures that are in common usage.

H² Oakey:

An alternative measure of heritability, which is sometimes referred to as the generalized heritability, can be traced back to Laloë (1993) and was proposed by Oakey et al. (2006) in the context of plant breeding. It has gained popularity due to its ability to account for heterozygosity/covariances in $G_{(g)}$ (e.g., Mathews et al. 2008; Rodríguez-Álvarez et al. 2018). In this approach, contrasts of the true and predicted genotypic effects are defined as $c^{'} g$ and $c^{'} {\hat{g}}^{B L U P}$ , respectively. Note that, unlike for $H_{Δ}^{2}$ where all differences/contrasts between genotype pairs are considered, the contrast vector $c$ can be any linear combination of genotypic effects where the elements of $c$ sum to 0. This allows different genotypic effects to have different heritabilities, while at the same time reducing to a single scalar quantity. Similar to our derivation, they make use of (1)-III, with the difference that it applies to $c^{'} g$ and $c^{'} {\hat{g}}^{B L U P}$ , and thus represents the squared correlation between the true and predicted genotypic contrasts defined by $c$ . The vector $c$ is then chosen so that this squared correlation (i.e., the heritability) is maximized. Components of the full heritability are defined as

λ = {(\frac{c o v (c^{'} g, c^{'} {\hat{g}}^{B L U P})}{\sqrt{v a r (c^{'} g) v a r (c^{'} {\hat{g}}^{B L U P})}})}^{2} = \frac{c^{'} G_{(g)} (I_{n_{g}} - G_{(g)}^{- 1} C_{22 (g)}) c}{c^{'} G_{(g)} c} = \frac{c^{'} G_{(g)} Dc}{c^{'} G_{(g)} c},

(35)

where $c$ is an eigenvector of the matrix $D = I_{n_{g}} - G_{(g)}^{- 1} C_{22 (g)}$ with constraint $c^{'} G_{(g)} c = 1$ and $λ$ is the associated eigenvalue. The largest among the eigenvalues, $λ_{ζ}$ $(ζ = {1, \dots, n_{λ}})$ , equals the maximized squared correlation of $c^{'} g$ and $c^{'} {\hat{g}}^{B L U P}$ . Note that $n_{λ} = n_{g}$ and due to constraints on ${\hat{g}}^{B L U P}$ , $n_{z} < n_{g}$ eigenvalues will be zero. The generalized heritability is then defined as the mean of all nonzero eigenvalues:

H_{O a k e y}^{2} = \frac{\sum_{ζ = 1}^{n_{g}} λ_{ζ}}{n_{g} - n_{z}} = \frac{\sum_{ζ = n_{z} + 1}^{n_{g}} λ_{ζ}}{n_{g} - n_{z}} .

(36)

In more recent work, Rodríguez-Álvarez et al. (2018) show how random spatial variation in plant breeding experiments can be analyzed using tensor product P-splines, which is a two-dimensional smoothing technique. In this context, they denote the trace of $D$ as the effective dimension of the genotypic effects and reexpress $H_{O a k e y}^{2}$ as:

H_{O a k e y}^{2} = \frac{t r a c e (D)}{n_{g} - n_{z}} .

(37)

They point out that the notion of effective dimension is well known in the smoothing context, where it can be interpreted as a complexity measure for a given model and its component effects.

Simulated:

This method was proposed by Piepho and Möhring (2007), and is also based on the squared correlation between $g$ and ${\hat{g}}^{B L U P}$ in (1)-III. It first defines the variance–covariance matrix of all random effects $u$ and the target genotypic effects $g$ as

v a r (\begin{matrix} g \\ u \end{matrix}) = (\begin{matrix} G_{(g)} & U \\ U^{'} & G \end{matrix}),

(38)

so that $U = c o v (g, u)$ . They then define $Ω$ as the variance–covariance matrix of the joint distribution of the true and predicted genotypic effects:

Ω = v a r (\begin{matrix} g \\ {\hat{g}}^{B L U P} \end{matrix}) = (\begin{matrix} G_{(g)} & F G^{- 1} (G - C_{22}) G^{- 1} F^{'} \\ F G^{- 1} (G - C_{22}) G^{- 1} F^{'} & F G^{- 1} (G - C_{22}) G^{- 1} F^{'} \end{matrix}),

(39)

where $F = (\begin{matrix} G_{(g)} & U \end{matrix})$ . Subsequently, $Ω$ can be decomposed as $Ω = Γ Γ^{'}$ by singular value decomposition or Cholesky decomposition. Finally, values for $g$ and ${\hat{g}}^{B L U P}$ can be simulated (i.e., $g_{sim}$ and ${\hat{g}}_{sim}^{BLUP}$ ) for an experiment, with the same design and genotypic relationships as those underlying the actual data $y$ , as

d_{sim} = (\begin{matrix} g_{sim} \\ {\hat{g}}_{sim}^{BLUP} \end{matrix}) = Γ z_{sim},

(40)

where $z_{sim}$ is a vector with $2 n_{g}$ simulated random independent standard normal deviates. Thus, for each of the $n_{s i m} = {1, \dots, s}$ simulation runs, a new vector $z_{sim}$ of independent standard normal deviates is randomly generated, so that we can obtain $d_{sim}$ , and compute the squared sample correlation $(r_{s}^{2})$ of $g_{sim}$ and ${\hat{g}}_{sim}^{BLUP}$ . By running a large number of simulations, we can obtain the simulated expected squared correlation of predicted and true genotypic effects as

H_{s i m}^{2} = \frac{1}{n_{s i m}} \sum_{s = 1}^{n_{s i m}} r_{s}^{2} .

(41)

Notice that even in the simple, balanced setting, this method is not exactly identical to $H_{S t d}^{2}$ , but it is asymptotically equivalent for an increasing number of genotypes:

H_{s i m}^{2} = E (r_{g, \hat{g}}^{2}) \approx {(\frac{c o v (g, {\hat{g}}^{B L U P})}{\sqrt{v a r (g) v a r ({\hat{g}}^{B L U P})}})}^{2} = H_{S t d}^{2} .

(42)

It can be argued that $H_{s i m}^{2}$ is preferable over, e.g., $H_{P i e p h o}^{2}$ or $H_{C u l l i s}^{2}$ , as it captures the entire variance–covariance structure in the data in a more comprehensive manner (Piepho and Möhring 2007; Schmidt et al. 2019). Further notice that this approach also allows direct simulation of the response to selection (see Appendix E).

Reliability:

In animal breeding and thus mostly in the context of a single observation per genotype, the reliability $(r^{2})$ , also known as the CD, is a popular statistic expressing the squared correlation between predicted and true breeding value, and is closely related to heritability (Laloë et al. 1996; Laloë and Phocas 2003; Kuehn et al. 2007; Piepho 2019). We can estimate the reliability for the (genotypic/breeding value of the) $i$ th entry as

r_{i}^{2} = 1 - \frac{v a r ({\hat{g}}_{i}^{B L U P})}{v a r (g_{i})},

(43)

where $v a r ({\hat{g}}_{i}^{B L U P})$ is the $i$ th diagonal element of the $C_{22 (g)}$ matrix and $v a r (g_{i})$ is $i$ th diagonal element of the $G_{(g)}$ matrix (Mrode and Thompson 2014, chapter 9.3.4.). Accordingly, $r_{i}^{2}$ does not account for the off-diagonal elements of either $C_{22 (g)}$ or $G_{(g)}$ [but see Mrode and Thompson (2014), Appendix D]. We can further obtain the mean reliability as

{\bar{r}}_{\cdot}^{2} = \frac{1}{n_{g}} \sum_{i = 1}^{n_{g}} r_{i}^{2} .

(44)

Computation of heritability measures:

As summarized in Table 1, we computed eight overall heritability measures ( $H_{S t d}^{2}$ , ${\bar{H}}_{Δ \cdot \cdot_{B L U E}}^{2}$ , $H_{P i e p h o}^{2}$ , ${\bar{H}}_{Δ \cdot \cdot_{B L U P}}^{2}$ , $H_{C u l l i s}^{2}$ , $H_{O a k e y}^{2}$ , ${\bar{r}}_{\cdot}^{2}$ , and $H_{S i m}^{2}$ ), three genotype-wise heritability measures ( ${\bar{H}}_{Δ i \cdot_{B L U E}}^{2}$ , ${\bar{H}}_{Δ i \cdot_{B L U P}}^{2}$ , and $r_{i}^{2}$ ), and two pair-wise heritability measures ( $H_{Δ i j_{B L U E}}^{2}$ and $H_{Δ i j_{B L U P}}^{2}$ ) for each of the four examples. When assuming $v a r (a) = K σ_{a}^{2}$ for examples 3 and 4, we are estimating narrow-sense heritabilities, otherwise broad-sense heritabilities were computed. The former is not possible for $H_{S t d}^{2}$ , $H_{C u l l i s}^{2}$ , and $H_{P i e p h o}^{2}$ , however, as these measures implicitly assume a constant genotypic variance and no covariances. We therefore based estimates of these heritability measures on the alternative model that assumes $v a r (g) = I_{n_{g}} σ_{g}^{2}$ and accordingly completely ignores marker information. Finally, we used an ad hoc solution to compute $H_{S t d}^{2}$ via (6) in example 4, where we had slightly unbalanced data and environment-specific error variance estimates. Although only 86 out of 89 genotypes were present at all three environments, and one environment only had two replicates instead of three, we considered $n_{e} = n_{r} = 3.$ Thus, we took the respective maximum value, which is often done in practice. Furthermore, we computed the mean error variance as $σ_{ε}^{2} = \sum σ_{ε_{t}}^{2} / n_{e}$ .

Table 1. Summary of heritability measures.

Heritability measure			Accounts for heteroscedasticity/covariances
Overall	Per genotype	Per pair	In $G_{(g)}$	In $C_{11 (g)}$	In $C_{22 (g)}$
$H_{S t d}^{2}$	—	—	No/no	No/no	—
${\bar{H}}_{Δ \cdot \cdot_{B L U E}}^{2}$	${\bar{H}}_{Δ i \cdot_{B L U E}}^{2}$	$H_{Δ i j_{B L U E}}^{2}$	Yes/yes	Yes/yes	—
$H_{P i e p h o}^{2}$	—	—	No/no	Yes/yes	—
${\bar{H}}_{Δ \cdot \cdot_{B L U P}}^{2}$	${\bar{H}}_{Δ i \cdot_{B L U P}}^{2}$	$H_{Δ i j_{B L U P}}^{2}$	Yes/yes	—	Yes/yes
$H_{C u l l i s}^{2}$	—	—	No/no	—	Yes/yes
$H_{O a k e y}^{2}$	—	—	Yes/yes	—	Yes/yes
${\bar{r}}_{\cdot}^{2}$	$r_{i}^{2}$	—	Yes/no	—	Yes/no
$H_{s i m}^{2}$	—	—	Yes/yes	—	Yes/yes

Open in a new tab

Software:

All statistical analyses were done by REML with the mixed model package ASReml-R version 3.0 (Gilmour et al. 2009) in the R Statistical Computing Environment (R Core Team 2015) and SAS 9.4 (SAS Institute Inc. 2013).

Data availability

The data used in examples 1 and 2 are the “john.alpha” data set provided in the R package agridat (Wright 2017), which is available at https://cran.r-project.org/web/packages/agridat/index.html. Supplemental material and the data used in examples 3 and 4 are available at https://figshare.com/articles/Lettuce_trial_phenotypic_and_marker_data_/8299493. R code is available at https://github.com/PaulSchmidtGit/Heritability.

Results

In this section, we present overall heritability estimates first, followed by genotype-wise and finally pairwise heritability estimates. The results are accompanied by minimal interpretations, whereas an elaboration on general reasons for differences between estimates of the different methods is provided in the Discussion.

Overall heritability

Figure 1 shows estimates for overall heritability obtained via all eight methods for all examples. As expected for the simple, balanced setting of example 1, all estimates are identical, except for the slightly smaller estimates obtained for ${\bar{r}}_{\cdot}^{2}$ and $H_{s i m}^{2}$ . A similar picture is found for example 2, with the exception of $H_{S t d}^{2}$ displaying a notably larger estimate than the rest, which is due to the method’s inability to account for the variance of the random incomplete block effect. Examples 3 and 4 show a more heterogeneous picture, with $h_{O a k e y}^{2}$ displaying estimates that are strikingly lower than the rest.

Genotype-wise heritability

In example 1, estimates for the pairwise heritability measures $H_{Δ i j_{B L U E}}^{2}$ and $H_{Δ i j_{B L U P}}^{2}$ were constant across all pairs (≈0.580), and hence equal to ${\bar{H}}_{Δ i \cdot_{B L U E}}^{2}$ , ${\bar{H}}_{Δ i \cdot_{B L U P}}^{2}$ , ${\bar{H}}_{Δ \cdot \cdot_{B L U E}}^{2}$ , and ${\bar{H}}_{Δ \cdot \cdot_{B L U P}}^{2}$ , which in turn were equal to $H_{S t d}^{2}$ , $H_{C u l l i s}^{2}$ , and $H_{O a k e y}^{2}$ . Analogously, the slightly lower $r_{i}^{2}$ estimates were also constant (≈0.556) across genotypes and, therefore, equal to ${\bar{r}}_{\cdot}^{2}$ .

In example 2, estimates for genotype-wise heritability measures were no longer constant, neither between nor within each method. Instead, each method found two unique, but relatively similar, estimates and each one for half of the genotypes, respectively, i.e., 0.80792 and 0.80801 for ${\bar{H}}_{Δ i \cdot_{B L U E}}^{2}$ , 0.80911 and 0.80916 for ${\bar{H}}_{Δ i \cdot_{B L U P}}^{2}$ , and 0.77537 and 0.77547 for $r_{i}^{2}$ .

Figure 2 details the heritability estimates per genotype for examples 3 and 4. It can be seen that the estimates are much more heterogeneous compared to examples 1 and 2. In fact, all of the 89 genotype-wise estimates per method were different from each other in both example 3 and example 4. In both of the latter, and irrespective of the heritability method, genotypes geno49, geno82, and geno88 showed noticeably lower estimates than the rest. This is related to the fact that the additive variances of these three genotypes stand out with a size of only ≈30% of the average (results not shown). We additionally found relatively low estimates for geno49 and geno81 (i.e., two of the genotypes with missing observations) in example 4 (Figure 2).

Summary plot for genotype-wise heritability estimates for example 3 (left) and example 4 (right). Plots on the bottom left show scatter plots with simple linear regression lines and genotype-labels for outlying estimates. The plots on the diagonal show histograms for the respective column. The upper right corner displays the Pearson correlation estimate. BLUE, best linear unbiased estimation; BLUP, best linear unbiased prediction; Corr, correlation.

In example 3, estimates between the three methods all showed correlations larger than 0.9, with ${\bar{h}}_{Δ i \cdot_{B L U P}}^{2}$ and $r_{i}^{2}$ displaying the highest correlation of ∼0.959. In example 4, the correlation between these two was still high (≈0.932), while the other two correlation estimates dropped below 0.76 (Figure 2).

Pairwise heritability

Figure 3 details the estimated pairwise heritability measures for all examples. As mentioned above, estimates for $H_{Δ i j_{B L U E}}^{2}$ and $H_{Δ i j_{B L U P}}^{2}$ in example 1 were constant across all $n_{Δ} = 276$ genotype pairs (≈0.580). In example 2, estimates for each method split into two clusters, respectively, yet even the overall estimate ranges were relatively small (0.802–0.817 for $H_{Δ i j_{B L U E}}^{2}$ and 0.803–0.818 for $H_{Δ i j_{B L U P}}^{2}$ ). The heterogeneous estimates in example 2 are due to the recovery of interblock information via the random incomplete block effect in (17). It results from the varying number of times two genotypes (or even their neighboring genotypes) appear together in the same incomplete block and thus can be compared directly (John and Williams 1995).

Estimates in example 3 mostly form a single cluster for both methods, respectively, ranging from ∼0.70–0.88 for $H_{Δ i j_{B L U E}}^{2}$ and 0.80–0.92 for $H_{Δ i j_{B L U P}}^{2}$ , so that the cluster is located left of/above the diagonal in Figure 3. There are three exceptionally low estimates for both methods, which are those of the differences geno49–geno82, geno49–geno88, and geno82–geno88, i.e., the three genotypes with relatively small additive variances as mentioned before. Generally speaking, genotype pairs with a lower additive genotypic covariance $c o v (g_{i}, g_{j})$ tend to show higher estimates for both $H_{Δ i j_{B L U E}}^{2}$ and $H_{Δ i j_{B L U P}}^{2}$ (Figure 3).

In most aspects, results from example 4 are similar to those in example 3. The most striking difference is the second cluster of lower pairwise heritability estimates. As highlighted in Figure 3, this cluster consists only of the 88 differences from/to genotype geno81, i.e., the genotype present only at a single of the three environments. Notice that while not highlighted, the differences from/to the two genotypes present at only two of the environments (i.e., geno38 and geno49) also form similar clusters, but they are merely located at the lower left border of the main cluster and thus do not stand out as drastically. Furthermore, the main cluster here lies more or less on the diagonal. Finally, the range of estimates for $c o v (g_{i}, g_{j})$ is smaller than in example 3. This is mostly due to the $g \times e$ variance that for example 4 in (20) could be estimated separately via the genotype-by-environment interaction effect ${(g e)}_{i t}$ as $v a r (ge) = I_{263} σ_{g e}^{2}$ , whereas fo example 3 the $g \times e$ variance is confounded within the variance of the genotype main effect, since there is no genotype-by-environment interaction effect in (19), as data from only a single environment are present.

Discussion

Variations of the mixed model for replicated plant data

A consequence of having multiple individuals represent the same genotype is that the definition of the genetic variance $σ_{g}^{2}$ requires careful consideration; rather unproblematic are genetically completely homogeneous cultivars such as inbred lines, doubled-haploid (DH) lines, clones, or hybrids. Yet, it should be noted that when it comes to genetically heterogeneous population cultivars in rye, for example, the matter is no longer straightforward (Bernal-Vasquez et al. 2017). Note that estimates for ${\bar{h}}_{Δ \cdot \cdot_{B L U E}}^{2}$ , ${\bar{h}}_{Δ \cdot \cdot_{B L U P}}^{2}$ increased with decreasing covariance $c o v (g_{i}, g_{j})$ as seen in the Results section for pairwise heritability. Note further that when we assumed $v a r (a) = K σ_{a}^{2}$ for the (additive) genotype main effect in the analysis of example 4, for simplicity we did not correspondingly make use of $K$ for modeling the genotype–environment interaction effect, but instead assumed $v a r (ge) = I_{263} σ_{g e}^{2}$ . Yet, to account for marker-by-environment interaction, one may instead assume e.g., $v a r (ge) = K \otimes I_{3} σ_{g e}^{2}$ , where $\otimes$ denotes the Kronecker product. Although Schulz-Streeck et al. (2012) found that the improvement of accounting for marker-by-environment interaction can be negligible, the reader may easily decide to include $K$ for random genotype-interaction effects and it may actually be advisable if the variances for those effects are expected to be relatively large (Bernal-Vasquez et al. 2017).

Relatedness of heritability methods

When both $G_{(g)}$ and $C_{22 (g)}$ are proportional to identity matrices, all heritability measures give identical estimates. However, this only holds for a linear random model, since even a single fixed effect [e.g., $μ$ in (16)] results in nonzero off-diagonal elements in $C_{22 (g)}$ . Yet, it should be noted that such a purely random model is not practically relevant in plant (and animal) breeding.

When $G_{(g)} = I_{n_{g}} σ_{g}^{2}$ and $C_{22 (g)}$ have compound symmetry structure (i.e., simple, balanced setting), all methods except $H_{S i m}^{2}$ and ${\bar{r}}_{\cdot}^{2}$ give identical estimates, which we confirmed in the simple, balanced setting of example 1. Furthermore, both $H_{Δ}^{2}$ methods display that same value across all genotypes and pairs. Since we always had to assume $G_{(g)} = I_{n_{g}} σ_{g}^{2}$ for $H_{S t d}^{2}$ , $H_{P i e p h o}^{2}$ , and $H_{C u l l i s}^{2}$ , example 3 technically reduces to a simple, balanced setting for these measures as well, since it involves a single environment laid out as RCBD with balanced data. Accordingly, these three methods yielded identical estimates in that scenario as well (Figure 1). The estimates for ${\bar{r}}_{\cdot}^{2}$ differ, because off-diagonal elements are ignored in its estimation. However, notice that with increasing $n_{g}$ , the off-diagonal elements of $C_{22 (g)}$ approach 0, and thus ${\bar{r}}_{\cdot}^{2}$ approaches the other measures and, as stated before, so does $H_{S i m}^{2}$ .

In the case where $G_{(g)} = I_{n_{g}} σ_{g}^{2}$ and $C_{22 (g)}$ is an unstructured matrix, ${\bar{H}}_{Δ \cdot \cdot_{B L U P}}^{2}$ and $H_{C u l l i s}^{2}$ give identical estimates. This is analogously true for ${\bar{H}}_{Δ \cdot \cdot_{B L U E}}^{2}$ and $H_{P i e p h o}^{2}$ , and for $C_{11}$ (rather than $C_{22 (g)}$ ) being an unstructured matrix. Rodríguez-Álvarez et al. (2018) show a connection between ${\bar{r}}_{\cdot}^{2}$ and $H_{O a k e y}^{2}$ . They point out that given $G_{(g)} = I_{n_{g}} σ_{g}^{2}$ , the mean reliability can be expressed as ${\bar{r}}_{\cdot}^{2} = \frac{1}{n_{g}} \sum_{i = 1}^{n_{g}} r_{i}^{2} = t r a c e (I_{n_{g}} - G_{(g)}^{- 1} C_{22 (g)}) / n_{g} = t r a c e (D) / n_{g}$ and thus as a special case of $H_{O a k e y}^{2}$ , ignoring the number of zero eigenvalues. This is in agreement with Laloë (1993), and our results confirm this relationship for examples 1 and 2 (results not shown). However, we did not find it to be true for examples 3 and 4, where $G_{(g)} \neq I_{n_{g}} σ_{g}^{2}$ . Note that in their article Rodríguez-Álvarez et al. (2018) refer to the mean reliability $({\bar{r}}_{\cdot}^{2})$ as $H_{C}^{2}$ in Cullis et al. (2006), even though the latter method is based on differences [see (11)], whereas the former is based on individual genotypes [see (43)].

Interestingly, though not practically very relevant, $r_{i}^{2}$ and ${\bar{H}}_{Δ i \cdot_{B L U P}}^{2}$ give identical estimates per genotype for diagonal $G_{(g)}$ and $C_{22 (g)}$ .

Finally, when $G_{(g)}$ displays covariances (e.g., as $G_{(g)} = K σ_{g}^{2}$ ) and $C_{22 (g)}$ is an unstructured matrix, none of the five measures that are appropriate for this case (i.e., ${\bar{h}}_{Δ \cdot \cdot_{B L U E}}^{2}$ , ${\bar{h}}_{Δ \cdot \cdot_{B L U P}}^{2}$ , $h_{O a k e y}^{2}$ , ${\bar{r}}_{\cdot}^{2}$ , and $h_{S i m}^{2}$ ) give identical results. Nevertheless, with the exception of $h_{O a k e y}^{2}$ , heritability estimates within examples 3 and 4 were relatively similar (Figure 1). This is true even for the methods that did not include kinship information (i.e., $H_{S t d}^{2}$ , $H_{P i e p h o}^{2}$ , and $H_{C u l l i s}^{2}$ ). Yet, when Ould Estaghvirou et al. (2013) simulated phenotypic and marker data for a single location, they found that these three methods gave lower estimates than the rest.

The most striking results are those for $h_{O a k e y}^{2}$ in examples 3 and 4. Notice that Lourenço et al. (2017) also found outstandingly low estimates for $h_{O a k e y}^{2}$ in models with $G_{(g)} = K σ_{g}^{2}$ for two real data sets and one simulated data set. This raises the question whether $H_{O a k e y}^{2}$ is suited for cases where $G_{(g)} = K σ_{g}^{2}$ . While it is true that $H_{O a k e y}^{2}$ estimates are equal to $H_{S t d}^{2}$ in the simple, balanced setting (of example 1), this is only a necessary, but not a sufficient, condition for $H_{O a k e y}^{2}$ to be generally applicable also with correlated genotypes. It can be argued that $H_{O a k e y}^{2}$ and ${\bar{H}}_{Δ \cdot \cdot}^{2}$ are similar in the sense that both are based on contrasts between genotypes. However, the important difference is that for ${\bar{H}}_{Δ \cdot \cdot}^{2}$ the contrasts of interest (i.e., all genotype pairs) are chosen according to the breeders’ goals, while the contrasts in $H_{O a k e y}^{2}$ are determined by the data set. Laloë (1993) shows that the smallest and largest nonzero eigenvalues, $λ_{ζ}$ , are the lower and upper limit for the heritability of all possible contrasts of genotypes. Accordingly, they are also the lower and upper limit for all $H_{Δ i j}^{2} / h_{Δ i j}^{2}$ , which is confirmed by our results (Figure 4). It may be noted that $H_{O a k e y}^{2}$ is a simple average of “canonical” heritabilities (eigenvalues) corresponding to canonical contrasts and derived from the canonical decomposition of covariance matrices. As such, low heritabilities indicate a low design efficiency, as the smallest eigenvalue appears to be a good indicator of the robustness of the design and a measure of the part of the genetic trend that can be predicted (Laloë and Phocas 2003). In contrast, a pairwise difference is a combination of low and high heritability contrasts that corresponds to the weighted mean of canonical heritabilities. This may go some way toward explaining the observed discrepancies in estimates of $h_{O a k e y}^{2}$ and the alternatives considered here.

Individual estimates and boxplots for $h_{Δ i j_{B L U P}}^{2}$ , *${\bar{h}}_{Δ i \cdot_{B L U P}}^{2}$* , and $λ_{ζ}$ (*i.e.*, eigenvalues needed for $h_{O a k e y}^{2}$ ) from example 4. The zero eigenvalue is denoted by the symbol ×. BLUP, best linear unbiased prediction.

Genotype-specific information gained from $H_{Δ}^{2}$

Figure 2 and Figure 3 are good examples of how $H_{Δ}^{2} / h_{Δ}^{2}$ can give a more in-depth insight into the outcome of a plant breeding trial. More precisely, the results show that investigating genotype-specific, or even genotype-contrast-specific, heritability measures can (i) summarize how variable/ambiguous an overall heritability estimate is and (ii) point out individual genotypes that stand out.

The first point may seem obvious by now, but it must be realized that whenever only a single (overall) heritability estimate is obtained, there is no information whatsoever about the dispersion of that estimate, only about its central tendency. We argue that investigating ${\bar{H}}_{Δ i \cdot}^{2} / {\bar{h}}_{Δ i \cdot}^{2}$ and/or $H_{Δ i j}^{2} / h_{Δ i j}^{2}$ gives more comprehensive insight, since they explicitly show the variability across genotypes and thus, bringing us to the second point, point out exceptional genotypes such as geno38, geno49, geno81, geno82, and geno88 in example 4. Notice that we identified two separate reasons for why these five genotypes were exceptional (low additive variance and missing observations; see Results section). Moreover, looking at the cluster of estimates for geno81 in example 4 in Figure 3, it becomes clear how a single genotype can display notably lower pairwise heritability estimates than the rest. These estimates are obviously decreasing the overall heritability estimate for this example, which would be especially unfavorable in a scenario where the breeder does not have great interest in this particular genotype. In the end, none of this would become apparent by only looking at overall heritability estimates (Figure 1).

What do we really want heritability for and how does $H_{Δ}^{2}$ help?

There are two main reasons why heritability on an entry-mean basis is of interest in plant breeding. On one hand, it is plugged into the breeder’s Equation (2) to predict the response to selection. On the other hand, it is a descriptive measure used to assess the usefulness and precision of results from cultivar evaluation trials. In the simple, balanced setting, heritability is suited for both purposes. However, whenever we depart from the simple, balanced setting, $H_{S t d}^{2}$ is no longer suited for either of the two purposes. Furthermore, any alternative measure can ultimately only aim to generalize heritability as a descriptive measure. Regarding the purpose of heritability as a mean to estimate the response to selection, we reiterate the view of Piepho and Möhring (2007): instead of trying to approximate heritability using some ad hoc measure, one should simulate the response to selection directly. As stated before, this can be done via the same simulation-based approach used to obtain $H_{S i m}^{2}$ and is exemplified in their work (also see Appendix E).

In terms of the descriptive function of heritability, we believe that $H_{Δ}^{2}$ is a valuable extension of heritability on an entry-mean basis, because its genotype-wise and pair-wise estimates give more detailed insight, and ultimately allow for better decision-making by the breeder. Furthermore, we would like to argue that its derivation, calculation, and interpretation is rather intuitive, which makes us optimistic about its acceptance in practice. Our view is further supported by Laloë (1993), who also suggested extending reliability to pairwise contrasts in the animal breeding framework.

Conclusions

Irrespective of whether a breeder ultimately decides to apply $H_{Δ}^{2}$ , we think that understanding the idea behind it alone raises awareness for the mentioned problems with heritability outside the scope of simple, balanced settings and thus can potentially improve the use of heritability in breeding programs. In the end, all heritability measures aim to inform about the same underlying subject matter, and, albeit via different methodologies, they all do. What should therefore be held above all, especially given the mentioned ambiguity problems in a nonsimple, unbalanced setting, is to report how heritability was estimated. Hanson and Robinson (1963, page 612) put it in this crucial context of reproducible science by pointing out that “One must extrapolate in the real world, and one must use estimates of heritability derived by someone else, especially with new crops in new environments. Therefore, it is important to specify exactly how published estimates were obtained in order that others may extrapolate.”

Acknowledgments

We thank Ivan Simko and Ryan J. Hayes (US Department of Agriculture-Agricultural Research Service, Crop Improvement and Protection Research Unit, Salinas, CA) for providing the lettuce data. We also wish to thank two anonymous reviewers for their very helpful comments. This research was funded by the German Research Foundation (Deutsche Forschungsgemeinschaft grant PI 377/18-1).

Appendix

Appendix A: Computing ${\hat{g}}^{BLUP}$ From ${\hat{g}}^{BLUE}$ in a Stage-Wise Approach

Consider the analysis of a series of trials where the aim is to obtain ${\hat{g}}^{B L U P}$ . One may use a single-stage analysis with a random genotype main effect. Alternatively, it is also possible to use a stage-wise analysis, where the first stage obtains ${\hat{g}}^{B L U E}$ from which ${\hat{g}}^{B L U P}$ are predicted in the second stage. Both approaches are equivalent, if variance components are known. In practice, small differences are encountered, as variances need to be estimated (Piepho et al. 2012a).

Single-stage analysis

Assume that we can write the general single-stage model in (13) as

y = X_{A} β_{A} + Z_{A} u_{A} + Z_{B} u_{B} + Z_{W} u_{W} + ε,

(45)

where $β_{A}$ is a vector of fixed across-environment effects (subscript $A$ ), $u_{A}$ is a vector of random across-environment effects with $u_{A} \sim (0, G_{A})$ , $u_{B}$ is a vector of random between-environment effects (subscript $B$ ) with $u_{B} \sim (0, G_{B})$ , $u_{W}$ is a vector of random within-environment effects (subscript $W$ ) with $u_{W} \sim (0, G_{W})$ , $ε$ is a vector of plot errors with $ε \sim (0, R)$ , and $y$ is the observed data vector with $y \sim (0, V)$ , where $V = Z_{A} G_{A} Z_{A}^{T} + Z_{B} G_{B} Z_{B}^{T} + Z_{W} G_{W} Z_{W}^{T} + R$ [see Piepho et al. (2012a)]. Thus, $G$ here is a block diagonal matrix with blocks $G_{A}$ , $G_{B}$ , and $G_{W}$ on its diagonal. Given a MET similar to example 4, $X_{A} β_{A}$ would be a general intercept, $u_{A}$ the random genotype main effect (i.e. $g$ ), $u_{B}$ the random environment main effect and random genotype-by-environment interaction effect, $u_{W}$ the random block effect at each environment, and $ε$ the plot error. Accordingly, $G_{A}$ would correspond to the kinship matrix $K$ , whereas $G_{B}$ and $G_{W}$ would be proportional to identity matrices.

Two-stage analysis

Model (45) has a two-stage representation, with the first stage model given by

y = X_{A 1} β_{A 1} + Z_{B} u_{B} + Z_{W} u_{W} + ε_{1},

(46)

where effects are defined as in (45), while the subscript $1$ denotes that the corresponding parameters are estimated in this first stage and thus $V_{1} = Z_{B} G_{B} Z_{B}^{T} + Z_{W} G_{W} Z_{W}^{T} + R_{1}$ . Notice that in contrast to (45), $Z_{A} u_{A}$ is missing here, since the genotype main effect needs to be taken as fixed at this stage (Piepho et al. 2012a) and is therefore comprised in $X_{A 1} β_{A 1}$ . For simplicity, we can assume that here the genotypic main effects are the only fixed effects so that $β_{A 1}$ are the genotype means. Accordingly, we can obtain genotypic BLUEs as

B L U E (β_{A 1}) = {\hat{g}}^{B L U E} = {(X_{A 1}^{'} V_{1}^{- 1} X_{A 1})}^{- 1} X_{A 1}^{'} V_{1}^{- 1} y,

(47)

with variance

v a r ({\hat{g}}^{B L U E}) = {(X_{A 1}^{'} V_{1}^{- 1} X_{A 1})}^{- 1} = R_{2} .

(48)

In the second stage we then have

{\hat{g}}^{B L U E} = X_{A 2} β_{A 2} + Z_{A 2} u_{A} + ε_{2},

(49)

where $ε_{2} \sim (0, R_{2})$ and just like in (45), the random genotype main effect $g = u_{A} \sim (0, G_{A})$ where $G_{A}$ corresponds to the kinship matrix $K$ . Notice that $X_{A 2} β_{A 2} = 1_{n_{g}} μ$ and $Z_{A 2} = I_{n_{g}}$ . Accordingly, we have $V_{2} = Z_{A 2} G_{A} Z_{A 2}^{T} + R_{2}$ and for the mixed model equations we find

{[\begin{matrix} X_{A 2}^{'} R_{2}^{- 1} X_{A 2}^{'} & X_{A 2}^{'} R_{2}^{- 1} Z_{A 2} \\ Z_{A 2}^{'} R_{2}^{- 1} X_{A 2} & Z_{A 2}^{'} R_{2}^{- 1} Z_{A 2} + G_{A}^{- 1} \end{matrix}]}^{-} [\begin{matrix} {\hat{β}}_{A 2} \\ {\hat{g}}^{B L U P} \end{matrix}] = [\begin{matrix} X_{A 2}^{'} R_{2}^{- 1} {\hat{g}}^{B L U E} \\ Z_{A 2}^{'} R_{2}^{- 1} {\hat{g}}^{B L U E} \end{matrix}] .

(50)

It is important to note that we can write

X_{A} = X_{A 1} X_{A 2}

(51)

and

Z_{A} = X_{A 1} Z_{A 2} .

(52)

Finally, plugging in (48) and (49) into (50), while making use of (51) and (52), it can be seen that the mixed model equation solutions for the second stage are equivalent to those for the single-stage analysis.

Appendix B: An Alternative Estimation for ${\bar{H}}_{Δ \cdot \cdot}^{2}$ in the Case of Independent Genotypic Effects with Constant Variance by Averaging Separately Across Numerator and Denominator

Given independent genotypic effects with constant variance, averaging numerator and denominator of $H_{Δ i j_{B L U P}}^{2}$ in (24) separately first leads to the algebraically equivalent simplification

{\bar{H}}_{Δ \cdot \cdot_{B L U P}}^{2} = \frac{\sum_{i} \sum_{j < i} var (g_{i} - g_{j}) - v_{Δ i j}^{B L U P}}{\sum_{i} \sum_{j < i} var (g_{i} - g_{j})} = \frac{2 σ_{g}^{2} - \frac{2}{n_{g} (n_{g} - 1)} \sum_{i} \sum_{j < i} v_{Δ i j}^{B L U P}}{2 σ_{g}^{2}} = \frac{2 σ_{g}^{2} - {\bar{v}}_{Δ \cdot \cdot}^{B L U P}}{2 σ_{g}^{2}} = 1 - \frac{{\bar{v}}_{Δ \cdot \cdot}^{B L U P}}{2 σ_{g}^{2}} = H_{C u l l i s}^{2} .

(53)

Analogously, we find for $H_{Δ i j_{B L U E}}^{2}$ in (29) that

{\bar{H}}_{Δ \cdot \cdot_{B L U E}}^{2} = \frac{\sum_{i} \sum_{j < i} var (g_{i} - g_{j})}{\sum_{i} \sum_{j < i} var (g_{i} - g_{j}) + v_{Δ i j}^{B L U E}} = \frac{2 σ_{g}^{2}}{2 σ_{g}^{2} + \frac{2}{n_{g} (n_{g} - 1)} \sum_{i} \sum_{j < i} v_{Δ i j}^{B L U E}} = \frac{2 σ_{g}^{2}}{2 σ_{g}^{2} + {\bar{v}}_{Δ \cdot \cdot}^{B L U E}} = \frac{σ_{g}^{2}}{σ_{g}^{2} + {\bar{v}}_{Δ \cdot \cdot}^{B L U E} / 2} = H_{P i e p h o}^{2} .

(54)

Appendix C: An Approximation for ${\bar{H}}_{Δ \cdot \cdot}^{2}$ in the General Case

It can be shown (Ould Estaghvirou et al. 2013, Additional Files 1) that

\sum_{i} \sum_{j < i} v_{Δ i j}^{B L U E} = t r a c e (n_{g} P_{μ} C_{11 (g)}),

(55)

where $P_{μ} = I_{n}_{g} - n_{g}^{- 1} 1_{n_{g}} 1_{n_{g}}^{T}$ with $1_{n_{g}}$ being a vector of ones. Thus, $P_{μ}$ is a matrix that centers for the overall mean, such that ${\bar{v}}_{Δ \cdot \cdot}^{B L U E} = \frac{2}{n_{g} (n_{g} - 1)} t r a c e (n_{g} P_{μ} C_{11 (g)})$ . Analogously, we have

\sum_{i} \sum_{j < i} v_{Δ i j}^{B L U P} = t r a c e (n_{g} P_{μ} C_{22 (g)}),

(56)

such that ${\bar{v}}_{Δ \cdot \cdot}^{B L U P} = \frac{2}{n_{g} (n_{g} - 1)} t r a c e (n_{g} P_{μ} C_{22 (g)})$ and finally

\sum_{i} \sum_{j < i} v a r (g_{i} - g_{j}) = t r a c e (n_{g} P_{μ} G_{(g)}),

(57)

such that ${\bar{v}}_{Δ \cdot \cdot}^{g} = \frac{2}{n_{g} (n_{g} - 1)} t r a c e (n_{g} P_{μ} G_{(g)})$ . By inserting (55), (56), and (57) into (34), we get

{\bar{\bar{H}}}_{Δ \cdot \cdot_{B L U E}}^{2} = \frac{t r a c e (P_{μ} G_{(g)})}{t r a c e (P_{μ} G_{(g)}) + t r a c e (P_{μ} C_{11 (g)})},

(58)

where, loosely speaking, $t r a c e (P_{μ} G_{(g)})$ captures the genotypic variance, $t r a c e (P_{μ} C_{11 (g)})$ captures the environmental variance, and hence $t r a c e (P_{μ} G_{(g)}) + t r a c e (P_{μ} C_{11 (g)})$ captures the phenotypic variance. Notice that (58) coincides with Method 4 of Ould Estaghvirou et al. (2013). Based on (33), we find correspondingly

{\bar{\bar{H}}}_{Δ \cdot \cdot_{B L U P}}^{2} = \frac{t r a c e (P_{μ} G_{(g)}) - t r a c e (P_{μ} C_{22 (g)})}{t r a c e (P_{μ} G_{(g)})} .

(59)

Appendix D: A Standardization of $M$ To Directly Capture the Total Genotypic Variance

As shown in Appendix C, ${\bar{v}}_{Δ \cdot \cdot}^{g}$ captures the average genotypic variance of a difference between two genotypes. In accordance with the notion of this article, we could obtain half the average variance of a difference as an estimate for the total genotypic variance. Notice that this approach is in line with the average semi variance $(σ_{g (a s v)}^{2})$ used by Piepho (2019) and can be defined as

σ_{g (a s v)}^{2} = 0.5 {\bar{v}}_{Δ \cdot \cdot}^{g} = \frac{t r a c e (P_{μ} G_{(g)})}{n_{g} - 1} .

(60)

In the case of $G_{(g)} = M M^{'} σ_{a}^{2} = K σ_{a}^{2}$ , we can do some useful rearrangements:

\begin{array}{l} σ_{g (a s v)}^{2} = \frac{t r a c e (P_{μ} M M^{'}) σ_{g}^{2}}{n_{g} - 1} = \frac{t r a c e (P_{μ} P_{μ} M M^{'}) σ_{g}^{2}}{n_{g} - 1} \\ = \frac{t r a c e (P_{μ} M M^{'} P_{μ}) σ_{g}^{2}}{n_{g} - 1} = \frac{t r a c e (\tilde{M} {\tilde{M}}^{'}) σ_{g}^{2}}{n_{g} - 1}, \end{array}

(61)

where $\tilde{M} = P_{μ} M$ is column mean-centered. Notice that $\tilde{M}$ is equal to what is referred to as $Z$ in VanRaden (2008) (page 4416). Notice further that the standardization presented here works for an arbitrary coding of markers and arbitrary scale. As a result, the trace of ${\tilde{G}}_{(g)} = \tilde{M} {\tilde{M}}^{'} \tilde{σ_{a}^{2}} = \tilde{K} \tilde{σ_{a}^{2}}$ directly returns ${\hat{σ}}_{g (a s v)}^{2}$ . We can standardize further:

\bar{\bar{M}} = \frac{\tilde{M}}{\sqrt{\frac{t r a c e (\tilde{M} {\tilde{M}}^{'})}{n_{g} - 1}}}

(62)

and define ${\bar{\bar{G}}}_{(g)} = \bar{\bar{M}} {\bar{\bar{M}}}^{'} \bar{\bar{σ_{a}^{2}}} = \bar{\bar{K}} \bar{\bar{σ_{a}^{2}}}$ . Now the total genotypic variance simplifies to

σ_{g (a s v)}^{2} = \frac{t r a c e (P_{μ} \bar{\bar{M}} {\bar{\bar{M}}}^{'} \bar{\bar{σ_{a}^{2}}})}{n_{g} - 1} = \bar{\bar{σ_{a}^{2}}},

(63)

which simplifies (58) and (59) to

{\bar{h}}_{Δ B L U E}^{2} = \frac{\bar{\bar{σ_{a}^{2}}}}{\bar{\bar{σ_{a}^{2}}} + t r a c e (n_{g} P_{μ} C_{11 (g)})}

(64)

and

{\bar{h}}_{Δ B L U P}^{2} = \frac{\bar{\bar{σ_{a}^{2}}} - t r a c e (n_{g} P_{μ} C_{22 (g)})}{\bar{\bar{σ_{a}^{2}}}} .

(65)

Thus, if $\bar{\bar{M}}$ is standardized as shown above, the genotypic variance we are trying to capture is conveniently estimated directly as the variance for $a$ . However, it should be noted that both $\tilde{K}$ and $\bar{\bar{K}}$ are singular, and thus may lead to problems depending on the statistical software being used: ASReml-R 3.0 (Gilmour et al. 2009) will show an error message, while PROC MIXED in SAS 9.4 (SAS Institute Inc. 2013) will effectively compute a modified version of the mixed model equations (Piepho et al. 2012b; SAS Institute Inc. 2017).

We would like to point out that neither $\tilde{K}$ nor $\bar{\bar{K}}$ are equal to the genomic relationship matrices proposed in VanRaden (2008). Their first method obtains the general relationship matrix as:

K_{VanRaden 1} = \frac{\tilde{M} {\tilde{M}}^{'}}{\sum_{w = 1}^{n_{m}} 2 p_{w} (1 - p_{w})},

(66)

where $p_{w}$ are the mean frequencies of the second allele across all genotypes at marker $w = {1, \dots, n_{m}}$ and can also be expressed as $p_{w} = \frac{1_{n_{g}}^{T} M}{2 n_{g}}$ . Their second method obtains the matrix as $K_{VanRaden 2} = \tilde{M} L {\tilde{M}}^{'}$ where $L$ is a diagonal with elements

L_{w, w} = \frac{1}{n_{m} [2 p_{w} (1 - p_{w})]} .

(67)

Since we have $\tilde{K} = \tilde{M} {\tilde{M}}^{'}$ , we find that opposed to $\tilde{K}$ and therefore $\bar{\bar{K}}$ , (66) and (67) additionally require a division involving allele frequencies. It may also be pointed out that in a population with Hardy–Weinberg equilibrium, $t r a c e (M M^{'})$ corresponds to the heterozygosity of the markers. However, it should be kept in mind that completely genetically homogeneous cultivars—such as inbred lines, DH-lines, clones, or hybrids—are definitely not in Hardy–Weinberg equilibrium. Finally, note that if we approximate ${\bar{h}}_{Δ \cdot \cdot_{B L U E}}^{2}$ via (59) and implement ${\hat{σ}}_{g (a s v)}^{2}$ into $h_{P i e p h o}^{2}$ , we find that ${\bar{\bar{h}}}_{Δ \cdot \cdot_{B L U E}}^{2} = h_{P i e p h o}^{2}$ even in the general case. Analogously, we have ${\bar{\bar{h}}}_{Δ \cdot \cdot_{B L U P}}^{2} = h_{C u l l i s}^{2}$ .

Appendix E: Directly Simulate Response to Selection

As proposed by Piepho and Möhring (2007), one may directly simulate the response to selection via the same approach used to obtain $H_{S i m}^{2}$ in this article. Since in (40) we simulate $n_{g}$ pairs of true $(g_{sim})$ and predicted $({\hat{g}}_{sim}^{BLUP})$ genotypic values, we can select a subset with the best values for ${\hat{g}}_{sim}^{BLUP}$ and compute the response to selection for the $s$ th simulation run as

R_{s} = \frac{\sum_{i \in ϕ_{s}} g_{sim}}{n_{s e l}},

(68)

where $ϕ_{s}$ is the selected subset of genotypes in the $s$ th simulation run and $n_{s e l}$ is the number of selected genotypes (i.e. $n_{s e l} \leq n_{g}$ ). Analogous to (41), we can then obtain the simulated expected response to selection as

R_{s i m} = n_{s i m}^{- 1} \sum_{s = 1}^{n_{s i m}} R_{s} .

(69)

Note that, in principle, the simulation-based approach could also be adapted to simulate the joint distribution of true $(g_{sim})$ and estimated $({\hat{g}}_{sim}^{BLUE})$ genotypic values. Furthermore, one could then compare the simulated expected response to selection values from both simulation approaches to determine whether to ultimately model the genotypic main effect as fixed or random. Yet, not least because of the ability to account for kinship information, we generally expect an analysis with random genotypic effects to be more useful than one with fixed genotypic effects [see Piepho et al. (2008)].

Footnotes

Supplemental material available at FigShare: https://doi.org/10.25386/genetics.7370168.

Communicating editor: A. Charcosset

Literature Cited

Becker H., 2011. Pflanzenzüchtung. Ed. 2 Ulmer, Stuttgart, Germany. [Google Scholar]
Bernal-Vasquez A.-M., Gordillo A., Schmidt M., Piepho H.-P., 2017. Genomic prediction in early selection stages using multi-year data in a hybrid rye breeding program. BMC Genet. 18: 51 10.1186/s12863-017-0512-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cullis B. R., Smith A. B., Coombes N. E., 2006. On the design of early generation variety trials with correlated data. J. Agric. Biol. Environ. Stat. 11: 381–393. 10.1198/108571106X154443 [DOI] [Google Scholar]
Dias K. O. D. G., Gezan S. A., Guimarães C. T., Nazarian A., da Costa L., et al. , 2018. Improving accuracies of genomic predictions for drought tolerance in maize by joint modeling of additive and dominance effects in multi-environment trials. Heredity 121: 24–37. 10.1038/s41437-018-0053-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
Falconer D. S., Mackay T. F. C., 2005. Introduction to Quantitative Genetics, Ed. 4 Pearson Prentice Hall, Harlow. [Google Scholar]
Gilmour A. R., Gogel B. J., Cullis B. R., Thompson R., 2009. ASReml user guide release 3.0 VSN International Ltd., Hemel Hempstead. [Google Scholar]
Hadasch S., Simko I., Hayes R. J., Ogutu J. O., Piepho H.-P., 2016. Comparing the predictive abilities of phenotypic and marker-assisted selection methods in a biparental lettuce population. Plant Genome 9 Available at: https://dl.sciencesocieties.org/publications/tpg/abstracts/9/1/plantgenome2015.03.0014. 10.3835/plantgenome2015.03.0014 [DOI] [PubMed] [Google Scholar]
Hallauer A. R., Carena M. J., Miranda Filho J. B., 2010. Quantitative Genetics in Maize Breeding, Vol. 3 Springer, New York. [Google Scholar]
Hanson W., Robinson H., 1963. Statistical Genetics and Plant Breeding. National Academy of Sciences - National Research Council, Washington, DC. [Google Scholar]
Hayes R. J., Galeano C. H., Luo Y., Antonise R., Simko I., 2014. Inheritance of decay of fresh-cut lettuce in a recombinant inbred line population from Salinas 88 × La Brillante. J. Am. Soc. Hortic. Sci. 139: 388–398. 10.21273/JASHS.139.4.388 [DOI] [Google Scholar]
Henderson C. R., 1986. Statistical methods in animal improvement. Historical overview, pp. 2–14 in Advances in Statistical Methods for Genetic Improvement of Livestock, Vol. 18, edited by Gianola D., Hammond K. Springer, Berlin: 10.1007/978-3-642-74487-7_1 [DOI] [Google Scholar]
John N. A., Williams E. R., 1995. Cyclic and Computer Generated Designs, Ed. 2 Chapman and Hall, London: 10.1007/978-1-4899-7220-0 [DOI] [Google Scholar]
Knight R. L., 1948. Dictionary of Genetics, Chronica Botánica Co., Waltham, Massachusetts.
Kuehn L. A., Lewis R. M., Notter D. R., 2007. Managing the risk of comparing estimated breeding values across flocks or herds through connectedness: a review and application. Genet. Sel. Evol. 39: 225–247. 10.1051/gse:2007001 [DOI] [PMC free article] [PubMed] [Google Scholar]
Laloë D., 1993. Precision and information in linear models of genetic evaluation. Genet. Sel. Evol. 25: 557 10.1186/1297-9686-25-6-557 [DOI] [Google Scholar]
Laloë D., Phocas F., 2003. A proposal of criteria of robustness analysis in genetic evaluation. Livest. Prod. Sci. 80: 241–256. 10.1016/S0301-6226(02)00092-1 [DOI] [Google Scholar]
Laloë D., Phocas F., Menissier F., 1996. Considerations on measures of precision and connectedness in mixed linear models of genetic evaluation. Genet. Sel. Evol. 28: 359 10.1186/1297-9686-28-4-359 [DOI] [Google Scholar]
Lourenço V. M., Rodrigues P. C., Pires A. M., Piepho H.-P., 2017. A robust DF-REML framework for variance components estimation in genetic studies. Bioinformatics 33: 3584–3594. 10.1093/bioinformatics/btx457 [DOI] [PubMed] [Google Scholar]
Mathews K. L., Malosetti M., Chapman S., McIntyre L., Reynolds M., et al. , 2008. Multi-environment QTL mixed models for drought stress adaptation in wheat. Theor. Appl. Genet. 117: 1077–1091. 10.1007/s00122-008-0846-8 [DOI] [PubMed] [Google Scholar]
Mrode R. A., Thompson R., 2014. Linear Models for the Prediction of Animal Breeding Values, Ed. 3 Centre for Agriculture and Bioscience International, Wallingford: 10.1079/9781780643915.0000 [DOI] [Google Scholar]
Oakey H., Verbyla A., Pitchford W., Cullis B., Kuchel H., 2006. Joint modeling of additive and non-additive genetic line effects in single field trials. Theor. Appl. Genet. 113: 809–819. 10.1007/s00122-006-0333-z [DOI] [PubMed] [Google Scholar]
Ould Estaghvirou S. B., Ogutu J. O., Schulz-Streeck T., Knaak C., Ouzunova M., et al. , 2013. Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding. BMC Genomics 14: 860 10.1186/1471-2164-14-860 [DOI] [PMC free article] [PubMed] [Google Scholar]
Piepho H.-P., 2019. A coefficient of determination (R²) for generalized linear mixed models. Biom. J. 61: 860–872. 10.1002/bimj.201800270 [DOI] [PubMed] [Google Scholar]
Piepho H.-P., Möhring J., 2007. Computing heritability and selection response from unbalanced plant breeding trials. Genetics 177: 1881–1888. 10.1534/genetics.107.074229 [DOI] [PMC free article] [PubMed] [Google Scholar]
Piepho H.-P., Möhring J., Melchinger A. E., Büchse A., 2008. BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161: 209–228. 10.1007/s10681-007-9449-8 [DOI] [Google Scholar]
Piepho H.-P., Möhring J., Schulz-Streeck T., Ogutu J. O., 2012a A stage-wise approach for the analysis of multi-environment trials. Biom. J. 54: 844–860. 10.1002/bimj.201100219 [DOI] [PubMed] [Google Scholar]
Piepho H.-P., Ogutu J. O., Schulz-Streeck T., Estaghvirou B., Gordillo A., et al. , 2012b Efficient computation of ridge-regression best linear unbiased prediction in genomic selection in plant breeding. Crop Sci. 52: 1093–1104. 10.2135/cropsci2011.11.0592 [DOI] [Google Scholar]
R Core Team , 2015. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. [Google Scholar]
Rodríguez-Álvarez M. X., Boer M. P., van Eeuwijk F. A., Eilers P. H. C., 2018. Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spat. Stat. 23: 52–71. 10.1016/j.spasta.2017.10.003 [DOI] [Google Scholar]
SAS Institute Inc , 2013. Base SAS 9.4 Procedures Guide: Statistical Procedures, 2. SAS Institute Inc., Cary, NC. [Google Scholar]
SAS Institute Inc. , 2017. SAS/STAT® 14.3 User’s Guide SAS Institute Inc., Cary, NC. [Google Scholar]
Schmidt P., Hartung J., Rath J., Piepho H.-P., 2019. Estimating broad-sense heritability with unbalanced data from agricultural cultivar trials. Crop Sci. 59: 525–536. 10.2135/cropsci2018.06.0376 [DOI] [Google Scholar]
Schulz-Streeck T., Ogutu J. O., Karaman Z., Knaak C., Piepho H.-P., 2012. Genomic selection using multiple populations. Crop Sci. 52: 2453–2461. 10.2135/cropsci2012.03.0160 [DOI] [Google Scholar]
Searle S. R., Casella G., McCulloch C. E., 1992. Variance Components. Wiley, New York: 10.1002/9780470316856 [DOI] [Google Scholar]
Searle S. R., Speed F. M., Milliken G. A., 2012. Population marginal means in the linear model. An alternative to least squares means. Am. Stat. 34: 216–221. 10.1080/00031305.1980.10483031 [DOI] [Google Scholar]
VanRaden P. M., 2008. Efficient methods to compute genomic predictions. J. Dairy Sci. 91: 4414–4423. 10.3168/jds.2007-0980 [DOI] [PubMed] [Google Scholar]
Viana J. M. S., Pereira H. D., Mundim G. B., Piepho H.-P., Silva F. F. E., 2018. Efficiency of genomic prediction of non-assessed single crosses. Heredity 120: 283–295. 10.1038/s41437-017-0027-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wellmann R., Bennewitz J., 2011. The contribution of dominance to the understanding of quantitative genetic variation. Genet. Res. 93: 139–154. 10.1017/S0016672310000649 [DOI] [PubMed] [Google Scholar]
Wright K., 2017. agridat: Agricultural Datasets. R package version 1.13. dataset: john.alpha. https://CRAN.R-project.org/package=agridat.
Xu S., 2013. Principles of Statistical Genomics. Springer, New York: 10.1007/978-0-387-70807-2 [DOI] [Google Scholar]
Yan W., 2014. Crop Variety Trials: Data Management and Analysis. John Wiley & Sons Inc., New York: 10.1002/9781118688571 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[bib1] Becker H., 2011. Pflanzenzüchtung. Ed. 2 Ulmer, Stuttgart, Germany. [Google Scholar]

[bib2] Bernal-Vasquez A.-M., Gordillo A., Schmidt M., Piepho H.-P., 2017. Genomic prediction in early selection stages using multi-year data in a hybrid rye breeding program. BMC Genet. 18: 51 10.1186/s12863-017-0512-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Cullis B. R., Smith A. B., Coombes N. E., 2006. On the design of early generation variety trials with correlated data. J. Agric. Biol. Environ. Stat. 11: 381–393. 10.1198/108571106X154443 [DOI] [Google Scholar]

[bib4] Dias K. O. D. G., Gezan S. A., Guimarães C. T., Nazarian A., da Costa L., et al. , 2018. Improving accuracies of genomic predictions for drought tolerance in maize by joint modeling of additive and dominance effects in multi-environment trials. Heredity 121: 24–37. 10.1038/s41437-018-0053-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Falconer D. S., Mackay T. F. C., 2005. Introduction to Quantitative Genetics, Ed. 4 Pearson Prentice Hall, Harlow. [Google Scholar]

[bib6] Gilmour A. R., Gogel B. J., Cullis B. R., Thompson R., 2009. ASReml user guide release 3.0 VSN International Ltd., Hemel Hempstead. [Google Scholar]

[bib7] Hadasch S., Simko I., Hayes R. J., Ogutu J. O., Piepho H.-P., 2016. Comparing the predictive abilities of phenotypic and marker-assisted selection methods in a biparental lettuce population. Plant Genome 9 Available at: https://dl.sciencesocieties.org/publications/tpg/abstracts/9/1/plantgenome2015.03.0014. 10.3835/plantgenome2015.03.0014 [DOI] [PubMed] [Google Scholar]

[bib8] Hallauer A. R., Carena M. J., Miranda Filho J. B., 2010. Quantitative Genetics in Maize Breeding, Vol. 3 Springer, New York. [Google Scholar]

[bib9] Hanson W., Robinson H., 1963. Statistical Genetics and Plant Breeding. National Academy of Sciences - National Research Council, Washington, DC. [Google Scholar]

[bib10] Hayes R. J., Galeano C. H., Luo Y., Antonise R., Simko I., 2014. Inheritance of decay of fresh-cut lettuce in a recombinant inbred line population from Salinas 88 × La Brillante. J. Am. Soc. Hortic. Sci. 139: 388–398. 10.21273/JASHS.139.4.388 [DOI] [Google Scholar]

[bib11] Henderson C. R., 1986. Statistical methods in animal improvement. Historical overview, pp. 2–14 in Advances in Statistical Methods for Genetic Improvement of Livestock, Vol. 18, edited by Gianola D., Hammond K. Springer, Berlin: 10.1007/978-3-642-74487-7_1 [DOI] [Google Scholar]

[bib12] John N. A., Williams E. R., 1995. Cyclic and Computer Generated Designs, Ed. 2 Chapman and Hall, London: 10.1007/978-1-4899-7220-0 [DOI] [Google Scholar]

[bib42] Knight R. L., 1948. Dictionary of Genetics, Chronica Botánica Co., Waltham, Massachusetts.

[bib13] Kuehn L. A., Lewis R. M., Notter D. R., 2007. Managing the risk of comparing estimated breeding values across flocks or herds through connectedness: a review and application. Genet. Sel. Evol. 39: 225–247. 10.1051/gse:2007001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Laloë D., 1993. Precision and information in linear models of genetic evaluation. Genet. Sel. Evol. 25: 557 10.1186/1297-9686-25-6-557 [DOI] [Google Scholar]

[bib15] Laloë D., Phocas F., 2003. A proposal of criteria of robustness analysis in genetic evaluation. Livest. Prod. Sci. 80: 241–256. 10.1016/S0301-6226(02)00092-1 [DOI] [Google Scholar]

[bib16] Laloë D., Phocas F., Menissier F., 1996. Considerations on measures of precision and connectedness in mixed linear models of genetic evaluation. Genet. Sel. Evol. 28: 359 10.1186/1297-9686-28-4-359 [DOI] [Google Scholar]

[bib18] Lourenço V. M., Rodrigues P. C., Pires A. M., Piepho H.-P., 2017. A robust DF-REML framework for variance components estimation in genetic studies. Bioinformatics 33: 3584–3594. 10.1093/bioinformatics/btx457 [DOI] [PubMed] [Google Scholar]

[bib19] Mathews K. L., Malosetti M., Chapman S., McIntyre L., Reynolds M., et al. , 2008. Multi-environment QTL mixed models for drought stress adaptation in wheat. Theor. Appl. Genet. 117: 1077–1091. 10.1007/s00122-008-0846-8 [DOI] [PubMed] [Google Scholar]

[bib20] Mrode R. A., Thompson R., 2014. Linear Models for the Prediction of Animal Breeding Values, Ed. 3 Centre for Agriculture and Bioscience International, Wallingford: 10.1079/9781780643915.0000 [DOI] [Google Scholar]

[bib21] Oakey H., Verbyla A., Pitchford W., Cullis B., Kuchel H., 2006. Joint modeling of additive and non-additive genetic line effects in single field trials. Theor. Appl. Genet. 113: 809–819. 10.1007/s00122-006-0333-z [DOI] [PubMed] [Google Scholar]

[bib22] Ould Estaghvirou S. B., Ogutu J. O., Schulz-Streeck T., Knaak C., Ouzunova M., et al. , 2013. Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding. BMC Genomics 14: 860 10.1186/1471-2164-14-860 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Piepho H.-P., 2019. A coefficient of determination (R²) for generalized linear mixed models. Biom. J. 61: 860–872. 10.1002/bimj.201800270 [DOI] [PubMed] [Google Scholar]

[bib24] Piepho H.-P., Möhring J., 2007. Computing heritability and selection response from unbalanced plant breeding trials. Genetics 177: 1881–1888. 10.1534/genetics.107.074229 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Piepho H.-P., Möhring J., Melchinger A. E., Büchse A., 2008. BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161: 209–228. 10.1007/s10681-007-9449-8 [DOI] [Google Scholar]

[bib26] Piepho H.-P., Möhring J., Schulz-Streeck T., Ogutu J. O., 2012a A stage-wise approach for the analysis of multi-environment trials. Biom. J. 54: 844–860. 10.1002/bimj.201100219 [DOI] [PubMed] [Google Scholar]

[bib27] Piepho H.-P., Ogutu J. O., Schulz-Streeck T., Estaghvirou B., Gordillo A., et al. , 2012b Efficient computation of ridge-regression best linear unbiased prediction in genomic selection in plant breeding. Crop Sci. 52: 1093–1104. 10.2135/cropsci2011.11.0592 [DOI] [Google Scholar]

[bib28] R Core Team , 2015. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. [Google Scholar]

[bib29] Rodríguez-Álvarez M. X., Boer M. P., van Eeuwijk F. A., Eilers P. H. C., 2018. Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spat. Stat. 23: 52–71. 10.1016/j.spasta.2017.10.003 [DOI] [Google Scholar]

[bib30] SAS Institute Inc , 2013. Base SAS 9.4 Procedures Guide: Statistical Procedures, 2. SAS Institute Inc., Cary, NC. [Google Scholar]

[bib31] SAS Institute Inc. , 2017. SAS/STAT® 14.3 User’s Guide SAS Institute Inc., Cary, NC. [Google Scholar]

[bib32] Schmidt P., Hartung J., Rath J., Piepho H.-P., 2019. Estimating broad-sense heritability with unbalanced data from agricultural cultivar trials. Crop Sci. 59: 525–536. 10.2135/cropsci2018.06.0376 [DOI] [Google Scholar]

[bib33] Schulz-Streeck T., Ogutu J. O., Karaman Z., Knaak C., Piepho H.-P., 2012. Genomic selection using multiple populations. Crop Sci. 52: 2453–2461. 10.2135/cropsci2012.03.0160 [DOI] [Google Scholar]

[bib34] Searle S. R., Casella G., McCulloch C. E., 1992. Variance Components. Wiley, New York: 10.1002/9780470316856 [DOI] [Google Scholar]

[bib35] Searle S. R., Speed F. M., Milliken G. A., 2012. Population marginal means in the linear model. An alternative to least squares means. Am. Stat. 34: 216–221. 10.1080/00031305.1980.10483031 [DOI] [Google Scholar]

[bib36] VanRaden P. M., 2008. Efficient methods to compute genomic predictions. J. Dairy Sci. 91: 4414–4423. 10.3168/jds.2007-0980 [DOI] [PubMed] [Google Scholar]

[bib37] Viana J. M. S., Pereira H. D., Mundim G. B., Piepho H.-P., Silva F. F. E., 2018. Efficiency of genomic prediction of non-assessed single crosses. Heredity 120: 283–295. 10.1038/s41437-017-0027-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] Wellmann R., Bennewitz J., 2011. The contribution of dominance to the understanding of quantitative genetic variation. Genet. Res. 93: 139–154. 10.1017/S0016672310000649 [DOI] [PubMed] [Google Scholar]

[bib39] Wright K., 2017. agridat: Agricultural Datasets. R package version 1.13. dataset: john.alpha. https://CRAN.R-project.org/package=agridat.

[bib40] Xu S., 2013. Principles of Statistical Genomics. Springer, New York: 10.1007/978-0-387-70807-2 [DOI] [Google Scholar]

[bib41] Yan W., 2014. Crop Variety Trials: Data Management and Analysis. John Wiley & Sons Inc., New York: 10.1002/9781118688571 [DOI] [Google Scholar]

PERMALINK

Heritability in Plant Breeding on a Genotype-Difference Basis

Paul Schmidt

Jens Hartung

Jörn Bennewitz

Hans-Peter Piepho

Abstract

Materials and Methods

Mixed model theory

Example data sets

Example 1:

Example 2:

Example 3:

Example 4:

Modeling of genotypic main effect in all examples:

Pairwise heritability

Independent genotypic effects with constant variance:

Generalization to correlated genotypic effects with nonconstant variances:

Other proposals to compute heritability

H2 Oakey:

Simulated:

Reliability:

Computation of heritability measures:

Table 1. Summary of heritability measures.

Software:

Data availability

Results

Overall heritability

Figure 1.

Genotype-wise heritability

Figure 2.

Pairwise heritability

Figure 3.

Discussion

Variations of the mixed model for replicated plant data

Relatedness of heritability methods

Figure 4.

Genotype-specific information gained from HΔ2

What do we really want heritability for and how does HΔ2 help?

Conclusions

Acknowledgments

Appendix

Appendix A: Computing g^BLUP From g^BLUE in a Stage-Wise Approach

Single-stage analysis

Two-stage analysis

Appendix B: An Alternative Estimation for H¯Δ⋅⋅2 in the Case of Independent Genotypic Effects with Constant Variance by Averaging Separately Across Numerator and Denominator

Appendix C: An Approximation for H¯Δ⋅⋅2 in the General Case

Appendix D: A Standardization of M To Directly Capture the Total Genotypic Variance

Appendix E: Directly Simulate Response to Selection

Footnotes

Literature Cited

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

H² Oakey:

Genotype-specific information gained from $H_{Δ}^{2}$

What do we really want heritability for and how does $H_{Δ}^{2}$ help?

Appendix A: Computing ${\hat{g}}^{BLUP}$ From ${\hat{g}}^{BLUE}$ in a Stage-Wise Approach

Appendix B: An Alternative Estimation for ${\bar{H}}_{Δ \cdot \cdot}^{2}$ in the Case of Independent Genotypic Effects with Constant Variance by Averaging Separately Across Numerator and Denominator

Appendix C: An Approximation for ${\bar{H}}_{Δ \cdot \cdot}^{2}$ in the General Case

Appendix D: A Standardization of $M$ To Directly Capture the Total Genotypic Variance