Abstract
We consider the problem of interpreting negative maximum likelihood estimates of heritability that sometimes arise from popular statistical models of additive genetic variation. These may result from random noise acting on estimates of genuinely positive heritability, but we argue that they may also arise from misspecification of the standard additive mechanism that is supposed to justify the statistical procedure. Researchers should be open to the possibility that negative heritability estimates could reflect a real physical feature of the biological process from which the data were sampled.
Keywords: Heritability, GREML, Linear mixed model, Epistasis, Model misspecification
THE past decade has seen a proliferation of statistical methods for estimating heritability from large genome-wide genetic data sets. In particular, genomic-relatedness-based restricted maximum-likelihood (GREML; Visscher et al. 2006; Yang et al. 2010) has emerged as a standard tool in statistical genetics, along with related procedures such as Haseman–Elston (HE) and LD score regression (Bulik-Sullivan et al. 2015 preprint; Wu and Sankararaman 2018) that are constructed on the same statistical foundation. In many settings, these approaches can be invaluable for demonstrating the existence and approximate level of heritability by aggregating small genetic effects distributed across the genome. This is practically useful for complex traits given that mapping most causal genetic variants remains difficult (Manolio et al. 2009).
Despite its widespread and often indiscriminate application, GREML depends on very strong assumptions that are impossible to verify in detail, are not believed to be literally true, and are rarely subjected to any formal diagnostic or even qualitative critical consideration. In particular, the underlying statistical model assumes strictly additive causal genetic effects. Even if the additive model is accepted as a sensible foundation, nongenetic effects may reshape the appearance of genetic influences. In this paper we take an instrumental approach to GREML and related procedures: heritability measures arise from these calculations regardless of any connection to the additive model, and these need to be interpreted. In particular, we focus on a phenomenon that is typically dismissed as impossible or meaningless: negative heritability.
While the heritability parameter in the GREML model is mathematically compelled to be nonnegative, we explain how a broader view—not a new view, but one close to the root conception of heritability—implies that values of heritability meaningfully extend into the negative range, and hence that negative estimates of heritability can be taken seriously. It is only the extraneous (and not strictly credible) assumptions of GREML’s motivating model that would exclude negative estimates. We buttress this intuition with a biologically plausible story linked to a mathematically coherent model, where negative heritability estimates arising from the standard GREML procedure are meaningful indicators of causal biological processes.
Operational Definitions of Heritability
As Albert Jacquard pointed out decades ago (Jacquard 1983), the narrow-sense heritability of a phenotype, commonly denoted h2, has two distinct conventional meanings:
The proportion of total variance attributable to additive genetic effects.
The slope of the linear regression of children’s phenotypes on the mean parental phenotypes.
Both meanings appear in the earliest works to give a quantitative operational definition to heritability, in particular Lush (1940). For more on the history of the notion of heritability, see Bell (1977).
The nexus between these two meanings is an additive model, where genetic and nongenetic effects are independent and sum together to produce the phenotype. When we have general genetic relatedness (rather than parental relations with fixed 50% expected relatedness), heritability is analogous to a regression coefficient that relates phenotypic similarity to genotypic similarity.
We are particularly concerned here with the interpretation of negative estimates of heritability. The appearance of negative estimates for a parameter of crucial scientific interest that is prima facie positive is unusual, as has often been noted. Negative estimates of the heritability parameter are often dismissed as a mathematical abstraction, values in a range that arises purely formally and that may only be reported for formal purposes. For example, Johnston et al. (2010) obtain a point estimate of −0.109 for the heritability of horn length in Soay sheep, which is immediately dismissed with the statement that “it is impossible to have negative heritability.” The inference is drawn that the true heritability must actually be a small positive number toward the upper end of the confidence interval.
One case where negative heritability estimates have been used in practice is for estimating the average heritability across a group of exchangeable phenotypic measurements, like gene expression. In this case, negative estimates are reported under the presumption that this yields a complete ensemble of estimates that are collectively unbiased. We illustrate one such analysis using RNA-sequencing data from the Genetic European Variation in Health and Disease (GEUVADIS) consortium (Lappalainen et al. 2013). One significant contribution of our work is our calculation of the bias imparted to the heritability estimate when negative estimates are suppressed, helping to elucidate the conditions under which this bias may be presumed negligible.
Our fundamental argument is that negative heritability estimates need to be taken more seriously. The confusion, we contend, comes from the overlap between statistical models that operationalize the two different interpretations of heritability described above. The argument for rejecting negative estimates appears compelling just so long as the focus is only on the additive random-effects model in Equation 1 that often motivates definition 1 of heritability. Variance is nonnegative, hence the ratio of two variances cannot be negative.
While “variance attributable to additive genetic effects” is a basic element of the genetic model in (1), it has no place in the statistical algorithms commonly used to fit the model to real data, including GREML. As our later discussion of (1) will make clear, the GREML estimate of heritability is defined to serve as an estimator of a ratio of two variances, where the numerator is a component of the denominator. The ratio is constrained to lie between 0 and 1, so the estimate seems intended to lie between 0 and 1. However, as we shall explain, the GREML estimate is realized under a more general multivariate normal model, where the natural constraint on h2 is weaker: , where are the singular values of the genotype matrix. If the phenotypes were derived from summing independent additive genetic effects, then the true h2 would have to be nonnegative, but that must be recognized as an additional assumption that would need to be scientifically warranted, as it is not compelled on strictly logical or mathematical grounds.
This discordance between common practice and formal probability theory manifests itself in two distinct roles in modern genetic analyses. First, alleles can exercise actual causal influences on traits, or can tag causal genetic influences through linkage, and such contributions cannot generate negative heritability. However, second, alleles also serve as markers of family and ancestry, markers of relatedness among individuals that may structure historical, behavioral, social, and environmental influences on traits. We argue that there is no reason to assume nonnegative heritability in this latter, more general class of models. As attention expands beyond basic additive genetic models to more complex characterizations of genome-wide genetic architecture, it is important to understand the behavior of h2 beyond its intuitive definition grounded in classical quantitative genetics.
The meaning of negative heritability
Heritability is not a natural, measurable quantity. Heritability is defined only by its role in a model, and the model is inevitably misspecified. The normally distributed random genetic effects have no physical reality, and function instead primarily to justify a model of convenience. In general, the heritability of a trait will vary across populations, measurement devices, choice of scale, and countless environmental factors.
The term “negative heritability” appeared for the first time, so far as we are aware, in a paper by J. B. S. Haldane, written around 1960, but first published posthumously in 1996 (Haldane 1996). Haldane described how the maternal-effect trait of neonatal jaundice could be said to display negative heritability: Because the disease results from maternal antibodies against a fetal antigen, it will not arise in a fetus whose mother herself experienced neonatal jaundice (we thank Jonathan Marchini for pointing out this reference to us). Haldane then calculates a negative heritability from a model that is specialized to the peculiar inheritance structure of this condition.
Once we have accepted the GREML multivariate normal framework, which we will define precisely, we must admit the possibility that the joint distribution of phenotypes and genotypes in a given data set may be best described by an h2 value that is negative. The question this raises is, can such a negative heritability estimate be biologically sensible? The heritability parameter may be identified, in a precise way, with a correlation between genotypic similarity and phenotypic similarity. The model invites us to select an estimate of h2 that best matches the genetic covariance between individuals to the similarity in their traits. Even if we want heritability to be interpreted as a partition of variance, in the sense of definition 1, this will not always be correct. All we have access to from the data are an estimate of something like heritability in the sense of definition 2. High heritability means that individuals with similar genotypes tend to have similar trait values. Zero heritability means that genotypes tell us nothing about similarities in trait values. Negative heritability, then, could be a perfectly sensible description of data where individuals with similar genotypes tend to have more discordant traits than random pairs. In the special case of twin studies, for example, negative heritability simply means that monozygotic twins are less phenotypically similar than dizygotic twins.
Saying that a given set of data might be best described by a negative heritability estimate goes only part of the way toward answering the biological plausibility of negative true heritability. We cannot assume that a small sample of data pairs that are known (for scientific reasons) to be positively correlated will indeed yield a positive empirical correlation. Negative heritability could arise in the same way, as a spurious effect of random fluctuations in data from a system with zero or small positive heritability. The essential question is, could there be a plausible stochastic mechanism that would produce genuine negative heritability, so that as the amount of data generated by the model goes to infinity, the estimate converges to a negative quantity?
GREML is an optimization procedure derived under a Gaussian model, with a heritability parameter that makes good mathematical sense in the negative range. It would be perfectly straightforward to generate data from this model, but it might be difficult to interpret such a procedure in biologically meaningful terms. We seek, then, a negative heritability mechanism that has a similar form to the random-genetic-effects model, but is misspecified in a way that suggests a plausible story. We propose one such mechanism, based on the phenomenon of “phenotypic repulsion.” As with Haldane’s model (which may be understood as a special case), this mechanism has implications that may be implausible or even obviously false in a given real data set. It involves interactions between individuals that are not primarily genetic. However, the point we want to suggest is that as an abstract physical mechanism that could be producing our data it is as mathematically plausible as the linear random-effects model that undergirds GREML. This is only one example of such a mechanism, and the conclusion we advocate is that negative heritability must be acknowledged as a genuine phenomenon for genotype–phenotype data, even if it may be reasonably excluded by the context of some studies. Speculation about the biological settings that could yield negative heritability can also prove an effective guide to understanding when negative heritability estimates may be reliably truncated or ignored.
Our position parallels the advice on “interpretation of negative components of variance” propounded in a very different context by J. A. Nelder in 1954 (Nelder 1954). Nelder considered the problem of ANOVA testing on split-plot experiments, where the error for main plots was found to be smaller than the error for subplots, producing a negative estimate for the residual subplot error. As we have done here, Nelder showed how the apparently negative “variance component” could arise either from sampling error on a positive variance component or from a misspecification of the model, where correlations between measurements have been neglected. “In any particular situation,” Nelder concludes, “it is the statistician’s responsibility to decide which model is more appropriate.”
The GREML model as linear regression
For the remainder of this paper we follow Steinsaltz et al. (2018) in using the letter ψ to represent heritability, to avoid the confusing implication built into the nomenclature h2 that this parameter formally cannot be negative. Our derivations draw on the analysis in that paper, which also discusses criticisms of GREML, such as those in Krishna Kumar et al. (2016).
Underlying GREML, as well as alternative approaches to heritability estimation such as LD score and HE regression, is a random-effects model. Our basic object is a data set consisting of an n × p matrix Z, taken to represent the genotypes of n individuals, measured at p different loci. There is a vector y, representing a scalar trait observation for each of the n individuals. The raw genotypes are counts of alleles taking the values 0, 1, or 2, but the genotype matrix is centered to have mean zero in each column and normalized to have mean square over the whole matrix equal to 1. (Often columns are further standardized to variance 1, but we do not make this assumption.) The model posits the existence of a random vector of genetic influences from the individual SNPs such that
(1) |
The vectors u and ε are assumed to have independent Gaussian entries with zero means and equal variances. The variances are determined by two parameters, which are to be estimated: θ represents the precision (reciprocal variance) of the nongenetic noise and ψ represents the heritability, entering the model as the ratio of genetic variance to total variance. We also use the notation in some places, for concision.
The GREML model has been formulated as a random-effects model, but it is equivalent to a multivariate normal model corresponding to the covariance matrix
(2) |
where is the genetic relatedness matrix (GRM), and θ0 and Ψ0 are the true values of the parameters. In this section we describe how the model may also be understood as a linear regression model.
The initial GCTA paper (Yang et al. 2010) spelled out an analogy between GCTA and a different form of linear regression, regressing squared trait differences between pairs of individuals on corresponding off-diagonal elements of the GRM, with points and correlated errors. This is essentially HE regression, which has recently become a popular heritability estimation method due to its speed and robustness to some degree of model misspecification (Chen 2014; Golan et al. 2014). Instead, we draw an approximate comparison between GREML and regression with n points and independent errors.
Let be the singular-value decomposition of , where indicates the transpose of a matrix X. We rotate the observations to diagonalize the covariance matrix, obtaining
Because the columns of Z have zero means, one of the singular values is zero and the corresponding column of U is proportional to a vector with all elements equal to 1. Thus every other column of U sums to zero (because its columns are orthogonal), and hence each column defines a contrast between weighted groupings of individuals.
The elements of z are independent centered normal random variables, and zi has variance . It follows that are independent chi-square random variables each on one degree of freedom and
where the are distributed as the logarithms of the independent chi-square variables, long-tailed to the left, with mean ≈ −1.302, SD ≈ 2.266, and skewness ≈ −1.643.
The mean of is 1, and when are uniformly small we may approximate our equation by
(3) |
Here, Ψ0 takes on the role of the true slope for a regression of on . It can be estimated by least squares, bearing in mind that the skew of the affects SE of estimation.
Practitioners instead usually estimate ψ via (restricted) maximum likelihood. Obviously, the maximum likelihood estimate (MLE) is optimal when the underlying model assumptions hold. However, formally characterizing the behavior of the MLE is nontrivial, especially under nonindependent genotypes (cf. Jiang et al. 2016) or sparse, nonpolygenic architectures. For this reason, most theoretical mixed model analyses focus on regression-based approaches with simple analytic solutions, like HE regression or the eigenvalue regression in (3). In contrast, we derived an analytic approximation to the GREML estimate in Steinsaltz et al. (2018), which we used to demonstrate several important theoretical properties. First, the MLE has a small negative bias on the order of 1/n, which is negligible in many settings. Second, if only k SNPs are causal, the MLE additionally suffers a nonrandom, nonasymptotically vanishing bias of order 1/k. Finally, population structure tends to make GREML more efficient, at the cost of exposure to potential confounding. In this paper, we apply the same analytical framework to a different question: Are there plausible forms of model misspecification that yield truly negative MLE heritability?
Formally, Steinsaltz et al. (2018) shows how the MLEs can be expressed in terms of quantities and They satisfy
(4) |
Here, Cov is the empirical covariance of vector elements, an operation on vectors defined by , and Var is similarly defined. We also set , and omit the dependence on ψ when helpful. When Ψ0 is not too close to 1 and the variance of the squared singular values is small, the least-squares and MLEs are close to each other.
Suppose, however, that the true variances of the zi include a phenotypic contribution that varies inversely with the singular values of Z. In the phenotypic repulsion model to be specified shortly, to first order in the true slope is as a function of a repulsion parameter α. When α exceeds Ψ0, the true slope turns negative and estimated slopes correctly include negative values. From this regression-based perspective, there is no reason prima facie to assume heritability must be nonnegative.
Bias from rejecting negative heritability estimates
The common practice of truncating the maximum likelihood calculation to nonnegative values introduces bias that is well-known and may be serious for samples of moderate size, both when estimates are truncated at zero and when negative estimates are simply suppressed.
The problem of estimating the probability of negative heritability estimates was studied 50 years ago by Gill and Jensen (1968). We add here a few comments about how the framework described in Steinsaltz et al. (2018) may contribute to understanding the magnitude of the negative heritability estimate problem that arises from sampling noise in settings where the true heritability is understood to be nonnegative, hence where truncation at zero (or rejection of negative estimates) is warranted and guarantees improved estimates in, say, mean squared error. We gain a rough idea of the effect of rejecting negative estimates from a normal approximation where and X has standard normal distribution [see Steinsaltz et al. (2018) for derivation]. Here, ≈ means that the difference between the left-hand and right-hand sides is bounded (in probability) by a constant times , where the constant may depend on Ψ0.
Truncating estimates where by setting them equal to 0 imposes the truncation bias
(5) |
where Φ is the standard normal cumulative distribution function (c.d.f.) Note that by standard inequalities for Φ (Feller 1968) this is positive and bounded by when Ψ0 > 0. This will be very small when nτ2 is even moderately large compared with , which is to be expected except when Ψ0 is zero, or nearly zero. When Ψ0 = 0 we have a nonnegligible positive bias of approximately the same size as the SE σ0, and will thus be highly relevant for any statistical tests of the null hypothesis Ψ0 = 0.
Truncation at zero will at least be recognizable, whereas tacit rejection of negative estimates may leave no trace due to publication bias. If we have an ensemble of estimates that have been selected to be nonnegative, the average has a conditioning bias that is identical to the expression in (5) divided by . In the special case Ψ0 = 0, this doubles the bias relative to truncation.
The phenotypic repulsion model
The notion that new species force their way into phenotypic gaps in the existing ecological community was termed by Darwin as the “principle of divergence” and has been further developed by ecologists under the name “phenotypic repulsion” or “phylogenetic repulsion” (Webb et al. 2002). Species living in close proximity, which are often closely related phylogenetically, coexist by separating from each other phenotypically. A similar kind of competitive exclusion has been proposed (Sulloway 2011) on the individual level to explain observed patterns of developmental variation within human families. Social niche formation within families has also been proposed by Conley et al. (2013), without an explicit mathematical model, as the basis for an evaluation of gene–environment interaction based on misclassified twin types.
Phenotypic repulsion has been more commonly described on the level of species differences than within species. Cardillo (2012) has described negative correlation between phylogenetic distance and flowering period difference among fire-killed but not fire-resistant Banksia species in southwestern Australia. A study of Florida oak species found that many traits differed more, between closely related species, than would be expected by chance (Cavender-Bares et al. 2004). We have not found quantitative studies of phenotypic repulsion between individuals within a species, but it seems plausible that local competition for sunlight combined with range-limited seed dispersion would yield an effective phenotypic repulsion between related plants in a forest setting. In human populations anecdotal evidence suggests that monozygotic twins seek to differentiate themselves from their sibling, which may be a stronger force than genetic similarity for traits with a negligible causal genetic basis.
While our focus is on biologically meaningful phenotypic repulsion, as in the examples above, the repulsion may also result from pure experimental artifacts. For example, in mega-analyses across institutes or laboratories, similarity between analytical or experimental procedures may correlate negatively with similarity in genetic ancestry. This induces repulsion in the sense that genetic similarity predicts experimental dissimilarity. Nonetheless, in this situation the resulting repulsion is not connected to a biologically meaningful process and, rather, would disappear under proper experimental protocols and/or correcting for potential technical confounders like laboratory and batch.
The model we propose here is novel, so may be criticized for failing to provide an example of negative heritability in an established ecological model already in use. We would argue that this model does describe a phenomenon of interest in ecology that has not yet been formalized, and so either the behavior it describes should be taken seriously, or it should provoke a better model of the phenomenon.
We propose a model of phenotypic repulsion where individuals that are most closely related genetically strive to avoid each other phenotypically. This starts with a model like the GREML model described above, where individuals have phenotypes determined by normally distributed effect sizes acting on their individual genotypes. We introduce a penalty term to the probability, of the form
where is the (i, j) entry of the GRM, and α ≤ 1 is a parameter measuring the extent to which genetically similar individuals are pushed to have differing phenotypes. Of course, this setup could be generalized to higher-dimensional phenotypes, with yiyj replaced by an arbitrary inner product. The penalty term is inspired by the statistical mechanics models that have been applied to geographically structured population dynamics, such as the Contact Process (Liggett 1999), used to model the spread of epidemics.
Combining this specification with (2), we see that the phenotypes will now be multivariate normal with mean zero and covariance matrix
(6) |
It follows that the transformed phenotypes z = U* y are independent normal with mean zero and variance
Suppose the data have come from this phenotypic–repulsion model, and we analyze them using the misspecified random-effects model. While it is always possible to get because of random fluctuations, we would like to show that the heritability implied by this model is “really” negative, in the sense that the distribution of converges to a strictly negative value as the number of subjects goes to infinity. This will follow from Proposition 1 (below), when we take the function f in that result to be
(7) |
as long as , since
which is less than 0 for all t ≥ 0.
In other words, to the extent that we say that heritability is defined by the linear model, heritability can be negative if genotypes and phenotypes interact through the environment in a manner like the phenotypic repulsion model. We prove that this is the case—that the heritability to which the estimates converge with increasing population size is negative—in the following Proposition, which is proved in Appendix A.
Proposition 1:
Suppose we have a family of n × n GRMs An for , with eigenvalues , with
(8) |
(9) |
(10) |
We also write .
Let U(n) be the corresponding eigenvector matrix. For each n we have a multivariate normal random vector y(n) with covariance matrix , where is a positive, strictly decreasing, continuously differentiable function. We assume that
(11) |
(We maintain the normalization assumption that .)
Let be the MLE for an observation y(n), calculated from the random-effects model with GRM An. Then defining
(12) |
is bounded above in probability by the strictly negative quantity −δ as . That is, the probability of goes to 0 as .
Although we focus on GREML in this paper, two other prominent approaches to estimate heritability in unrelated samples are HE regression and LD score regression. In Appendix B, we show that HE regression also converges to a negative heritability estimate under the phenotypic repulsion model. While it is simpler to analyze HE, the proof is similar to the proof of Proposition 1: in both cases the key fact is that larger eigenvalues of the kinship matrix are actually associated with lower phenotypic variance under phenotypic repulsion (Equation A3), which is the essence of negative heritability. Moreover, based on established approximate equivalences between HE regression and LD score regression (Bulik-Sullivan 2015; Zhou 2017), LD score regression likely also converges to negative heritability estimates under phenotypic repulsion.
Broadly, these and other estimates of heritability may be understood as approximations for the same parameter as in the GREML model, and hence may be expected to target a negative value for large n, as the various estimates converge. The key point is that none of these procedures is directly estimating a variance component. Each of them is estimating a covariance, and it is easy to see how these covariances can be negative.
Finally, we note that the ordinary asymptotic SE for GREML are no longer accurate under the phenotypic repulsion model. In Appendix C, we derive the asymptotic behavior of the SE under repulsion using the sandwich estimator. However, there seems to be no simple interpretation of the relationship between the genotype distribution and the SE as there is for the well-specified model.
Transcriptome-wide gene expression heritability
We conclude with a brief example that illustrates the practical significance of negative heritability estimates. Although negative estimates of heritability for a single fixed trait are rarely published, it is common to include negative estimates when profiling heritability across a large number of roughly exchangeable traits (Yang et al. 2013; Wright et al. 2014; Bhatia et al. 2015 preprint; Finucane et al. 2015; Zhu et al. 2015; Gusev et al. 2016; Gymrek et al. 2016; Hernandez et al. 2019). Characterizing such -omic-wide heritability is common in functional genomics, where high-throughput measurements of some genomic feature are made at thousands of genomic positions. The most common measurement is (RNA) gene expression, but other prominent examples include methylation levels, chromatin accessibility, expression response to stimuli, or protein expression.
We analyzed an RNA-sequencing data set from the GEUVADIS consortium (Lappalainen et al. 2013). We aligned the raw transcript reads from the European individuals to the reference hg19 transcriptome with the RSEM software package (Li and Dewey 2011). We removed perfectly correlated genes and genes with low expression mean or variance.
For each i in 375 people and j in 4154 genes, we define the phenotype as where nij is the number of observed RNA reads for gene j measured in person i. We centered and scaled each gene to mean zero and variance one.
Separately for each gene y(j), we estimate its cis-heritability, that is, the heritability in expression levels explained by SNPs near to the gene. We do so by fitting our standard model (1) with a genotype matrix Z(j) whose columns correspond to SNPs located up to 1 megabase upstream or downstream of gene j’s transcription start site. Restricting to SNPs near a gene is a common way to enrich for causal SNPs. We discard rare SNPs, which we define as SNPs with minor allele frequencies below 2.5%. Finally, we remove genes with fewer than 1000 corresponding SNPs, which excludes 35 genes.
The column dimensions (p) of the cis genotype matrices range from 1000 to 20,523 across genes, with a mean of 3027 and median of 2754. We fit each with the maximum likelihood routine from Hernandez et al. (2019), yielding 4119 values reflecting systematic variation across genes in their cis-heritability, within the limits imposed by sampling error.
The distribution of the resulting transcriptome-wide cis-heritability estimates is shown in Figure 1 in the form of a smoothed histogram. Clearly, many of the estimates are negative. The mode is close to zero. Removing negative heritability estimates increases the transcriptome-wide average heritability from 6.2 to 9.0%, and truncating at zero increases it from 6.2 to 6.6%.
We repeated the analysis after adjusting for unobserved confounding estimated by 10 probabilistic estimation of expression residuals factors (Stegle et al. 2010). This practice, or variants based on gene expression principal components (Alter et al. 2000) or surrogate variables (Leek and Storey 2007), is standard practice in functional genomics (Stegle et al. 2012). The common aim of these approaches is to approximate latent confounding variation, like experimental batch effects, which can often be partially captured by dimensionality reduction. The confounder estimates are treated as known covariates and residualized from the phenotype and genotype data.
Correcting for 10 probabilistic estimation of expression residuals factors increases many of the values and reduces the incidence of negative as shown in the green curve in Figure 1. However, it is clear that many negative estimates remain. Negative estimates are bound to be part of the picture whenever Ψ0 is small and estimated with low precision, both conditions that will likely hold in most functional genomic analyses for at least the near future.
On the question of whether some negative estimates may be meaningful reflections of nongenetic phenotypic structure, it is best to keep an open mind.
Data availability
GEUVADIS data were obtained from the GEUVADIS consortium. We fit GREML heritability estimates using the LMM implementation in the singher R package (Hernandez et al. 2019), available at https://github.com/andywdahl/SingHer.
Discussion
Negative heritability estimates are common results of standard statistical procedures. Linear random-effects models of genetic causality make negative heritability impossible, inviting us to treat the negative parameter estimates as spurious results produced by statistical noise that should be truncated back to zero, the closest meaningful value. However, these generative linear models are not physically validated: it is not in any sense literally true that phenotypes are produced by additive contributions of alleles and independent noise. We have shown here that other biologically plausible stochastic models would indeed generate data in the negative range of heritability parameters. These are misspecified from the point of view of the linear random-effects models, but they are correctly specified from the point of view of the Gaussian likelihood that is used to estimate the heritability parameter. Our phenotypic repulsion example demonstrates that truly negative heritability can convey a biological fact when individuals tend to differentiate themselves from their relatives. Meaningfully negative heritability should not always be ruled out.
There has long been some dispute about whether these “spurious” negative estimates ought to be included for the sake of unbiasedness, so that the whole ensemble of estimates from multiple studies might be properly centered. We use an approximation for the GREML heritability estimate that we previously derived (Steinsaltz et al. 2018) to formally support this argument as well as to quantify the bias from truncation.
More importantly, we also suggest that the problem should be considered with more nuance: The very possibility of negative heritability depends strongly on the nature of the trait, of the population, and of the sampling procedure. True, asymptotically persistent negative heritability requires strong nonlinear contributions, increasingly strong as the negative parameter approaches the true negative lower bound. This suggests that it may be reasonable to replace truncation at zero by an appropriate shrinkage of negative estimates toward zero, perhaps based on context. This would affect not only negative point estimates, but also confidence intervals centered at small positive values. In a Bayesian framework this would correspond, of course, to assigning heritability a prior distribution with small, nonzero weight on negative values. Statistical models of convenience, such as the variance-component model that underlies GREML and many other heritability estimation approaches, should never drive substantive scientific conclusions, such as declaring that negative heritability is impossible.
Acknowledgments
We thank David Siegel for help processing the GEUVADIS data. D.S. is supported by grant ES/N011856/1 from the UK Economic and Social Research Council and by grant BB/S001824/1 from the Biotechnology and Biological Sciences Research Council. A.D. is supported by grant 1U01HG009080-01 from the U.S. National Institutes of Health. K.W.W. is supported by grant 5P30AG012839 from the U.S. National Institute on Aging.
Appendix A: Proof of Proposition 1
We wish to show that . This will follow if every increasing sequence ni has a subsequence such that . Define the empirical measure , where δx represents unit point mass at x. Since the space of probability measures on is compact, given an increasing sequence ni we may find a subsequence such that converges weakly to a measure σ on . Thus, it will suffice to prove the proposition under the assumption that .
We follow the general principle, enunciated by White (1982), that the MLE for the misspecified model will converge to the closest fit in the Kullback–Leibler sense. In other words, the parameter estimate converges in probability to the location of the maximum expected value of the log-likelihood function. The arguments of White (1982) do not apply directly here, because we are not sampling identically and independently (i.i.d.) random variables; however, by Equation (22) of Steinsaltz et al. (2018), the score function may be written
(A1) |
for , where (Xi) are i.i.d. random variables and
Since for n sufficiently large, we may assume without loss of generality that this holds for all n, perhaps after truncating an initial portion of the sequence. It follows that is defined for any , and by (9) that
are both uniformly bounded over all n, and . By a variant of the central result of Yuan (1997), converges uniformly in ψ to the function that is the limit of the expected values
The covariance is understood here to be with respect to S having distribution σ. [This result does not satisfy exactly the conditions of Yuan (1997), so we provide a proof of the result, stated as Lemma 1.]
We need to show that
(A2) |
From this it will follow that , hence
It remains to verify (A2). We note that the definition of δ makes
for any and . Since is a decreasing function of t, for , we have by Harris’s inequality (Boucheron et al. 2013, Theorem 2.15)
(A3) |
We also have
For and we have
Since f is decreasing, we have for the bound
This proves that for .
For is a decreasing function of t, and is increasing, so (again by Harris’s Inequality) , which completes the proof.
We now prove the key uniform convergence result for Gn. The range of s and of ψ in this result may be rescaled arbitrarily, so for convenience of notation we will denote these by [0,1].
Lemma 1.
Let be a continuous function such that for all
are both finite. Let be atomic probability measures on concentrated at n points . We suppose that the measures σn converge weakly to a probability measure on , and that there is a such that
(A4) |
(A5) |
Let be independent random variables with and . Define for
Then converges uniformly in probability to the function defined by
That is,
The condition (A4) may be weakened by replacing by , for δ positive, and equivalently for LS, as long as we have correspondingly stronger moment bounds on Xn,i. We have stated it in this form for simplicity.
Proof.
The sublinearity of the Lipschitz constant implies that the Lipschitz constant of Gn is a random variable bounded by
We have
by the Cauchy–Schwarz Inequality. Also by the Cauchy–Schwarz Inequality, we have
Thus .
We have
(A6) |
Fix any positive integer k. Because of the bounds on the Lipschitz constants of G and Gn, the second term is bounded by
(A7) |
Because of the assumed weak convergence , this converges to as for each fixed k. Since k is arbitrary, the second term in fact converges to 0 as .
To deal with the first term we use the standard method of chaining (cf. Pollard 1990, chapter 3): we define finite skeletons of [0,1], subsets with , defined by
We then proceed by approximating any point by a sequence of nearest neighbors , so that . Since for any continuous function f
we have the basic chaining inequality
(A8) |
where
We have
Now note that
For any collection of random variables we know that
so for
By Minkowski’s Inequality, we have
So finally, by (A8), we have
(A9) |
Applying Markov’s inequality, and combining this with (A7), completes the proof.
Appendix B: HE Regression Under Phenotypic Repulsion
According to Equation (8) of Wu and Sankararaman (2018) the HE regression estimate of genetic variance may be defined by
where tr (·) is the trace of a matrix. As the heritability is the ratio of to a positive estimate of phenotypic variance, the estimated heritability will be negative whenever is negative.
Under our model, y = UTx is the vector of phenotypes, with U an orthogonal matrix, x a vector of i.i.d. standard normal random variables, and T the diagonal matrix with on the diagonal, with
These are the same as , where f is given in Equation 7.
The denominator in the HE estimator of is , which converges to the constant we have called C3−1. The numerator is
where S is the diagonal matrix with si on the diagonal. By the same argument as in the proof of Proposition 1, where we apply Harris’s inequality to show that (Equation A3), we see that . The Weak Law of Large Numbers implies that the numerator remains bounded above (in probability) as by −C1C2. Hence the HE regression estimate targets a negative number smaller than for large n.
Appendix C: SE Under Repulsion
Asymptotically correct SE for the standard genotypic random-effects model may be calculated from the Fisher information. We carry out the calculations here for the parametrization in terms of θ and , to simplify the notation. In the region of ψ negative or close to 0 the variance and covariance of ψ hardly differs from that of ϕ, and in any case the transformation is straightforward.
Starting from the transformed phenotypes , and using the definitions of vi and wi provided above Equation 4, we have the log likelihood
The first two derivatives may be written as
and
where is used to denote the mean of a sequence (ai). Thus .
We immediately have . When samples are drawn from the true model with parameters we have that are independent chi-squared random variables with 1 degree of freedom. Thus, the expected Fisher information is
The covariance matrix for is the inverse
In particular, the asymptotic variance of is . It follows immediately that the asymptotic variance of is .
For the (misspecified) phenotypic repulsion model three changes are needed: First, the expected value of is no longer , but
Second, the Fisher information is evaluated not at the parameters which no longer define the distribution from which the data are sampled, but rather at the best-fit parameters . We still have , but there is no simple representation for , which will solve the equation . The expected Fisher information is
Here and below β and w are evaluated at .) The inverse is
The third change is that the asymptotic variance for a misspecified model is not given by , but by the sandwich estimator White (1982) , where V is the covariance matrix of , which is
Footnotes
Communicating editor: S. Browning
Literature Cited
- Alter O., Brown P. O., and Botstein D., 2000. Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97: 10101–10106. 10.1073/pnas.97.18.10101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bell A. E., 1977. Heritability in retrospect. J. Hered. 68: 297–300. 10.1093/oxfordjournals.jhered.a108840 [DOI] [PubMed] [Google Scholar]
- Bhatia G., Gusev A., Loh P.-R., Vilhjálmsson B. J., Ripke S. et al. , 2015. Haplotypes of common SNPs can explain missing heritability of complex diseases. bioRxiv (Preprint posted July 12, 2015).doi: 10.1101/022418 [Google Scholar]
- Boucheron S., Lugosi G., and Massart P., 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford: 10.1093/acprof:oso/9780199535255.001.0001 [DOI] [Google Scholar]
- Bulik-Sullivan B., 2015. Relationship between LD score and Haseman-Elston regression. bioRxiv (Preprint posted April 20, 2015).doi: 10.1101/018283 [Google Scholar]
- Bulik-Sullivan B. K., Loh P.-R., Finucane H. K., Ripke S., Yang J. et al. , 2015. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47: 291–295. 10.1038/ng.3211 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cardillo M., 2012. The phylogenetic signal of species co-occurrence in high-diversity shrublands: different patterns for fire-killed and fire-resistant species. BMC Ecol. 12: 21 10.1186/1472-6785-12-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cavender-Bares J., Ackerly D. D., Baum D. A., and Bazzaz F. A., 2004. Phylogenetic overdispersion in Floridian oak communities. Am. Nat. 163: 823–843. 10.1086/386375 [DOI] [PubMed] [Google Scholar]
- Chen G.-B., 2014. Estimating heritability of complex traits from genome-wide association studies using IBS-based Haseman–Elston regression. Front. Genet. 5: 107 10.3389/fgene.2014.00107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Conley D., Rauscher E., Dawes C., Magnusson P. K. E., and Siegal M. L., 2013. Heritability and the equal environments assumption: evidence from multiple samples of misclassified twins. Behav. Genet. 43: 415–426. 10.1007/s10519-013-9602-1 [DOI] [PubMed] [Google Scholar]
- Feller W., 1968. An Introduction to Probability and Its Applications. Vol. 1 Ed. 3. John Wiley & Sons, New York. [Google Scholar]
- Finucane H. K., Bulik-Sullivan B., Gusev A., Trynka G., Reshef Y. et al. , 2015. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47: 1228–1235. 10.1038/ng.3404 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gill J.L., and Jensen E., 1968. Probability of obtaining negative estimates of heritability. Biometrics 24: 517–526. 10.2307/2528315 [DOI] [PubMed] [Google Scholar]
- Golan D., Lander E. S., and Rosset S., 2014. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl. Acad. Sci. USA 111: E5272–E5281. 10.1073/pnas.1419064111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gusev A., Ko A., Shi H., Bhatia G., Chung W. et al. , 2016. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48: 245–252. 10.1038/ng.3506 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gymrek M., Willems T., Guilmatre A., Zeng H., Markus B. et al. , 2016. Abundant contribution of short tandem repeats to gene expression variation in humans. Nat. Genet. 48: 22–29. 10.1038/ng.3461 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haldane J. B. S., 1996. The negative heritability of neonatal jaundice. Ann. Hum. Genet. 60: 3–5. 10.1111/j.1469-1809.1996.tb01165.x [DOI] [PubMed] [Google Scholar]
- Hernandez R. D., Uricchio L. H., Hartman K., Ye C., Dahl A. et al. , 2019. Ultrarare variants drive substantial cis heritability of human gene expression. Nat. Genet. 51: 1349–1355. 10.1038/s41588-019-0487-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jacquard A., 1983. Heritability: one word, three concepts. Biometrics 39: 465–477. 10.2307/2531017 [DOI] [PubMed] [Google Scholar]
- Jiang J., Li C., Paul D., Yang C., and Zhao H. et al. , 2016. On high-dimensional misspecified mixed model analysis in genome-wide association study. Ann. Stat. 44: 2127–2160. 10.1214/15-AOS1421 [DOI] [Google Scholar]
- Johnston S. E., Beraldi D., McRae A. F., Pemberton J. M., and Slate J., 2010. Horn type and horn length genes map to the same chromosomal region in Soay sheep. Heredity 104: 196–205. 10.1038/hdy.2009.109 [DOI] [PubMed] [Google Scholar]
- Krishna Kumar S., Feldman M. W., Rehkopf D. H., and Tuljapurkar S., 2016. Limitations of GCTA as a solution to the missing heritability problem. Proc. Natl. Acad. Sci. USA 113: E61–E70 (erratum: Proc. Natl. Acad. Sci. USA 113: E813). 10.1073/pnas.1520109113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lappalainen T., Sammeth M., Friedländer M. R., ’t Hoen P. A., Monlong J., 2013. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501: 506–511. 10.1038/nature12531 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leek J. T., and Storey J. D., 2007. Capturing heterogeneity in gene expression studies by Surrogate Variable Analysis. PLoS Genet. 3: e161 10.1371/journal.pgen.0030161 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li B., and Dewey C. N., 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12: 323 10.1186/1471-2105-12-323 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liggett T., 1999. Stochastic Interacting Systems: Contact, Voter, and Exclusion Processes. Springer Verlag, New York: 10.1007/978-3-662-03990-8 [DOI] [Google Scholar]
- Lush J. L., 1940. Intra-sire correlations or regressions of offspring on dam as a method of estimating heritability of characteristics. Proceedings of the American Society of Animal Nutrition 1940: 293–301. [Google Scholar]
- Manolio T. A., Collins F. S., Cox N. J., Goldstein D. B., Hindorff L. A. et al. , 2009. Finding the missing heritability of complex diseases. Nature 461: 747–753. 10.1038/nature08494 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nelder J., 1954. The interpretation of negative components of variance. Biometrika 41: 544–548. 10.1093/biomet/41.3-4.544 [DOI] [Google Scholar]
- Pollard D., 1990. Empirical Processes: Theory and Applications, Volume 2 of CBMS-NSF Regional Conference Series in Probability and Statistics. Institute of Mathematical, Hayward, CA. [Google Scholar]
- Stegle O., Parts L., Durbin R., and Winn J., 2010. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6: e1000770 10.1371/journal.pcbi.1000770 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stegle O., Parts L., Piipari M., Winn J., and Durbin R., 2012. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7: 500–507. 10.1038/nprot.2011.457 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steinsaltz D., Dahl A., and Wachter K. W., 2018. Statistical properties of simple random-effects models for genetic heritability. Electron. J. Stat. 12: 321–358. 10.1214/17-EJS1386 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sulloway F. J., 2011. Why siblings are like darwin’s finches: birth order, sibling competition, and adaptive divergence within the family, pp. 87–119 in The Evolution of Personality and Individual Differences, edited by Buss D. M. and Hawley P. H.. Oxford University Press, New York. [Google Scholar]
- Visscher P. M., Medland S. E., Ferreira M. A. R., Morley K. I., Zhu G. et al. , 2006. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2: e41 10.1371/journal.pgen.0020041 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Webb C. O., Ackerly D. D., McPeek M. A., and Donoghue M. J., 2002. Phylogenies and community ecology. Annu. Rev. Ecol. Syst. 33: 475–505. 10.1146/annurev.ecolsys.33.010802.150448 [DOI] [Google Scholar]
- White H., 1982. Maximum likelihood estimation of misspecified models. Econometrica 50: 1–25. [Google Scholar]
- Wright F. A., Sullivan P. F., Brooks A. I., Zou F., Sun W. et al. , 2014. Heritability and genomics of gene expression in peripheral blood. Nat. Genet. 46: 430–437. 10.1038/ng.2951 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y., and Sankararaman S., 2018. A scalable estimator of SNP heritability for biobank-scale data. Bioinformatics 34: i187–i194. 10.1093/bioinformatics/bty253 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J., Benyamin B., McEvoy B. P., Gordon S., Henders A. K. et al. , 2010. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42: 565–569. 10.1038/ng.608 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J., Lee T., Kim J., Cho M. C., Han B. G. et al. , 2013. Ubiquitous polygenicity of human complex traits: genome-wide analysis of 49 traits in Koreans. PLoS Genet. 9: e1003355 10.1371/journal.pgen.1003355 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yuan K.-H., 1997. A theorem on uniform convergence of stochastic functions with applications. J. Multivariate Anal. 62: 100–109. 10.1006/jmva.1997.1674 [DOI] [Google Scholar]
- Zhou X., 2017. A unified framework for variance component estimation with summary statistics in genome-wide association studies. Ann. Appl. Stat. 11: 2027–2051. 10.1214/17-AOAS1052 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu Z., Bakshi A., Vinkhuyzen A. A., Hemani G., Lee S. H. et al. , 2015. Dominance genetic variation contributes little to the missing heritability for human complex traits. Am. J. Hum. Genet. 96: 377–385. 10.1016/j.ajhg.2015.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
GEUVADIS data were obtained from the GEUVADIS consortium. We fit GREML heritability estimates using the LMM implementation in the singher R package (Hernandez et al. 2019), available at https://github.com/andywdahl/SingHer.