Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Dec 1.
Published in final edited form as: Hum Genet. 2013 Jul 19;132(12):10.1007/s00439-013-1334-z. doi: 10.1007/s00439-013-1334-z

How meaningful are heritability estimates of liability?

Penny H Benchek 1,, Nathan J Morris 1
PMCID: PMC3843952  NIHMSID: NIHMS507670  PMID: 23867980

Abstract

It is commonly acknowledged that estimates of heritability from classical twin studies have many potential shortcomings. Despite this, in the post-GWAS era, these heritability estimates have come to be a continual source of interest and controversy. While the heritability estimates of a quantitative trait are subject to a number of biases, in this article we will argue that the standard statistical approach to estimating the heritability of a binary trait relies on some additional untestable assumptions which, if violated, can lead to badly biased estimates. The ACE liability threshold model assumes at its heart that each individual has an underlying liability or propensity to acquire the binary trait (e.g., disease), and that this unobservable liability is multivariate normally distributed. We investigated a number of different scenarios violating this assumption such as the existence of a single causal diallelic gene and the existence of a dichotomous exposure. For each scenario, we found that substantial asymptotic biases can occur, which no increase in sample size can remove. Asymptotic biases as much as four times larger than the true value were observed, and numerous cases also showed large negative biases. Additionally, regions of low bias occurred for specific parameter combinations. Using simulations, we also investigated the situation where all of the assumptions of the ACE liability model are met. We found that commonly used sample sizes can lead to biased heritability estimates. Thus, even if we are willing to accept the meaningfulness of the liability construct, heritability estimates under the ACE liability threshold model may not accurately reflect the heritability of this construct. The points made in this paper should be kept in mind when considering the meaningfulness of a reported heritability estimate for any specific disease.

Introduction

Heritability may be defined as the proportion of the total variability in an observed trait attributable to genetic effects in a given population at a particular point in time (Lewontin 1974; Visscher et al. 2008). It serves as a measure of the extent to which a disease or trait is genetically determined. Heritability studies have heavily influenced our understanding of disease etiology by showing us evidence that numerous diseases and traits are largely genetic in nature. However, heritability studies have also been at the center of numerous historical controversies about nature vs. nurture (Plomin and Spinath 2004; Galton 1869; Kendler et al. 1993; Pam et al. 1996; Joseph 2000). Most recently there has been much dismay and some controversy regarding the amount of heritability explained by genome wide association studies (GWAS) (Goldstein 2009; Eichler et al. 2010; Maher 2008; Manolio et al. 2009; Zuk et al. 2012). Still, before we bemoan or applaud our ability to explain heritability, we should be confident about the heritability estimates we are adopting. We may be trying to explain heritability that does not exist or we may believe that we have adequately explained heritability when indeed there is much more. While there are numerous designs for studying heritability, here we focus on the classic twin study design. However, our work is broadly applicable to other study designs that utilize a liability model.

Studying how a trait aggregates in families offers insight regarding the genetic basis of a trait. However, simply observing an increased occurrence of a trait in families does not tell us whether that increase is due to genes or due to some aspect of the family's common environment. Classic twin studies (Boomsma et al. 2002) involving monozygotic (MZ) and dizygotic (DZ) twins are one common approach to disentangling the effects due to genes from those due to the common environment. There are numerous well-known limitations to twin studies. The main limitations stem from the following assumptions: MZ and DZ twin pairs share their environments to the same extent; gene-environment correlations and interactions are minimal for the trait; random mating occurs in the population; and twins are no different from the general population in terms of the trait (Rijsdijk and Sham 2002). Newer research has also raised the possibility that MZ twins are not even truly identical (Baranzini et al. 2010; Bruder et al. 2008). Many researchers are aware of these limitations but still believe that twin studies provide a valuable source of information about heritability. Here, we will assume for the sake of argument that the standard twin study assumptions are reasonable. Unfortunately, when dealing with binary traits (e.g., affected vs. unaffected), there is an additional set of assumptions, which are commonly used. In this article, we show that even moderate deviations from the most fundamental of these additional assumptions may lead to wildly incorrect results.

One of the most well-known approaches to analyzing binary trait twin data is to use a liability threshold model (Falconer 1965). Curnow (1972) has pointed out that the threshold model is mathematically equivalent to a probit-like risk model. Liability threshold models assume that each individual has a hypothetical continuous liability composed of latent genetic and environmental factors. In a liability model, the observed binary trait (Y) is related to the continuous liability (Z) by way of a threshold (τ). That is,

Y={1ifZ>T0otherwise.

Thus, the probability that an individual is affected is the probability that their liability exceeded some threshold. Put more mathematically: P(Y = 1) = P(Z > τ), where Z follows some specified liability distribution. Because the liabilities for two twins are dependent, the joint distribution of each twin pair type (i.e., MZ or DZ) must be specified. Even if we are willing to accept the meaningfulness of the liability construct, the validity of heritability estimates from the liability model is dependent on correctly specifying the underlying distribution of the liability. For example, Kidd and Cavalli-Sforza (1973) noted that very different heritability estimates of schizophrenia could be obtained by assuming a single underlying gene or by assuming a polygenic model (Elston 1977).

The ACE liability model (Neale and Cardon 1992) is the most frequently used liability model. The ACE model holds that the liability is made up of three components: additive genetic factors ‘A’, common environmental factors ‘C’, and random environmental factors ‘E’. It is assumed that the interactions between the components, as well as factors within the components, are minimal. Each of the factors within each component is believed to have a small and additive effect on the liability; thus, each component is assumed to follow a normal distribution. Because the total liability is equal to the summation of the three normally distributed components, it follows that a single individual's total liability under the ACE model is also normally distributed; that is, Z~N(μ,σA2+σC2+σE2), where the total variability in liability is partitioned according to the variability explained by each of the components. Additionally, the joint liability distribution for twin pairs under the ACE model is assumed to be multivariate normal (MVN). The joint twin pair distribution under the ACE model is

[Z1Z2]=MVN([μμ],[1ϕσA2+σC2ϕσA2+σC21]),

where the expected proportion of shared genes (ϕ) is set to 1 for MZ twins and 1/2 for DZ twins, the mean liability (μ) is set to 0, and total variability in liability (σT2) is set to 1 (i.e., σT2=σA2+σC2+σE2=1). Assigning values to the mean (μ) and total variance (σT2) is required to fix the scale for the liability, avoiding the inherent problem of identifiability (Huang 2005). It does not represent a problem because we are only interested in the proportion of the total variability that each component explains. Curnow (1972) points out that there always exists a transformation which would make the underlying liability model marginally normal. Unfortunately, marginal normality does not imply multivariate normality.

We have no assurance that the MVN liability assumption matches reality. We cannot test the assumption of a MVN liability because the liability and its contributing factors are unobservable. Therefore, if we are to believe the heritability estimates derived from the ACE liability model, we should know just how robust the model is to violations of the MVN assumption.

Methods

Heritability estimation assuming the ACE model

Maximum likelihood is one relatively straightforward approach to estimating heritability using MZ and DZ twin data under the ACE liability model (Rijsdijk and Sham 2002). In this approach, an estimate (θ^ML) is achieved by finding the parameters that maximize the log-likelihood function based on the multinomial distribution,

logL(θ|p^,n)=jnjip^ijIn(pij(θ)), (1)

where θ=(σA2,σC2,τMZ,τDZ) are the parameters to be estimated; j ∈ {MZ, DZ} defines twin zygosity; i ∈ {0, 1, 2} indicates the number of affected individuals in the twin pair; nj is the number of MZ or DZ twin pairs; p^ij is the proportion of twin pairs with i affected individuals among the twins with zygosity j; and pij(θ) is the probability that a pair of twins will have i affected individuals given that they have zygosity j. The pij(θ) in Eq. (1) is calculated by numerically integrating over the joint twin pair distribution under the ACE model. The threshold values (τMZ and τDZ) are used as upper or lower limits of integration, depending on which pij(θ) is being calculated. These thresholds are determined by the prevalence of the trait under the standard normal distribution. Here, we set the prevalence to be the same for both MZ and DZ twins.

Asymptotic Bias in the ACE model estimator when the assumptions are not met

If the ACE liability model is “true,” then, as the sample size becomes large, the maximum likelihood estimate should become very close to the “true” value. Put more mathematically, under standard conditions, the estimate converges in probability to the truth: θ^MLPθtrue. Of more interest is the question: what happens for large samples when the ACE model is “false”? This question can easily be answered using the tools of theoretical statistics (White 1982) as discussed in the “Appendix”. Using those results, we were able to calculate the large sample bias in the heritability estimates.

Under the ACE model, each of the three components contributing to the liability (i.e., additive genetic, common environmental, and random environmental) is assumed to follow a normal distribution. Thus, if any of the components deviate from the assumed normal distribution, then the MVN assumption is likely to be violated. We investigated the biases of the ACE liability model maximum likelihood estimator under the following hypothetical scenarios:

  1. The true liability consists of common environmental and random effects that are normally distributed; however, the additive genetic effect is due to a single diallelic gene with three distinct genotypes.

  2. The true liability consists of additive genetic and random effects that are normally distributed; however, the common environmental effect follows a Bernoulli distribution due to a single dichotomous common environmental exposure (e.g., secondhand smoke), where both twins are exposed or neither twin is exposed.

  3. The true liability consists of additive genetic and random effects that are normally distributed; however, the common environmental effects follow a t distribution.

  4. The true liability consists of additive genetic and common environmental effects that are normally distributed; however, the random effects follow a mixture of normal distributions, such that the marginal probability distribution of each twin is similar to the distribution of systolic blood pressure (SBP) found in NHANES data (Centers for Disease Control and Prevention (CDC) 2009).

When we refer to the “true” liability above, we are referring to the data generative model as opposed to the data analysis model. The first three scenarios were chosen to represent a wide spectrum of deviations from the component normality assumption. The first two are relatively strong deviations from normality while the third scenario is relatively subtle. The fourth scenario was chosen as an example of a real trait that does not follow a normal distribution. The bias in percentage heritability was calculated as

σA2(θ^ML)σA2(θtrue)σA2(θtrue)×100, (2)

where σA2(θtrue) represents the true additive genetic variance of the liability and σA2(θ^ML) represents the value to which the maximum likelihood estimate will converge. We calculated the bias in heritability according to Eq. (2) with fixed parameter values σA2 and σC2 across varying scenario specific parameter values (e.g., prevalence, gene frequency, exposure probability and degrees of freedom). Threshold values for trait liability were estimated assuming known population prevalence of the trait.

Non-asymptotic bias in the ACE model estimator when the assumptions are met

We further explored bias in the ACE model estimator through simulations derived under the ACE model. Each sample consisted of an equal mix of MZ and DZ twin pairs. For example, a sample size of 100 consists of 50 MZ twin pairs and 50 DZ twin pairs. Here we fixed sample size at 100, 500, 1,000, and 10,000. For each sample size, we fixed trait prevalence at 0.10, 0.20 and 0.30. For each of the combinations of sample size and prevalence, we fixed parameter values σA2 and σC2 to reflect high heritability/low common environment and vice versa (i.e.,σA2=0.60,σC2=0.15andσA2=0.15,σC2=0.60) We ran 5,000 simulations based on each of these configurations and calculated bias as in Eq. (2). Here it is not assumed that the population prevalence is known, thus threshold values are determined by the sample prevalence under each simulated data set.

Results

Asymptotic bias in the ACE model estimator when a single causal diallelic gene is present

For the scenario where the additive genetic effects do not follow a normal distribution due to the presence of a single causal diallelic gene (Fig. 1), we mostly see that heritability is underestimated. However, under the combination of low gene frequency and low prevalence (Fig. 1a), we see an overestimation of heritability (e.g., for gene frequency and prevalence both at 0.025, the bias is 84 %). We also see that at low gene frequencies, the magnitude of bias is generally the greatest. This is most pronounced at the highest prevalence levels, for example (Fig. 1d), for gene frequency = 0.025 and prevalence = 0.50, the bias is −89 %. The same is true, although to a lesser extent, for low prevalence. The effect of low prevalence is most pronounced at the highest and lowest gene frequencies. Bias is generally absent for the combination of higher prevalence and higher gene frequencies (Fig. 1a–c). However, this unbiased region decreases as the true heritability increases (Fig. 1d).

Fig. 1.

Fig. 1

Percent bias as a function of trait prevalence and gene frequency (gene freq.). Percent bias was calculated according to Eq. (2), where the true distribution of the additive genetic component of the liability is non-normal due to a single causal diallelic gene, and the true parameter values are: a σA2=0.2, σC2=0.6; b σA2=0.4, σC2=0.4; c σA2=0.6, σC2=0.2; d σA2=0.8, σC2=0.1

Asymptotic bias in the ACE model estimator when a single dichotomous common environmental exposure is present

Consider the scenario where the common environmental effects follow a Bernoulli distribution (Fig. 2). Under the combination of low exposure probability and low prevalence, we see an overestimation of heritability (Fig. 2a, b). For example, for probability and prevalence both at 0.025, the bias is 231 % (Fig. 2a). We also see that for either low probability or low prevalence, the magnitude of bias is greatest. The effect of low prevalence is most pronounced at the highest probabilities and the effect of low probability is most pronounced at the highest prevalence values. For example (Fig. 2a), for probability = 0.025 and prevalence = 0.50, the bias is 381 %. Bias is generally absent for the combination of higher prevalence and higher exposure probabilities (Fig. 2d). However, this unbiased region decreases as the true variability due to common environmental effects increases (Fig. 2a–c).

Fig. 2.

Fig. 2

Percent bias as a function of trait prevalence and exposure probability. Percent bias was calculated according to Eq. (2), where the true distribution of the common environmental component of the liability is Bernoulli, and the true parameter values are: a σA2=0.1, σC2=0.8; b σA2=0.2, σC2=0.6; c σA2=0.4, σC2=0.4; d σA2=0.6, σC2=0.2

Asymptotic bias in the ACE model estimator when the common environmental effects follow a t distribution

In the scenario where the common environmental effects follow a t distribution (Fig. 3), the magnitude of bias is greatest for the lowest degrees of freedom. This is especially true under the combination of low degrees of freedom and low or high prevalence. As an example of the combination of low degrees of freedom and low prevalence (Fig. 3a), we see that with df = 3 and prevalence = 0.025, the bias is −60 %. As an example of the combination of low degrees of freedom and high prevalence (Fig. 3a), we see that with df = 3 and prevalence = 0.50, the bias is 76 %. Bias is generally absent for the combination of higher prevalence and higher degrees of freedom.

Fig. 3.

Fig. 3

Percent bias as a function of trait prevalence and degrees of freedom (df). Percent bias was calculated according to Eq. (2), where the true liability has a t distributed common environmental component. The true parameter values are: a σA2=0.1, σC2=0.8; b σA2=0.2, σC2=0.6; c σA2=0.4, σC2=0.4; d σA2=0.6, σC2=0.2

Asymptotic bias in the ACE model estimator when the random environmental effects follow a mixture of normal distributions

We looked at systolic blood pressure (SBP) for males 55+, obtained from the NHANES (2009–2010) datasets (Centers for Disease Control and Prevention (CDC) 2009). Adjustments for treated blood pressure were made by adding 10 mmHg to the corresponding SBP values (Cui et al. 2003). The resulting kernel density plot (Fig. 4a) and QQ plot (Fig. 4b) suggest that the data deviate from normality, particularly in the right tail of the distribution. Therefore, we derived a mixture model of four normal distributions using the R package ‘mixtools’ (Benaglia et al. 2009). Comparing the empirical CDF to the CDF of the mixture model, we see that the mixture model fits the data well (Fig. 4c). Although we used the mixture model, numerous other models would fit the data equally well. We simply wanted to select a distribution that mimics as closely as possible the true distribution of SBP for males 55+. We investigated the situation where the continuous SBP variable was converted to a binary hypertension trait using a cutoff value. If the genetic and common environmental components are normally distributed while the random component involves a mixture of normal distributions, then it is straightforward to calculate the asymptotic bias.

Fig. 4.

Fig. 4

Systolic blood pressure (SBP) in males 55+. a Density plot of SBP; b QQ plot of expected quantiles if SBP is normally distributed versus observed quantiles; c CDF plot with step function = empirical CDF from SBP data and smooth function = CDF from mixture model; d percent bias as a function of cutoff value. Cutoff value determines hypertension (HTN) status. Percent bias was calculated according to Eq. (2), where the true distribution of the random environment is set to be a mixture of normal distributions. The true parameter values are: continuous lines σA2=0.4, σC2=0.1; dashed lines σA2=0.3, σC2=0.2

For this scenario, where the random environmental effects follow a mixture model distribution, we looked at two fixed combinations for σA2 and σC2. The first was with σA2=0.4 and σC2=0.1 the second was with σA2=0.3 and σC2=0.2. These choices of heritability reflect current estimates of the heritability of blood pressure (Cowley et al. 2012). Both fixed parameter combinations showed similar bias (Fig. 4). The magnitude of bias is greatest for the lowest and highest cutoff values (≈60 %). Under high cutoff values, the ACE model estimator tends to underestimate heritability and under low cutoff values heritability tends to be overestimated. For the first combination, the bias estimates for commonly used cutoff values (120, 140, and 160) were 48, 29, and −14 %, respectively. For the second combination, the corresponding bias estimates were 50, 32, and −11 %. These cutoff values represent the boundaries between prehypertension, stage 1 hypertension, and stage 2 hypertension, respectively, as defined by the American Heart Association (Kurtz et al. 2005).

Non-asymptotic bias in the ACE model estimator when the assumptions are met

For the ACE model simulated data, heritability tends to be underestimated for σA2=0.60 and σC2=0.15 (Fig. 5a) and tends to be overestimated for σA2=0.15 and σC2=0.60 (Fig. 5b). Bias is greatest for the lowest sample size(n = 100) and the lowest prevalence (prevalence = 0.10) (Fig. 5). For σA2=0.60 and σC2=0.15 and sample size 100, the magnitude of bias decreases as trait prevalence increases, however, not substantially (i.e., prevalence = 0.10, 0.20, and 0.30gives bias = −24, −19, −16 %, respectively). For σA2=0.15 and σC2=0.60, the magnitude of bias is even higher for sample size of 100 (i.e., 81, 67, and 61 %, respectively). In general, bias decreases as either sample size or prevalence increases. Prevalence at0.10 shows the highest bias for sample sizes 100, 500, and 1,000. Surprisingly, bias is still at 16 % for σA2=0.15 and σC2=0.60 and sample size 1,000.

Fig. 5.

Fig. 5

Percent bias as a function of sample size and prevalence. Percent bias was calculated according to Eq. (2), where the true liability distribution follows the ACE model. The true parameter values are: a σA2=0.6, σC2=0.15; b σA2=0.15, σC2=0.6

Discussion

Here we have focused on the ACE liability model to calculate the heritability of binary traits. There have been other recent efforts, such as GCTA (Lee et al. 2011), to calculate heritability of binary traits based on the liability model utilizing GWAS data. However, these efforts have distinctly different goals from those of the ACE liability model. They attempt to estimate the heritability explained by SNPs as opposed to the total heritability. While we have not investigated such methods in this work, we suspect that the concerns we have raised about the ACE liability model will have some relevance to threshold modeling approaches using large pedigrees as well as approaches such as GCTA.

Our intention in selecting the scenarios describing asymptotic bias was to provide potent examples of the consequences of both “extreme” and “subtle” deviations from the MVN assumption. It is clear that a single gene or single common environmental exposure, where many genes and many common environmental exposures are assumed, is representative of an “extreme” deviation. It is also clear that a t distributed common environment, when normality is assumed, is representative of a “subtle” deviation, as the t distribution and the normal distribution are quite similar. While we refer here to “extreme” scenarios, we do not mean that these scenarios are unrealistic. For example, scenario 1 is not only representative of a large number of single gene Mendelian disorders, but it also represents situations with multiple variants inherited together on a single haplotype. Likewise, diseases that are caused by many rare variants will behave similarly to our diallelic model because it is unlikely that any given family will have more than one disease variant segregating within it.

For the first three scenarios examined, we saw that heritability estimates, under many parameter combinations, were highly biased. In each of these scenarios, we also observed regions where heritability was estimated well. Among the regions where bias was substantial, the direction of bias was not consistent across parameter combinations. Under some parameter combinations, heritability was overestimated and for others it was underestimated. Bias increased as the variance of the non-normally distributed component ( σA2 or σC2) increased. This is intuitive as the ACE model assumed that each of the three components is normally distributed. Therefore, the greater the proportion of the total variability the non-normally distributed component explains, the greater the deviation from the ACE model's assumption of a MVN distribution, and subsequently the greater the bias.

Under the two exemplary scenarios of “severe” deviations, we saw that absolute bias was highest for the lowest parameter values. Even in the case where the proportion of the variance attributable to the associated component is low, we saw that the combination of low prevalence and low gene frequency or low probability of exposure created substantial bias.

Of the three scenarios reviewed, the t distributed common environment scenario was the subtlest in its deviation from the MVN assumption. Even with this subtlety, for low degrees of freedom, the bias was substantial, particularly under low trait prevalence. This effect was most pronounced in the case where the proportion of the variability explained by the common environment was high, yet was still significant when the proportion explained was low.

The fourth scenario was selected as a realistic example of an observable trait that does not follow a normal distribution. We designed the scenario to accurately reflect the non-normal distribution of SBP found in NHANES data (Centers for Disease Control and Prevention (CDC) 2009). It is not uncommon to define a binary hypertension trait based on a cut off value. Thus, our example is not entirely contrived. For our chosen scenario, biased estimates occurred for most cutoff values, with the magnitude and sign dependent on the cutoff.

The underlying biases found in the ACE model simulated results showed that sample size and prevalence level are a concern even when ACE model assumptions are met. Although these biases can be controlled for through increased sample size, the increase in sample size necessary to eliminate bias is dependent on trait prevalence. In cases where the trait prevalence is very low, the necessary increase in sample size to avoid bias may not be obtainable.

Consequently, individuals are often ascertained based on their trait status, which creates a whole new set of biases to be dealt with. However, for small sample sizes, we would also point out that the confidence intervals for the heritability estimates from the ACE liability model are often quite large in practice and may dwarf these biases.

It should be noted that some of the concerns about bias in the second scenario might be avoided if the environmental factor is measured and adjusted for. However, there are likely to be many diseases for which there are rare environmental factors with large effect sizes. Furthermore, reality may be even more complex, because many “environmental” exposures such as smoking may actually encode some genetic information (Boardman et al. 2010) which we would not necessarily want to adjust out. The scenarios that we have presented are only representative of the numerous possible complexities that are likely to underlie real diseases.

As we have argued in this report, even if we are willing to accept the meaningfulness of the liability construct, the MVN assumption is a serious weakness of the ACE liability threshold model. Many diseases have known highly penetrant alleles and have very strong environmental risk factors such as smoking. Such diseases will clearly be unlikely to have an underlying liability that adequately satisfies the MVN assumption. By definition, the liability is unobservable and thus the MVN assumption cannot be easily checked. The latency of the components combined with the lack of robustness to the MVN assumption should give pause to consider the meaningfulness of heritability estimates obtained under the ACE liability model for any given disease.

Acknowledgments

We thank Dr. Robert Elston for his helpful comments and suggestions. This work was supported by National Cancer Institute (NCI) Grant R25 CA094186 and by National Heart Lung and Blood Institute (NHLBI) Grant T32 HL007567.

Appendix

Consider the function f(p^)=argmaxθ{logL(θ|p^,n)}, which relates the estimated proportions of twin pairs, with (0,1, or 2) affected in each pair, to the maximum likelihood parameter estimates (i.e., θ^ML=f(p^)). As indicated by the notation, f (·) is not a function of n because the model is saturated. By the law of large numbers, the estimated proportions converge in probability to the true proportions: p^Pptrue. Then, assuming f is continuous at po, by Slutsky's Continuity Theorem (Ferguson 1996) we have, θ^MLPf(ptrue). Therefore, even in the case where the model is falsely specified, the maximum likelihood estimate will become very close to a function of the true proportions: f (ptrue). If the model is correctly specified, then the function of the true proportions will equal the true parameter values: f (ptrue) = θo, otherwise it may be a biased estimate of the parameter values: f (ptrue) ≠θo. In all of the scenarios discussed, it is easy to calculate the “true” proportions (ptrue) using numerical integration. It is also possible to calculate the function f (ptrue) using numerical optimization techniques combined with numerical integration. Thus, the large sample properties of the estimator under the ACE liability model may be investigated by statistical theory without the use of simulations. We calculated the bivariate normal integral using the method of Genz and Bretz (2002) as implemented in the R package “mvtnorm,” and we performed numerical integration over the t distribution using the adaptive quadrature (Piessens et al. 1983) “integrate” function in R (R Team 2010). We performed numerical optimization using the “optim” function in R (R Team 2010). Also, we calculated the thresholds using the R function “uniroot” (R Team 2010).

As an example of calculating ptrue, consider the second scenario with a dichotomous common exposure where the exposure has frequency γ. Let E ∈ {0,1} represent the exposure status of a pair. Consider the probability that neither of the twins is affected:

p0jtrue=γP(Z1<τj,Z2<τj|E=1)+(1γ)P(Z1<τj,Z2<τj|E=0). (3)

Similar formulas to (3) can be written for p1jtrue and p2jtrue, so the probability of 0, 1 and 2 affected in a twin pair can be calculated. If we allow β=σC2/γ(1γ) to represent the effect size for the exposure, then variance of the common environment component is Var(βE)=σC2. Furthermore, conditional on E we have

[Z1Z2]=MVN([βEβE],[1ϕσA2ϕσA21]). (4)

Thus, Eq. (4) may be used in conjunction with Eq. (3) to calculate ptrue. Very similar equations may be created for the other scenarios discussed in the paper.

Contributor Information

Penny H. Benchek, Email: pennybenchek@gmail.com.

Nathan J. Morris, Email: njm18@case.edu.

References

  1. Baranzini SE, Mudge J, Van Velkinburgh JC, Khankhanian P, Khrebtukova I, Miller NA, Zhang L, Farmer AD, Bell CJ, Kim RW. Genome, epigenome and RNA sequences of monozygotic twins discordant for multiple sclerosis. Nature. 2010;464:1351–1356. doi: 10.1038/nature08990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Benaglia T, Chauveau D, Hunter DR, Young DS. mixtools: An R package for analyzing mixture models. J Stat Softw. 2009;32(6):1–29. [Google Scholar]
  3. Boardman JD, Blalock CL, Pampel FC. Trends in the genetic influences on smoking. J Health Soc Behav. 2010;51:108–123. doi: 10.1177/0022146509361195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Boomsma D, Busjahn A, Peltonen L. Classical twin studies and beyond. Nat Rev Genet. 2002;3:872–882. doi: 10.1038/nrg932. [DOI] [PubMed] [Google Scholar]
  5. Bruder CEG, Piotrowski A, Gijsbers AACJ, Andersson R, Erickson S, Diaz de Ståhl T, Menzel U, Sandgren J, von Tell D, Poplawski A. Phenotypically concordant and discordant monozygotic twins display different DNA copy-number-variation profiles. Am J Human Genet. 2008;82:763–771. doi: 10.1016/j.ajhg.2007.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey Data. Hyattsville, MD: U.S. Department of Health and Human Services; 2009–2010. Centers for Disease Control and Prevention (CDC) [Google Scholar]
  7. Cowley AW, Jr, Nadeau JH, Baccarelli A, Berecek K, Fornage M, Gibbons GH, Harrison DG, Liang M, Nathanielsz PW, O'Connor DT. Report of the National Heart, Lung, and Blood Institute Working Group on epigenetics and hypertension. Hypertension. 2012;59:899–905. doi: 10.1161/HYPERTENSIONAHA.111.190116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cui JS, Hopper JL, Harrap SB. Antihypertensive treatments obscure familial contributions to blood pressure variation. Hypertension. 2003;41:207–210. doi: 10.1161/01.hyp.0000044938.94050.e3. [DOI] [PubMed] [Google Scholar]
  9. Curnow R. The multifactorial model for the inheritance of liability to disease and its implications for relatives at risk. Biometrics. 1972;28:931–946. [PubMed] [Google Scholar]
  10. Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet. 2010;11:446–450. doi: 10.1038/nrg2809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Elston RC. Query—estimating heritability of a dichotomous trait. Biometrics. 1977;33:231–233. [PubMed] [Google Scholar]
  12. Falconer DS. Inheritance of liability to certain diseases estimated from incidence among relatives. Ann Hum Genet. 1965;29:51–76. [Google Scholar]
  13. Ferguson TS. A course in large sample theory. Chapman & Hall/CRC; London: 1996. [Google Scholar]
  14. Galton F. Hereditary genius: an inquiry into its laws and consequences. Macmillan and co; London: 1869. [Google Scholar]
  15. Genz A, Bretz F. Comparison of methods for the computation of multivariate t probabilities. J Comput Graph Stat. 2002;11:950–971. [Google Scholar]
  16. Goldstein DB. Common genetic variation and human traits. N Engl J Med. 2009;360:1696–1698. doi: 10.1056/NEJMp0806284. [DOI] [PubMed] [Google Scholar]
  17. Huang GH. Model identifiability. Encycl Stat Behav Sci. 2005;3:1249–1251. [Google Scholar]
  18. Joseph J. Not in their genes: a critical view of the genetics of attention-deficit hyperactivity disorder. Dev Rev. 2000;20:539–567. [Google Scholar]
  19. Kendler KS, Neale MC, Kessler RC, Heath AC, Eaves LJ. A test of the equal-environment assumption in twin studies of psychiatric illness. Behav Genet. 1993;23:21–27. doi: 10.1007/BF01067551. [DOI] [PubMed] [Google Scholar]
  20. Kidd K, Cavalli-Sforza L. An analysis of the genetics of schizophrenia. Biodemography Soc Biol. 1973;20:254–264. doi: 10.1080/19485565.1973.9988051. [DOI] [PubMed] [Google Scholar]
  21. Kurtz TW, Griffin KA, Bidani AK, Davisson RL, Hall JE. Recommendations for blood pressure measurement in humans and experimental animals Part 2: blood Pressure measurement in experimental animals: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research. Hypertension. 2005;45:299–310. doi: 10.1161/01.HYP.0000150857.39919.cb. [DOI] [PubMed] [Google Scholar]
  22. Lee SH, Wray NR, Goddard ME, Visscher PM. Estimating missing heritability for disease from genome-wide association studies. Am J Human Genet. 2011;88:294–305. doi: 10.1016/j.ajhg.2011.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lewontin RC. Annotation: the analysis of variance and the analysis of causes. Am J Hum Genet. 1974;26:400. [PMC free article] [PubMed] [Google Scholar]
  24. Maher B. Personal genomes: the case of the missing heritability. Nature. 2008;456:18–21. doi: 10.1038/456018a. [DOI] [PubMed] [Google Scholar]
  25. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Neale MC, Cardon LR. Methodology for genetic studies of twins and families. Kluwer; Dordrecht: 1992. [Google Scholar]
  27. Pam A, Kemker SS, Ross CA, Golden R. The “equal environments assumption” in MZ-DZ twin comparisons: an untenable premise of psychiatric genetics? Acta geneticae medicae et gemellologiae. 1996;45:349–360. doi: 10.1017/s0001566000000945. [DOI] [PubMed] [Google Scholar]
  28. Piessens R, Doncker-Kapenga D, Uberhuber C, Kahaner D. Springer series in computational mathematics. Vol. 1. Springer-Verlag; Berlin, New York: 1983. Quadpack: a subroutine package for automatic integration. [Google Scholar]
  29. Plomin R, Spinath FM. Intelligence: genetics, genes, and genomics. J Pers Soc Psychol. 2004;86:112–129. doi: 10.1037/0022-3514.86.1.112. [DOI] [PubMed] [Google Scholar]
  30. Rijsdijk FV, Sham PC. Analytic approaches to twin data using structural equation models. Brief Bioinforma. 2002;3:119–133. doi: 10.1093/bib/3.2.119. [DOI] [PubMed] [Google Scholar]
  31. Team R. R: a language and environment for statistical computing. R foundation for statistical computing; Vienna, Austria: 2010. [Google Scholar]
  32. Visscher PM, Hill WG, Wray NR. Heritability in the genomics era—concepts and misconceptions. Nat Rev Genet. 2008;9:255–266. doi: 10.1038/nrg2322. [DOI] [PubMed] [Google Scholar]
  33. White H. Maximum-likelihood estimation of mis-specified models. Econometrica. 1982;50:1–25. [Google Scholar]
  34. Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: genetic interactions create phantom heritability. P Natl Acad Sci USA. 2012;109:1193–1198. doi: 10.1073/pnas.1119675109. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES