Significance
Correlation between genotypes and phenotypes can be produced by genetic nurturing, namely the effect of parents’ genotypes on their offspring’s phenotypes through the parents’ phenotypes. Population subdivision and assortative mating can give rise to correlations between genotypes and phenotypes similar to those due to genetic nurturing. Variances and correlations may not reveal causal relationships in the presence of these complexities. We analyze mechanistic models of genetic nurturing, population subdivision, and assortative mating and compare these with results obtained within the framework of modern causal analysis. Our results clarify statistical signals emanating from correlations between nontransmitted alleles and offspring phenotypes and reveal difficulties with standard linear models in the interpretation of heritability, in particular, the concept of missing heritability.
Keywords: genetic nurturing, cultural transmission, missing heritability, population subdivision and assortative mating, correlation and causality
Abstract
Genetic nurturing, the effect of parents’ genotypes on offspring phenotypes through parental phenotypic transmission, can be modeled in terms of gene–culture interactions. This paper first uses a simple one-locus, two-phenotype gene–culture cotransmission model to compute the effect of genetic nurturing in terms of regression of children’s phenotypes on transmitted and nontransmitted alleles. With genetic nurturing, interpreting heritability and hence the meaning of “missing heritability” becomes problematic. Other factors, for example, population subdivision and assortative mating, generate similar signals to those of genetic nurturing, namely, correlation between parents’ nontransmitted alleles and children’s phenotypes. Corrections must be made for these to isolate the signal of genetic nurturing. Finally, a unified causal framework is constructed for genetic nurturing, population subdivision, and assortative mating. Causal and noncausal paths from transmitted and nontransmitted alleles to children’s phenotypes are identified and investigated in the presence of genetic nurturing, population subdivision, and assortative mating. Using causal analysis, assumptions made in inferring direct and indirect effects are then clarified and evaluated in a broader causal context.
The nature vs. nurture problem is fundamental to most human sciences. History is replete with disastrous human suffering caused by malevolent use of positions taken regarding this problem. In the second half of the 20th century, after the modern evolutionary synthesis, researchers in human behavior started to incorporate simple models from statistical genetics, in the process extending such concepts as heritability beyond their original function. The notion of heritability was first introduced by Lush (1) in 1937 in animal breeding to predict the effectiveness of artificial selection. From the 1960s, people such as Arthur Jensen (2) and others used an estimate known as “broad sense heritability” as a statistical measure of genetic determination in research into human behavior. Their high heritability estimates for IQ led to a genetic deterministic view of human intelligence (3, 4). These estimates were often based on studies of twins reared together and apart (5) or on other correlations between relatives. However, such estimates of broad heritability cannot disentangle gene-by-environment interactions from purely genetic effects (6–9).
The debates of the 1970s led Cavalli-Sforza and Feldman (10) to construct generative models of cultural and genotypic transmission from which one could assess the contribution of vertical cultural transmission to phenotypic variation. They showed that such cultural transmission could lead to inflated estimates of genetic heritability.
Correlations between relatives continued to be the primary tool for estimation of the genetic contribution to phenotypic variance throughout the 1970s and 1980s (3, 4, 11, 12), although the importance of cultural transmission in inflating such correlations was recognized by several investigators (7, 13–17). Analysis of the model constructed by Cavalli-Sforza and Feldman (10) showed that parents’ phenotypes, if transmitted through vertical cultural transmission, could have strong effects on correlations between relatives. This approach was used, and the results confirmed, by Cloninger et al. (13–15) and also by Morton and coworkers in Hawaii (17).
In the past 10 y, with the growing popularity of genome-wide association studies (GWAS), heritability has been estimated from data by accumulating the variance explained by millions of single-nucleotide polymorphisms (SNPs) and in general SNP heritability is lower than earlier estimates based on correlations between relatives. This inconsistency is known as “missing heritability” (18, 19) and represents an important conundrum in human statistical genetics. Different factors can contribute to this inconsistency and we divide them into two categories: genetic factors and demographic factors. Genetic factors include too few SNPs (20–22) and complex genetic interactions [linkage disequilibrium, epistasis (23), etc.]. Demographic factors include cultural transmission (19, 22, 24), population subdivision (25), and assortative mating (24). Both categories can contribute to missing heritability and their relative importance can vary from trait to trait. Historically, barring a few exceptions, researchers have focused primarily on genetic factors.
Recently, Kong et al. (26) found substantial correlation between parents’ nontransmitted alleles and children’s phenotypes. They suggested that this correlation could be generated by the effect of parents’ phenotypes, which are influenced by parents’ genotypes, on children’s phenotypes. This sequence, from parents’ genotypes to parents’ phenotypes to children’s phenotypes, was called “genetic nurturing.” Subsequent studies connected genetic nurturing to the missing heritability problem (27–29). These studies appear to be closely related to the original model of Cavalli-Sforza and Feldman (10) and its generalization by Feldman et al. (8). Here we propose a simple extension of these models and show how vertical cultural transmission can lead to genetic nurturing. This affects the linear regressions that have been used to separate the contributions to phenotypes by nature and nurture, and we show how it affects our understanding of the missing heritability problem. We also explore how population subdivision and assortative mating, which are common for many traits in humans (30, 31), can generate correlation between nontransmitted alleles and children’s phenotype. A natural next step is to analyze how these different factors could produce such a correlation in a causal framework, which can help expose the nature–nurture conundrum (32–35), in the presence of complex population dynamics and possible confounding factors. Thus, we propose a unified causal framework for genetic nurturing, population subdivision, and assortative mating, which extends to nonlinear cases such as the path analysis approach used by Kong et al. (26). We also examine the approach they used to estimate direct genetic effects and make corrections for assortative mating.
A Model with Cultural Transmission
Genetic nurturing (26) entails that parents’ genotypes can have an indirect effect on children’s phenotypes by influencing parents’ phenotypes. This implies that children’s phenotypes will be determined by both children’s genotypes and parents’ phenotypes, where the influence of parents’ phenotypes occurs via vertical cultural transmission. Cavalli-Sforza and Feldman (10) proposed a quantitative genetic model incorporating vertical cultural transmission and showed that the heritability could be overestimated if there was vertical cultural transmission. However, for quantitative traits, it is very difficult to analyze the mathematical dynamics of a trait’s distribution. To address this issue and develop some theoretical understanding of the genetic nurturing effect, we construct a discrete trait model based on Cavalli-Sforza and Feldman (10) and show how the genetic nurturing effect can be included in the model.
We consider one diploid locus with two alleles, and . Each genotype can be one of two phenotypic variants represented by bar and nonbar. Thus the six pheno-genotypes can be written as and with . The probability of a child being bar is determined by the parents’ phenotypes and the child’s genotype. To be specific, denote by the phenotypic state of the child, which is 1 for bar and 0 for nonbar; the genotypic state of the child, which is 1 for , 2 for , and 3 for ; the contribution of the child’s genotype when its genotype is ; and the contribution of a parent’s phenotype when the parent’s phenotype is and the child’s genotype is . Then we have
[1] |
where , with the index function, and and stand for the phenotypic states of the father and mother, respectively.
Since we can always move the common part of and into , we reduce the number of parameters by specifying for ,
[2] |
Table 1 describes this model, where
[3] |
We denote the frequency of the bar phenotype with genotype at generation by , the allele frequency of by , and the allele frequency of by . Without selection and assortative mating, the population is in Hardy–Weinberg equilibrium and does not change. The frequencies of the six pheno-genotypes are , respectively.
Table 1.
The first row shows the child’s genotype; the first column represents parents’ pheno-genotypes, and the entries represent the child’s probability of having the bar phenotype conditioned on child’s genotype (first row) and parents’ phenotypes (first column).
Now consider the evolution under random mating. On the one hand, the probability of being bar for in generation is . On the other hand, it equals , where represents the probability that one of the parents (say, the mother) in generation has the bar phenotype given that the child’s pheno-genotype is . Since the child’s genotype is , the mother’s genotype has to be , where allele has probability being 1 and probability being 2. Since the probabilities of being bar for and in generation are and , we have . Thus, . Corresponding recursions for and can be derived, and we have
[4] |
The globally stable equilibrium of Eq. 4 (SI Appendix, section A), denoted by , is with
[5] |
Regression Analysis
We regress the child’s phenotype on alleles transmitted from the parents to the child and alleles not transmitted to the child. Let the number of in parental alleles transmitted to the child be and the number of in parental alleles not transmitted to the child be . Then, since and are independent random variables, we can write the regression equation
[6] |
where , and is the sum of the intercept and the residual. Calculation of these expressions is shown in SI Appendix, section B, and we obtain
[7a] |
[7b] |
[7c] |
[7d] |
Then the coefficients in Eq. 6 are
[8a] |
[8b] |
If the transmission process is purely genetic, that is, s are zero, we have , and , which corresponds to the classical average effect in Falconer and Mackay (36) and the regression slope in Lynch and Walsh (ref. 37, pp. 65–67). With cultural transmission, i.e., nonzero s, still has the familiar form 8a due to independence of transmitted and nontransmitted alleles. However, the frequencies will be complicated expressions in s and s shown in Eq. 5, and will not be zero. We can understand this result more clearly from the way parents’ nontransmitted alleles influence children’s phenotypes. The nontransmitted alleles are included in parents’ genotypes, which affect parents’ phenotypes, and thus influence children’s phenotypes by cultural transmission (s). When the cultural transmission pathway (genetic nurturing pathway) is absent (s), there is no influence of nontransmitted alleles.
Kong et al. (26) assumed that both the transmitted and nontransmitted alleles participate in genetic nurturing, which is also the case in our model. Since the transmitted alleles influence the parents’ phenotypes, as do the nontransmitted alleles, they calculated the direct effect [ in Kong et al.’s (26) paper] by subtracting the regression coefficients of nontransmitted alleles from those of the transmitted alleles. In our model, the corresponding quantity is , which represents the direct genetic effect when the parents’ phenotypes have the same effect on different children’s genotypes. To show this, we assume s are linear, i.e., , and s are the same, i.e., . Using Eqs. 8 and 5 we then have
[9a] |
and
[9b] |
which implies
[9c] |
Although and do not constitute “total” and “indirect” effects due to the confounding of grandparents’ phenotypic effects, their difference is exactly the direct genetic effect. This analysis can be extended to cases with arbitrary s and the same , but it fails when s are different. This is because the genetic nurturing influences the regression in two ways: First, as shown above, it incorporates the nontransmitted alleles. Second, it influences the equilibrium frequency of the bar phenotype for each genotype, which makes it difficult to extract the direct genetic effect, as claimed by Kong et al. (26). (For more detailed analysis, see A Unified Causal Framework for Genetic Nurturing, Population Subdivision, and Assortative Mating and SI Appendix, sections E and J.) We will see in the next part that this observation also leads to an interesting view of the missing heritability problem.
Missing Heritability?
The missing heritability problem (18) arises when the heritability calculated from regression on alleles is lower than the heritability calculated from correlations between relatives. As pointed out by Cavalli-Sforza and Feldman (10), the latter heritability can be overestimated if there is vertical cultural transmission. To quantify this in our model, we regress the child’s phenotype on the parental phenotypes. The parent–offspring phenotype covariance (SI Appendix, section C) can be written
[10] |
where
[11a] |
[11b] |
(Note that for convenience, we use the sum of the parental phenotypic values rather than their average.) Note that the additive genetic variance calculated by regression on transmitted alleles is and is always positive. Thus the missing heritability can be explained qualitatively by . More formally, let , which gives
[12] |
In the parameter region where , the heritability estimated from regression on alleles is much smaller than the heritability estimated from correlations between relatives. This seems to explain the missing heritability problem. However, our analysis raises two problems in interpreting missing heritability, which render it meaningless in the context of cultural transmission. First, the genetic nurturing effect entails that the additive genetic variance computed from GWAS and that computed from correlations between relatives both depend on the equilibrium frequencies (s), which depend on both s and s. This is true even in the case where s are all the same and s are linear. In this case, from Eq. 9a, we have , and calculation of shows that this also depends on and . (For details, see SI Appendix, section D.) Thus the so-called “missing heritability problem” is generated by comparing two quantities, neither of which is purely genetic. Second, although cultural transmission can contribute to missing heritability, the absence of missing heritability does not imply that cultural transmission is weak or absent. There exist parameter regions where cultural transmission is relatively strong but the two heritability estimates are roughly the same. (For detailed analysis and examples, see SI Appendix, section D.) Given these two issues, the dichotomy of nature and nurture simply does not apply in the systems with genetic nurturing and vertical cultural transmission.
Population Subdivision and Assortative Mating
Population subdivision and assortative mating can have profound effects on the missing heritability (24, 25). It is then interesting to compare statistical signals generated by population subdivision and assortative mating with that of genetic nurturing. Kong et al. (26) made a correction for assortative mating. Here we use a simple one-locus, two-allele purely genetic model to show how population subdivision and assortative mating can generate a signal similar to that of genetic nurturing, namely, the correlation between parents’ nontransmitted alleles and children’s phenotypes. We also examine how Kong et al. (26) corrected for assortative mating.
The Effect of Population Subdivision.
Consider a metapopulation which includes subpopulations, denoted by . The relative sizes of the subpopulations are denoted by , respectively. Again we assume there are two alleles, and , and two phenotypes, bar and nonbar. The probability of being bar is genetically determined and is for , for , and for . Hardy–Weinberg equilibrium is assumed for each subpopulation and the frequencies of and in population are assumed to be and . The frequencies of and in the metapopulation are thus and , respectively. The genotype frequencies for are then , respectively, which we denote by . Set and , which represent, respectively, the frequencies of bar in each subpopulation and the metapopulation. We now study how subpopulation structure may mimic genetic nurturing.
First, we regress the child’s phenotype on alleles transmitted and alleles not transmitted to the child (in the metapopulation). We again denote the child’s phenotypic value by , which is 1 for the bar phenotype and 0 for the nonbar phenotype. Let the numbers of the father’s and mother’s transmitted alleles be and the numbers of the nontransmitted alleles be . The regression is then
[13] |
where and are the regression coefficients and is the sum of the intercept and the residual.
Using the law of total variance we can calculate the covariances among and (SI Appendix, section F) and obtain
[14] |
where and are two different elements of the set . Denote by and by . Then from multivariate regression analysis (SI Appendix, section F), we obtain
[15] |
and
[16] |
Thus, when a population is subdivided, the nontransmitted alleles will also be correlated with the phenotype and the regression coefficient will in general not be zero. This correlation is generated by differences among allele frequencies in subpopulations. Both , the covariance of probabilities being bar among subpopulations, and , the variance of allele frequencies among subpopulations, will be nonzero when such differences exist. This means that even in the purely genetic case, population subdivision may generate correlation between children’s phenotypes and nontransmitted alleles, which is considered by Kong et al. (26) as the signal of genetic nurturing. Thus, a correction must be made if we want to separate the genetic nurturing effect from that of population subdivision.
The Effect of Assortative Mating.
Assortative mating can be viewed as a special kind of population subdivision, namely, division into assorting groups, and an analysis similar to the above can be applied with assortative mating. Again we have alleles and with phenotypes bar and nonbar. The frequencies of and are and . The probabilities of a child being bar if its genotype is are, respectively, . In each generation, we assume a proportion of individuals mate assortatively based on phenotype and the remaining individuals mate randomly. Denote the genotype frequencies of in generation by , respectively. Since the proportion of bar in each genotype is fixed, we need only analyze the genotype frequency dynamics, and since and and do not change, there is only one free variable. Thus, we need only to solve the recursion for one of ; here we focus on the recursion for , namely
[17] |
In SI Appendix, section G, we prove that there is a unique globally stable fixed point for the dynamical system 17. Assuming the equilibrium genotype frequencies are , we can use a regression analysis similar to that used for population subdivision. Let
[18] |
We can then compute the covariances among and (SI Appendix, section H); namely,
[19] |
Comparing these results with those of the population subdivision model, we see that in the assortative mating model,
which, together with Eq. 16, gives the regression coefficients. As a special case of population subdivision, assortative mating can also generate a similar signal to that of genetic nurturing and should be corrected for in computations of correlations. Kong et al. (26) do make the correction and we agree with their conclusion that the effect of assortative mating is relatively weak. However, there may be a problem with their model assumptions. To be specific, when assumption VI in their supplementary material section “Estimating the confounding effects induced by assortative mating” is translated into our model, it amounts to assuming and are zero while , , , , , and are not zero. From our analysis this appears unreasonable since these covariances should all be the same.
A Unified Causal Framework for Genetic Nurturing, Population Subdivision, and Assortative Mating
In previous sections, we proposed a mechanistic model for genetic nurturing and showed how population subdivision and assortative mating could mimic genetic nurturing. We followed Kong et al. (26) and used a linear regression (i.e., path analysis or structural equation model [SEM]) method for mediation analysis. We showed that this method works well in the linear setting but becomes problematic in the nonlinear case. Fortunately, traditional structural mediation analysis and separation of total effect into direct effect and indirect effect can be naturally extended to the nonlinear setting using calculus and counterfactuals in causal analysis (38). Further, causal analysis provides powerful techniques for analyzing different causal effects and the influence of confounding factors (39, 40), which can also help frame the mismatch between causality and statistics used in the nature–nurture discussion, such as variance (6) and correlation. We therefore employ causal analysis in constructing a semi-Markovian causal model (Fig. 1) for genetic nurturing, population subdivision, and assortative mating (details in SI Appendix, section J). Using this unified causal framework, we show that the correlations between nontransmitted alleles and the phenotype under genetic nurturing, population subdivision, and assortative mating represent different pathways in the causal diagram. We also analyze different effects of transmitted alleles and clarify when the analysis by Kong et al. (26) of genetic nurturing works and when it fails.
In Fig. 1, and , respectively, represent the parents’ and the child’s genotypes, represents the alleles transmitted from the parents, and represents the alleles not transmitted from the parents. and represent the parents’ and the child’s phenotypes. The parents’ genotypes and phenotypes are confounded by and are thus connected by the bidirectional arc in Fig. 1. We study the effects of transmitted alleles and nontransmitted alleles.
In the causal diagram of Fig. 1, the “total effect” of changing transmitted alleles from to on the child’s phenotype can be represented by , in which case the quantity corresponding to the regression coefficient will be ; the total effect of changing nontransmitted alleles from to on the child’s phenotype can be represented by , in which case the quantity corresponding to the regression coefficient will be . The relationships among these quantities and the path-specific effects corresponding to the “direct effect” and the “indirect effect” in Kong et al.’s (26) paper are analyzed in detail below.
Before doing the analysis, we address a key problem for the identification of different effects in Fig. 1, namely, the definition of . This quantity is not always well defined because the parents’ genotypes are automatically determined when the transmitted alleles and nontransmitted alleles are known. Thus, to use causal analysis, we must define this probability when is not compatible with . If the confounding factor in Fig. 1 is known, then this quantity is just ; otherwise, some prior knowledge of is needed. Table 2 lists the symbols used throughout our application of the causal analysis, while Table 3 lists the analogous symbols used in the cultural transmission model.
Table 2.
Notation | Meaning |
The child’s genotype | |
The mother’s genotype | |
The father’s genotype | |
The parents’ genotypes, denoted by | |
The child’s phenotype | |
The mother’s phenotype | |
The father’s phenotype | |
The parents’ phenotypes, denoted by | |
The mother’s mother’s phenotype | |
The mother’s father’s phenotype | |
The mother’s parents’ phenotypes, denoted by | |
The father’s mother’s phenotype | |
The father’s father’s phenotype | |
The father’s parents’ phenotypes, denoted by | |
Alleles transmitted from the mother to the child | |
Alleles transmitted from the father to the child | |
Alleles transmitted from the parents to the child, | |
denoted by | |
Alleles not transmitted from the mother to the child | |
Alleles not transmitted from the father to the child | |
Alleles not transmitted from the parents to the child, | |
denoted by |
Table 3.
Notation in cultural transmission model | Corresponding representation in the unified causal model |
Here for simplicity, we assume X =1 for bar phenotype and X =0 for nonbar phenotype.
Fortunately, for genetic nurturing, population subdivision, and assortative mating, the required prior knowledge exists. For population subdivision and assortative mating, since the effect is purely genetic, we have . For genetic nurturing, represents the grandparents’ phenotypic influence. Denote the mother’s parents’ phenotypes by and the father’s parents’ phenotypes by . Then
[20] |
which can be calculated if data from the grandparents’ generation are given. If these data are not given (which we assume in this paper), we may assume that the population is at equilibrium under random mating, in which case
[21] |
Each term on the right-hand side of [21] can be calculated using data from the parents’ generation. For example, we may assume , where is a vector of two phenotypes and is a genotype. ( and can be defined similarly.)
Now, since is defined and identifiable, we are now able to calculate the effects of nontransmitted alleles () on the child’s phenotype (). has one front-door (causal) path (we denote this by path 1; for definition of front-door path, see SI Appendix, section J) to ,
and three backdoor (noncausal) paths (denoted by paths 2, 3, and 4; for definition of backdoor path, see SI Appendix, section J) to ,
Following refs. 40–42, in the causal diagram Fig. 1, can be given by
[22] |
Thus the total effect of changing from to given by can be calculated using Eq. 22. In the population subdivision and assortative mating case, since there is no effect of the parental phenotypes, the edge (in Fig. 1) and paths 1, 3, and 4 vanish, so . This automatically gives and , which means there is no causal effect of the nontransmitted alleles on the child’s phenotype. However, because of the existence of the active backdoor path 2, and is not zero in general. To be specific, population subdivision and assortative mating introduce correlation between allele frequencies and (assorting) groups so that contains group information. As a result, and will generally not be the same and will not be zero in general, which explains the correlation between the nontransmitted alleles and the child’s phenotype when there is population subdivision or assortative mating despite there being no causal path between these two quantities.
For genetic nurturing, and are confounded due to the influence of the grandparents’ phenotypes. In a random mating population, paths 2 and 3 are deactivated because under random mating, . Thus, is influenced by paths 1 and 4, with representing the effect of path 1. This means that in the genetic nurturing case, , which corresponds to in the previous section (Eqs. 6 and 8b), is in general not a suitable representation of the effect of the nontransmitted alleles. The correlation between the nontransmitted alleles and the child’s phenotype is in general not zero, but is a result of both genetic nurturing and confounding between the parents’ genotypes and phenotypes. This confounding, in the genetic nurturing case, is due to the grandparents’ phenotypic influence. [Kong et al. (26) did not take account of the grandparents’ influence or genetic nurturing’s influence on pheno-genotype frequencies; they therefore artificially deactivated path 4, which allowed their regression to work.]
Now we consider the effect of transmitted alleles. It is easy to see that there are four paths from to , namely, two front-door paths (denoted by path and path )
and two backdoor paths (denoted by path and path )
From the analyses in refs. 40–42, is given by
[23] |
Similarly as in the above analysis of nontransmitted alleles, with population subdivision or assortative mating, only path is activated, and With genetic nurturing, path vanishes while paths , , and , which jointly contribute to , are activated. Also represents the joint effect of causal paths and . As for the nontransmitted alleles, because of the confounding of the parents’ genotypes and phenotypes, which introduces a backdoor path , , corresponding to (Eqs. 6 and 8a), is in general not a suitable representation of the total effect of the transmitted alleles. [Again Kong et al. (26) ignore path , which allows their regression method to work.]
Nonetheless, the total effect of the transmitted alleles can be calculated using the do operator. Since there are two causal paths in this case, it is interesting to identify the path-specific effects. In mediation analysis, the path-specific effects of paths and are called “natural direct effect” and “natural indirect effect,” respectively. The total effect of changing from to can then be partitioned to into these two effects. Following refs. 43 and 44, the natural direct and natural indirect effects can be calculated as follows:
Natural direct effect:
[24] |
Natural indirect effect:
[25] |
Kong et al. (26) also make this separation and called them direct effect and indirect effect, respectively. They also use the difference between regression coefficients for the transmitted alleles and the nontransmitted alleles to represent the direct effect of transmitted alleles. However, as we have shown above, their estimates of the total effects of the transmitted and nontransmitted alleles, represented by and , respectively, are biased due to the confounding of the parents’ phenotypes and genotypes. Thus, we need to analyze the relationships among the following three quantities:
-
1)
The difference between the total effects of the transmitted alleles and the nontransmitted alleles, represented by
[26] |
-
2)
Kong et al.’s (26) estimates of direct effect, represented by
[27] |
-
3)
Path 1-specific causal effects, the natural direct effect in our causal diagram (Fig. 1), represented by
[28] |
These three quantities will be the same (for proof, see SI Appendix, section J) if there exist such that for and
[29] |
[30] |
[31] |
[32] |
[33] |
Obviously, this assumption of additivity is exactly what the SEM and that of Kong et al. (26) assume. However, there are two problems. First, even when this assumption holds and the direct effect is estimated properly, their estimate of the total effect and the indirect effect is biased (SI Appendix, section J). Second, when this assumption is violated, as in the general case of the model defined by Table 1, these three quantities will not be the same. Thus, using the difference between total effects of the transmitted and nontransmitted alleles to estimate the direct effect of the transmitted alleles may be problematic.
Discussion
In this paper, we propose a simple generative model of cultural transmission and study the effects of including population subdivision and assortative mating in the model. Four main issues are addressed. First, the cultural transmission model gives a quantitative explanation of genetic nurturing that can be compared with that of Kong et al. (26). Second, we show how the notion of heritability from traditional statistical genetics is inadequate under this simple setting. Third, we show how population subdivision and assortative mating can generate signals similar to genetic nurturing and examine Kong et al.’s method (26) for correction of assortative mating. Finally, we propose a unified causal diagram for genetic nurturing, population subdivision, and assortative mating. Two statistical signals are used for our analysis: the difference between heritability estimated from GWAS and that estimated from pedigrees and the nonnegligible correlation between parents’ nontransmitted alleles and children’s phenotypes. Here we discuss the meaning of these two signals in a more general context.
For the first signal, represented by the missing heritability, we show in our gene–culture coevolution model, that heritability is an ill-defined notion and fails to decompose the total variance into nature and nurture. In addition, this failure, which is reflected in the missing heritability problem, has to be considered in the broader context of other problems, since it can be caused by both genetic and demographic factors, whose importance can vary from trait to trait. For example, having too few SNPs may produce missing heritability in height and body mass index (21, 22), while for social and behavioral traits, cultural transmission can be an important factor. However, even for height and body-mass index (BMI), it is not easy to exclude confounding cultural factors, such as diet, that can be vertically transmitted. In our analysis, when genetic nurturing exists, we find parameter regions where cultural transmission is not negligible, but the two estimates of heritability are roughly the same; this can be easily extended to the cases where parents’ phenotypes and genotypes are confounded by factors other than genetic nurturing. Thus, when vertical cultural transmission is potentially involved, heritability will not be a meaningful statistic.
Second, we show that genetic nurturing, population subdivision, and assortative mating can contribute to a second signal, namely, the correlation between parents’ nontransmitted alleles and children’s phenotypes. We also show that the method Kong et al. (26) used to correct for assortative mating entails restrictive assumptions. Using the causal diagram in Fig. 1, we see that the scenario is more complex when genetic nurturing coexists with population subdivision or assortative mating. When genetic nurturing coexists with population subdivision, can be defined similarly for each subpopulation. The difference between this case and genetic nurturing without population subdivision is that paths 2, 3, and will be activated. With genetic nurturing in the case of assortative mating, there is an added complication due to the additional source of confounding between the parents’ genotypes and phenotypes; prior information about this confounding is needed for to be well defined.
In our unified causal diagram (Fig. 1), the correlation between the parents’ nontransmitted alleles () and the child’s phenotype () can arise from two different classes of paths: either through (paths 2 and 3) or not through but through (paths 1 and 4). The paths through can be tested and possibly corrected using linkage disequilibrium (LD). The path through , either causal or noncausal, implies the influence of parents’ phenotypes. Here we emphasize two points. First, although in the two-allele, two-phenotype model, we assumed that the parents’ phenotypes and that of the child referred to the same trait, in our causal diagram they could refer to different traits that have a causal relationship. In this case there will be “cultural influence” instead of “cultural transmission” but the whole causal diagram and all of the analysis will still be valid. Second, although without prior information about confounding the different effects cannot be calculated, confounding factors other than genetic nurturing between the parents’ phenotypes and genotypes can still produce correlation between the nontransmitted alleles and the child’s phenotype if the pathway from to is activated, even when there is no genetic basis for the phenotypes. Path 4 still exists even if , , and do not exist, see Fig. 2. (In principle, we should also include in Fig. 2 since the genotypes and phenotypes are confounded in the parents’ generation. However, adding this will not activate any path, so we neglect it for simplicity.) In this case the correlation generated by the parental phenotypic influence and genotype–phenotype confounding will be an indicator of pure cultural influence or cultural transmission.
Although our causal diagram unifies genetic nurturing, population subdivision, and assortative mating and can be extended qualitatively to more general cases of linkage disequilibrium and parental influences, it does not exhaust all potential mechanisms that could generate correlation between nontransmitted alleles and children’s phenotypes. Recent findings by Mostafavi et al. (45) that polygenic scores, which are computed under additive assumptions on allelic variances, are not portable among population groups with the same apparent genetic ancestry are relevant in the context of our analysis. They attribute their result to factors such as genetic nurturing (i.e., indirect genetic affects) and assortative mating studied here, as well as varying levels of environmental variance and genotype-by-environment interactions.
The correlation between parents’ nontransmitted alleles and children’s phenotypes is a signal of demographic or social influences, which can include parental phenotypic influence (genetic nurturing as a special case), linkage disequilibrium (population subdivision as a special case). (Confounding between and without referring to can also contribute to such correlation, but in reality will be less likely.) More knowledge of the potential confounding factors is needed if we want to distinguish between these mechanisms more precisely.
Lewontin (ref. 6, p. 409) pointed out the pitfalls in inferring causality from variance statistics, including heritability derived from familial analyses: “The analysis of causes in human genetics is meant to provide us with the basic knowledge we require for correct schemes of environmental modification and intervention. Together with a knowledge of the relative frequencies of different human genotypes, a knowledge of norms of reaction can also predict the demographic and public health consequences of certain massive environmental changes. Analysis of variance can do neither of these because its results are a unique function of the present distribution of environment and genotypes.” Lewontin’s admonition, made 46 y ago, remains pertinent today with respect to SNP heritability derived from GWAS. Inference of causality from the relationships among statistical signals, including correlations and variances, remains extremely difficult.
Supplementary Material
Acknowledgments
This research was supported in part by the Morrison Institute for Population and Resource Studies at Stanford University and by the Stanford Center for Computational, Evolutionary and Human Genomics. We thank Professors Jonathan Pritchard and Ewart Thomas for their comments on an earlier draft.
Footnotes
The authors declare no competing interest.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2015869117/-/DCSupplemental.
Data Availability.
All study data are included in this article and SI Appendix
Change History
September 1, 2021: Figure 2 has been updated. The text of this article has been updated; please see accompanying correction for details.
References
- 1.Lush J. L., Animal Breeding Plans (Iowa State College Press, ed. 2, 1943). [Google Scholar]
- 2.Jensen A., How much can we boost IQ and scholastic achievement? Harv. Educ. Rev. 39, 1–123 (1969). [Google Scholar]
- 3.Bouchard T. J., McGue M., Familial studies of intelligence: A review. Science 212, 1055–1059 (1981). [DOI] [PubMed] [Google Scholar]
- 4.Herrnstein R. J., Murray C., The Bell Curve: Intelligence and Class Structure in American Life (Simon and Schuster, 2010). [Google Scholar]
- 5.Kamin L. J., The Science and Politics of IQ (Routledge, 2012). [Google Scholar]
- 6.Lewontin R. C., Annotation: The analysis of variance and the analysis of causes. Am. J. Hum. Genet. 26, 400–411 (1974). [PMC free article] [PubMed] [Google Scholar]
- 7.Feldman M. W., Lewontin R. C., The heritability hang-up. Science 190, 1163–1168 (1975). [DOI] [PubMed] [Google Scholar]
- 8.Feldman M. W., Christiansen F. B., Otto S. P., Gene-culture co-evolution: Teaching, learning, and correlations between relatives. Isr. J. Ecol. Evol. 59, 72–91 (2013). [Google Scholar]
- 9.Turkheimer E., Weak genetic explanation 20 years later: Reply to Plomin et al. (2016). Perspect. Psychol. Sci. 11, 24–28 (2016). [DOI] [PubMed] [Google Scholar]
- 10.Cavalli-Sforza L. L., Feldman M. W., Cultural versus biological inheritance: Phenotypic transmission from parents to children. (A theory of the effect of parental phenotypes on children’s phenotypes). Am. J. Hum. Genet. 25, 618–637 (1973). [PMC free article] [PubMed] [Google Scholar]
- 11.Rao D. C., Morton N. E., Yee S., Analysis of family resemblance. II. A linear model for familial correlation. Am. J. Hum. Genet. 26, 331–359 (1974). [PMC free article] [PubMed] [Google Scholar]
- 12.Rao D. C., Morton N. E., Yee S., Resolution of cultural and biological inheritance by path analysis. Am. J. Hum. Genet. 28, 228–242 (1976). [PMC free article] [PubMed] [Google Scholar]
- 13.Cloninger C. R., Rice J., Reich T., Multifactorial inheritance with cultural transmission and assortative mating. I. Description and basic properties of the unitary models. Am. J. Hum. Genet. 30, 618–643 (1978). [PMC free article] [PubMed] [Google Scholar]
- 14.Cloninger C. R., Rice J., Reich T., Multifactorial inheritance with cultural transmission and assortative mating. II. A general model of combined polygenic and cultural inheritance. Am. J. Hum. Genet. 31, 176–198 (1979). [PMC free article] [PubMed] [Google Scholar]
- 15.Cloninger C. R., Rice J., Reich T., Multifactorial inheritance with cultural transmission and assortative mating. III. Family structure and the analysis of separation experiments. Am. J. Hum. Genet. 31, 366–388 (1979). [PMC free article] [PubMed] [Google Scholar]
- 16.Cavalli-Sforza L. L., Feldman M. W., The evolution of continuous variation. III. Joint transmission of genotype, phenotype and environment. Genetics 90, 391–425 (1978). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rao D. C., Morton N. E., Lalouel J. M., Lew R., Path analysis under generalized assortative mating: II. American IQ. Genet. Res. 39, 187–198 (1982). [DOI] [PubMed] [Google Scholar]
- 18.Manolio T. A., et al. , Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Feldman M. W., Ramachandran S., Missing compared to what? Revisiting heritability, genes and culture. Philos. Trans. R. Soc. Lond. B Biol. Sci. 373, 20170064 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Nisbett R. E., et al. , Intelligence: New findings and theoretical developments. Am. Psychol. 67, 130–159 (2012). [DOI] [PubMed] [Google Scholar]
- 21.Wainschtein P., et al. , Recovery of trait heritability from whole genome sequence data. BioRxiv:10.1101/588020 (25 March 2019).
- 22.Yang J., et al. , Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Zuk O., Hechter E., Sunyaev S. R., Lander E. S., The mystery of missing heritability: Genetic interactions create phantom heritability. Proc. Natl. Acad. Sci. U.S.A. 109, 1193–1198 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Vinkhuyzen A. A. E., Van Der Sluis S., Maes H. H. M., Posthuma D., Reconsidering the heritability of intelligence in adulthood: Taking assortative mating and cultural transmission into account. Behav. Genet. 42, 187–198 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tropf F. C., et al. , Hidden heritability due to heterogeneity across seven populations. Nat. Hum. Behav. 1, 757–765 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kong A., et al. , The nature of nurture: Effects of parental genotypes. Science 359, 424–428 (2018). [DOI] [PubMed] [Google Scholar]
- 27.Young A. I., et al. , Relatedness disequilibrium regression estimates heritability without environmental bias. Nat. Genet. 50, 1304–1310 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Young A. I., Solving the missing heritability problem. PLoS Genet. 15, e1008222 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Young A. I., Benonisdottir S., Przeworski M., Kong A., Deconstructing the sources of genotype-phenotype associations in humans. Science 365, 1396–1400 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ardlie K. G., Lunetta K. L., Seielstad M., Testing for population subdivision and association in four case-control studies. Am. J. Hum. Genet. 71, 304–311 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Robinson M. R., et al. , Genetic evidence of assortative mating in humans. Nat. Hum. Behav. 1, 0016 (2017). [Google Scholar]
- 32.Pingault J., et al. , Using genetic data to strengthen causal inference in observational research. Nat. Rev. Genet. 19, 566–580 (2018). [DOI] [PubMed] [Google Scholar]
- 33.Briley D. A., Livengood J., Derringer J., Behaviour genetic frameworks of causal reasoning for personality psychology. Eur. J. Pers. 32, 202–220 (2018). [Google Scholar]
- 34.Conley D., Zhang S., The promise of genes for understanding cause and effect. Proc. Natl. Acad. Sci. U.S.A. 115, 5626–5628 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Morris T. T., Davies N. M., Hemani G., Smith G. D., Population phenomena inflate genetic associations of complex social traits. Sci. Adv. 6, eaay0328 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Falconer D. S., Mackay T. F. C., Introduction to Quantitative Genetics (Longman, Essex, UK, 1996) [Google Scholar]
- 37.Lynch M., Walsh B., Genetics and Analysis of Quantitative Traits (Sinauer, Sunderland, MA, 1998). [Google Scholar]
- 38.Pearl J., Interpretation and identification of causal mediation. Psychol. Methods 19, 459–481 (2014). [DOI] [PubMed] [Google Scholar]
- 39.Pearl J., Causality: Models, Reasoning and Inference (Cambridge University Press, ed. 2, 2009). [Google Scholar]
- 40.Shpitser I., Pearl J., Complete identification methods for the causal hierarchy. J. Mach. Learn. Res. 9, 1941–1979 (2008). [Google Scholar]
- 41.Tian J., Pearl J., “A general identification condition for causal effects” in Proceedings of the Eighteenth National Conference on Artificial Intelligence, Dechter R., Sutton R., program cochairs (AAAI Press/MIT Press, Menlo Park, CA, 2002), pp. 567–573. [Google Scholar]
- 42.Shpitser I., Pearl J., “Identification of joint interventional distributions in recursive semi-Markovian causal models” in Proceedings of the National Conference on Artificial Intelligence, Gil Y., Mooney R. J., cochairs (AAAI, Boston, MA, 2006), vol. 2, pp. 1219–1226. [Google Scholar]
- 43.Avin C., Shpitser I., Pearl J., “Identifiability of path-specific effects” in Proceedings of Nineteenth International Joint Conference on Artificial Intelligence, Kaelbling L. P., program chair (Morgan Kaufmann Publishers, San Francisco, CA, 2005), pp. 357–363. [Google Scholar]
- 44.Shpitser I., VanderWeele T. J., A complete graphical criterion for the adjustment formula in mediation analysis. Int. J. Biostat. 7, 16 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Mostafavi H., et al. , Variable prediction accuracy of polygenic scores within an ancestry group. eLife 9, e48376 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All study data are included in this article and SI Appendix