Relatedness disequilibrium regression estimates heritability without environmental bias

Alexander I Young; Michael L Frigge; Daniel F Gudbjartsson; Gudmar Thorleifsson; Gyda Bjornsdottir; Patrick Sulem; Gisli Masson; Unnur Thorsteinsdottir; Kari Stefansson; Augustine Kong

doi:10.1038/s41588-018-0178-9

. Author manuscript; available in PMC: 2019 Feb 13.

Published in final edited form as: Nat Genet. 2018 Aug 13;50(9):1304–1310. doi: 10.1038/s41588-018-0178-9

Relatedness disequilibrium regression estimates heritability without environmental bias

Alexander I Young ^1,^2,^3,^*, Michael L Frigge ¹, Daniel F Gudbjartsson ^1,⁴, Gudmar Thorleifsson ¹, Gyda Bjornsdottir ¹, Patrick Sulem ¹, Gisli Masson ¹, Unnur Thorsteinsdottir ^1,⁵, Kari Stefansson ^1,⁵, Augustine Kong ^1,^3,^4,^*

PMCID: PMC6130754 EMSID: EMS78471 PMID: 30104764

Abstract

Heritability measures the proportion of trait variation that is due to genetic inheritance. Measurement of heritability is of importance to the nature-versus-nurture debate. However, existing estimates of heritability could be biased by environmental effects. Here we introduce relatedness disequilibrium regression (RDR), a novel method for estimating heritability. RDR avoids most sources of environmental bias by exploiting variation in relatedness due to random Mendelian segregation. We use a sample of 54,888 Icelanders with both parents genotyped to estimate the heritability of 14 traits, including height (55.4%, S.E. 4.4%) and educational attainment (17.0%, S.E. 9.4%). Our results suggest that some other estimates of heritability could be inflated by environmental effects.

Introduction

Heritability measures the proportion of trait variation in a population that is due to genetic inheritance. Estimation of the relative importance of genetic inheritance (nature) versus environment (including nurture) has generated much controversy¹. Historically, most estimates of heritability for human traits have come from twin studies²^,³. Some more recent methods propose to estimate heritability by modelling the effects of genome-wide single nucleotide polymorphisms (SNPs)⁴. We refer to these methods as GREML-SNP methods, referencing inference on genomic relatedness (GR), estimated from SNPs, using restricted maximum likelihood (REML). In order to reduce the influence of non-additive genetic effects and environmental effects, samples are pruned so that no pair is related above some low threshold⁴, typically 0.025 or 0.05.

Instead of modelling the effects of SNPs, heritability can be estimated by examining how phenotypic similarity changes with relatedness. Relatedness is measured by the fraction of the genome a pair shares in segments inherited from a common ancestor, called IBD (identical-by-descent) segments. (We note that what we call ‘relatedness’ here has sometimes been termed ‘realised relatedness’ to distinguish it from expected relatedness given a pedigree⁵). Sharing of an IBD segment implies sharing of all genetic variants in that segment, except for mutations that occurred since the last common ancestor of the segment. This implies that IBD based methods can capture nearly all of the heritability of a trait. In contrast, GREML-SNP methods can only capture the fraction of the heritability explained by genotyped SNPs⁴. Another advantage of IBD based methods over GREML-SNP methods is that they do not make assumptions about the distribution of SNP effect sizes. Violation of these assumptions has been shown to introduce bias to GREML-SNP estimates of heritability⁴^,⁶.

An IBD based method, which we call the ‘Kinship’ method, examines how phenotypic similarity increases with relatedness for all pairs from a population sample⁷. When close relatives have more similar environments than distant relatives, the Kinship method will overestimate heritability, as it is unable to distinguish between similarity due to genetic effects and environmental effects. To reduce environmental bias, modelling of environmental effects shared between close relatives has been suggested⁸^,⁹. However, environmental similarity may increase with relatedness across much of the relatedness spectrum: siblings may have more similar environments than cousins, and so on, down to distant relatives. In this case, modelling environmental covariance between close relatives alone will not remove environmental bias from the Kinship method. While an extension to the Kinship method has been developed that models spatially distributed environmental effects¹⁰, most environmental effects do not follow a simple, spatial distribution.

A different IBD based method, which we call ‘Sib-Regression’, restricts the analysis to sibling pairs⁵. There are two copies of each piece of DNA in each parent. Whether a sibling inherits one or the other copy of a piece of DNA from a parent is like the outcome of a fair coin toss. The coin toss represents the outcome of random Mendelian segregation of DNA in the parent during meiosis. Whether both siblings inherit the same copy of a piece of DNA is like whether two independent tosses of a fair coin both come out the same. Therefore, the siblings inherit the same copy of DNA from a parent half of the time on average. Most of the variation around the average relatedness is due to random segregations in the parents of the siblings. The random segregations are independent of almost all environmental effects. Sib-Regression therefore avoids most sources of environmental bias. However, Sib-Regression requires hundreds of thousands of genotyped sibling pairs to obtain precise heritability estimates, whereas existing applications have used ~20,000 sibling pairs or less⁵^,¹¹.

Here we introduce a novel method for estimating heritability, relatedness disequilibrium regression (RDR). RDR looks at how much more or less related a pair is than would be expected from the relatedness of the parents. This deviation we call ‘relatedness disequilibrium’. Relatedness disequilibrium is due to random Mendelian segregations in the parents during meiosis, so is independent of almost all environmental effects. Unlike Sib-Regression, RDR can use any pair of individuals, provided there is genetic information on the parents of the pair. By using all pairs from a large sample with both parents genotyped, RDR can obtain precise estimates of heritability with negligible bias due to environment. We apply RDR to estimate heritability for 14 quantitative traits in Iceland.

Results

Defining heritability through random segregation

We first distinguish direct genetic effects and indirect genetic effects: a direct genetic effect is the effect of genetic material in a body on that body, whereas an indirect genetic effect is the effect on another body (Supplementary Note)¹²^–¹⁴. For example, if parenting affects the educational attainment of offspring, then there could be indirect genetic effects from parent to offspring, which we term ‘parental genetic nurturing effects’¹⁴. Any allele inherited by the phenotyped individual (proband) was also present in one of its parents, implying the allele can have both direct and parental genetic nurturing effects on the proband. However, parental genetic nurturing effects, and other indirect genetic effects, are environmental effects from the perspective of the individual whose trait is affected. The heritability of the trait is thus defined as the fraction of trait variation in the population that is explained by direct genetic effects alone.

To separate variation due to direct genetic effects (heritability) from variation explained by the environment, we use random segregation during meiosis. This approach is analogous to the transmission disequilibrium test (TDT) for a direct genetic effect of an allele on a phenotype¹⁵^–¹⁷. Proband genotype is determined by the genotypes of the proband’s parents and random segregations. The TDT looks for an association between the phenotype and the variation in proband genotype caused by random segregations in the parents. This separates association due to direct genetic effects from association due to environment. Similarly, by using random segregation, phenotypic variation can be decomposed into variation due to direct genetic effects alone and other components. Assuming direct genetic effects are additive and there is no gene-by-environment interaction, the decomposition is (Supplementary Note):

Var (Y) = v_{g} + v_{e \sim g} + c_{g, e} + Var (ϵ);

(1)

where ν_g is the variance explained by direct genetic effects, and h² = ν_g/Var(Y) is the heritability; ν_e~g is the variance of the part of the environmental component of the phenotype that is correlated with parental genotype, which includes the variance explained by (additive) parental genetic nurturing effects; c_g,e is the covariance between direct genetic effects and environmental effects; and Var(ϵ) is the variance of the component of the phenotype that is uncorrelated with both proband genotype and parental genotype.

RDR covariance model

The variance decomposition (1) leads to a decomposition of the covariance matrix of a vector of observations of a phenotype, Y. Under certain assumptions (Supplementary Note):

Cov (Y) = v_{g} R + v_{e \sim g} R_{par} + c_{g, e} R_{o,par} + Cov (ϵ),

(2)

where [R]_ij is the relatedness of individual i and individual j, [R_par]_ij is the relatedness of the parents of i and the parents of j; [R_o,par]_ij is the relatedness of i and the parents of j and j and the parents of i (Online Methods). In general, Cov(ϵ) is unknown and can be similar to R. For example, family environment effects that are independent of genetics will cause closely related pairs to be more similar than distantly related pairs. Furthermore, pairs that are more related than the average are more likely to be from the same region and thereby have more similar environments¹⁰.

To fit the RDR covariance model, we make the simplifying assumption that Cov(ϵ) = σ²I. Importantly, violation of the assumption that Cov(ϵ) = σ²I does not introduce bias to RDR estimates of heritability, as we outline below.

Environmental bias properties of RDR

By using random segregation, both RDR and the TDT separate direct genetic effects from environmental effects. The TDT achieves this by conditioning on parental genotype, whereas RDR achieves this by conditioning on parental relatedness. The expectation of offspring genotype given its parents’ genotypes is one half of the sum of the parents’ genotypes, and any variation around this expectation comes from random segregation. Similarly, the expectation of offspring relatedness, [R]_ij, given parental relatedness, [R_par]_ij, is [R_par]_ij/2, and any variation around this expectation comes from random segregation (Figure 1, Supplementary Figure 1, and Supplementary Note). (Note that this relationship does not hold for pairs where one is the direct ancestor of the other, such as parent-offspring pairs.)

For all pairs of individuals *i,j* from 20,000 Icelanders with both parents genotyped, the relatedness of i and j, [R]_i,j, is compared to the relatedness of the parents of i and the parents of j, [R_par]_ij. The number of pairs in each hexagonal bin is indicated by shading. Relationships determined by the deCODE Genealogy database are indicated: GP-GC, grandparent-grandchild; P-O, parent-offspring; and sibling. The solid diagonal line indicates the expectation of [R]_ij, which is [R_par]_ij/2, except for pairs where one is a direct ancestor of the other (Supplementary Note). The dashed diagonal line indicates the regression line (excluding parent-offspring and grandparent-grandchild pairs), with intercept -1x10^-4, gradient 0.493, and variance explained 84%. The small deviation of the regression line from the theoretical expectation is likely due to some IBD segments shared between parents being broken up by recombination, resulting in a small fraction of segments in the offspring being too small to detect. Relatedness disequilibrium is the variation in [R]*_ij* around [R_par]_ij/2. Relatedness disequilibrium is due to independent, random segregations in the parents, except for pairs where one is the direct ancestor of the other.

By fitting R and R_par jointly, RDR uses the variation in [R]_ij around its expectation, [R_par]_ij/2, to estimate heritability. We call this variation relatedness disequilibrium. For a pair, relatedness disequilibrium is caused by random segregations in the parents of the pair, so is independent of sharing of all environmental effects apart from indirect genetic effects between the pair. This insight forms the basis of a mathematical proof that heritability estimates from RDR converge to the true heritability, when the sample excludes pairs that have indirect genetic effects on each other and excludes pairs where one is the direct ancestor of the other (Supplementary Note). If indirect genetic effects are restricted to close relatives, the bias is likely to be small for RDR because close relatives comprise only a small fraction of the pairs in a large population sample. The bias due to indirect genetic effects could be much larger for methods that rely on close relatives, such as Sib-Regression and twin studies.

Pairs where one is the direct ancestor of the other can introduce bias because they have an atypical relationship between [R]_ij and [R_par]_ij (Figure 1). However, they will comprise a small fraction of the total pairs in a large population sample, even if multiple generations are genotyped. For our sample, around 30% also have a parent or grandparent in our sample, but parent-offspring and grandparent-grandchild pairs comprise only 0.0014% of all pairs. In simulations, we could not detect bias due to inclusion of parent-offspring and grandparent-grandchild pairs (Online Methods and Supplementary Table 1), so we did not remove individuals from our sample that also have a parent or grandparent in our sample.

Simulation of RDR heritability estimation

We tested RDR for simulated traits in our sample and compared RDR to Sib-Regression, the Kinship method, and the Kinship method allowing for an effect of shared family environment, which we call the ‘Kinship F.E.’ method. We determined whether pairs shared a family environment by whether they shared a mother according to the deCODE Genealogy Database. The modelling of the environment in the Kinship F.E. model is similar to a recently proposed extension of the Kinship model⁸. We randomly selected 10,000 SNPs to act as causal SNPs for our simulations (Online Methods). The SNPs had a minimum minor allele frequency (MAF) of 0.5% and median MAF of 22.8%. We simulated traits in a random subsample of 10,000 individuals with both parents genotyped for all the methods other than Sib-Regression, where we used all 54,888 individuals with both parents genotyped.

We first confirmed that heritability estimates for all the methods were approximately unbiased for traits determined by additive, direct genetic effects and random noise (‘additive’ trait, Table 1, Supplementary Tables 2 and 3).

Table 1. Comparison of heritability estimates for simulated traits.

The mean heritability estimates, expressed as a % of the phenotypic variance, from four different methods (RDR, Kinship, Kinship F.E., Sib-Regression) for different simulated traits. The true (narrow-sense) heritability of each trait was 40%. We simulated 500 replicates of each trait based on Icelandic genetic data from a random subsample of 10,000 individuals with both parents genotyped (Methods) --- apart from for Sib-Regression, where we used all 54,888 individuals. Ten thousand SNPs with median minor allele frequency (MAF) 22.8% were given additive effects for all the traits other than the ‘rare SNPs’ trait, for which 2,200 SNPs with MAF between 0.1% and 1% (median 0.26%) were used. To the additive genetic component, only noise was added for the ‘additive’ trait and the ‘rare SNPs’ trait. For the ‘epistatic’ trait, 10% of the phenotypic variance was due to pairwise interactions between SNPs. For the ‘dominance’ trait, 10% of the phenotypic variance was due to dominance effects. For the other traits, effects representing different sources of environmental confounding were added in addition to noise and the additive genetic component. For the ‘regional’ trait, each region of Iceland (sysla) was given an effect; for the ‘maternal environment' trait, an environmental effect shared between those who share mothers was added; for the ‘genetic nurturing trait’, the genotypes of the parents were also given effects to simulate ‘parental genetic nurturing’ effects¹⁴. For the ‘regional’ trait, the Kinship and Kinship F.E. methods also included adjustment for 20 genetic principal components.

	RDR		Kinship		Kinship F.E.		Sib-Regression

Trait	Estimate (%)	Standard Error (%)	Estimate (%)	Standard Error (%)	Estimate (%)	Standard Error (%)	Estimate (%)	Standard Error (%)
additive	39.3	0.62	40.4	0.15	40.5	0.18	41.2	0.69
genetic nurturing	39.4	0.49	92.7	0.09	82.8	0.14	40.4	0.37
maternal environment	38.9	0.58	76.3	0.17	39.9	0.18	41.1	0.37
regional	38.3	0.60	59.0	0.17	58.3	0.20	32.1	0.63
rare SNPs	35.0	0.64	39.5	0.15	39.4	0.19	39.7	0.67
epistatic	41.3	0.60	44.2	0.16	43.3	0.19	50.1	0.63
dominance	40.5	0.63	42.7	0.15	41.1	0.19	50.5	0.71

Open in a new tab

We simulated a trait where individuals who shared a mother shared a random environmental effect. We found that the Kinship method greatly overestimated the heritability of this trait (‘maternal environment’ trait, Table 1). However, the Kinship F.E. estimates of heritability were approximately unbiased. Both Sib-Regression and RDR estimates were approximately unbiased.

The results for the ‘maternal environment’ trait show that modelling a family environment effect can remove bias from the Kinship method in certain circumstances. However, when indirect genetic effects from relatives are present, modelling the family environment is ineffective at removing bias. To show this, we simulated a trait determined by direct genetic effects, parental genetic nurturing effects, and random noise (‘genetic nurturing’ trait, Table 1). For the simulated trait, the genetic nurturing effect of each SNP was a fixed fraction of its direct effect, generating a substantial covariance term, c_g,e. The variance components as a percentage of the phenotypic variance were: ν_g = 40%, ν_e~g = 10%, and c_g,e ≈ 28%, bringing the total variance explained by parent and offspring genotype to ~78%.

We found that the Kinship method greatly overestimated the heritability of the ‘genetic nurturing’ trait. Modelling of the family environment only slightly reduced the bias, with the Kinship F.E. estimates of heritability over twice the true value. The reason for this is that parental genetic nurturing effects induce correlations between all pairs with non-zero parental relatedness, not just those that share a family environment. This leads to an increase in environmental similarity with relatedness across the relatedness spectrum.

We simulated a trait affected by population stratification. For this trait, each region of Iceland had a different mean trait value (Supplementary Note). We found that the Kinship and Kinship F.E. estimates of heritability were upwardly biased when adjusting for 20 genetic principal components (‘regional’ trait, Table 1). Adjusting for 100 principal components, the mean Kinship F.E. heritability estimate was 57.6% (S.E. 0.21%), still considerably larger than the true heritability, 40%. In contrast, RDR estimates were approximately unbiased. This is because relatedness disequilibrium is caused by random segregations so is uncorrelated with regional co-localization.

In some cases, IBD based methods such as RDR will not capture the phenotypic variance explained by recent mutations, which are rare in the population. To measure how well RDR captures variance from rare variants, we simulated a trait determined by additive, direct effects of SNPs with MAFs between 1% and 0.1%, with median MAF 0.26% (Supplementary Note). RDR captured ~88% of the variance explained by the rare SNPs.

We found that RDR estimates were insensitive to non-additive genetic effects. The mean RDR estimates were close to the true narrow-sense heritability (40%) for traits influenced by both pairwise interactions between SNPs and dominance effects (Table 1). In contrast, the mean Sib-Regression estimates were close to the sum of the variance explained by additive and non-additive genetic effects (Table 1).

RDR estimates of heritability for 14 human traits

We estimated the variance components of the RDR covariance model for 14 quantitative traits (Online Methods, Table 2, Supplementary Table 4, and Supplementary Figure 2). For the exact same probands that RDR was applied to, heritability estimates were obtained from the Kinship and Kinship F.E. methods (Online Methods, Table 2, Figure 2). For 11 of the 14 traits, the Kinship F.E. estimate, $h_{kinFE}^{2},$ is bigger than the RDR estimate, $h_{RDR}^{2}$ (average $h_{kinFE}^{2} - h_{RDR}^{2} = 12.1 %$ ). We found that $h_{kinFE}^{2}$ was statistically significantly higher than $h_{RDR}^{2}$ (p<0.05, one sided test assuming $h_{kinFE}^{2}$ and $h_{RDR}^{2}$ are independent, so p-values represent an upper bound) for educational attainment $(h_{kinFE}^{2} - h_{RDR}^{2} = 35.4 %, p < 2.2 \times 10^{- 4}),$ height $(h_{kinFE}^{2} - h_{RDR}^{2} = 22.6 %, p < 1.3 \times 10^{- 6}),$ body mass index (BMI) $(h_{kinFE}^{2} - h_{RDR}^{2} = 17.8 %, p < 4.3 \times 10^{- 3}),$ and age at first child in women $(h_{kinFE}^{2} - h_{RDR}^{2} = 10.9 %, p < 0.043) .$ We found no evidence that differences between $h_{kinFE}^{2}$ and $h_{RDR}^{2}$ were driven by atypical properties of the sample with both parents genotyped or by differences in mean trait values between the regions of Iceland (Supplementary Note and Supplementary Table 5).

Table 2. Heritability estimates.

For each trait, the sample size used for the RDR, Kinship F.E., RDR-SNP, and RELT-SNP methods is given under ‘n’, and the sample size for Sib-Regression (‘Sib-Reg.’) given under ‘sib-pairs’. Each heritability estimate is expressed as a percentage of the phenotypic variance and is followed by its standard error in brackets. RDR, Kinship F.E., RDR-SNP, and RELT-SNP estimates are from the exact same Icelandic samples with both parents genotyped, and samples were restricted to those born between 1951 and 1997 for BMI and traits measured from blood, and samples were restricted to those born between 1951 and 1995 for height. In order to maximise sample size, Sib-Regression estimates are from all genotyped Icelandic sibling pairs available without year-of-birth restrictions. Twin studies estimates are from the Swedish Twin Registry¹⁹, apart from for education, which is from a meta-analysis of Scandinavian twin studies²³ (Supplementary Table 6). Trait abbreviations: BMI, body mass index; AFCW, age at first child in women; AFCM, age at first child in men; menarche, age at menarche (years); education, educational attainment (years); total chol., total cholesterol; HDL, high density lipoprotein; glucose, fasting glucose; MCH, mean cell haemoglobin; MCHC, mean cell heamoglobin concentration; MCV, mean cell volume.

		RDR		Kinship F.E.		RDR-SNP		RELT-SNP		Sib-Regression			Twin
Trait	n	Est. (%)	S.E. (%)	Est. (%)	S.E. (%)	Est. (%)	S.E. (%)	Est. (%)	S.E. (%)	sib-pairs	Est. (%)	S.E. (%)	Est. (%)	S.E. (%)
BMI	19,589	28.9	6.3	46.7	2.5	34.2	2.9	36.1	3.4	56,461	38.5	12.0	65	3.8
height	21,802	55.4	4.4	78.0	1.9	44.5	2.3	55.2	4.4	64,847	68.4	9.6	81	-
AFCW	22,367	22.6	6.0	33.5	2.1	11.7	2.6	20.1	2.3	30,582	32.0	17.4	-	-
AFCM	17,117	14.9	7.9	16.3	2.6	11.5	3.4	12.3	2.2	21,729	55.3	21.3	-	-
menarche	11,242	30.9	10.5	41.9	4.0	26.8	5.0	33.9	4.2	16,621	50.6	23.1	75	6.9
education	12,035	17.0	9.4	52.4	3.7	17.3	4.4	29.2	4.4	32,542	39.7	14.8	43	3.6
total chol.	27,320	30.6	5.0	32.2	1.8	23.5	2.3	24.2	2.2	74,271	15.1	12.9	57	3.8
HDL	24,570	44.8	5.3	45.1	2.1	32.0	2.5	29.7	2.7	67,894	50.5	11.4	69	3.1
triglycerides	24,099	24.2	5.7	29.8	2.0	23.8	2.6	25.8	2.4	62,746	35.8	12.1	61	3.7
glucose	19,500	15.9	7.2	23.6	2.3	15.8	3.1	16.8	2.3	36,469	29.6	18.5	59	4.0
creatinine	38,929	22.9	3.7	22.2	1.3	16.9	1.6	17.2	1.6	98,385	4.0	11.1	59	1.5
MCH	43,917	38.5	3.2	36.8	1.2	29.3	1.5	28.7	1.9	107,711	40.3	10.2	-
MCHC	43,963	14.9	3.3	18.4	1.1	12.5	1.5	13.0	1.2	107,833	15.8	10.5	-
MCV	43,919	39.1	3.1	38.5	1.2	31.1	1.5	29.8	2.0	107,702	35.9	10.2	-

Open in a new tab

Fig. 2 — Horizontal intervals show +/-1.96 standard errors for the estimates on the x-axis, and vertical intervals show +/-1.96 standard errors for the estimates on the y-axis. See Table 2 for numerical values. A) Comparison of RDR to ‘Kinship F.E.’. B) Comparison of RDR-SNP to RELT-SNP. C) Comparison of RDR to Sib-Regression⁵ estimates. Intervals for the RDR estimates are not shown to better display Sib-Regression intervals. D) Comparison to published twin studies estimates from the Swedish Twin Registry¹⁹, apart from for education, which is from a meta-analysis of Scandinavian twin studies²³ (Supplementary Table 6). Trait abbreviations: BMI, body mass index; AFCW, age at first child in women; AFCM, age at first child in men; education, educational attainment (years); cholesterol, total cholesterol; HDL, high density lipoprotein; glucose, fasting glucose; MCH, mean cell haemoglobin; MCHC, mean cell heamoglobin concentration; MCV, mean cell volume.

Using Icelandic data, but without limiting to probands with parents genotyped, Sib-Regression estimates of heritability, denoted by $h_{sib}^{2},$ were computed (Online Methods, Table 2 and Figure 2). RDR estimates were more precise than Sib-Regression estimates for every trait, and, on average, the estimated standard errors for $h_{sib}^{2}$ were 2.5 times larger than those for $h_{RDR}^{2},$ implying the effective sample size for RDR is around 6.25 times higher than for Sib-Regression. If a difference between RDR and Sib-Regression exists, it could be a consequence of indirect genetic effects between siblings¹⁸, epistasis, dominance, and/or rare variants. However, the lack of precision in Sib-Regression estimates implies that the power to detect differences is low, and we did not find any statistically significant differences.

There are not enough monozygotic twins in the Icelandic data to obtain precise twin estimates of heritability. To compare RDR results with twin studies from a similar population, we took estimates from the Swedish Twin Registry¹⁹ denoted by $h_{twin}^{2},$ which were available for nine of the fourteen traits (Online Methods, Table 2, Figure 2, and Supplementary Table 6). The difference $h_{twin}^{2} - h_{RDR}^{2}$ was above zero and statistically significant (p<0.05) for all nine traits, with an average difference of 33.2%. For Sib-Regression, the average difference $h_{twin}^{2} - h_{sib}^{2}$ was 26.4%. The fact that both RDR and Sib-Regression estimates are substantially lower than twin studies estimates could be due to differences in heritability between our sample and the samples of twins and/or overestimation of heritability by twin studies.

GREML-SNP estimates are biased by genetic nurturing

SNP based methods, such as GREML-SNP, will generally capture a smaller fraction of the full heritability of a trait than IBD based methods, such as RDR, making direct comparison of environmental bias difficult. We therefore introduce RDR-SNP, which uses SNPs to estimate the three relatedness matrices of the RDR covariance model (Online Methods). In other words, R, R_par, and R_o,par are replaced by estimates from a set of SNPs: R^snp, $R_{par}^{snp},$ and $R_{o,par}^{snp} .$ The only difference between RDR-SNP and GREML-SNP is that RDR-SNP also fits $R_{par}^{snp}$ and $R_{o,par}^{snp}$ in addition to R^snp.

In order to compare RDR-SNP to typical GREML-SNP analysis, we simulated traits in a subset of the UK Biobank²⁰ where genotype data on both parents was available (n=973). As in typical GREML-SNP analysis, we pruned the sample so that no pair of individuals had relatedness greater than 0.05, leaving 937 individuals (Supplementary Note).

We randomly sampled 11,771 SNPs to act as causal SNPs, and we calculated R^snp, $R_{par}^{snp},$ and $R_{o,par}^{snp}$ from this set (Supplementary Note). We simulated a trait determined only by additive, direct effects of SNPs and random noise. Both GREML-SNP and RDR-SNP estimated the true heritability, 20%, without detectable bias: mean estimate 19.76% (0.15% S.E.) for GREML-SNP and 19.70% (0.30% S.E.) for RDR-SNP.

Alleles transmitted to offspring are also present in the parents, so have both direct and parental genetic nurturing effects. Let δ be the direct effect of a SNP, and let η be the parental genetic nurturing effect. The effect of the transmitted allele is therefore (δ + η). GREML-SNP uses only transmitted alleles, so is unable to separate the variance from the direct effect alone, proportional to δ², from the variance explained by the combined direct and parental genetic nurturing effects, proportional to (δ + η)². We investigated this theoretically (Supplementary Note) and by simulating a trait with both direct and genetic nurturing effects. We set the genetic nurturing effect of each variant to be one third of its direct effect, similar to the estimated ratio for educational attainment in Iceland¹⁴. The direct effects explained 20% of the phenotypic variance, implying that the total variance explained by transmitted alleles is ${(1 + \frac{1}{3})}^{2} \times 20 % \approx 35.56 %,$ much larger than the heritability, 20%.

The mean GREML-SNP heritability estimate was 35.15% (0.16% S.E.), very close to the total variance explained by combined direct and indirect effects of transmitted alleles, 35.56%, and in close agreement with our theoretical analysis (Supplementary Note). In contrast, RDR-SNP estimated heritability without detectable bias: mean estimate 19.70% (S.E. 0.30%).

Evidence from Icelandic data for bias in GREML-SNP estimates

For typical GREML-SNP analysis, the sample is pruned so that no pair has relatedness above a low threshold, usually 0.025 or 0.05. When a large fraction of the sample is related to another person in the sample above threshold levels, such as in our Icelandic sample, this approach entails a large loss of sample size. A similar approach that avoids a large loss in sample size is to regress elements of the sample phenotypic covariance matrix onto R^snp only for those pairs whose relatedness is less than the threshold. We call this approach RELT-SNP, with RELT standing for ‘relatedness thresholded’. If the same relatedness threshold is applied, GREML-SNP and RELT-SNP estimates from large samples would be expected to be very similar under most conditions. By applying RELT-SNP to the simulated traits in the UK Biobank, we showed RELT-SNP and GREML-SNP give very similar estimates and exhibit the same bias due to parental genetic nurturing effects (Supplementary Note).

In the Icelandic sample, we compared RDR-SNP to RELT-SNP with a relatedness threshold of 0.05 (Online Methods). We first computed RDR-SNP and RELT-SNP estimates for the simulated traits, whose true heritability was 40% (Supplementary Table 7). When using the causal variants to calculate R^snp, $R_{par}^{snp},$ and $R_{o,par}^{snp},$ both RDR-SNP and RELT-SNP gave approximately unbiased estimates of heritability for the ‘additive’, ‘maternal’, ‘epistatic’ and ‘dominance’ traits. For the ‘genetic nurturing’ trait, the average RELT-SNP estimate was 74.1% (0.14% S.E.), close to the variance explained by combined direct and genetic nurturing effects, ~73.3%. In contrast, RDR-SNP estimates were approximately unbiased (h² = 40.1%, 0.07% S.E.). When the causal variants differed from the variants used to calculate the relatedness matrices, a bias was introduced to RDR-SNP, RELT-SNP, and GREML-SNP estimates (Supplementary Table 7).

For the real traits, we estimated heritability using relatedness matrices calculated from 605,966 genome-wide SNPs typically found on Illumina genotyping arrays (Online Methods, Table 2 and Supplementary Table 8). We found that $h_{RELT-SNP}^{2}$ was statistically significantly higher than $h_{RDR-SNP}^{2}$ (p<0.05, one sided test assuming $h_{RDR-SNP}^{2}$ and $h_{RELT-SNP}^{2}$ are independent, so p-values represent an upper bound) for height $(\frac{h_{RELT-SNP}^{2}}{h_{RDR-SNP}^{2}} = 1.24, p < 0.015),$ age at first child in women $(\frac{h_{RELT-SNP}^{2}}{h_{RDR-SNP}^{2}} = 1.72, p < 7.6 \times 10^{- 3}),$ and educational attainment (years) $(\frac{h_{RELT-SNP}^{2}}{h_{RDR-SNP}^{2}} = 1.69, p < 0.027)$ (Methods).

Discussion

We introduced RDR, a novel heritability estimation method, and used it to estimate heritability for 14 quantitative traits in Iceland. Through mathematical investigations and simulations, we demonstrated that RDR estimates of heritability have negligible bias due to environment. In contrast, GREML-SNP, the Kinship method, and the Kinship F.E. method showed substantial bias due to indirect genetic effects from relatives. The GREML-SNP simulations show that removing close relatives does not remove bias due to indirect genetic effects from relatives. Our results suggest that GREML-SNP estimates could be interpreted as estimates of the variance explained by the combined direct and indirect effects of transmitted alleles, rather than the heritability.

For educational attainment, there is evidence for a substantial contribution from indirect genetic effects from parents and siblings¹⁴. This implies that educational attainment heritability estimates from GREML-SNP⁸^,²¹, and the recently proposed extension to the Kinship model⁸, are likely to be upwardly biased. We estimated that GREML-SNP estimates of the heritability of educational attainment could be inflated by a factor of around 1.69. This inflation factor is consistent with genetic nurturing effects around 30% the size of direct effects, which is consistent an estimate using a polygenic score¹⁴ and within-family analyses in other populations²².

RDR, like other methods using IBD segments⁷^,⁸, may underestimate the heritability due to rare variants. However, any underestimation due to rare variants will be less than for GREML-SNP methods applied to typical genotyping arrays. By using IBD segments, RDR captures substantially more of the variance from rare variants, around 88% for variants with MAF between 1% and 0.1%. This implies that the underestimation of heritability by RDR will be small unless very rare variants, especially de-novo mutations, explain a large fraction of the phenotypic variance. Furthermore, Sib-Regression captures variance from all variants other than de-novo mutations not shared by siblings. The fact that Sib-Regression estimates are close to RDR estimates on average argues against substantial underestimation of heritability by RDR due to very rare variants.

Heritability estimates from Swedish twin studies were substantially higher than RDR and Sib-Regression estimates for almost all traits. Some of the difference between could be due to differences in heritability between our Icelandic sample and the Swedish twin samples. Other possible explanations are: overestimation of heritability by twin studies and/or very rare variants, especially de-novo mutations, explaining a substantial fraction of the phenotypic variance.

The RDR method requires parents of probands to be genotyped. Large datasets with this property are currently rare, which is the main reason our current study is limited to the Icelandic population. However, our results argue that large, genotyped samples including close relatives are essential for disentangling nature and nurture. As large population samples become more common, large amounts of family data will inevitably be collected. We therefore expect RDR and related methods to become more widely applied.

Online Methods

Icelandic Sample

All participating subjects donating biological samples signed informed consents and the study was approved by the Data Protection Commission of Iceland (DPC) and the National Bioethics Committee of Iceland. Personal identities of the phenotypes and biological samples were encrypted by a third party system provided by the Icelandic Data Protection Authority.

The Icelandic samples were genotyped using Illumina microarrays as previously described²⁴. The whole genomes of 2,636 Icelanders were sequenced using Illumina technology to a mean depth of at least 10X (median 20X)²⁴. A total of 35.5 million autosomal SNPs and indels were identified using the Genome Analysis Toolkit version 2.3.9.

The deCODE genealogy database is a comprehensive database that includes information on more than 800,000 Icelandic individuals, deceased and living, dating back to the settlement of Iceland 1,200 years ago. The database is constructed from a nationwide census, conducted regularly from the year 1700, church books and other available information, and is particularly complete for the last 200 years. The database includes, when known, information on parents of each individual, gender, year of birth and, if applicable, year of death.

We restricted our analyses to genotyped individuals with both genetic parents genotyped and all four grandparents in the deCODE genealogy database. This left 54,888 individuals. See the Life Sciences Reporting Summary for a summary of sample restrictions and other information.

The individuals and their parents had all been phased and segments shared identical-by-descent (IBD), both within and between individuals, determined by long-range phasing²⁵^,²⁶. To reduce bias due to segments incorrectly called as identical-by-descent, we restricted our analyses to segments of length greater than 5 centi-Morgans. Note that sex-chromosomes were not included.

As a measure of ascertainment bias, we compared years of education between the individuals with both parents genotyped and the full set of individuals with education data. Mean years of education for those with both parents genotyped was 15.07 compared to 13.63 for the whole sample with education data. Part of this is due to the fact that those with both parents genotyped were born later than average, and mean levels of education have increased over time. After regressing out year-of-birth (YOB), YOB², YOB³, the sample with both parents genotyped still had 0.32 years more education on average, compared to a standard deviation of 3.39 years. This shows that our results are slightly biased towards those with higher socio-economic status, which, for many traits, is expected to increase heritability²⁷^,²⁸.

Trait measurements

As a measure of educational attainment, we used information on years of schooling, available for 63,508 individuals, that originated from questionnaires administered in deCODE’s various disease projects and from routine assessments of elderly nursing home residents. As the data have been gathered over the years for the purpose of descriptive demographics rather than for phenotype use, the questions were originally not standardized across projects and many of them have categorical responses. For this study, to make it as consistent as is possible when it comes to the educational attainment trait studied in the published meta-analysis²⁹, efforts were put into mapping the responses to the questionnaires into the UNESCO ISCED classification (see URLs). In particular, the final quantitative measure used, before sex and year-of-birth adjustments, ranges from a minimum of 10 years to a maximum of 20 years.

Height and body mass index (BMI) information, collected primarily through deCODE’s genetic studies on cardiovascular disease, obesity and cancer, were available for 89,615 and 77,285 adult individuals, respectively³⁰^,³¹. About 20% of the information was self-reported.

Blood measurements were collected from three of the largest laboratories in Iceland: Landspítali - The National University Hospital of Iceland, Reykjavík; The Laboratory in Mjódd, Reykjavík; Akureyri Hospital, The Regional Hospital in North Iceland, Akureyri; in addition to the Icelandic Heart Association. For many individuals, multiple blood samples had been taken at different time points. To aid comparability with other studies that have used one time-point only, we took only the first measurement of each individual.

Information on ‘age at first child’ (AAFC) was extracted from the deCODE genealogy database. Age at menarche was determined by the answer to the question ‘How old were you when your menstruation started?’ as detailed elsewhere³².

Apart from educational attainment, traits were quantile-normalised within each sex. Educational attainment was not quantile-normalised as the measurements fall into discrete categories of years of education. The traits were regressed on year-of-birth (YOB), sex, YOB, YOB², YOB³, and the interactions of sex with YOB², YOB³. The residuals of this regression were then used as the phenotype, Y, when fitting the models described below. To ensure our heritability estimates correspond to the adult but not elderly population, we further restricted our analysis to those born between 1951 and 1995 for height, and between 1951 and 1997 for BMI and the traits measured from blood. (Note that for Sib-Regression, the year of birth restrictions were not applied to maximise the sample size.)

Identification of siblings

For the Sib-Regression estimator, we obtained the relatedness for all pairs of genotyped individuals who share both parents in the genealogy. To ensure we only used true full-siblings, we clustered the pairs by relatedness into four clusters using k-means clustering: unrelated, half-sibling, full-sibling, and monozygotic twins. This left 127,264 full-sibling pairs, comprised of 70,317 unique individuals, whose relatedness distribution had a mean of 0.502 and a standard deviation of 0.0382. To maximise the precision of the Sib-Regression estimator, we did not restrict by year-of-birth or by number of parents genotyped, so the sample used is different to the sample used for the other estimators.

Calculation of IBD relatedness matrices

To calculate R, R_par, and, R_o,par, we used formulae based on the genetic covariance in a population descending from a finite number of ancestors³³ (Supplementary Note):

{[R]}_{ij} = \frac{1}{2} \sum_{k, l = m, p} ({IBD}_{ij}^{kl} - K_{0}) / (1 - K_{0})

where K₀ is the mean kinship over all pairs in the population, and ${IBD}_{ij}^{kl}$ is the proportion of the maternally inherited haplotype of i shared identical-by-descent (IBD) with the paternally inherited haplotype of j;

{[R_{par}]}_{ij} = \frac{K_{p (i) p (j)} + K_{p (i) m (j)} + K_{m (i) p (j)} + K_{m (i) m (j)} - 4 K_{0}}{(1 - K_{0})}

where K_p(i)m(j) is the kinship between the father of i and the mother of j;

{[R_{o,par}]}_{ij} = \frac{K_{i p (j)} + K_{i m (j)} + K_{m (i) j} + K_{p (i) j} - 4 K_{0}}{(1 - K_{0})}

where K_im(j) is the kinship between i and the mother of j, etc.

Calculation of SNP relatedness matrices

To perform RDR-SNP analysis, we calculated relatedness matrices from SNPs $(R^{snp}, R_{par}^{snp}, R_{o,par}^{snp})$ that are analogous to the IBD relatedness matrices used in RDR (R, R_par, and, R_o,par). Consider a sample of n individuals genotyped at l bi-allelic SNPs, where the genotype is expressed as the copy number (0,1, or 2) of one of the two alleles. Let G be the [n × l] matrix of genotypes standardised to have mean zero and variance 1. The matrix R^snp is equivalent to that used in standard GREML-SNP analysis, and is calculated as R^snp = l⁻¹GG^T.

To calculate $R_{par}^{snp}$ and $R_{o,par}^{snp},$ we first have to form parental genotypes. Let G_m be the [n × l] matrix of genotypes of the mothers of the n individuals in the sample, and let G_p be the [n × l] matrix of the genotypes of the fathers. Then G_par = G_m + G_p is the parental genotype matrix, with entries from {0,1,2,3,4}. We normalised the columns of G_par to have mean zero and variance two. The variance is naturally twice that of the offspring genotype in an outbred population as each entry is the sum of maternal and paternal genotypes. Then

R_{par}^{snp} = {(2 l)}^{- 1} G_{par} G_{par}^{T}; R_{o, par}^{snp} = {(2 l)}^{- 1} (G G_{par}^{T} + G_{par} G^{T}) .

The matrices are calculated in this way to ensure estimates of v_g, v_e~g, and c_g,e are properly calibrated. These equations can be derived from a random effects model (Supplementary Note).

For the analysis of the real traits, we computed relatedness matrices from SNPs from the Illumina Framework SNP set. The Illumina Framework SNP set is a set of 611,173 SNPs shared between many of the Illumina genotyping arrays used to genotype the Icelandic sample²⁴. We used this set of SNPs in order to make our analysis comparable to applications of GREML-SNP to data from typical genotyping arrays. Before computing relatedness matrices, we removed SNPs with imputation information below 0.9999 and MAF less than 1%, leaving 605,966 SNPs. For the simulated traits, we also computed relatedness matrices from only the causal SNPs (Supplementary Note).

Computing Relatedness Disequilibrium Regression (RDR) estimates

The relatedness disequilibrium regression (RDR) covariance model is

Cov (Y) = v_{g} R + v_{e \sim g} R_{par} + c_{g, e} R_{o, par} + σ^{2} I .

We investigated fitting this model by least squares regression of the off-diagonal elements of the sample phenotypic covariance matrix on the off-diagonal elements of the relatedness matrices:

(y_{i} - \bar{y}) (y_{j} - \bar{y}) \sim {[R]}_{i j} + {[R_{par}]}_{i j} + {[R_{o,par}]}_{i j},

where y_i is the phenotype observation for individual i, and ȳ is the sample phenotype mean. We excluded both parent-offspring and grandparent-grandchild pairs from the regression, as these pairs violate the relationship between [R]_ij and [R_par]_ij required for removal of environmental bias from estimation of v_g (Figure 1 and Supplementary Note). We also investigated fitting the model by unconstrained restricted maximum likelihood in GCTA³⁴, under the assumption the trait follows a multivariate normal distribution:

Y \sim N (μ, v_{g} R + v_{e \sim g} R_{par} + c_{g, e} R_{o, par} + σ^{2} I) .

For the maximum likelihood method, one can only remove individuals, and all the pairs including that individual, not arbitrary pairs. Around 30% of the sample with both parents genotyped have an ancestor who also has both parents genotyped. We therefore did not exclude individuals so that no parent-offspring and no grandparent-grandchild pairs were present, as this would have resulted in a large loss of sample size.

In our simulations, we found that RDR estimates from maximum likelihood and RDR estimates from least-squares were both approximately unbiased, with no consistent advantage in bias evident from fitting the model by least-squares after excluding parent-offspring and grandparent-grandchild pairs (Supplementary Table 1). However, least-squares estimates were considerably less precise than those from maximum likelihood. We therefore used maximum likelihood without exclusion of parent-offspring and grandparent-grandchild pairs for all analyses in the main text. For the real traits, the results from least-squares were consistent with the results from maximum likelihood, but the least-squares estimates were considerably less precise (Supplementary Table 5).

To obtain RDR-SNP estimates, we fitted the following model by restricted maximum likelihood in GCTA:

Y \sim N (μ, v_{g} R^{snp} + v_{e \sim g} R_{par}^{snp} + c_{g, e} R_{o,par}^{snp} + σ^{2} I) .

Computing Kinship and Kinship F.E. estimates

To obtain heritability estimates from the Kinship method, we fitted the following model for a vector of phenotype observations Y:

Y \sim N (X_{kin} b, v_{g} R + σ^{2} I) .

For the Kinship F.E. model, we added a variance component that modelled shared family environment:

Y \sim N (X_{kin} b, v_{g} R + v_{c} C + σ^{2} I),

where [C]_ij = 1 if i and j shared a mother according to the deCODE genealogy database, otherwise [C]_ij = 0 For all of the simulated traits other than the ‘regional’ trait, X_kin was a constant. For the ‘regional’ trait, it also included the top 20 genetic principal components. In the real trait analysis, X_kin included the top 20 genetic principal components. For both the Kinship and Kinship F.E. methods, we estimated model parameters by unconstrained restricted maximum likelihood in GCTA³⁴.

Computing RELT-SNP estimates

To compute RELT-SNP estimates, we regressed off-diagonal elements of the phenotypic covariance matrix onto elements of R^snp, excluding elements of R^snp greater than 0.05. Let X be a matrix whose first column has every entry equal to one, and whose other columns are covariates and/or positions on genetic principal components. Let b̂ be the least-squares estimate of vector of regression coefficients of the phenotype on X. Then we formed the sample phenotypic covariance matrix as S = (Y – Xb̂)(Y – Xb̂)^T, where Y is the vector of phenotype observations. Estimates of v_g were computed by regressing off-diagonal elements of S, [S]_ij, on off-diagonal elements of R^snp, [R^snp]_ij, excluding pairs where [R^snp]_ij > 0.05.

We describe a computational procedure for computing RELT-SNP estimates and their standard errors in the Supplementary Note. This procedure builds on previous work expressing the Haseman-Elston regression as a quadratic form³⁵, which takes into account the dependence between elements of S. We found our standard error estimates to be accurate in simulations, with a mean error of 4.3% across the simulated traits (Supplementary Table 9). The RELT-SNP estimates and standard errors were computed using custom Python code.

For the ‘regional’ trait, RELT-SNP was upwardly biased (h² = 45.7%, 0.23% S.E.) but became approximately unbiased (h² = 39.3%, 0.10% S.E.) when the trait was adjusted for 20 genetic principal components. However, we found that adjustment for 20 genetic principal components resulted in a downward bias for the ‘additive’ trait (h² = 38.7%, 0.09% S.E.). We therefore decided to take an approach where we adjusted for principal components only for those traits that exhibited substantial stratification. For the results in Table 2, we adjusted for 20 principal components only for the traits where the variance explained by the top 20 principal components exceeded 1%: height, age at first child in men and women, and educational attainment. We give results with and without control for principal components for all traits in Supplementary Table 8. The choice of 1% was somewhat arbitrary. Arbitrary decisions about how many principal components to control for are a disadvantage of Kinship, GREML-SNP, and RELT-SNP methods. RDR and RDR-SNP, in contrast, do not require such arbitrary decisions, as they separate genetic and environmental effects in a principled way.

Computing Sib-Regression estimates

To obtain Sib-Regression estimates⁵, we fit the regression model

{(y_{i} - y_{j})}^{2} \sim {[R]}_{i j}

for all i, j such that i and j are full-siblings. We fit the regression model by least-squares using custom R code. The estimate of v_g is then minus one half of the estimated regression coefficient. We compared estimating standard errors by the approximate formula given in the original Sib-Regression paper⁵ (equation 17) and estimating standard errors by treating Sib-Regression as a standard univariate linear regression with uncorrelated observations. For the ‘additive’ simulated trait, both gave almost exactly the same estimated standard error, which underestimated the standard error by approximately 9%. We used standard errors estimated from treating Sib-Regression as a standard univariate linear regression with uncorrelated observations for all other results.

Simulations using deCODE data

For all traits other than the ‘rare SNPs’ trait, we used imputed genotypes at 611, 173 SNPs from the Illumina Framework SNP set (see above). We filtered the SNPs so that the minimum imputation information was 0.9999, removing around half of the SNPs. Out of the remaining SNPs passing the filter, we randomly sampled 10,000 SNPs to use as the causal SNPs in our simulations. In the 10,000 selected SNPs, the median imputation information was 1.0000, the minimum minor allele frequency (MAF) was 0.52%, and the median MAF was 22.8%. For the ‘rare SNPs’ trait, we randomly sampled SNPs from all imputed SNPs with MAF between 1% and 0.1% and with imputation information at least 0.9999 and p-value for Hardy-Weinberg deviation greater than 0.05. We sampled 100 such SNPs from each chromosome, giving 2,200 SNPs in total.

For each type of trait, we simulated 500 independent replicates. We briefly describe the simulation of the direct, additive genetic component of each trait, which explained 40% of the phenotypic variance. Apart from for the ‘rare SNPs’ trait, we standardised genotypes so that each SNP’s genotype vector had sample mean zero and sample variance one. Let G represent the matrix of standardised genotypes at the 10,000 causal SNPs. We sampled additive effects of SNPs from a normal distribution. Let β~N(0,I) represent the vector of SNP effects. The additive genetic component, A, was calculated as A = Gβ, and then scaled to explain 40% of the phenotypic variance. For details on simulation of environmental components, see the Supplementary Note.

Simulations in the UK Biobank

To select causal SNPs for phenotype simulation, for each chromosome we randomly sampled 1,500 SNPs then removed those with MAF less than 5% or more than 0.5% missing genotypes. This gave a set of 11,771 SNPs. We mean imputed missing genotypes for both parents and offspring. We simulated 10,000 independent replications of each trait. Let l = 11,771. We standardised offspring genotypes so that the genotypes at each SNP had mean zero and variance 1. Let G be the matrix of standardised offspring genotypes. Here, we describe simulation of the direct, additive genetic component of the traits – for further details, see the Supplementary Note. For each trait, we simulated a normally distributed vector of effects for the l SNPs: β~N(0,0.2l⁻¹I). The additive genetic component of the trait, A, was then calculated as A = Gβ.

Selection of estimates from twin studies

The Swedish Twin Registry¹⁹ is a large sample of twins from a population of similar cultural and genetic composition to Iceland, giving the most precise and valid comparison possible based on published data³⁶^–⁴⁰. The exception is for education, where we used a meta-analysis of Scandinavian twin studies for increased precision²³. For BMI and traits measured from blood, unlike our estimates, the Swedish Twin Registry estimates did not exclude elderly individuals. This is unlikely to account for the higher estimates in the Swedish Twin Registry, as twin correlations and heritability estimates are generally lower in the elderly population².

We took the heritability estimate from the additive-common-environment (ACE) model²^,³ when provided. ACE estimates were not provided for the blood lipid traits, but monozygotic and dizygotic twin correlations were³⁹. We used these to obtain the moment based estimate of the heritability under the ACE model by the formula: 2(r_MZ – r_DZ), where r_MZ is the phenotypic correlation for monozygotic twins, and r_DZ is the phenotypic correlation for dizygotic twins. We took the weighted average of the same-sex and opposite-sex dizygotic twin correlations to estimate r_DZ. For creatinine, the ACE estimate was not provided, and neither were the twin correlations, so we took the published heritability estimate from the ADE model (additive-dominance-environment). The studies used and methods used are summarised in Supplementary Table 6. For height, heritability estimates were only provided for males and females separately, so we took the average estimate. The standard error was not provided. Height and weight estimates were based on self-reported data, whereas our estimates were based on approximately 80% measured and 20% self-reported data. This would be expected to increase our heritability estimates for height and BMI relative to the twin estimates due to a reduction in measurement error. For education, we used a meta-analysis of twin studies in Scandinavian countries, including Sweden, to give a more precise estimate²³. We could not find published estimates based on the Swedish Twin Registry for the haemoglobin traits and for age at first child, so we excluded them from the comparison.

Supplementary Material

Supplementary figures

NIHMS78471-supplement-Supplementary_figures.docx^{(615.1KB, docx)}

Supplementary information

NIHMS78471-supplement-Supplementary_information.pdf^{(966.7KB, pdf)}

Acknowledgements

A.I.Y. was supported by a Wellcome Trust Doctoral Studentship (099670/Z/12/Z) for part of this project. A.I.Y. and A.K. were supported by the Li Ka Shing Foundation for part of this project.

Footnotes

Data Availability

The authors declare that the Icelandic data supporting the findings of this study are available within the article, its supplementary information files and upon request. Applications for access to the UK Biobank data can be made on the UK Biobank website: http://www.ukbiobank.ac.uk/register-apply/.

Code Availability

Code used for estimating heritability by RELT-SNP and Sib-Regression is freely available under an MIT license at https://github.com/AlexTISYoung/RDR.

Author Contributions A.I.Y. conceived and designed the study, performed statistical analyses, contributed analysis tools, developed theoretical results, and wrote the paper. M.L.F. performed statistical analyses and contributed analysis tools. D.F.G. contributed analysis tools, processed raw genotype/sequencing data, and collected and processed phenotype data. G.T. contributed analysis tools, and collected and processed phenotype data. G.B. collected and processed phenotype data. P.S. collected and processed phenotype data. G.M. processed raw genotype/sequence data. U.T. supervised generation of genotype/sequence data and phenotype data. K.S. jointly supervised research and wrote the paper. A.K. conceived and designed the study, jointly supervised research, and wrote the paper.

Competing Financial Interests The authors affiliated with deCODE Genetics are/were employed by the company, which is owned by Amgen, Inc: A.I.Y., M.L.F., D.F.G., G.T., G.B., P.S., U.T., K.S., A.K.

URLs

Educational attainment categories: http://uis.unesco.org/en/isced-mappings.

References

1.Sesardic N. Making Sense of Heritability. Cambridge University Press; 2005. [Google Scholar]
2.Polderman TJC, et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet. 2015;47:702–709. doi: 10.1038/ng.3285. [DOI] [PubMed] [Google Scholar]
3.Boomsma D, Busjahn A, Peltonen L. Classical twin studies and beyond. Nat Rev Genet. 2002;3:872–82. doi: 10.1038/nrg932. [DOI] [PubMed] [Google Scholar]
4.Yang J, Zeng J, Goddard ME, Wray NR, Visscher PM. Concepts, estimation and interpretation of SNP-based heritability. Nat Genet. 2017;49 doi: 10.1038/ng.3941. [DOI] [PubMed] [Google Scholar]
5.Visscher PM, et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2006;2:e41. doi: 10.1371/journal.pgen.0020041. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Speed D, et al. Reevaluation of SNP heritability in complex human traits. Nat Genet. 2017;49:986–992. doi: 10.1038/ng.3865. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Zaitlen N, et al. Using Extended Genealogy to Estimate Components of Heritability for 23 Quantitative and Dichotomous Traits. PLoS Genet. 2013;9:e1003520. doi: 10.1371/journal.pgen.1003520. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Hill WD, et al. Genomic analysis of family data reveals additional genetic effects on intelligence and personality. Mol Psychiatry. 2018;44 doi: 10.1038/s41380-017-0005-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Muñoz M, et al. Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank. Nat Genet. 2016;48:980. doi: 10.1038/ng.3618. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Heckerman D, et al. Linear mixed model for heritability estimation that explicitly addresses environmental variation. Proc Natl Acad Sci. 2016;113:7377–7382. doi: 10.1073/pnas.1510497113. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Hemani G, et al. Inference of the genetic architecture underlying bmi and height with the use of 20,240 sibling pairs. Am J Hum Genet. 2013;93:865–875. doi: 10.1016/j.ajhg.2013.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Baud A, et al. Genetic Variation in the Social Environment Contributes to Health and Disease. PLOS Genet. 2017;13:e1006498. doi: 10.1371/journal.pgen.1006498. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Bijma P. Estimating indirect genetic effects: Precision of estimates and optimum designs. Genetics. 2010;186:1013–1028. doi: 10.1534/genetics.110.120493. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Kong A, et al. The nature of nurture: Effects of parental genotypes. Science (80) 2018;359:424–428. doi: 10.1126/science.aan6877. [DOI] [PubMed] [Google Scholar]
15.Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM) Am J Hum Genet. 1993;52:506–16. [PMC free article] [PubMed] [Google Scholar]
16.Ewens WJ, Spielman RS. The transmission/disequilibrium test: history, subdivision, and admixture. Am J Hum Genet. 1995;57:455–464. [PMC free article] [PubMed] [Google Scholar]
17.Thomson G. Mapping disease genes: family-based association studies. Am J Hum Genet. 1995;57:487–498. [PMC free article] [PubMed] [Google Scholar]
18.Carey G. Sibling imitation and contrast effects. Behav Genet. 1986;16:319–341. doi: 10.1007/BF01071314. [DOI] [PubMed] [Google Scholar]
19.Pedersen NL, Lichtenstein P, Svedberg P. The Swedish Twin Registry in the Third Millennium. Twin Res. 2002;5:427–432. doi: 10.1375/136905202320906219. [DOI] [PubMed] [Google Scholar]
20.Bycroft C, et al. Genome-wide genetic data on ~500,000 UK Biobank participants. bioRxiv. 2017 doi: 10.1101/166298. [DOI] [Google Scholar]
21.Krapohl E, Plomin R. Genetic link between family socioeconomic status and children’s educational achievement estimated from genome-wide SNPs. Mol Psychiatry. 2016;21:437–443. doi: 10.1038/mp.2015.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Lee J, et al. Gene discovery and polygenic prediction from a 1.1-million-person GWAS of educational attainment. Nat Genet. doi: 10.1038/s41588-018-0147-3. (in Press) [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Branigan AR, Mccallum KJ, Freese J. Variation in the heritability of educational attainment: An international meta-analysis. Soc Forces. 2013;92:109–140. [Google Scholar]
24.Gudbjartsson DF, et al. Large-scale whole-genome sequencing of the Icelandic population. Nat Genet. 2015;47:435–444. doi: 10.1038/ng.3247. [DOI] [PubMed] [Google Scholar]
25.Kong A, et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat Genet. 2008;40:1068–75. doi: 10.1038/ng.216. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Kong A, et al. Parental origin of sequence variants associated with complex diseases. Nature. 2009;462:868–874. doi: 10.1038/nature08625. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Tuvblad C, Grann M, Lichtenstein P. Heritability for adolescent antisocial behavior differs with socioeconomic status: Gene-environment interaction. J Child Psychol Psychiatry Allied Discip. 2006;47:734–743. doi: 10.1111/j.1469-7610.2005.01552.x. [DOI] [PubMed] [Google Scholar]
28.Stoolmiller M. Implications of the restricted range of family environments for estimates of heritability and nonshared environment in behavior--genetic adoption studies. Psychol Bull. 1999;125:392. doi: 10.1037/0033-2909.125.4.392. [DOI] [PubMed] [Google Scholar]
29.Okbay A, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016 doi: 10.1038/nature17671. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Gudbjartsson DF, et al. Many sequence variants affecting diversity of adult human height. Nat Genet. 2008;40:609–15. doi: 10.1038/ng.122. [DOI] [PubMed] [Google Scholar]
31.Thorleifsson G, et al. Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nat Genet. 2009;41:18–24. doi: 10.1038/ng.274. [DOI] [PubMed] [Google Scholar]
32.Elks CE, et al. Thirty new loci for age at menarche identified by a meta-analysis of genome-wide association studies. Nat Genet. 2010;42:1077–1085. doi: 10.1038/ng.714. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Young AI, Durbin R. Estimation of Epistatic Variance Components and Heritability in Founder Populations and Crosses. Genetics. 2014;198:1405–1416. doi: 10.1534/genetics.114.170795. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Tamar S. Confidence intervals for heritability via Haseman-Elston regression. Statistical Applications in Genetics and Molecular Biology. 2017;16:259. doi: 10.1515/sagmb-2016-0076. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Carlsson S, Ahlbom A, Lichtenstein P, Andersson T. Shared genetic influence of BMI, physical activity and type 2 diabetes: A twin study. Diabetologia. 2013;56:1031–1035. doi: 10.1007/s00125-013-2859-3. [DOI] [PubMed] [Google Scholar]
37.Silventoinen K, et al. Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res. 2003;6:399–408. doi: 10.1375/136905203770326402. [DOI] [PubMed] [Google Scholar]
38.Baker JH, Thornton LM, Bulik CM, Kendler KS, Lichtenstein P. Shared genetic effects between age at menarche and disordered eating. J Adolesc Heal. 2012;51:491–496. doi: 10.1016/j.jadohealth.2012.02.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Rahman I, et al. Genetic dominance influences blood biomarker levels in a sample of 12,000 Swedish elderly twins. Twin Res Hum Genet. 2009;12:286–294. doi: 10.1375/twin.12.3.286. [DOI] [PubMed] [Google Scholar]
40.Arpegård J, et al. Comparison of heritability of Cystatin C- and creatinine-based estimates of kidney function and their relation to heritability of cardiovascular disease. J Am Heart Assoc. 2015;4:e001467. doi: 10.1161/JAHA.114.001467. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary figures

NIHMS78471-supplement-Supplementary_figures.docx^{(615.1KB, docx)}

Supplementary information

NIHMS78471-supplement-Supplementary_information.pdf^{(966.7KB, pdf)}

[R1] 1.Sesardic N. Making Sense of Heritability. Cambridge University Press; 2005. [Google Scholar]

[R2] 2.Polderman TJC, et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet. 2015;47:702–709. doi: 10.1038/ng.3285. [DOI] [PubMed] [Google Scholar]

[R3] 3.Boomsma D, Busjahn A, Peltonen L. Classical twin studies and beyond. Nat Rev Genet. 2002;3:872–82. doi: 10.1038/nrg932. [DOI] [PubMed] [Google Scholar]

[R4] 4.Yang J, Zeng J, Goddard ME, Wray NR, Visscher PM. Concepts, estimation and interpretation of SNP-based heritability. Nat Genet. 2017;49 doi: 10.1038/ng.3941. [DOI] [PubMed] [Google Scholar]

[R5] 5.Visscher PM, et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2006;2:e41. doi: 10.1371/journal.pgen.0020041. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Speed D, et al. Reevaluation of SNP heritability in complex human traits. Nat Genet. 2017;49:986–992. doi: 10.1038/ng.3865. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Zaitlen N, et al. Using Extended Genealogy to Estimate Components of Heritability for 23 Quantitative and Dichotomous Traits. PLoS Genet. 2013;9:e1003520. doi: 10.1371/journal.pgen.1003520. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Hill WD, et al. Genomic analysis of family data reveals additional genetic effects on intelligence and personality. Mol Psychiatry. 2018;44 doi: 10.1038/s41380-017-0005-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Muñoz M, et al. Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank. Nat Genet. 2016;48:980. doi: 10.1038/ng.3618. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Heckerman D, et al. Linear mixed model for heritability estimation that explicitly addresses environmental variation. Proc Natl Acad Sci. 2016;113:7377–7382. doi: 10.1073/pnas.1510497113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Hemani G, et al. Inference of the genetic architecture underlying bmi and height with the use of 20,240 sibling pairs. Am J Hum Genet. 2013;93:865–875. doi: 10.1016/j.ajhg.2013.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Baud A, et al. Genetic Variation in the Social Environment Contributes to Health and Disease. PLOS Genet. 2017;13:e1006498. doi: 10.1371/journal.pgen.1006498. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Bijma P. Estimating indirect genetic effects: Precision of estimates and optimum designs. Genetics. 2010;186:1013–1028. doi: 10.1534/genetics.110.120493. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Kong A, et al. The nature of nurture: Effects of parental genotypes. Science (80) 2018;359:424–428. doi: 10.1126/science.aan6877. [DOI] [PubMed] [Google Scholar]

[R15] 15.Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM) Am J Hum Genet. 1993;52:506–16. [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Ewens WJ, Spielman RS. The transmission/disequilibrium test: history, subdivision, and admixture. Am J Hum Genet. 1995;57:455–464. [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Thomson G. Mapping disease genes: family-based association studies. Am J Hum Genet. 1995;57:487–498. [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Carey G. Sibling imitation and contrast effects. Behav Genet. 1986;16:319–341. doi: 10.1007/BF01071314. [DOI] [PubMed] [Google Scholar]

[R19] 19.Pedersen NL, Lichtenstein P, Svedberg P. The Swedish Twin Registry in the Third Millennium. Twin Res. 2002;5:427–432. doi: 10.1375/136905202320906219. [DOI] [PubMed] [Google Scholar]

[R20] 20.Bycroft C, et al. Genome-wide genetic data on ~500,000 UK Biobank participants. bioRxiv. 2017 doi: 10.1101/166298. [DOI] [Google Scholar]

[R21] 21.Krapohl E, Plomin R. Genetic link between family socioeconomic status and children’s educational achievement estimated from genome-wide SNPs. Mol Psychiatry. 2016;21:437–443. doi: 10.1038/mp.2015.2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Lee J, et al. Gene discovery and polygenic prediction from a 1.1-million-person GWAS of educational attainment. Nat Genet. doi: 10.1038/s41588-018-0147-3. (in Press) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Branigan AR, Mccallum KJ, Freese J. Variation in the heritability of educational attainment: An international meta-analysis. Soc Forces. 2013;92:109–140. [Google Scholar]

[R24] 24.Gudbjartsson DF, et al. Large-scale whole-genome sequencing of the Icelandic population. Nat Genet. 2015;47:435–444. doi: 10.1038/ng.3247. [DOI] [PubMed] [Google Scholar]

[R25] 25.Kong A, et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat Genet. 2008;40:1068–75. doi: 10.1038/ng.216. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Kong A, et al. Parental origin of sequence variants associated with complex diseases. Nature. 2009;462:868–874. doi: 10.1038/nature08625. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Tuvblad C, Grann M, Lichtenstein P. Heritability for adolescent antisocial behavior differs with socioeconomic status: Gene-environment interaction. J Child Psychol Psychiatry Allied Discip. 2006;47:734–743. doi: 10.1111/j.1469-7610.2005.01552.x. [DOI] [PubMed] [Google Scholar]

[R28] 28.Stoolmiller M. Implications of the restricted range of family environments for estimates of heritability and nonshared environment in behavior--genetic adoption studies. Psychol Bull. 1999;125:392. doi: 10.1037/0033-2909.125.4.392. [DOI] [PubMed] [Google Scholar]

[R29] 29.Okbay A, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016 doi: 10.1038/nature17671. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Gudbjartsson DF, et al. Many sequence variants affecting diversity of adult human height. Nat Genet. 2008;40:609–15. doi: 10.1038/ng.122. [DOI] [PubMed] [Google Scholar]

[R31] 31.Thorleifsson G, et al. Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nat Genet. 2009;41:18–24. doi: 10.1038/ng.274. [DOI] [PubMed] [Google Scholar]

[R32] 32.Elks CE, et al. Thirty new loci for age at menarche identified by a meta-analysis of genome-wide association studies. Nat Genet. 2010;42:1077–1085. doi: 10.1038/ng.714. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Young AI, Durbin R. Estimation of Epistatic Variance Components and Heritability in Founder Populations and Crosses. Genetics. 2014;198:1405–1416. doi: 10.1534/genetics.114.170795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Tamar S. Confidence intervals for heritability via Haseman-Elston regression. Statistical Applications in Genetics and Molecular Biology. 2017;16:259. doi: 10.1515/sagmb-2016-0076. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Carlsson S, Ahlbom A, Lichtenstein P, Andersson T. Shared genetic influence of BMI, physical activity and type 2 diabetes: A twin study. Diabetologia. 2013;56:1031–1035. doi: 10.1007/s00125-013-2859-3. [DOI] [PubMed] [Google Scholar]

[R37] 37.Silventoinen K, et al. Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res. 2003;6:399–408. doi: 10.1375/136905203770326402. [DOI] [PubMed] [Google Scholar]

[R38] 38.Baker JH, Thornton LM, Bulik CM, Kendler KS, Lichtenstein P. Shared genetic effects between age at menarche and disordered eating. J Adolesc Heal. 2012;51:491–496. doi: 10.1016/j.jadohealth.2012.02.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Rahman I, et al. Genetic dominance influences blood biomarker levels in a sample of 12,000 Swedish elderly twins. Twin Res Hum Genet. 2009;12:286–294. doi: 10.1375/twin.12.3.286. [DOI] [PubMed] [Google Scholar]

[R40] 40.Arpegård J, et al. Comparison of heritability of Cystatin C- and creatinine-based estimates of kidney function and their relation to heritability of cardiovascular disease. J Am Heart Assoc. 2015;4:e001467. doi: 10.1161/JAHA.114.001467. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Relatedness disequilibrium regression estimates heritability without environmental bias

Alexander I Young

Michael L Frigge

Daniel F Gudbjartsson

Gudmar Thorleifsson

Gyda Bjornsdottir

Patrick Sulem

Gisli Masson

Unnur Thorsteinsdottir

Kari Stefansson

Augustine Kong

Abstract

Introduction

Results

Defining heritability through random segregation

RDR covariance model

Environmental bias properties of RDR

Figure 1. Relatedness disequilibrium.

Simulation of RDR heritability estimation

Table 1. Comparison of heritability estimates for simulated traits.

RDR estimates of heritability for 14 human traits

Table 2. Heritability estimates.

Fig. 2. Comparison of heritability estimates from different methods.

GREML-SNP estimates are biased by genetic nurturing

Evidence from Icelandic data for bias in GREML-SNP estimates

Discussion

Online Methods

Icelandic Sample

Trait measurements

Identification of siblings

Calculation of IBD relatedness matrices

Calculation of SNP relatedness matrices

Computing Relatedness Disequilibrium Regression (RDR) estimates

Computing Kinship and Kinship F.E. estimates

Computing RELT-SNP estimates

Computing Sib-Regression estimates

Simulations using deCODE data

Simulations in the UK Biobank

Selection of estimates from twin studies

Supplementary Material

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases