Abstract
In order to further illuminate the potential role of dominant genetic variation in the “missing heritability” debate, we investigated the additive (narrow-sense heritability, h2) and dominant (δ2) genetic variance for 18 human complex traits. Within the same study base (10,682 Swedish twins), we calculated and compared the estimates from classic twin-based structural equation model with SNP-based genomic-relatedness-matrix restricted maximum likelihood [GREML(d)] method. Contributions of δ2 were evident for 14 traits in twin models (average δ2twin = 0.25, range 0.14–0.49), two of which also displayed significant δ2 in the GREMLd analyses (triglycerides δ2SNP = 0.28 and waist circumference δ2SNP = 0.19). On average, the proportion of h2SNP/h2twin was 70% for ADE-fitted traits (for which the best-fitting model included additive and dominant genetic and unique environmental components) and 31% for AE-fitted traits (for which the best-fitting model included additive genetic and unique environmental components). Independent evidence for contribution from shared environment, also in ADE-fitted traits, was obtained from self-reported within-pair contact frequency and age at separation. We conclude that despite the fact that additive genetics appear to constitute the bulk of genetic influences for most complex traits, dominant genetic variation might often be masked by shared environment in twin and family studies and might therefore have a more prominent role than what family-based estimates often suggest. The risk of erroneously attributing all inherited genetic influences (additive and dominant) to the h2 in too-small twin studies might also lead to exaggerated “missing heritability” (the proportion of h2 that remains unexplained by SNPs).
Introduction
Heritability is a concept used to denote the relative importance of genetic influences to variability of diseases or complex traits and is loosely defined as the proportion of the phenotypic variance attributed to inherited genetic effects.1 Several methods can be used to estimate heritability. They are based either on modeling of family correlations in related subjects2 (distributions of trait similarities among various types of relatives) or on molecular measurements in related or unrelated subjects.3, 4 The classic twin study, often implemented using structural equation modeling (SEM), is the most commonly used family-based approach. Observed intra-pair correlations among genetically identical, monozygotic (MZ) twins and fraternal, dizygotic (DZ) twins are contrasted in order to partition the phenotypic variance into additive (A) genetic variance—so called narrow-sense heritability (h2), dominant genetic variance (D), and shared (C) and non-shared (E) environmental variance.2, 5 The sum of additive and dominant genetic proportions of variance is often referred to as the broad-sense heritability. As in any family-based modeling, classic twin studies rely on certain important assumptions, the most debated being that MZ and DZ twins share their raising environment to the same extent.
A further complication in the classic twin model is that C and D cannot be estimated simultaneously. This is because the model is under-informed to allow quantification of more than one source to deviance from pure additivity, even if it exists. With this follows that whenever D is indicated in the twin model, it does lend support to contribution from D, but the magnitude will represent the total net deviance from a pure additive genetic model. Positive contributions to this deviance will stem from dominance (interactions between alleles within the same locus), epistasis (interactions between different loci), as well as other types of higher-order interactions, whereas negative contributions will arise from shared environmental factors. Thus, contributions from both C and D components might very well coexist but “mask” each other, so that the net effect appears as contribution from neither.
Recent methodological developments offer alternatives to estimate heritability via SNPs. When restricting the modeling to include only significantly associated loci identified from genome-wide association studies, it typically accounts for a minute proportion of the h2 estimated from twin or pedigree studies, a phenomenon originally denoted “missing heritability.”6 By extending the models to utilize contributions from all common SNPs, SNP-based methods like genomic-relatedness-matrix restricted maximum likelihood (GREML) algorithm implemented in genome-wide complex trait analysis (GCTA) can detect considerable shares of the h2 (typically ∼30%–50%).4, 7 The remainder is what now usually is considered to make up the “missing heritability.”
Recently, Zhu et al. estimated dominant genetic variance (δ2) for human complex traits, by applying an extension of GREML algorithm, called GREMLd.8 The authors observed significant contributions of δ2 in subsets of traits and samples and estimated the global average contribution to be 1/5th of the contribution from A and therefore concluded that dominant genetic variation contributes little to the “missing heritability.” Here we investigate the heritability of 18 robustly measured human complex traits including blood biomarkers of cardiovascular disease, kidney function, and diabetes mellitus, as well as three anthropometric reference traits. Our aim was to further illuminate the potential role of dominant genetic variation in the “missing heritability” discussion, by comparing the estimates from both twin-based SEM and SNP-based GREML(d) within the same study base (10,682 Swedish twins).
Material and Methods
Study Population
The subjects of this study (n = 10,682) have all participated in the TwinGene project,9 a Swedish population-based cohort of twins born between 1911 and 1958. Their average age at phenotypic measurements was 65 (±8) years, and the participants had previously taken part in a computer-assisted telephone interview, Screening Across the Lifespan Twin study (SALT), undertaken between 1998 and 2002. Both of these projects were approved by the local ethics committee at Karolinska Institutet, and all participants have given their informed consent. Zygosity was determined by DNA markers (57% of the study sample) or self-reported childhood resemblance. There were 2,499 monozygotic, 4,154 same-sex dizygotic (SSDZ), and 4,029 opposite-sex dizygotic (OSDZ) twins, totalling 5,074 (48%) men and 5,608 women. Descriptive statistics are shown in Table S1.
Trait Measurements
All physically measured quantitative phenotypes available to all participants in the TwinGene project were investigated. Blood was collected after overnight fasting at a local health-care facility in the morning from Monday to Thursday, to ensure that the tubes with serum would be sent to Karolinska Institutet Biobank before the weekend by overnight post. Samples were stored at −80°C awaiting clinical chemistry and immunological assays. Total cholesterol, triglyceride, low- and high-density lipoproteins, apolipoprotein A1 and B, hemoglobin, C-reactive protein, and glucose were measured by routine methods on semi-automated biochemistry analyzer (Beckman Coulter). Glycosylated hemoglobin A1c was measured by ion exchange chromatography; immunoglobulin A was measured by a reverse-phase protein microarray; cystatin C was measured by particle reinforced immune-turbidimetric analysis using Architect ci8200 immunoassay analyzer; creatinine was measured by an enzymatic method through Arcitect c8000 and Arcitect c16000 (Abbott); and glomerular filtration rate was calculated as 79.901 × (cystatin C mg/l)−1.4389. Height, weight, and waist circumference were measured without shoes and in light clothing. Body mass index (BMI) was calculated as BMI = weight(kg)/height(m)2. The unit and distribution of each trait in different gender and zygosity subgroups are reported in Tables S2 and S3.
Genotyping
For each individual, 7 ml whole blood was collected in an EDTA tube and genomic DNA was extracted by Puregene extraction kit (Gentra Systems) and subsequently stored at −20°C. Subjects with DNA concentration less than 20 ng/μl, as well as a set of 302 female monozygotic pairs participating in a previous genome-wide genotyping effort, were excluded. DNA from all remaining dizygotic individuals and from one twin within each available monozygotic twin pair (in total, n = 9,896) were sent for genotyping with Illumina OmniExpress bead chip (700K). Quality control was performed and exclusions of samples and SNPs were done according to the following criteria: genotype missingness > 0.03, individual missingness > 0.03, minor allele frequency < 0.01, Hardy-Weinberg equilibrium p value < 10−7, sex mismatch, heterozygosity (individuals with an F-statistic larger than five standard deviations from the sample mean), cryptic (unknown) relatedness, or phenotypic information missing on more than five traits. Finally, 9,606 individuals and 644,556 SNPs remained.
Data Handling
Data handling, descriptive statistics, covariate adjustment, and normalization were performed in SAS v.9.4 (SAS Institute). The difference in means between males and females was tested for each trait by t test. Raw values of each trait were adjusted for age, sex, and the first ten principal components based on genotypes (9,617 individuals and 644,556 SNPs that passed genotyping QC) in linear regression models, then residuals from the regression were rank order normalized, resulting in standard normal distributions.
Twin-Based SEM
Twin order was randomly assigned, singletons and pairs with missing values for more than five traits were removed, and finally 3,870 complete twin pairs (1,088 MZ, 1,443 SSDZ, and 1,339 OSDZ) were used in twin-based analyses. Structural equation modeling of the observed covariance in MZ and DZ twin pairs was performed to find maximum likelihood estimates for additive genetic effects (A; the sum of the effects of individual loci), dominant genetic effects (D; interactions between alleles within the same locus), common/shared environmental effects (C; contributes to the similarities between relatives living together), and unique/non-shared environmental effects (E; specific to individual, contributes to the dissimilarities between family members), contributing to the variance within, and covariance between, individuals for each phenotype. Akaike information criterion10 was used to compare the goodness of fit of ACE (a model including A, C, and E), ADE (including A, D, and E), and AE (including A and E) models to find the most parsimonious model. The narrow- and broad-sense heritability (h2 and H2) was estimated, corresponding to the proportion of phenotypic variance attributable to additive genetic variance, or additive plus dominance genetic variances, respectively. All twin-based analyses were performed with the OpenMx package11 in R.
SNP-Based GREML(d)
GREML(d) was implemented in GCTA to estimate h2 and δ2 via comparison of empirical genetic resemblance of unrelated individuals, based on identity-by-state when all genome-wide common SNPs are fitted as random effects in a mixed linear model. One twin in each complete dizygotic pair (both same-sex and opposite-sex) was randomly removed from the 9,606 individuals with both genotypes and phenotypes available. The remaining 6,812 individuals were used in the first step of generating pair-wise additive genomic relationship matrix. Subsequently, one individual within each more distantly related pair (cut-off value > 0.025, corresponding to a relatedness between second and third cousins) was removed and dominant genomic relationship matrix was generated based on these 5,779 unrelated individuals (1,185 MZ, 2,316 SSDZ, and 2,278 OSDZ twins). For each phenotype, restricted maximum likelihood was used to estimate the variance explained by all SNPs. Both the SNP-based additive genetic variance, the so-called “chip heritability” (h2SNP), and dominant genetic variance (δ2SNP) was estimated.
Contact Frequency and Age at Separation within Twin Pairs
Self-reported intra-pair contact frequency (the frequency by which the twins in a pair said they met in person) and age at separation from the co-twin constitutes independent measures of degree of shared environment within twin pairs. Such measures were available from the SALT interview for more than 90% of the complete twin pairs. The replies to questions about how often the participants usually met with their co-twin were divided into four levels: (1) less than once a year, (2) on a yearly basis, (3) on a monthly basis, or (4) on a weekly basis; the measure is here used as a continuous variable. When answers were available from both twins, the within-pair average value was calculated and used for each pair. Violation of the equal environment assumption was tested by comparing means (t test) between MZ and DZ twins in contact frequency/age at separation. To test whether the degree of shared environment was related to within-pair trait similarity, the correlation between contact frequency/age at separation and absolute intra-pair difference in adjusted trait levels was examined among MZ twins. MZ intra-pair correlation stratified by level of shared environment was also estimated for each trait. For contact frequency, we divided all pairs into low (defined as ≤3, on the monthly basis) and high (defined as >3, more than monthly) groups, and for age at separation we divided the pairs by the median value.
Results
The classic twin model indicated contributions of dominant genetic effects (ADE was the best-fitting model) for 14 out of the 18 traits, with an average δ2twin of 0.25 (Table 1). In two of these traits (triglycerides and waist circumference), we also observed significant dominance in the SNP-based GREMLd model. Notably, the significant δ2SNP of waist circumference observed in the ARIC cohort from the paper of Zhu et al.8 was successfully replicated in our data (δ2SNP = 0.19, 95% CI 0.01–0.37, p = 0.01). The large estimate of δ2SNP observed for triglycerides was not seen in the corresponding δ2twin, possibly due to chance in the sampling and the fact that the GREML(d) and the twin-based models are independent methods since they rely on different contrasts. GREMLd also indicated contributions from dominance for six additional traits, but all were non-significant (Table S4), possibly because the sample size of unrelated individuals was too small for sufficient power.
Table 1.
Trait |
Twin-Based SEMa |
SNP-Based GREML(d)b |
h2SNP/ h2twin | h2SNP/ H2twinc | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rMZ | rDZ | Best M | h2twin | 95% CI | δ2twin | 95% CI | h2SNP | 95% CI | δ2SNP | 95% CI | |||
TC | 0.48 | 0.19 | ADE | 0.28 | (0.13,0.43) | 0.19 | (0.03,0.36) | 0.15 | (0.03,0.27) | 0.00 | (0.00,0.18) | 54% | 32% |
LDL | 0.46 | 0.18 | ADE | 0.23 | (0.08,0.38) | 0.24 | (0.07,0.41) | 0.16 | (0.04,0.28) | 0.00 | (0.00,0.18) | 70% | 34% |
Apolipoprotein B | 0.52 | 0.23 | ADE | 0.39 | (0.25,0.53) | 0.14 | (0.00,0.30) | 0.14 | (0.02,0.26) | 0.00 | (0.00,0.18) | 36% | 26% |
Triglyceride | 0.55 | 0.24 | ADE | 0.42 | (0.27,0.55) | 0.14 | (0.00,0.30) | 0.31 | (0.19,0.43) | 0.28 | (0.10,0.46)d | 74% | 55% |
C-reactive protein | 0.42 | 0.19 | ADE | 0.30 | (0.15,0.44) | 0.14 | (0.00,0.31) | 0.37 | (0.25,0.49) | 0.00 | (0.00,0.18) | 123% | 84% |
Glucose | 0.51 | 0.20 | ADE | 0.24 | (0.09,0.38) | 0.30 | (0.15,0.46) | 0.17 | (0.05,0.29) | 0.15 | (0.00,0.33) | 71% | 31% |
HbA1c | 0.69 | 0.28 | ADE | 0.37 | (0.24,0.51) | 0.35 | (0.21,0.49) | 0.20 | (0.08,0.32) | 0.00 | (0.00,0.18) | 54% | 28% |
Hemoglobin | 0.55 | 0.24 | ADE | 0.41 | (0.26,0.55) | 0.15 | (0.00,0.30)e | 0.21 | (0.09,0.33) | 0.00 | (0.00,0.18) | 51% | 38% |
Cystatin C | 0.57 | 0.26 | ADE | 0.42 | (0.28,0.56) | 0.18 | (0.03,0.34) | 0.27 | (0.15,0.39) | 0.05 | (0.00,0.23) | 64% | 45% |
Creatinine | 0.58 | 0.24 | ADE | 0.35 | (0.21,0.50) | 0.24 | (0.09,0.40) | 0.18 | (0.06,0.30) | 0.01 | (0.00,0.19) | 51% | 31% |
eGFR | 0.57 | 0.24 | ADE | 0.38 | (0.23,0.52) | 0.21 | (0.05,0.36) | 0.32 | (0.20,0.44) | 0.03 | (0.00,0.21) | 84% | 54% |
Body mass index | 0.68 | 0.24 | ADE | 0.28 | (0.13,0.42) | 0.41 | (0.26,0.56) | 0.21 | (0.09,0.33) | 0.02 | (0.00,0.20) | 75% | 30% |
Weight | 0.73 | 0.27 | ADE | 0.37 | (0.23,0.51) | 0.35 | (0.21,0.50) | 0.26 | (0.14,0.38) | 0.11 | (0.00,0.29) | 70% | 36% |
WC | 0.63 | 0.20 | ADE | 0.15 | (0.01,0.29) | 0.49 | (0.34,0.65) | 0.16 | (0.04,0.28) | 0.19 | (0.01,0.37)d | 107% | 25% |
ADE-Average | – | – | – | 0.33 | – | 0.25 | – | 0.22 | – | 0.06(0.24)d | – | 70% | 39% |
HDL | 0.67 | 0.31 | AE | 0.66 | (0.63,0.69) | – | – | 0.24 | (0.12,0.36) | 0.01 | (0.00,0.19) | 36% | – |
Apolipoprotein A1 | 0.65 | 0.34 | AE | 0.66 | (0.63,0.68) | – | – | 0.17 | (0.05,0.29) | 0.09 | (0.00,0.27) | 26% | – |
AE-Average | 0.66 | – | – | – | 0.21 | – | 0.05 | – | 31% | – | |||
Immunoglobulin A | 0.43 | 0.28 | ACE | 0.40 | (0.29,0.51) | 0.07f | (0.00,0.15) | 0.24 | (0.12,0.36) | 0.00 | (0.00,0.18) | 60% | – |
Height | 0.87 | 0.48 | ACE | 0.77 | (0.71,0.83) | 0.09f | (0.04,0.15) | 0.62 | (0.50,0.74) | 0.00 | (0.00,0.18) | 81% | – |
ACE-Average | – | – | – | 0.59 | – | 0.08f | – | 0.43 | – | 0.00 | – | – | – |
Abbreviations are as follows: TC, total cholesterol; WC, waist circumference; LDL and HDL, low- and high-density lipoproteins; HbA1c, glycosylated hemoglobin A1c; eGFR, estimated glomerular filtration rate (machine-based calculation from cystatin C); rMZ and rDZ, coefficients of intra-pair correlations within monozygotic and dizygotic twin pairs; Best M, the best-fitting model for each trait according to Akaike information criterion; h2twin and δ2twin, additive and dominant genetic variance estimated from twin model; 95% CI, 95% confidence interval; h2SNP and δ2SNP, additive and dominant genetic variation estimated from SNP model.
Estimates from classical twin-based structural equation model (SEM) including 3,870 twin pairs.
Estimates from directly genotyped SNPs of 5,779 unrelated individuals in genomic-relatedness-matrix restricted maximum likelihood [GREML(d)] method.
H2 indicates broad-sense heritability including both additive (h2) and dominant (δ2) genetic variance.
Value in parentheses equals the average of the two significant estimates.
Non-significant.
Shared environmental components estimated in ACE model.
In the classic twin analyses, the AE model was the best-fitting model for high-density lipoprotein and apolipoprotein A1, whereas the ACE model was the best-fitting model for immunoglobulin A and height (Table S5). For the h2SNP estimated from GREML, the means were very similar between the ADE- and AE-fitted traits, equal to 0.22 and 0.21, respectively. This was not supported by results from the twin model, in which the average h2twin of ADE-fitted traits was half as big as the average h2twin of AE-fitted traits.
The proportion of h2SNP/h2twin indicates how much of the A component estimated from the twin-based model is explained by the common SNP-based model. A large proportion, on average 70% of h2twin, was captured by h2SNP for ADE-fitted traits, whereas for the two AE-fitted traits, h2SNP explained only 31% of the h2twin, similar to the proportions generally reported in previous studies. If previous studies were underpowered to identify significant dominance components, they might instead have attributed it to the additive component in a more parsimonious AE model. This is similar to the finding for high-density lipoprotein in the current study; the pattern of the intra-pair correlations (rMZ > 2rDZ) indicated presence of dominance (Table S6), but we appear underpowered to declare it significant. If we mimic this situation and adopt AE models for all the D-influenced traits (i.e., let A be the sum of A and D components), the average value of h2SNP/H2twin would decrease to 39%. For the sake of completeness, we also performed a sex-limitation SEM that estimated the heritability by gender, but by doing so the power to identify dominant effects decreased and AE model became the best-fitted model for several traits (Tables S7 and S8). For creatinine and GFR, there are pronounced differences in variance components estimates between males and females, which is in agreement with a previous report from the same study base.12
Using self-reported measures of degree of shared environment with MZ co-twin, we found independent evidence for influences from shared environment for a subset of traits. Self-reported contact frequency and the number of years spent together before separation were both significantly higher for MZ than for SSDZ and OSDZ twins (Table S9). Together these results indicate violations of the equal environment assumption as a potential problem. In order to get a sense of the magnitude of bias such a violation might be associated with, we calculated the MZ intra-pair correlations stratified by level of shared environment in a high versus a low group (Table S10). Even though the levels of shared environments were considerably larger in the high group, 1.8 standard deviation (SD) for contact frequency and 1.4 SD for age at separation, the trait level similarity was influenced only modestly with rMZ estimates on average ∼0.05 larger in the high group.
Contact frequency was weakly albeit significantly related to absolute within MZ-pair difference in high-density lipoproteins, body mass index, weight, and waist circumference (Table 2). Similar results were obtained when using age at separation as the indicator of degree of shared environment, with significant correlations observed with absolute within MZ-pair difference in high-density lipoproteins, body mass index, and weight. These evaluations are restricted to MZ twins since the aim is to test the relation between degree of shared environment and intra-pair trait similarity un-confounded by genetic influences. In DZ pairs the genetic sharing will differ between pairs, which means that the correlations are not straightforward to interpret.
Table 2.
Trait |
Contact Frequency |
Separation Age |
||||
---|---|---|---|---|---|---|
r | npair | p | r | npair | p | |
Total cholesterol | 0.004 | 1,066 | 0.90 | 0.014 | 1,027 | 0.66 |
High-density lipoprotein | −0.081 | 1,066 | 0.01 | −0.064 | 1,027 | 0.04 |
Low-density lipoprotein | 0.017 | 1,044 | 0.59 | −0.010 | 1,007 | 0.74 |
Apolipoprotein A1 | −0.060 | 1,065 | 0.05 | −0.052 | 1,026 | 0.09 |
Apolipoprotein B | −0.010 | 1,065 | 0.75 | 0.031 | 1,026 | 0.32 |
Triglyceride | −0.047 | 1,066 | 0.13 | −0.026 | 1,027 | 0.40 |
C-reactive protein | −0.028 | 1,065 | 0.35 | 0.016 | 1,026 | 0.62 |
Glucose | −0.032 | 1,066 | 0.29 | −0.008 | 1,027 | 0.79 |
Glycosylated hemoglobin A1c | −0.052 | 1,065 | 0.09 | −0.036 | 1,026 | 0.25 |
Hemoglobin | 0.005 | 1,063 | 0.87 | −0.026 | 1,024 | 0.40 |
Cystatin C | −0.044 | 1,029 | 0.16 | 0.029 | 993 | 0.37 |
Creatinine | −0.035 | 1,029 | 0.26 | 0.019 | 993 | 0.55 |
Glomerular filtration rate | −0.031 | 1,029 | 0.32 | 0.009 | 993 | 0.79 |
Immunoglobulin A | 0.008 | 1,058 | 0.79 | 0.002 | 1,020 | 0.95 |
Body mass index | −0.071 | 1,061 | 0.02 | −0.089 | 1,022 | <0.01 |
Weight | −0.085 | 1,064 | 0.01 | −0.063 | 1,025 | 0.04 |
Waist circumference | −0.083 | 1,063 | 0.01 | −0.048 | 1,024 | 0.12 |
Height | −0.020 | 1,063 | 0.52 | −0.033 | 1,024 | 0.30 |
Abbreviations are as follows: r, spearman correlation coefficient for the correlation between absolute intra-pair difference and contact frequency/separation age for each trait; p, p values. Significant estimates are in italics.
Discussion
Our results from both the classic twin-based and the common SNP-based models lend support to a more prominent role of dominant genetic variation than previous studies generally have reported for similar traits. The large size of the study and advanced age of participants might be contributing factors of importance for this finding. The results also highlight the potential risk of systematic ignorance of deviances from pure additivity in smaller twin studies.
Since heritability by definition is population specific, comparisons of estimates obtained from twin-based and genome-wide common SNP-based models can be achieved reliably only when both types of analyses are performed within the same study base. With the large number of genotyped twin pairs available in TwinGene, it is well suited for such comparisons. We attribute the unusual detection of significant δ2twin for a majority of traits in our sample to the increased power to discriminate A from D that comes with the large sample of twins of older age. This view is supported by findings from other unusually large population-based studies, such as a recent Dutch twin-family study on blood biomarker levels and metabolic syndrome traits, which detected significant D effects that increased with age.13
Because twins represent only a small fraction of the population, the sample size in twin studies is usually limited and most previous studies have included fewer than 1,000 twin pairs, which might provide inadequate power to significantly declare contributions from variance components indicating deviance from additivity (i.e., C and D). Instead, in smaller studies the dominant genetic effects or shared environmental effects are typically attributed to additive genetics in more parsimonious AE models. In a recent very large meta-analysis of estimates from twin studies,14 the vast majority of investigated traits were reported to be consistent with a simple AE model in which all twin similarity was attributed to A, with the remaining variance explained by non-shared environmental factors. However, close to 50% of all reported joint rMZ and rDZ estimates actually showed a pattern of deviance supporting D (rMZ > 2rDZ). Still, D was handled as a part of A (A = D + A) in all such traits. This is similar to what we would observe if we would split the large TwinGene material into several smaller samples; we would then be unable to declare D significant (due to lack of power) and instead report AE as the best-fitting model, and consequently attribute D to the A component.
Twin studies represent the classic design to disentangle the genetic and environmental contributions to familial aggregation/correlation. The relative importance of genetic and environmental effects is estimated by decomposing the total variance into different components (A, C or D, and E). The decomposition relies on important assumptions: MZ share 100% while DZ share on average 50% of their inherited genome, twins within MZ and DZ pairs share the raising environment to the same extent (equal environment assumption), and there is no correlation or interaction between genes and environment.2 For traits in which the familial aggregation is solely due to additive genetic effects, the MZ correlation is expected to be exactly twice the DZ correlation (rMZ = 2rDZ). If there are additional influences from shared environmental effects (C), the additive pattern becomes distorted by making DZ more similar to MZ twins (i.e., rMZ < 2rDZ). When the deviance goes in the other direction (rMZ > 2rDZ), dominant genetic effects (D) are usually assumed to cause the distortion from pure additivity.
The inability to estimate C and D simultaneously inherent in the classic twin model means that whenever there is a deviance from a pure additive genetic model, the model will provide support to the existence of either shared environment or dominance deviance; however, significant contribution of one says nothing about absence or presence of the other. This means that contributions from both components might coexist but “mask” each other, so that the net effect appears as contribution from neither. Thus, when C exists simultaneously with D, it will tend to counterbalance and even outweigh deviance contribution from D. However, since the twins in our sample were relatively old (mean age was 65 years), contributions from shared environmental factors might have been attenuated, leaving D more prone to be observed.
Our data provide independent evidence for simultaneous contribution of C for some traits in which the SEM declare ADE to be the best-fitting model. We conclude this from the fact that there were significant negative correlation between self-reported contact frequency in MZ pairs and absolute within-pair trait difference for high-density lipoprotein and three weight-related traits (Table 2). Further, the observed relation was negative for 14 out of 18 traits, indicating that a general trend might be present also for other traits. Number of years spent together before separation showed similar relations to absolute within-pair trait difference in MZ twins. Twins staying together longer tend to display more similar trait values. The group-mean differences between MZ and DZ twins in degree of shared environment were 0.58 and 0.39 SD for contact frequency and age at separation, respectively. Thus, the core twin model assumption of equal shared environment between MZ and DZ appears to be violated. One potential consequence of such a violation is that the D component might become inflated in the twin model, while GREML(d) would stay unaffected. This difference could be argued one reason for the markedly larger D component as estimated in the twin-based SEM compared to the GREML(d). However, the relation between degree of shared environment and within-pair trait difference was weak: the strongest correlation found for contact frequency was for weight (−0.085), and for separation age it was BMI (−0.089) (Table 2). The weak relation was apparent also from comparisons of trait correlations in strata of the MZ twins sharing most and least environment (Table S10). The mean difference in level of sharing between the two groups is at least three times larger than the difference observed between MZ and DZ twins. Still, the correlations in trait levels were only very modestly different between the high and low group. Thus, we consider the bias potentially introduced by the violation of the equal environment assumption to be small, and thus not a prominent reason for the discrepancy between twin-based versus GREMLd-based estimation of D.
Another way to obtain independent evidence for contributions of C is to study non-biological relations such as adoptive or step relations. In a previous investigation on military conscription data of BMI at age of 18, significant correlations were observed also among non-biological (step- and adoptive) relatives.15 This indication of C was supported by significantly stronger correlation in maternal compared to paternal half-brothers, arguably reflecting that children most often follow mothers upon divorce in the studied population, or that mothers have a generally stronger impact on the relevant family environment (eating habits, food choices, etc.) as compared to fathers.16
It is clear that the MZ correlation coefficient provides per se an unbiased upper bound of the proportion of variance that genetics (both additive and non-additive) ultimately could explain, but in the contemporary literature there exist different opinions about what should be considered the relevant denominator in the concept of “missing heritability.” If it is the broad-sense heritability, an additive modeling of genotypes should not be expected to explain anything but the additive fraction (i.e., we would have to accept that a portion will remain inaccessible). On the other hand, we and others consider the relevant denominator to be the narrow-sense heritability, and the missing heritability to be the proportion of h2 that remains unexplained by SNPs, equal to 1 − (h2SNP/h2twin).
During the past decade, many explanations of the missing heritability phenomenon have been suggested. Some have focused on using larger numbers of common or rare variants to capture more of the functional genetic variance;17, 18 others have suggested that missing heritability is due to an overestimation of the additive genetic effects because cryptic contribution of epistatic interactions between loci (often termed “I”), creating something denoted “phantom heritability.”19, 20 Our results suggest a similar concept, the possibility that h2 tends to be overestimated if there is inadequate power to discriminate A and D components in the twin model. Letting the A component from a more parsimonious AE model represent the h2 will provide a value that will be closer to the broad-sense heritability (variation due to A plus variation due to D). There is also a possibility that epistatic interactions are captured differentially by the classic twin-based and the SNP-based GREML(d) heritability estimation and thereby might be responsible for some of the differences between the two models. However, a recent paper suggested that epistatic effects will contribute little to genetic variance in outbred populations.21
Even though additive genetics most probably constitutes the bulk of genetic influences to most complex traits, our results from both twin-based and SNP-based models lend support to a more prominent role of dominant genetic variation than what most earlier studies have indicated. We believe simultaneous contributions from both C and D might be a common situation for many traits. Extended twin-family study designs including more family members (e.g., parents, offspring, and non-twin siblings) might offer improved possibilities to verify the existence of D effects,22, 23 but such materials of adequate size are unfortunately exceptionally rare. Previous elegant simulation studies performed in extended twin-family structures24 lend support for the view we here present that simultaneous presence of C and D might be a common phenomenon despite the fact that classic twin studies rarely find evidence for either. We foresee a future development where integration of twin, family, and molecular-based methods allow more accurate quantification of additive and non-additive proportions of genetic influences, which in turn might help us to reclaim the remains of the missing heritability.
Acknowledgments
Karolinska Institutet financially supports the Swedish Twin Registry. The study was supported by Swedish Research Council, Swedish Heart-Lung Foundation, and China Scholarship Council.
Published: November 5, 2015
Footnotes
Supplemental Data include ten tables with details about samples, methods, descriptive statistics, and results and can be found with this article online at http://dx.doi.org/10.1016/j.ajhg.2015.10.004.
Web Resources
The URLs for data presented herein are as follows:
GCTA-GREML(d), http://cnsgenomics.com/software/gcta/download.html
OpenMx - Advanced Structural Equation Modeling, http://openmx.psyc.virginia.edu/
R statistical software, http://www.r-project.org/
Supplemental Data
References
- 1.Tenesa A., Haley C.S. The heritability of human disease: estimation, uses and abuses. Nat. Rev. Genet. 2013;14:139–149. doi: 10.1038/nrg3377. [DOI] [PubMed] [Google Scholar]
- 2.Neale M.C., Maes H.H.M. Kluwer Academic Publishers B.V; Dordrecht, Netherlands: 2004. Methodology for Genetic Studies of Twins and Families; pp. 2–7. [Google Scholar]
- 3.Visscher P.M., Medland S.E., Ferreira M.A., Morley K.I., Zhu G., Cornes B.K., Montgomery G.W., Martin N.G. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2006;2:e41. doi: 10.1371/journal.pgen.0020041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yang J., Lee S.H., Goddard M.E., Visscher P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.van Dongen J., Slagboom P.E., Draisma H.H., Martin N.G., Boomsma D.I. The continuing value of twin studies in the omics era. Nat. Rev. Genet. 2012;13:640–653. doi: 10.1038/nrg3243. [DOI] [PubMed] [Google Scholar]
- 6.Maher B. Personal genomes: The case of the missing heritability. Nature. 2008;456:18–21. doi: 10.1038/456018a. [DOI] [PubMed] [Google Scholar]
- 7.Yang J., Benyamin B., McEvoy B.P., Gordon S., Henders A.K., Nyholt D.R., Madden P.A., Heath A.C., Martin N.G., Montgomery G.W. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhu Z., Bakshi A., Vinkhuyzen A.A., Hemani G., Lee S.H., Nolte I.M., van Vliet-Ostaptchouk J.V., Snieder H., Esko T., Milani L., LifeLines Cohort Study Dominance genetic variation contributes little to the missing heritability for human complex traits. Am. J. Hum. Genet. 2015;96:377–385. doi: 10.1016/j.ajhg.2015.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Magnusson P.K., Almqvist C., Rahman I., Ganna A., Viktorin A., Walum H., Halldner L., Lundström S., Ullén F., Långström N. The Swedish Twin Registry: establishment of a biobank and other recent developments. Twin Res. Hum. Genet. 2013;16:317–329. doi: 10.1017/thg.2012.104. [DOI] [PubMed] [Google Scholar]
- 10.DeLeeuw J. Springer; New York: 1992. Breakthroughs in Statistics: Introduction to Akaike (1973) Information Theory and an Extension of the Maximum Likelihood Principle; pp. 599–609. [Google Scholar]
- 11.Neale M.C., Hunter M.D., Pritikin J.N., Zahery M., Brick T.R., Kirkpatrick R.M., Estabrook R., Bates T.C., Maes H.H., Boker S.M. OpenMx 2.0: extended structural equation and statistical modeling. Psychometrika. 2015 doi: 10.1007/s11336-014-9435-8. Published online January 27, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Arpegård J., Viktorin A., Chang Z., de Faire U., Magnusson P.K., Svensson P. Comparison of heritability of cystatin C- and creatinine-based estimates of kidney function and their relation to heritability of cardiovascular disease. J. Am. Heart Assoc. 2015;4:e001467. doi: 10.1161/JAHA.114.001467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.van Dongen J., Willemsen G., Chen W.M., de Geus E.J., Boomsma D.I. Heritability of metabolic syndrome traits in a large population-based sample. J. Lipid Res. 2013;54:2914–2923. doi: 10.1194/jlr.P041673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Polderman T.J., Benyamin B., de Leeuw C.A., Sullivan P.F., van Bochoven A., Visscher P.M., Posthuma D. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat. Genet. 2015;47:702–709. doi: 10.1038/ng.3285. [DOI] [PubMed] [Google Scholar]
- 15.Clément K., Sørensen T.I.A. Informa Healthcare; New York: 2007. Obesity: Genomics and Postgenomics; pp. 19–37. [Google Scholar]
- 16.Magnusson P.K., Rasmussen F. Familial resemblance of body mass index and familial risk of high and low body mass index. A study of young men in Sweden. Int. J. Obes. Relat. Metab. Disord. 2002;26:1225–1231. doi: 10.1038/sj.ijo.0802041. [DOI] [PubMed] [Google Scholar]
- 17.Zuk O., Schaffner S.F., Samocha K., Do R., Hechter E., Kathiresan S., Daly M.J., Neale B.M., Sunyaev S.R., Lander E.S. Searching for missing heritability: designing rare variant association studies. Proc. Natl. Acad. Sci. USA. 2014;111:E455–E464. doi: 10.1073/pnas.1322563111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Golan D., Lander E.S., Rosset S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl. Acad. Sci. USA. 2014;111:E5272–E5281. doi: 10.1073/pnas.1419064111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zuk O., Hechter E., Sunyaev S.R., Lander E.S. The mystery of missing heritability: genetic interactions create phantom heritability. Proc. Natl. Acad. Sci. USA. 2012;109:1193–1198. doi: 10.1073/pnas.1119675109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ritchie M.D. Finding the epistasis needles in the genome-wide haystack. Methods Mol. Biol. 2015;1253:19–33. doi: 10.1007/978-1-4939-2155-3_2. [DOI] [PubMed] [Google Scholar]
- 21.Mäki-Tanila A., Hill W.G. Influence of gene interaction on complex trait variation with multilocus models. Genetics. 2014;198:355–367. doi: 10.1534/genetics.114.165282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Rettew D.C., Rebollo-Mesa I., Hudziak J.J., Willemsen G., Boomsma D.I. Non-additive and additive genetic effects on extraversion in 3314 Dutch adolescent twins and their parents. Behav. Genet. 2008;38:223–233. doi: 10.1007/s10519-008-9192-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Keller M.C., Coventry W.L., Heath A.C., Martin N.G. Widespread evidence for non-additive genetic variation in Cloninger’s and Eysenck’s personality dimensions using a twin plus sibling design. Behav. Genet. 2005;35:707–721. doi: 10.1007/s10519-005-6041-7. [DOI] [PubMed] [Google Scholar]
- 24.Keller M.C., Medland S.E., Duncan L.E. Are extended twin family designs worth the trouble? A comparison of the bias, precision, and accuracy of parameters estimated in four twin family models. Behav. Genet. 2010;40:377–393. doi: 10.1007/s10519-009-9320-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.