Estimating narrow-sense heritability using family data from admixed populations

Georgios Athanasiadis; Doug Speed; Mette K Andersen; Emil V R Appel; Niels Grarup; Ivan Brandslund; Marit Eika Jørgensen; Christina Viskum Lytken Larsen; Peter Bjerregaard; Torben Hansen; Anders Albrechtsen

doi:10.1038/s41437-020-0311-2

. 2020 Apr 9;124(6):751–762. doi: 10.1038/s41437-020-0311-2

Estimating narrow-sense heritability using family data from admixed populations

Georgios Athanasiadis ^1,^✉, Doug Speed ^2,³, Mette K Andersen ⁴, Emil V R Appel ⁴, Niels Grarup ⁴, Ivan Brandslund ⁵, Marit Eika Jørgensen ^6,^7,⁸, Christina Viskum Lytken Larsen ⁸, Peter Bjerregaard ⁸, Torben Hansen ⁴, Anders Albrechtsen ^1,^✉

PMCID: PMC7239878 PMID: 32273574

Abstract

Estimating total narrow-sense heritability in admixed populations remains an open question. In this work, we used extensive simulations to evaluate existing linear mixed-model frameworks for estimating total narrow-sense heritability in two population-based cohorts from Greenland, and compared the results with data from unadmixed individuals from Denmark. When our analysis focused on Greenlandic sib pairs, and under the assumption that shared environment among siblings has a negligible effect, the model with two relationship matrices, one capturing identity by descent and one capturing identity by state, returned heritability estimates close to the true simulated value, while using each of the two matrices alone led to downward biases. When phenotypes correlated with ancestry, heritability estimates were inflated. Based on these observations, we propose a PCA-based adjustment that recovers the true simulated heritability. We use this knowledge to estimate the heritability of ten quantitative traits from the two Greenlandic cohorts, and report differences such as lower heritability for height in Greenlanders compared with Europeans. In conclusion, narrow-sense heritability in admixed populations is best estimated when using a mixture of genetic relationship matrices on individuals with at least one first-degree relative included in the sample.

Subject terms: Genetics, Heritable quantitative trait

Introduction

Heritability is the fraction of phenotypic variance attributed to genetics. More specifically, assuming that the variance $σ_{P}^{2}$ of a phenotype equals the sum of its genetic $σ_{G}^{2}$ and environmental variance $σ_{e}^{2}$ , heritability in its broad sense (H²) is expressed by the ratio $σ_{G}^{2}$ / $σ_{P}^{2}$ (Fisher 1918; Wright 1921). The genetic variance $σ_{G}^{2}$ can be further broken down into its additive ( $σ_{A}^{2}$ ), dominant ( $σ_{D}^{2}$ ), and epistatic ( $σ_{I}^{2}$ ) components (Visscher et al. 2008; Zaitlen and Kraft 2012). Most of the existing literature focuses on the fraction of phenotypic variance $σ_{P}^{2}$ owing to additive effects alone ( $σ_{A}^{2}$ / $σ_{P}^{2}$ )—the so-called narrow-sense heritability (h²).

Heritability is interesting in its own right, but it is also pivotal in quantitative genetic studies with many practical uses. Because, by definition, heritability measures the contribution of genetics to a phenotype, it allows us to gain insights into the genetic architecture of a trait. Moreover, knowledge about the heritability of a trait helps us evaluate the effectiveness of a genome-wide association study (GWAS), as the so-called SNP heritability $h_{g}^{2}$ of a trait informs us about the maximum discovery potential of a given genotyping platform. Similarly, heritability estimates provide an upper bound to the accuracy of polygenic predictions with predictors potentially having higher performance in more heritable traits.

There exist many ways to estimate narrow-sense heritability h² and they usually boil down to estimating the variance owing to additive effects ( $σ_{A}^{2}$ ). Assuming an additive model, a classical approach is to use the phenotypic correlation between related individuals (Fisher 1918; Wright 1921; Yang et al. 2010), which for a pair j and k is

cor (y_{j}, y_{k}) = σ_{A}^{2} K_{{causal}_{[j, k]}} .

K_causal is an idealized genetic relationship matrix (GRM) that reflects genetic relationships between individuals at an unknown set of causal variants.

Because the set of causal variants is unknown, K_causal has been approximated by the expected relatedness in the pedigree matrix K_PED, which equals twice the kinship matrix Φ

K_{causal} \approx K_{PED} = E (2 Φ) .

The entries in the kinship matrix Φ are known as kinship coefficients. Kinship coefficient φ is the probability that a random allele from subject j is identical by descent (IBD) to an allele at the same locus from subject k. As an example, for a pair of full siblings j and k, this expected probability equals ¼ and therefore K_PED[j,k] = ½. For a design that focuses exclusively on sib pairs and assuming no dominance contribution, an estimate of the additive genetic variance ${\hat{σ}}_{A}^{2}$ is therefore two times the phenotypic correlation r_P of sib pairs

{\hat{σ}}_{A}^{2} = 2 r_{P [sibs]} .

When extended pedigrees are available, the entire K_PED can be leveraged in a linear mixed-model (LMM) framework. In this case, the phenotype vector y is modeled as

y ~ N (μ, σ_{A}^{2} K_{PED} + σ_{e}^{2} I)

and the genetic variance ${\hat{σ}}_{A}^{2}$ is estimated with restricted maximum likelihood (Shaw 1987; Blangero and Almasy 1997; Lange 2002; Kang et al. 2010).

When genetic data are also available, the total IBD fraction of the genome K_IBD[j,k], also referred to as $\hat{π}$ ( $\hat{π}$ = 2φ), can be estimated for j and k and used in the above LMM instead of its expected value from K_PED[j,k]. In other words, instead of being approximated with K_PED, K_causal is now approximated with K_IBD (Visscher et al. 2006, 2007).

The advent of SNP chips resulted in the use of thousands of markers in the computation of the GRM, thus allowing to estimate heritability in samples without pedigree information (Yang et al. 2010; Lee et al. 2011). In this case, assuming an N × M genotype matrix $\bar{G}$ with zero-column mean and unit-column variance, K_causal is approximated with an identity-by-state (IBS) genetic covariance matrix

K_{IBS} = \frac{1}{M} \bar{G} {\bar{G}}^{T} .

Given that the set of M-typed SNPs typically does not include all causal variants and/or it includes tag SNPs that are in imperfect linkage disequilibrium (LD) with said variants, the use of K_IBS can lead to an underestimation of the total heritability. For unrelated individuals, this estimate reflects only the proportion of phenotypic variance captured directly or indirectly by the typed SNPs—i.e., the so-called SNP heritability $h_{g}^{2}$ (note that SNP heritability $h_{g}^{2}$ is smaller than total heritability h² and is based on unrelated individuals). The GCTA software (Yang et al. 2011) uses K_IBS on unrelated individuals in order to estimate $h_{g}^{2}$ . It has been shown that $h_{g}^{2}$ can vary with minor allele frequency (MAF), LD, and genotype certainty (Speed et al. 2017). The LDAK software (Speed et al. 2012) can be used in order to accommodate these parameters in the computation of K_IBS.

More recently, it has been shown that K_IBD can be effectively substituted in the LMM by an IBS genetic covariance matrix K_IBS>t, in which all K_IBS entries below a threshold t are set to zero (Zaitlen et al. 2013). Moreover, the same authors introduced a method for the simultaneous estimation of SNP $h_{g}^{2}$ and total heritability h² by jointly fitting an LMM with K_IBD & K_IBS (or K_IBS>t instead of K_IBD). This way, the authors provided heritability estimates with narrower confidence intervals and showed that total heritability estimates under this approach were very similar to those under K_IBD (or K_IBS>t) alone.

In order for existing methods to produce meaningful heritability estimates, no population structure should be present in the studied samples (Zaitlen and Kraft 2012). Population structure can arise when individuals of different ancestry are found in the same sample and/or when individuals are admixed. Individuals from different populations tend to have different minor allele frequencies as well as different environmental exposures (Zaitlen and Kraft 2012). Because population structure correlates with environmental structure, it can inflate heritability estimates. It has been shown that, for SNP heritability ( $h_{g}^{2}$ ) estimates, inclusion of principal components (PCs) as fixed effects cannot fully account for the structure bias (Browning and Browning 2011). Moreover, the differences in ancestral allele frequencies affect the computation of K_IBS, which ceases to be proportional to K_IBD—even for close relationships. To date, there is not a clear strategy for estimating and interpreting heritability in structured/admixed populations.

Nevertheless, an LMM that uses a relationship matrix K_γ based on local ancestry, i.e., the genetic ancestry of an individual at a particular chromosomal location, instead of genotypes has been proposed (Zaitlen et al. 2014). The authors of this method produced accurate estimates of total heritability h² for several phenotypes in admixed African-American samples by fitting an LMM with K_γ and rescaling its regression coefficient accordingly. However, this method also has limitations, as it relies on accurate knowledge of local ancestry and assumes that samples are unrelated.

In light of the above, finding an efficient framework for total narrow-sense heritability estimates in admixed populations with high levels of relatedness remains an open question. Many experimental designs could benefit from a better understanding of how heritability estimates are affected by the simultaneous presence of population and family structure. In this work, we use extensive simulations to evaluate the performance of existing classical and LMM frameworks for estimating total (h²)—but also SNP ( $h_{g}^{2}$ )—heritability in two population-based cohorts from Greenland. The Greenlanders are a population isolate with many unique characteristics, such as small census size, high levels of relatedness, and extensive population structure with ancestry from both the Inuit and Europeans (Moltke et al. 2015; Pedersen et al. 2017). Our goal is to find a way to estimate and understand narrow-sense heritability in such populations, thus gaining valuable insights into the genetic architecture of complex traits in said populations.

Materials and methods

Samples

Greenlanders

The Greenlandic subjects (N = 4659) came from two general health surveys. The first survey (Bjerregaard et al. 2003) consisted of Greenlanders living in Denmark (the BHH cohort, N = 546), recruited during 1998–1999, as well as Greenlanders living in Greenland (the B99 cohort, N = 1328), recruited during 1999–2001 as part of a general population health survey. The second survey (Jørgensen et al. 2013) consisted of Greenlanders living in Greenland (the IHIT cohort, N = 2785), recruited during 2005–2010 as part of a population health survey.

Danes

The population-based Danish sample (N = 5470) was obtained from Inter99 (Glümer et al. 2004), a randomized intervention study collected at the Research Centre for Prevention and Health. In addition, 513 (N = 1169) Danish sib pairs were identified across several available Danish cohorts, namely the population-based cohorts (i) Inter99 (Jørgensen et al. 2003) (N = 294), (ii) Helbred2006 (Thuesen et al. 2014) (N = 121), (iii) Helbred2008 (Byberg et al. 2012) (N = 38), and (iv) Helbred2010 (Aadahl et al. 2014) (N = 57), all recruited from the Research Centre for Prevention and Health, Glostrup Hospital, Denmark, as well as cohorts collected for the study of type 2 diabetes, namely (v) Vejle Diabetes Biobank (Petersen et al. 2016) (N = 570), recruited at the Vejle Hospital, Denmark, (vi) the ADDITION study (Lauritzen et al. 2000) (N = 8), recruited at the Department of General Practice at the University of Aarhus, Denmark, and (vii) SDC (Andreasen et al. 2008) (N = 81) recruited at the outpatient clinic at Steno Diabetes Center, Denmark.

Genotyping and quality control

Both the Greenlandic and the unrelated Danish samples were typed on Illumina’s Cardio-MetaboChip (Illumina, San Diego, CA, USA). The Cardio-MetaboChip includes 196,725 SNPs selected from genetic studies of cardiovascular, metabolic, and anthropometric traits (Voight et al. 2012). Moreover, the unrelated Danish samples and the Danish sib pairs were typed on Illumina’s Infinium OmniExpress chip, which includes ~710,000 markers. Standard quality control was carried out separately on each dataset with PLINK v1.9 (Chang et al. 2015) and included filtering for per-individual (‑‑mind 0.01) and per-marker (--geno 0.01) genotype missingness = 1%. The datasets passing quality control consisted of (i) 4659 Greenlanders typed on 187,181 Cardio-MetaboChip autosomal SNPs, (ii) 5470 unrelated Danes typed on 186,639 Cardio-MetaboChip and 618,037 OmniExpress autosomal SNPs, and (iii) 1169 Danes forming sib pairs typed on 609,605 OmniExpress autosomal SNPs.

Phenotype simulations

We simulated 1000 quantitative phenotypes with true total narrow-sense heritability h² = {0.4, 0.6, 0.8} using real genetic data from 4659 Greenlanders and 6639 (5470 + 1169) Danes. Simulations were carried out separately on each dataset as follows:

First, we defined an N × C causal genotype matrix G_causal by sampling C = 1500 SNPs from a list of all available SNPs.

Second, we sampled SNP effects, represented by a C × 1 effect vector b, from a standard normal distribution N(0,1). In order to model the relationship between the effect size of SNP i and its allele frequency f_i, a genotype matrix G can be standardized according to the formula

{\bar{G}}_{j, i} = (G_{j, i} - 2 f_{i}) {[2 f_{i} (1 - f_{i})]}^{\frac{α}{2}},

(Speed et al. 2017). GCTA assumes a specific inverse relationship between SNP effect size and allele frequency by converting the original matrix G_causal into a zero-column mean and unit-column variance standard score matrix ${\bar{G}}_{causal}$ . This can be seen as a special case of the above standardizing formula for α = −1.

Third, we computed a vector of polygenic score S by multiplying ${\bar{G}}_{causal}$ by the corresponding effect vector b

S = {\bar{G}}_{causal} b,

and its additive genetic variance as

σ_{A}^{2} = var (S) .

Finally, we computed the phenotype vector P by adding an environmental vector ε of i.i.d. error terms to vector S

P = S + ε .

Error terms were sampled from the distribution $N (0, var (S) (\frac{1}{h^{2}} - 1))$ (Yang et al. 2011).

In some simulations involving the Greenlanders, we also modeled the interaction between environment and ancestry by adding an interaction vector E × Anc to the sum

P = S + E \times Anc + ε .

In particular, if $h_{E \times Anc}^{2}$ is the proportion of phenotypic variance explained by the interaction and θ_Inuit is the vector of proportion of Inuit ancestry, then

E \times Anc = θ_{Inuit} \sqrt{(\frac{h_{E \times Anc}^{2}}{h^{2}}) \frac{var (S)}{var (θ_{Inuit})}} .

As a consequence, the noise terms in ε are now sampled from $N (0, var (S) (\frac{1}{h^{2}} - 1) - var (E \times Anc))$ .

In other simulations involving the Greenlandic sib pairs, we added a vector E reflecting shared environment—i.e., the household effect (Almasy and Blangero 1998)—between siblings

P = S + E + ε .

In particular, we drew environmental effects from a normal distribution making sure to assign the same value to all individuals belonging to the same sibling cluster. As a consequence, the noise terms in ε are now sampled from $N (0, var (S) (\frac{1}{h^{2}} - 1) - var (E))$ .

Linear mixed model

In an LMM, the phenotype y is modeled as a mixture of fixed and random effects (i.e., the effects of the causal variants)

y = μ + {\bar{G}}_{causal} b + ε .

Assuming that b ~ N(0, $\frac{σ_{A}^{2}}{C}$ ) and ε ~ Ν(0, $σ_{e}^{2} I$ ) under the GCTA model with α = −1, y follows a multivariate normal distribution with mean μ and variance

\begin{matrix} var (y) & = & var (μ + {\bar{G}}_{causal} b + ε) \\ = & var ({\bar{G}}_{causal} b) + var (ε) \\ = & {\bar{G}}_{causal} var (b) {\bar{G}}_{causal}^{T} + var (ε) \\ = & {\bar{G}}_{causal} \frac{σ_{A}^{2}}{C} {\bar{G}}_{causal}^{T} + σ_{e}^{2} I \\ = & σ_{e}^{2} K_{causal} + σ_{e}^{2} I \end{matrix},

such that

y ~ N (μ, σ_{A}^{2} K_{causal} + σ_{e}^{2} I) .

Total narrow-sense heritability is then defined as

h^{2} = \frac{σ_{A}^{2}}{σ_{A}^{2} + σ_{e}^{2}} .

Because K_causal is unknown, we approximated it with other GRMs instead, such as K_IBD, K_IBS, and K_IBS>t.

Relationship matrices

We computed the K_IBD matrix for the entire Greenlandic sample from pairwise kinship coefficients ( ${\hat{π}}_{j, k}$ = 2φ_j,k = ½k_1,j,k + k_2,j,k) using RelateAdmix (Moltke and Albrechtsen 2014) or alternatively REAP (Thornton et al. 2012) on a genotype file with MAF cutoff = 0.01. We note that the total genomic IBD estimates are generally robust to the ascertainment scheme of the array used. We subsequently identified 1465 Greenlandic sib pairs by use of empirical thresholding over k₁ (IBD1, 0.3 < k₁ ≤ 0.7) and k₂ (IBD2, 0.1 < k₂ ≤ 0.5) on the RelateAdmix output. We then recomputed the K_IBD matrix for the identified sib pairs using RelateAdmix. K_IBD for the Danish sib pairs was computed with the PLINK --genome flag, using MAF cutoff = 0.01. For both the Greenlandic and Danish sib pairs, we also computed a K_IBD>t matrix, in which all entries below a threshold t = 0.05 were set to zero, and a $K_{IBD}^{0}$ matrix, in which all between-sib-pair values were set to zero.

K_IBS and K_IBS>t (t = 0.05) for both Greenlanders and Danes were computed with GCTA using a MAF cutoff = 0.01. Causal variants were removed from the computation of K_IBS and K_IBS>t. Because causal variants are selected from the entire list of available SNPs, we assume that they have an allele frequency distribution similar to the genotyped SNPs. We can explicitly control for this by adding --grm-adj 0 to the GCTA command line; however, this setting had no effect on our estimates and was dropped early (data not shown). For some heritability estimations, we also computed $K_{IBS}^{*}$ after removing not only the causal variants, but also all variants in LD with those (i.e., applying extreme LD pruning in the vicinity of causal variants). In addition, we estimated heritability by use of a $K_{IBS}^{c}$ matrix in which causal variants were included in the computation. Finally, household effects in Greenlandic sib pairs were captured by use of the K_HH matrix, whereby 1’s were assigned to all pairs of an individual with itself and its siblings and 0’s otherwise.

Heritability estimation

Additive genetic variance ${\hat{σ}}_{A}^{2}$ and, subsequently, total narrow-sense heritability h² were estimated for various GRMs with the GRM-based restricted maximum likelihood (GREML) procedure implemented in GCTA and in LDAK. For the sib pairs in particular, we carried out total narrow-sense heritability estimations using (i) the IBD-based matrices (K_IBD, K_IBD>t, and $K_{IBD}^{0}$ ) alone, (ii) the IBS-based matrices (K_IBS, $K_{IBS}^{*}$ , and $K_{IBS}^{c}$ ) alone, (iii) K_IBS together with K_IBD, K_IBD>t, $K_{IBD}^{0}$ , or K_IBS>t, and (iv) the classical sib-pair approach. When two relationship matrices were used, ${\hat{σ}}_{A}^{2}$ was equal to the sum of the two variance components corresponding to said matrices (Zaitlen et al. 2013). In the particular case of evaluating the household effect, its variance ${\hat{σ}}_{HH}^{2}$ was subtracted from the final estimation. We also estimated ${\hat{σ}}_{A}^{2}$ after adjusting for the first 5, 10, or 20 PCs, or a proportion of Inuit ancestry (where applicable). We note that SNP array ascertainment is not expected to affect total heritability estimates, as those are dependant on robust IBD measures. Conversely, SNP heritability estimates are sensitive to the ascertainment scheme of a given genotyping platform.

Analysis settings

We ran phenotype simulations and heritability estimates on four groups of Greenlanders: (i) all samples (N = 4659), (ii) sib pairs (N = 1688), (iii) more distantly related individuals (“cousins”; N = 2615), and (iv) unrelated individuals (N = 585), as well as two separate groups of Danes: (i) unrelated individuals (N = 5470), and (ii) sib pairs (N = 1169). In both populations, the $\hat{π}$ threshold for identifying unrelated individuals was 0.0625. Note that we did not merge the unrelated Danes with the Danish sib pairs because, unlike the Greenlanders, they do not come from the same population-based study. We estimated heritability on the above groups using different GRMs, without and with covariates (summarized in Table 1). For the sake of simplicity, we do not show results that are nonsensical (e.g., K_IBD for unrelated individuals).

Table 1.

Overview of the different approaches for estimating heritability from simulated data.

	Unrelated	Sib pairs	“Cousins”	All^a
LMM
K_IBD/K_IBD>t / $K_{IBD}^{0}$	N/A	h²	N/A	N/A
K_IBS/ $K_{IBS}^{*}$	$h_{g}^{2}$	h²	N/A	N/A
$K_{IBS}^{c}$	N/A	h²	N/A	N/A
K_IBD/K_IBD>t/ $K_{IBD}^{0}$ /K_IBS>t and K_IBS	N/A	h² and $h_{g}^{2}$	h² and $h_{g}^{2}$	h² and $h_{g}^{2}$
Classical sib-pair analysis	N/A	h²	N/A	N/A

Open in a new tab

The linear mixed models were run with and without covariates. Only the meaningful combinations of methods and sample groups were considered.

N/A not applicable or not tested.

^aAnalysis carried out in Greenlanders only.

Application to real data

We applied the best-performing model to real phenotypic data from the two population-based Greenlandic cohorts. All phenotypes considered were quantitative and consisted of basic anthropometric traits (height, weight, body mass index, hip circumference, waist circumference, and waist-to-hip ratio), as well as serum lipid levels (total cholesterol, HDL cholesterol, LDL cholesterol, and triglycerides). Data were rank-transformed to the quantiles of a standard normal distribution. Age and sex were included as covariates. We also carried out an empirical investigation of the impact of allele frequency and LD weighting—as defined in the LDAK model (Speed et al. 2017)—on heritability estimates. In particular, we estimated total narrow-sense heritability for the ten available traits assuming seven different genotype standardizations by setting LDAK’s parameter α = {−1.25, −1, −0.75, −0.5, −0.25, 0, 0.25}, and accounting for LD weighting. In this context, the model used by GCTA can be seen as a special case of the LDAK model by setting α = −1 and ignoring LD weighting.

Results

Admixture and relatedness in the Greenlandic and Danish data

Principal component analysis (PCA) and ADMIXTURE (Alexander et al. 2009) analysis of 4659 individuals showed that the general Greenlandic population is the result of admixture between Greenlandic Inuit and European populations, and that there is high variance in the admixture profiles of the Greenlanders (Fig. 1a) (Moltke et al. 2015; Pedersen et al. 2017). We note that assuming K = 2 ancestral components is a simplification of the admixture history of the Greenlanders as it has been previously shown that there are up to three distinct Inuit ancestral components with F_ST values as high as 0.04 (Moltke et al. 2015). Conversely, a sample of 5470 Danish individuals appeared largely unstructured (Fig. 1b), matching previous observations (Athanasiadis et al. 2016). We identified a large number of sib pairs in the Greenlandic sample (1465 pairs, N = 1688 individuals, Fig. S1A). We also confirmed the lack of relatedness in the 5470 unrelated Danish samples (Fig. S1B) and the presence thereof in the 513 Danish sib pairs (Fig. S1C).

Fig. 1 — a Principal component analysis of the entire Greenlandic sample (N = 4659). Individuals were colored according to their admixture proportions as estimated with ADMIXTURE assuming K = 2 ancestral components. Note that Greenlanders with a high proportion of Inuit ancestry (blue points) present further population structure along the second principal component. b Principal component analysis of the unrelated Danish sample (N = 5470). Due to persisting batch effects, the Danish data were pruned with PLINK (window size = 50; step size = 5; r² = 0.1) before the analysis.

Identity by state in the Greenlandic and Danish data

We illustrate the intrinsic differences of IBS between admixed and unadmixed populations by plotting the IBS-based genetic covariance against the IBD-based $\hat{π}$ estimates from the Greenlandic and Danish sib pairs, respectively (Fig. 2). For a given kinship (e.g., full siblings), the corresponding IBS values were far more dispersed in the Greenlanders (Fig. 2a) than in the Danes (Fig. 2b). This is due to the heterogeneous admixture profiles in the Greenlanders (Fig. 1a). In other words, whereas IBS is proportional to IBD in unadmixed individuals, this does not hold for the admixed individuals.

Fig. 2 — a Scatterplot of within-pair (blue) and between-pair (black) IBS-based genetic covariance against IBD-based $\hat{π}$ = 2φ estimates for the Greenlandic (MetaboChip) siblings. IBS was computed with GCTA and IBD was computed with RelateAdmix. b Scatterplot of within-pair (blue) and between-pair (black) IBS-based genetic covariance against IBD-based $\hat{π}$ = 2φ estimates for the Danish (OmniExpress) siblings. IBS was computed with GCTA and IBD was computed with PLINK. A negative IBS value denotes that two individuals are related to each other less than average. An y = x dotted line is shown in red.

Heritability estimates in phenotypes with no population-specific environmental effects

We explored a number of approaches for estimating narrow-sense heritability h² in the admixed Greenlandic and the unadmixed Danish population. We simulated quantitative traits by (i) randomly selecting 1500 causal loci with effect sizes depending on the allele frequency, such that the effect sizes of the standardized genotypes are normally distributed as assumed in the GCTA software, and (ii) adding environmental noise so that the true simulated h² was 0.4, 0.6, or 0.8. In these simulations, all individuals were set to having the same environmental variance regardless of their ancestry. We then estimated h² in an LMM framework for different GRMs—e.g., K_IBD, K_IBS, and K_IBS>t and combinations thereof (Fig. 3; Figs. S2 and S3; Supplementary file). In the following paragraphs, we first report the results based on one GRM, followed by the results based on two GRMs. As a reminder, for heritability estimates to be interpreted as “total” (h²), it is required that the sample includes related individuals.

Fig. 3 — a Mean total heritability estimates and 95% confidence intervals from 1 000 simulated phenotypes with true simulated heritability of 0.4 (peach), 0.6 (yellow) and 0.8 (green) in Greenlandic sib pairs. b Mean total heritability estimates and 95% confidence intervals in Danish sib pairs. Genotype scaling parameter ɑ was set to −1 (GCTA’s standard) for both phenotype simulation and heritability estimation. K_IBD: IBD-based GRM; K_IBD>t: IBD-based GRM in which all entries below t = 0.05 were set to zero; K_IBS: IBS-based GRM.

Total heritability estimates in sib pairs using one GRM

The use of K_IBD, which captures the fraction of the genome-shared IBD ( $\hat{π}$ ), resulted in underestimates of total heritability both in the Greenlandic and the Danish sib pairs (Fig. 3). Total heritability in the Greenlandic sib pairs was also underestimated when we used K_IBD>t (Fig. 3a) or $K_{IBD}^{0}$ (Fig. S2A). Conversely, using K_IBD>t or $K_{IBD}^{0}$ in the Danish sib pairs did not result in any significant downward biases (Fig. 3b; Fig. S2B). These results were insensitive to the method of IBD inference: both our method of choice (RelateAdmix) and an alternative (REAP) returned similar results (data not shown).

For closely related unadmixed individuals, IBS is proportional to IBD, and therefore estimates based on K_IBS will correspond to the total narrow-sense heritability as well (Hayes et al. 2009). Indeed, true simulated h² was fully recovered in the Danish sib pairs when K_IBS was used (Fig. 3b), but showed consistent downward biases across all simulated h² values in the Greenlandic sib pairs (Fig. 3a). Removing all SNPs with any LD with the causal variants (r² = 0) from the GRM (i.e., using the $K_{IBS}^{*}$ matrix) returned lower yet still comparable h² estimates in both the Greenlandic (Fig. S2A) and Danish sib pairs (Fig. S2B). Conversely, when causal variants were included in the computation of the GRM (i.e., using the $K_{IBS}^{c}$ matrix), then $K_{IBS}^{c} ≅$ K_causal and, consequently, h² was recovered (Fig. S4).

Total heritability estimates in sib pairs using two GRMs

The use of two GRMs (e.g., K_IBD & K_IBS) for heritability estimates is meant to leverage datasets in which both closely and more distantly related individuals are present (Zaitlen et al. 2013). However, when we applied this approach to the entire Greenlandic dataset (N = 4659), we observed a downward bias in total heritability estimates for the K_IBD & K_IBS model, while using K_IBS>t & K_IBS erroneously returned heritability estimates near 1.00 regardless of the true simulated value (Fig. S5). Nevertheless, when we performed the two-GRM analysis on the 1465 Greenlandic sib pairs alone, the true simulated h² was almost perfectly recovered for all GRM combinations, with estimates showing only a minor downward bias (Fig. 3a; Fig. S2A). Interestingly, these models outperform the classical sib-pair analysis as evidenced by their lower root-mean-square deviation (Table S1; Fig. S3). Extending the sample to include more distant relatives (“cousins”; $\hat{π}$ = [0.15, 0.67]; N = 2615) resulted in underestimates of the total heritability (Fig. S6), implying that the K_IBD & K_IBS model performs efficiently only on first-degree relatives when admixture is present. We therefore further examined the K_IBD & K_IBS model in the following section, focusing our attention on the sib pairs.

Total heritability estimates of phenotypes with shared environment between siblings

When we performed the two-GRM analysis of phenotypes that included the effect of shared environment on the 1465 Greenlandic sib pairs, the true simulated h² was inflated (squares in Fig. S7). The stronger the household effect, the higher the inflation. Notably, the inclusion of 10 PCs or a proportion of Inuit ancestry did not have any noticeable effect on the estimates (circles and triangles in Fig. S7). However, when we performed the K_IBD & K_IBS & K_HH analysis and subtracted the variance of the household effect, we were able to recover almost perfectly the total heritability estimates (diamonds in Fig. S7).

SNP heritability estimates in unrelated individuals

As previously mentioned, the use of K_IBS on unrelated individuals yields SNP ( $h_{g}^{2}$ ) rather than total heritability (h²) estimates (Yang et al. 2010). Bearing this in mind, we estimated $h_{g}^{2}$ in both unrelated Greenlanders and unrelated Danes (Fig. S8; Table S2). In all cases, we found that $h_{g}^{2}$ < h², as expected. For the Danish samples in particular, the MetaboChip $h_{g}^{2}$ was smaller than the OmniExpress $h_{g}^{2}$ .

Heritability estimates in phenotypes with population-specific environmental effects

In all of the above simulations, we assumed that the environmental component was independent of ancestry. However, when we added to the simulated phenotypes an environmental component correlating with ancestry, the use of K_IBS in the unrelated Greenlanders led to overestimates of SNP heritability $h_{g}^{2}$ , despite adjusting for population structure (Table S2).

Adjusting the K_IBD model for either the first 10 PCs or a proportion of Inuit ancestry produced a consistent yet uninterpretable pattern along the $\frac{h_{E \times Anc}^{2}}{h^{2}}$ ratio (Fig. S9). On the contrary, adjusting the K_IBD & K_IBS model for the same covariates produced a predictable as well as interpretable pattern across all choices for the $\frac{h_{E \times Anc}^{2}}{h^{2}}$ ratio (Fig. 4). In particular, estimates from the K_IBD & K_IBS model without covariates corresponded to the inflated quantity $\frac{σ_{A}^{2} + σ_{E \times Anc}^{2}}{σ_{A}^{2} + σ_{E \times Anc}^{2} + σ_{e}^{2}}$ (squares in Fig. 4). After adjustment for ancestry, the resulting estimates corresponded to removing the environmental interaction component ( $σ_{E \times Anc}^{2}$ ) from the numerator and denominator of the formula (i.e., $\frac{σ_{A}^{2}}{σ_{A}^{2} + σ_{e}^{2}}$ ; circle and triangle points in Fig. 4). We note that adjustment for 10 PCs was equivalent to adjustment for a proportion of Inuit ancestry (Fig. 4).

Fig. 4 — The relationship between the environment-by-ancestry (E × Anc) interaction and the true simulated heritability is quantified by the $h_{E \times Anc}^{2}$ /h² ratio on the x axis. The models with the covariates (circles and triangles) correspond to estimates of conditional total heritability after adjusting for the E × Anc effect (dotted line). Further rescaling of the conditional heritability estimates by 1 − $\frac{{\hat{σ}}_{E \times Anc}^{2}}{σ_{P}^{2}}$ returns the marginal simulated heritability (dashed line). $h_{E \times Anc}^{2}$ : proportion of variance captured by the E × Anc interaction; Inuit adm. :proportion of Inuit admixture. Adjustment for 5 or 20 first PCs returned virtually identical results (data not shown).

Thanks to the interpretability of the resulting “conditional” estimates, we were able to recover the true simulated heritability $\frac{σ_{A}^{2}}{σ_{A}^{2} + σ_{E \times Anc}^{2} + σ_{e}^{2}}$ – i.e., the “marginal” heritability (Weissbrod et al. 2018) (diamond points in Fig. 4). We achieved this by rescaling the conditional estimates by a factor of $1 - \frac{{\hat{σ}}_{E \times Anc}^{2}}{σ_{P}^{2}}$ , where ${\hat{σ}}_{E \times Anc}^{2}$ is an estimate of the environmental interaction variance computed as ${\hat{σ}}_{E \times Anc}^{2} = σ_{P}^{2} - σ_{\hat{P}}^{2}$ . $\hat{P}$ is the phenotype residuals after regressing out the effect of structure captured either by admixture proportions or the first two principal components.

Application to real phenotypes

We applied the best model (i.e., K_IBD & K_IBS & 10 PCs) and the follow-up PCA-based adjustment to ten quantitative traits in the 1465 Greenlandic sib pairs (Table 2). Not all phenotypes were equally sensitive to the PCA-based adjustment of their estimated conditional heritability, implying trait-specific environment-by-ancestry interactions. The GCTA model accommodates only one type of genotype standardization (α = −1), resulting in strong assumptions about the distribution of effect sizes. We therefore also used the LDAK model (Speed et al. 2017) and found that the optimal α value for genotype standardization varied across traits with most phenotypes supporting α ≥ −0.5 (Fig. S10; Table S3). Total heritability estimates under the LDAK model (Speed et al. 2017) were generally higher than under GCTA (Table 2; Table S3), with the greatest difference observed for height (0.657 ± 0.042 for GCTA against 0.786 ± 0.041 for LDAK). In addition, heritability estimates in eight out of ten real phenotypes were smaller in the Greenlanders than in their European or Mexican counterparts estimated with similar models.

Table 2.

Total narrow-sense heritability estimates for real phenotypes under the K_IBD & K_IBD & age & sex & 10 PCs model using (i) GCTA; (ii) GCTA followed byPCA-based rescaling; and (iii) LDAK with an optimal trait-specific parameter α and LD weighting, as well as comparison with (iv) published totalheritability estimates in Europeans and Mexicans.

	GCTA estimate (mean ± s.e.)	Rescaled GCTA point estimate	Best LDAK estimate (mean ± s.e.)	Estimate from the literature (mean ± s.e.)
Height	0.656 ± 0.042	0.611	0.786 ± 0.041	0.860 ± 0.117 (Visscher et al. 2007)
Weight	0.533 ± 0.044	0.504	0.536 ± 0.049	0.630 (Mamtani et al. 2014)
Body mass index	0.505 ± 0.045	0.494	0.470 ± 0.05	0.340 ± 0.120 (Vattikuti et al. 2012)
Hip circumference	0.494 ± 0.046	0.472	0.534 ± 0.05	0.680 (Mamtani et al. 2014)
Waist circumference	0.464 ± 0.047	0.461	0.479 ± 0.051	0.620 (Mamtani et al. 2014)
Waist-to-hip ratio	0.352 ± 0.050	0.348	0.385 ± 0.054	0.280 ± 0.120 (Vattikuti et al. 2012)
Total cholesterol	0.487 ± 0.047	0.475	0.523 ± 0.05	0.510 (van Dongen et al. 2013)
HDL cholesterol	0.453 ± 0.046	0.431	0.512 ± 0.049	0.480 ± 0.110 (Vattikuti et al. 2012)
LDL cholesterol	0.497 ± 0.050	0.496	0.534 ± 0.052	0.510 (van Dongen et al. 2013)
Triglycerides	0.312 ± 0.051	0.308	0.333 ± 0.054	0.470 ± 0.120 (Vattikuti et al. 2012)

Open in a new tab

Discussion

In this work, we explored the performance of existing methods for heritability estimates in the admixed Greenlandic population. Our goal was to propose a framework for unbiased heritability estimates in datasets where both population and family structure are notably present, as well as a way to interpret the resulting estimates. Even though the main focus is on total narrow-sense heritability (h²), we also report the results for SNP heritability ( $h_{g}^{2}$ ), a quantity that has gained a lot of attention in the past decade due to the availability of GWAS data (Yang et al. 2010; Lee et al. 2011; Browning and Browning 2011).

Through extensive simulations, we observed that all LMMs using one GRM led to downward biases in total heritability estimates when applied to family data from Greenland. Common choices of GRM, such as K_IBD and K_IBS, led to underestimates of total heritability in Greenlandic sib pairs, whereas no such biases were generally observed for the Danish sib pairs, indicating that inheriting DNA from different ancestral populations (i.e., admixture) exerts a biasing effect on both IBD- and IBS-based estimates. Even though this is not surprising for the IBS-based estimates, as the IBS ~ IBD assumption does not hold for the Greenlanders, it is not very clear why IBD-based heritability estimates are also affected by admixture. One possible explanation could be that IBD estimates become less accurate for more distantly related pairs, and therefore including them in the LMM introduces noise as evidenced by the underperformance of the full K_IBD matrix in the Danes.

We also observed that an LMM with two GRMs (K_IBD & K_IBS), a method designed to work on data with notable presence of family structure (Zaitlen et al. 2013), also led to downward biases in total heritability estimates when applied to the entire dataset from Greenland. However, when the same analysis focused on the Greenlandic sib pairs, it returned nearly unbiased heritability estimates. This could be due to the fact that, by restricting the analysis to the sib pairs, we controlled more efficiently for the noise that comes from between-sib-pair IBD estimates (Moltke and Albrechtsen 2014). The K_IBD & K_IBS model performs well under the assumption that shared environment among siblings has a negligible effect. We show that a nonzero household effect can potentially inflate total heritability estimates, but this effect can be accounted for with the inclusion of a shared environment matrix K_HH—at least in the simulation setting. We note that the K_IBD & K_IBS model outperformed the K_IBD model in the Danes, rendering more advisable the use of two GRMs in total narrow-sense heritability estimates in unadmixed populations too.

When there is no environmental correlation with ancestry, the K_IBD & K_IBS (or any other combination of one IBD- and one IBS-based GRM) model provides an accurate estimate of the true heritability matched only by the classical sib-pair analysis. However, we expect environmental structure to exert an inflating effect on heritability estimates due to its correlation with genetic structure. We found that adjusting for structure did not remove the inflation. Nevertheless, we provide a way to interpret the resulting total heritability estimates from the K_IBD & K_IBS & 10 PC models, as well as a way to adjust for the inflation. In particular, this inflated quantity is referred to as “conditional heritability” in a recent paper (Weissbrod et al. 2018), after adjusting for model covariates like in our case. We observed that, under the K_IBD & K_IBS & 10 PC models, the resulting conditional heritability estimate will be inflated by a factor of 1/(1 − $h_{E \times Anc}^{2}$ ), and we propose an adjustment that accounts efficiently for this inflation in order to retrieve the “marginal heritability” (Weissbrod et al. 2018). Finally, we note that the classical sib-pair approach will also produce inflated estimates when there is interaction with the environment, and that adjustment for PCs will not fix the issue.

As for the total narrow-sense heritability estimates of the real phenotypes obtained with the best model (K_IBD & K_IBS & 10 PCs), we observe that in some occasions, these are lower for the Greenlandic population than for European populations. A notable example is height, for which total heritability in the Greenlanders was estimated to be 0.656 ± 0.042 (0.611 after the PCA-based adjustment), whereas in unadmixed Europeans it was estimated at 0.860 (Visscher et al. 2007). We believe that this could be due to the reduced genetic diversity observed in the Greenlanders as a consequence of their particular population history, which included an extreme and prolonged bottleneck in recent times (Moltke et al. 2015; Pedersen et al. 2017), even though we did observe a notable increase when LD weighting was included in the estimation model according to LDAK (Speed et al. 2017).

Finally, our SNP heritability ( $h_{g}^{2}$ ) estimates in unrelated Greenlanders could be inflated due to genetic structure as reported previously (Browning and Browning 2011), even though we could not assess the level of inflation. In any case, SNP heritability estimates in the Greenlanders should be interpreted with caution because, as we saw, IBS measures are affected by admixture that can lead to artificially increased levels of LD between causal and typed markers.

It is important to note that this work does not solve all problems of heritability estimates in admixed populations. Our work should be viewed as a first attempt to explore the problem, and therefore the insights and solutions we provide here might not apply in all cases. Additional work is warranted in order to, e.g., model more accurately complex patterns of environmental stratification—similar to the household effect (Almasy and Blangero 1998)—and exposure of the same genetic ancestry to different environmental backgrounds. In addition, even though there are multiple methods for improving heritability estimates using, for example, LD score regression or partitioning SNPs according to allele frequencies (Gazal et al. 2017; Evans et al. 2018), we have not explored them here as they are harder to implement in admixed populations, where LD patterns and allele frequencies can be misspecified.

In summary, we advise against the use of K_IBD or K_IBS alone for total narrow-sense heritability estimates in populations with substantial levels of population and family structure. Instead, K_IBD & K_IBS & 10 PCs on a subset with high relatedness (preferably sib pairs) are advisable, given that K_IBD can now be efficiently computed for admixed populations (Thornton et al. 2012; Moltke and Albrechtsen 2014), with the caveat that the method could be capturing sizeable levels of shared environment among siblings. In any case, the resulting conditional h² estimates should be viewed as potentially inflated by a factor that we estimated at 1/(1 − $h_{E \times Anc}^{2}$ ), and an additional PCA-based adjustment should be carried out in order to recover the marginal total heritability estimate.

Supplementary information

Supplementary File^{(38.1KB, xlsx)}

Supplementary figure legends^{(18.9KB, docx)}

Acknowledgements

We would like to thank the staff and participants of the IHIT, B99, and BBH cohorts facilitating this study. The staff and steering committees from Research Centre for Prevention and Health, Glostrup, Denmark, from the ADDITION-DK study, University of Aarhus, Denmark, from Vejle Diabetes Biobank, Vejle Hospital, Denmark, and from Steno Diabetes Center, Gentofte, Denmark are acknowledged for their contribution to collecting and characterizing the Danish cohorts.

Data availability

The Greenlandic MetaboChip data have been submitted to the European Genome-Phenome Archive (https://ega-archive.org/) under accession number EGAS00001002641.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics

The two Greenlandic surveys received ethics approval from the Commission for Scientific Research in Greenland (project 2011-13, ref. no. 2011-056978; project 2013-13, ref. no. 2013-090702), and the Danish studies were approved by the local ethics committees (protocol ref. no. H-3-2012-155). All studies were conducted in compliance with the Helsinki Declaration II, and all participants gave their written consent after being informed about the study orally and in writing.

Footnotes

Associate editor: Giorgio Bertorelle

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Georgios Athanasiadis, Email: yorgos.athanasiadis@gmail.com.

Anders Albrechtsen, Email: albrecht@binf.ku.dk.

Supplementary information

The online version of this article (10.1038/s41437-020-0311-2) contains supplementary material, which is available to authorized users.

References

Aadahl M, Linneberg A, Møller TC, Rosenørn S, Dunstan DW, Witte DR, et al. Motivational counseling to reduce sitting time: a community-based randomized controlled trial in adults. Am J Prev Med. 2014;47:576–586. doi: 10.1016/j.amepre.2014.06.020. [DOI] [PubMed] [Google Scholar]
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Almasy L, Blangero J. Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet. 1998;62:1198–1211. doi: 10.1086/301844. [DOI] [PMC free article] [PubMed] [Google Scholar]
Andreasen CH, Stender-Petersen KL, Mogensen MS, Torekov SS, Wegner L, Andersen G, et al. Low physical activity accentuates the effect of the FTO rs9939609 polymorphism on body fat accumulation. Diabetes. 2008;57:95–101. doi: 10.2337/db07-0910. [DOI] [PubMed] [Google Scholar]
Athanasiadis G, Cheng JY, Vilhjálmsson BJ, Jørgensen FG, Als TD, Le Hellard S, et al. Nationwide Genomic Study in Denmark Reveals Remarkable Population Homogeneity. Genetics. 2016;204:711–722. doi: 10.1534/genetics.116.189241. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bjerregaard P, Curtis T, Borch-Johnsen K, Mulvad G, Becker U, Andersen S, et al. Inuit health in Greenland: a population survey of life style and disease in Greenland and among Inuit living in Denmark. Int J Circumpolar Health. 2003;62(Suppl 1):3–79. doi: 10.3402/ijch.v62i0.18212. [DOI] [PubMed] [Google Scholar]
Blangero J, Almasy L. Multipoint oligogenic linkage analysis of quantitative traits. Genet Epidemiol. 1997;14:959–964. doi: 10.1002/(SICI)1098-2272(1997)14:6<959::AID-GEPI66>3.0.CO;2-K. [DOI] [PubMed] [Google Scholar]
Browning SR, Browning BL. Population structure can inflate SNP-based heritability estimates. Am J Hum Genet. 2011;89:191–193. doi: 10.1016/j.ajhg.2011.05.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
Byberg S, Hansen A-LS, Christensen DL, Vistisen D, Aadahl M, Linneberg A, et al. Sleep duration and sleep quality are associated differently with alterations of glucose homeostasis. Diabet Med. 2012;29:e354–e360. doi: 10.1111/j.1464-5491.2012.03711.x. [DOI] [PubMed] [Google Scholar]
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Evans LM, Tahmasbi R, Vrieze SI, Abecasis GR, Das S, Gazal S, et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat Genet. 2018;50:737–745. doi: 10.1038/s41588-018-0108-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fisher RA. The correlation between relatives on the supposition of Mendelian inheritance. Trans Roy Soc Edin. 1918;52:399–433. [Google Scholar]
Gazal S, Finucane HK, Furlotte NA, Loh P-R, Palamara PF, Liu X, et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nat Genet. 2017;49:1421–1427. doi: 10.1038/ng.3954. [DOI] [PMC free article] [PubMed] [Google Scholar]
Glümer C, Carstensen B, Sandbaek A, Lauritzen T, Jørgensen T, Borch-Johnsen K, et al. A Danish diabetes risk score for targeted screening: the Inter99 study. Diabetes Care. 2004;27:727–733. doi: 10.2337/diacare.27.3.727. [DOI] [PubMed] [Google Scholar]
Hayes BJ, Visscher PM, Goddard ME. Increased accuracy of artificial selection by using the realized relationship matrix. Genet Res. 2009;91:47–60. doi: 10.1017/S0016672308009981. [DOI] [PubMed] [Google Scholar]
Jørgensen ME, Borch-Johnsen K, Stolk R, Bjerregaard P. Fat distribution and glucose intolerance among Greenland Inuit. Diabetes Care. 2013;36:2988–2994. doi: 10.2337/dc12-2703. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jørgensen T, Borch-Johnsen K, Thomsen TF, Ibsen H, Glümer C, Pisinger C. A randomized non-pharmacological intervention study for prevention of ischaemic heart disease: baseline results Inter99. Eur J Cardiovasc Prev Rehabil. 2003;10:377–386. doi: 10.1097/01.hjr.0000096541.30533.82. [DOI] [PubMed] [Google Scholar]
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S-Y, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42:348–354. doi: 10.1038/ng.548. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lange K. Mathematical and statistical methods for genetic analysis. 2nd edn. New York: Springer-Verlag; 2002. [Google Scholar]
Lauritzen T, Griffin S, Borch-Johnsen K, Wareham NJ, Wolffenbuttel BH, Rutten G, et al. The ADDITION study: proposed trial of the cost-effectiveness of an intensive multifactorial intervention on morbidity and mortality among people with Type 2 diabetes detected by screening. Int J Obes Relat Metab Disord. 2000;24(Suppl 3):S6–S11. doi: 10.1038/sj.ijo.0801420. [DOI] [PubMed] [Google Scholar]
Lee SH, Wray NR, Goddard ME, Visscher PM. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 2011;88:294–305. doi: 10.1016/j.ajhg.2011.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mamtani M, Kulkarni H, Dyer TD, Almasy L, Mahaney MC, Duggirala R, et al. Waist circumference is genetically correlated with incident Type 2 diabetes in Mexican-American families. Diabet Med. 2014;31:31–35. doi: 10.1111/dme.12266. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moltke I, Albrechtsen A. RelateAdmix: a software tool for estimating relatedness between admixed individuals. Bioinformatics. 2014;30:1027–1028. doi: 10.1093/bioinformatics/btt652. [DOI] [PubMed] [Google Scholar]
Moltke I, Fumagalli M, Korneliussen TS, Crawford JE, Bjerregaard P, Jørgensen ME, et al. Uncovering the genetic history of the present-day Greenlandic population. Am J Hum Genet. 2015;96:54–69. doi: 10.1016/j.ajhg.2014.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pedersen C-ET, Lohmueller KE, Grarup N, Bjerregaard P, Hansen T, Siegismund HR, et al. The effect of an extreme and prolonged population bottleneck on patterns of deleterious variation: insights from the Greenlandic Inuit. Genetics. 2017;205:787–801. doi: 10.1534/genetics.116.193821. [DOI] [PMC free article] [PubMed] [Google Scholar]
Petersen ERB, Nielsen AA, Christensen H, Hansen T, Pedersen O, Christensen CK, et al. Vejle Diabetes Biobank—a resource for studies of the etiologies of diabetes and its comorbidities. Clin Epidemiol. 2016;8:393–413. doi: 10.2147/CLEP.S113419. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shaw RG. Maximum-likelihood approaches applied to quanti- tative genetics of natural populations. Evolution. 1987;41:812–826. doi: 10.1111/j.1558-5646.1987.tb05855.x. [DOI] [PubMed] [Google Scholar]
Speed D, Cai N, Consortium UCLEB, Johnson MR, Nejentsev S, Balding DJ. Reevaluation of SNP heritability in complex human traits. Nat Genet. 2017;49:986–992. doi: 10.1038/ng.3865. [DOI] [PMC free article] [PubMed] [Google Scholar]
Speed D, Hemani G, Johnson MR, Balding DJ. Improved heritability estimation from genome-wide SNPs. Am J Hum Genet. 2012;91:1011–1021. doi: 10.1016/j.ajhg.2012.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thornton T, Tang H, Hoffmann TJ, Ochs-Balcom HM, Caan BJ, Risch N. Estimating kinship in admixed populations. Am J Hum Genet. 2012;91:122–138. doi: 10.1016/j.ajhg.2012.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thuesen BH, Cerqueira C, Aadahl M, Ebstrup JF, Toft U, Thyssen JP, et al. Cohort profile: the Health2006 cohort, research centre for prevention and health. Int J Epidemiol. 2014;43:568–575. doi: 10.1093/ije/dyt009. [DOI] [PubMed] [Google Scholar]
van Dongen J, Willemsen G, Chen W-M, de Geus EJC, Boomsma DI. Heritability of metabolic syndrome traits in a large population-based sample. J Lipid Res. 2013;54:2914–2923. doi: 10.1194/jlr.P041673. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vattikuti S, Guo J, Chow CC. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet. 2012;8:e1002637. doi: 10.1371/journal.pgen.1002637. [DOI] [PMC free article] [PubMed] [Google Scholar]
Visscher PM, Hill WG, Wray NR. Heritability in the genomics era–concepts and misconceptions. Nat Rev Genet. 2008;9:255–266. doi: 10.1038/nrg2322. [DOI] [PubMed] [Google Scholar]
Visscher PM, Macgregor S, Benyamin B, Zhu G, Gordon S, Medland S, et al. Genome partitioning of genetic variation for height from 11,214 sibling pairs. Am J Hum Genet. 2007;81:1104–1110. doi: 10.1086/522934. [DOI] [PMC free article] [PubMed] [Google Scholar]
Visscher PM, Medland SE, Ferreira MAR, Morley KI, Zhu G, Cornes BK, et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2006;2:e41. doi: 10.1371/journal.pgen.0020041. [DOI] [PMC free article] [PubMed] [Google Scholar]
Voight BF, Kang HM, Ding J, Palmer CD, Sidore C, Chines PS, et al. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet. 2012;8:e1002793. doi: 10.1371/journal.pgen.1002793. [DOI] [PMC free article] [PubMed] [Google Scholar]
Weissbrod O, Flint J, Rosset S. Estimating SNP-based heritability and genetic correlation in case-control studies directly and with summary statistics. Am J Hum Genet. 2018;103:89–99. doi: 10.1016/j.ajhg.2018.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wright S. Systems of mating. I. the biometric relations between parent and offspring. Genetics. 1921;6:111–123. doi: 10.1093/genetics/6.2.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zaitlen N, Kraft P. Heritability in the genome-wide association era. Hum Genet. 2012;131:1655–1664. doi: 10.1007/s00439-012-1199-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zaitlen N, Kraft P, Patterson N, Pasaniuc B, Bhatia G, Pollack S, et al. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS Genet. 2013;9:e1003520. doi: 10.1371/journal.pgen.1003520. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zaitlen N, Pasaniuc B, Sankararaman S, Bhatia G, Zhang J, Gusev A, et al. Leveraging population admixture to characterize the heritability of complex traits. Nat Genet. 2014;46:1356–1362. doi: 10.1038/ng.3139. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File^{(38.1KB, xlsx)}

Supplementary figure legends^{(18.9KB, docx)}

Data Availability Statement

The Greenlandic MetaboChip data have been submitted to the European Genome-Phenome Archive (https://ega-archive.org/) under accession number EGAS00001002641.

[CR1] Aadahl M, Linneberg A, Møller TC, Rosenørn S, Dunstan DW, Witte DR, et al. Motivational counseling to reduce sitting time: a community-based randomized controlled trial in adults. Am J Prev Med. 2014;47:576–586. doi: 10.1016/j.amepre.2014.06.020. [DOI] [PubMed] [Google Scholar]

[CR2] Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] Almasy L, Blangero J. Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet. 1998;62:1198–1211. doi: 10.1086/301844. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] Andreasen CH, Stender-Petersen KL, Mogensen MS, Torekov SS, Wegner L, Andersen G, et al. Low physical activity accentuates the effect of the FTO rs9939609 polymorphism on body fat accumulation. Diabetes. 2008;57:95–101. doi: 10.2337/db07-0910. [DOI] [PubMed] [Google Scholar]

[CR5] Athanasiadis G, Cheng JY, Vilhjálmsson BJ, Jørgensen FG, Als TD, Le Hellard S, et al. Nationwide Genomic Study in Denmark Reveals Remarkable Population Homogeneity. Genetics. 2016;204:711–722. doi: 10.1534/genetics.116.189241. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] Bjerregaard P, Curtis T, Borch-Johnsen K, Mulvad G, Becker U, Andersen S, et al. Inuit health in Greenland: a population survey of life style and disease in Greenland and among Inuit living in Denmark. Int J Circumpolar Health. 2003;62(Suppl 1):3–79. doi: 10.3402/ijch.v62i0.18212. [DOI] [PubMed] [Google Scholar]

[CR7] Blangero J, Almasy L. Multipoint oligogenic linkage analysis of quantitative traits. Genet Epidemiol. 1997;14:959–964. doi: 10.1002/(SICI)1098-2272(1997)14:6<959::AID-GEPI66>3.0.CO;2-K. [DOI] [PubMed] [Google Scholar]

[CR8] Browning SR, Browning BL. Population structure can inflate SNP-based heritability estimates. Am J Hum Genet. 2011;89:191–193. doi: 10.1016/j.ajhg.2011.05.025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] Byberg S, Hansen A-LS, Christensen DL, Vistisen D, Aadahl M, Linneberg A, et al. Sleep duration and sleep quality are associated differently with alterations of glucose homeostasis. Diabet Med. 2012;29:e354–e360. doi: 10.1111/j.1464-5491.2012.03711.x. [DOI] [PubMed] [Google Scholar]

[CR10] Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] Evans LM, Tahmasbi R, Vrieze SI, Abecasis GR, Das S, Gazal S, et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat Genet. 2018;50:737–745. doi: 10.1038/s41588-018-0108-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] Fisher RA. The correlation between relatives on the supposition of Mendelian inheritance. Trans Roy Soc Edin. 1918;52:399–433. [Google Scholar]

[CR13] Gazal S, Finucane HK, Furlotte NA, Loh P-R, Palamara PF, Liu X, et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nat Genet. 2017;49:1421–1427. doi: 10.1038/ng.3954. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] Glümer C, Carstensen B, Sandbaek A, Lauritzen T, Jørgensen T, Borch-Johnsen K, et al. A Danish diabetes risk score for targeted screening: the Inter99 study. Diabetes Care. 2004;27:727–733. doi: 10.2337/diacare.27.3.727. [DOI] [PubMed] [Google Scholar]

[CR15] Hayes BJ, Visscher PM, Goddard ME. Increased accuracy of artificial selection by using the realized relationship matrix. Genet Res. 2009;91:47–60. doi: 10.1017/S0016672308009981. [DOI] [PubMed] [Google Scholar]

[CR16] Jørgensen ME, Borch-Johnsen K, Stolk R, Bjerregaard P. Fat distribution and glucose intolerance among Greenland Inuit. Diabetes Care. 2013;36:2988–2994. doi: 10.2337/dc12-2703. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] Jørgensen T, Borch-Johnsen K, Thomsen TF, Ibsen H, Glümer C, Pisinger C. A randomized non-pharmacological intervention study for prevention of ischaemic heart disease: baseline results Inter99. Eur J Cardiovasc Prev Rehabil. 2003;10:377–386. doi: 10.1097/01.hjr.0000096541.30533.82. [DOI] [PubMed] [Google Scholar]

[CR18] Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S-Y, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42:348–354. doi: 10.1038/ng.548. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] Lange K. Mathematical and statistical methods for genetic analysis. 2nd edn. New York: Springer-Verlag; 2002. [Google Scholar]

[CR20] Lauritzen T, Griffin S, Borch-Johnsen K, Wareham NJ, Wolffenbuttel BH, Rutten G, et al. The ADDITION study: proposed trial of the cost-effectiveness of an intensive multifactorial intervention on morbidity and mortality among people with Type 2 diabetes detected by screening. Int J Obes Relat Metab Disord. 2000;24(Suppl 3):S6–S11. doi: 10.1038/sj.ijo.0801420. [DOI] [PubMed] [Google Scholar]

[CR21] Lee SH, Wray NR, Goddard ME, Visscher PM. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 2011;88:294–305. doi: 10.1016/j.ajhg.2011.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] Mamtani M, Kulkarni H, Dyer TD, Almasy L, Mahaney MC, Duggirala R, et al. Waist circumference is genetically correlated with incident Type 2 diabetes in Mexican-American families. Diabet Med. 2014;31:31–35. doi: 10.1111/dme.12266. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] Moltke I, Albrechtsen A. RelateAdmix: a software tool for estimating relatedness between admixed individuals. Bioinformatics. 2014;30:1027–1028. doi: 10.1093/bioinformatics/btt652. [DOI] [PubMed] [Google Scholar]

[CR23] Moltke I, Fumagalli M, Korneliussen TS, Crawford JE, Bjerregaard P, Jørgensen ME, et al. Uncovering the genetic history of the present-day Greenlandic population. Am J Hum Genet. 2015;96:54–69. doi: 10.1016/j.ajhg.2014.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] Pedersen C-ET, Lohmueller KE, Grarup N, Bjerregaard P, Hansen T, Siegismund HR, et al. The effect of an extreme and prolonged population bottleneck on patterns of deleterious variation: insights from the Greenlandic Inuit. Genetics. 2017;205:787–801. doi: 10.1534/genetics.116.193821. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] Petersen ERB, Nielsen AA, Christensen H, Hansen T, Pedersen O, Christensen CK, et al. Vejle Diabetes Biobank—a resource for studies of the etiologies of diabetes and its comorbidities. Clin Epidemiol. 2016;8:393–413. doi: 10.2147/CLEP.S113419. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] Shaw RG. Maximum-likelihood approaches applied to quanti- tative genetics of natural populations. Evolution. 1987;41:812–826. doi: 10.1111/j.1558-5646.1987.tb05855.x. [DOI] [PubMed] [Google Scholar]

[CR27] Speed D, Cai N, Consortium UCLEB, Johnson MR, Nejentsev S, Balding DJ. Reevaluation of SNP heritability in complex human traits. Nat Genet. 2017;49:986–992. doi: 10.1038/ng.3865. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] Speed D, Hemani G, Johnson MR, Balding DJ. Improved heritability estimation from genome-wide SNPs. Am J Hum Genet. 2012;91:1011–1021. doi: 10.1016/j.ajhg.2012.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] Thornton T, Tang H, Hoffmann TJ, Ochs-Balcom HM, Caan BJ, Risch N. Estimating kinship in admixed populations. Am J Hum Genet. 2012;91:122–138. doi: 10.1016/j.ajhg.2012.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] Thuesen BH, Cerqueira C, Aadahl M, Ebstrup JF, Toft U, Thyssen JP, et al. Cohort profile: the Health2006 cohort, research centre for prevention and health. Int J Epidemiol. 2014;43:568–575. doi: 10.1093/ije/dyt009. [DOI] [PubMed] [Google Scholar]

[CR43] van Dongen J, Willemsen G, Chen W-M, de Geus EJC, Boomsma DI. Heritability of metabolic syndrome traits in a large population-based sample. J Lipid Res. 2013;54:2914–2923. doi: 10.1194/jlr.P041673. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] Vattikuti S, Guo J, Chow CC. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet. 2012;8:e1002637. doi: 10.1371/journal.pgen.1002637. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] Visscher PM, Hill WG, Wray NR. Heritability in the genomics era–concepts and misconceptions. Nat Rev Genet. 2008;9:255–266. doi: 10.1038/nrg2322. [DOI] [PubMed] [Google Scholar]

[CR32] Visscher PM, Macgregor S, Benyamin B, Zhu G, Gordon S, Medland S, et al. Genome partitioning of genetic variation for height from 11,214 sibling pairs. Am J Hum Genet. 2007;81:1104–1110. doi: 10.1086/522934. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] Visscher PM, Medland SE, Ferreira MAR, Morley KI, Zhu G, Cornes BK, et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2006;2:e41. doi: 10.1371/journal.pgen.0020041. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] Voight BF, Kang HM, Ding J, Palmer CD, Sidore C, Chines PS, et al. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet. 2012;8:e1002793. doi: 10.1371/journal.pgen.1002793. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] Weissbrod O, Flint J, Rosset S. Estimating SNP-based heritability and genetic correlation in case-control studies directly and with summary statistics. Am J Hum Genet. 2018;103:89–99. doi: 10.1016/j.ajhg.2018.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] Wright S. Systems of mating. I. the biometric relations between parent and offspring. Genetics. 1921;6:111–123. doi: 10.1093/genetics/6.2.111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] Zaitlen N, Kraft P. Heritability in the genome-wide association era. Hum Genet. 2012;131:1655–1664. doi: 10.1007/s00439-012-1199-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] Zaitlen N, Kraft P, Patterson N, Pasaniuc B, Bhatia G, Pollack S, et al. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS Genet. 2013;9:e1003520. doi: 10.1371/journal.pgen.1003520. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] Zaitlen N, Pasaniuc B, Sankararaman S, Bhatia G, Zhang J, Gusev A, et al. Leveraging population admixture to characterize the heritability of complex traits. Nat Genet. 2014;46:1356–1362. doi: 10.1038/ng.3139. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Estimating narrow-sense heritability using family data from admixed populations

Georgios Athanasiadis

Doug Speed

Mette K Andersen

Emil V R Appel

Niels Grarup

Ivan Brandslund

Marit Eika Jørgensen

Christina Viskum Lytken Larsen

Peter Bjerregaard

Torben Hansen

Anders Albrechtsen

Abstract

Introduction

Materials and methods

Samples

Greenlanders

Danes

Genotyping and quality control

Phenotype simulations

Linear mixed model

Relationship matrices

Heritability estimation

Analysis settings

Table 1.

Application to real data

Results

Admixture and relatedness in the Greenlandic and Danish data

Fig. 1. Population structure of the Greenlandic and Danish cohorts.

Identity by state in the Greenlandic and Danish data

Fig. 2. Identity by state and identity by descent in the Greenlandic and Danish sib pairs.

Heritability estimates in phenotypes with no population-specific environmental effects

Fig. 3. Heritability estimates of simulated phenotypes without ancestry-specific environmental interactions.

Total heritability estimates in sib pairs using one GRM

Total heritability estimates in sib pairs using two GRMs

Total heritability estimates of phenotypes with shared environment between siblings

SNP heritability estimates in unrelated individuals

Heritability estimates in phenotypes with population-specific environmental effects

Application to real phenotypes

Table 2.

Discussion

Supplementary information

Acknowledgements

Data availability

Compliance with ethical standards

Conflict of interest

Ethics

Footnotes

Contributor Information

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases