Estimating heritability explained by local ancestry and evaluating stratification bias in admixture mapping from summary statistics

Tsz Fung Chan; Xinyue Rui; David V Conti; Myriam Fornage; Mariaelisa Graff; Jeffrey Haessler; Christopher Haiman; Heather M Highland; Su Yon Jung; Eimear E Kenny; Charles Kooperberg; Loic Le Marchand; Kari E North; Ran Tao; Genevieve Wojcik; Christopher R Gignoux; PAGE Consortium; Charleston WK Chiang; Nicholas Mancuso

doi:10.1016/j.ajhg.2023.09.012

. 2023 Oct 23;110(11):1853–1862. doi: 10.1016/j.ajhg.2023.09.012

Estimating heritability explained by local ancestry and evaluating stratification bias in admixture mapping from summary statistics

Tsz Fung Chan ¹, Xinyue Rui ¹, David V Conti ¹, Myriam Fornage ², Mariaelisa Graff ³, Jeffrey Haessler ⁴, Christopher Haiman ¹, Heather M Highland ³, Su Yon Jung ⁵, Eimear E Kenny ⁶, Charles Kooperberg ⁴, Loic Le Marchand ⁷, Kari E North ³, Ran Tao ^8,⁹, Genevieve Wojcik ¹⁰, Christopher R Gignoux ¹¹; PAGE Consortium, Charleston WK Chiang ^1,^12,¹³, Nicholas Mancuso ^1,^12,^13,^∗

PMCID: PMC10645552 PMID: 37875120

Summary

The heritability explained by local ancestry markers in an admixed population ( $h_{γ}^{2}$ ) provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of $h_{γ}^{2}$ can be susceptible to biases due to population structure in ancestral populations. Here, we present heritability estimation from admixture mapping summary statistics (HAMSTA), an approach that uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA $h_{γ}^{2}$ estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ∼5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe ${\hat{h}}_{γ}^{2}$ in the 20 phenotypes range from 0.0025 to 0.033 (mean ${\hat{h}}_{γ}^{2}$ = 0.012 ± 9.2 × 10⁻⁴), which translates to ${\hat{h}}^{2}$ ranging from 0.062 to 0.85 (mean ${\hat{h}}^{2}$ = 0.30 ± 0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 ± 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.

Keywords: admixture mapping, population structure, heritability, summary statistics, genetic admixture, local ancestry, family-wise error rate, genome-wide association

This study reports a method to estimate heritability explained by local ancestry using admixture mapping summary statistics and evaluates potential biases in admixture mapping. The method provides a strategy for summary statistic-based heritability estimation in admixed populations and controlling false positives due to ancestral population stratification in admixture mapping studies.

Introduction

Admixture mapping (AM) aims to identify genomic regions associated with a disease or quantitative trait in recently admixed populations¹^,²^,³^,⁴^,⁵^,⁶^,⁷ by leveraging the differences in allele frequencies between local ancestries.⁸ AM provides a powerful approach to complement genome-wide association studies (GWASs) in admixed populations due to local ancestry information better tagging uncommon or poorly imputed causal variants⁵ and spanning larger genomic regions, thus reducing the multiple testing burden,⁹ enabling discoveries with relatively smaller sample sizes.³^,¹⁰ Similarly, recent work¹¹ demonstrated that local ancestry information, which is summarized by heritability explained by local ancestry $h_{γ}^{2}$ , can be leveraged to estimate narrow-sense heritability h² in admixed populations, unlike the genotype-based lower bounds (i.e., $h_{g}^{2}$ ). Multiple works have shown that population structure can bias association tests and estimates of $h_{g}^{2}$ .¹²^,¹³ However, it is less understood how similar demographic phenomena bias AM and $h_{γ}^{2}$ inference in admixed populations.

Admixed populations are typically modeled as a mixture of multiple continental ancestries (e.g., African, European, or Native American) with finer-scale structure within ancestral populations left unmodeled. Nevertheless, human populations are often structured across both space and time. For example, European ancestry individuals can be modeled as a mixture of at least three ancient populations,¹⁴ and Native American ancestry components found in Latinos can also be derived across multiple subpopulations spread across Latin America.¹⁵ This unmodeled fine-scale structure could lead to potential biases in downstream association testing. Indeed, this phenomenon has been demonstrated in European populations,¹⁶^,¹⁷ and could similarly impact inference in admixed populations when it is not fully accounted for.¹⁸ When estimating ${h_{g}}^{2}$ using SNP data of large sample size, a robust approach to population stratification is to estimate h² and test statistic inflation simultaneously.¹⁹ Examples of this approach include linkage disequilibrium score regression (LDSC)¹³ and cov-LDSC.¹² While these methods are designed for SNP data, it remains unclear how applicable they are on estimating $h_{γ}^{2}$ using summary statistics from admixture mapping studies.

In this study we propose HAMSTA (heritability estimation from admixture mapping summary statistics), a likelihood-based approach to infer $h_{γ}^{2}$ from admixture mapping summary statistics. To achieve robust and efficient computation, HAMSTA transforms the correlated test statistics using a truncated singular value decomposition (tSVD) and performs maximum-likelihood inference while accounting for residual inflation due to stratification within ancestral populations. We perform extensive simulations and demonstrate that HAMSTA provides approximately unbiased estimates of $h_{γ}^{2}$ and outperforms existing approaches to detect evidence of stratification bias. We demonstrate that estimates from HAMSTA can be leveraged to efficiently compute well-calibrated family-wise error rates for admixture mapping, particularly in the presence of ancestral stratification which previous approaches do not consider.²⁰ Next, we apply HAMSTA to admixture mapping summary statistics for 20 traits from 15,988 self-identified African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study.²¹ We find the h² estimates of 0.85 (standard error: 0.085) and 0.42 (SE: 0.086) for height and BMI, respectively. Compared with LDSC on admixture mapping summary statistics, HAMSTA offers more precise estimates for $h_{γ}^{2}$ and better quantifies the inflation in the test statistics due to unknown confounding biases. Overall, we demonstrate that HAMSTA provides a fast and powerful way to estimate genome-wide heritability that controls biases using summary statistics from admixture mapping studies.

Material and methods

Model for complex trait and ancestral stratification

We consider a two-way admixed population, with ancestral populations pop1 and pop2, the last of which is recently structured into pop2a and pop2b (Figure S1). This demographic model mimics the African and European admixture in African American and the finer-scale structure in their ancestral European population. We let α, δ, and −δ denote the population mean phenotype values of pop1, pop2a, and pop2b. We denote A_i,k as the centered and standardized local ancestry calls for individual i at marker k, such that its sample mean is zero and sample variance is 1. We denote indexing over n individuals at the k^th marker as A_.,k and index over M markers for the i^th individual as A_i,.. We define the phenotype y_iof an admixed individual i as

y_{i} = A_{i, .} β + π_{i} α + d_{i} δ + ϵ_{i},

where β is the M × 1 vector of local ancestry effects, $π_{i}$ is defined as the global ancestry proportion due to pop1, $d_{i} = {π_{i}}^{(2 a)} - {π_{i}}^{(2 b)}$ is the difference between the global ancestry proportions of pop2a and pop2b, and $ϵ_{i} \sim N (0, σ_{ϵ}^{2})$ is residual environmental noise. Furthermore, we assume that $β_{k} \sim N (0, \frac{h_{γ}^{2}}{M})$ , where $h_{γ}^{2}$ is defined as the heritability explained by local ancestry.¹¹ Lastly, we define $\frac{α^{2}}{n} π^{'} π$ as the phenotypic variance explained (PVE) by global ancestry and $\frac{δ^{2}}{n} d^{'} d$ as PVE by ancestral stratification.

Test statistics for admixture mapping

We model the marginal association statistics from an admixture mapping study where only global ancestry proportions $π_{i}$ (and not d_i) are estimated beforehand. If the stratification term is not adjusted, the test statistics for marker k will be $Z_{k} = {s_{R}}^{- 1} {({A_{., k}}^{'} P A_{., k})}^{- 1 / 2} ({A_{., k}}^{'} P y)$ , where ${s_{R}}^{2}$ is the residual variance after the global ancestry $π$ is projected out by matrix $P = I - π {(π^{'} π)}^{- 1} π^{'}$ . Extending this to all M markers we have, $Z = {s_{R}}^{- 1} D^{- 1 / 2} (A^{'} P y)$ , where D is the diagonal elements of $A^{'} P A$ . Given this and distributional assumptions regarding y, we can derive the expectation and covariance of Z as

E [Z] = {s_{R}}^{- 1} D^{- 1 / 2} A^{'} P d δ

C o v [Z] = {s_{R}}^{- 2} [\frac{h_{γ}^{2}}{M} D^{- \frac{1}{2}} {(A^{'} P A)}^{2} D^{- \frac{1}{2}} + D^{- \frac{1}{2}} (A^{'} P A) D^{- \frac{1}{2}} σ_{ϵ}^{2}] .

The $D^{- 1 / 2} (A^{'} P A) D^{- 1 / 2}$ is local ancestry disequilibrium (LAD) matrix analogous to the LD matrix. When sample size n is large, the test statistics Z are well approximated by a multivariate normal distribution. The mean reflects the bias due to correlation between local ancestry and ancestral stratification conditional on the global ancestry. In the covariance, the first term is related to the heritability explained by local ancestry and LAD score matrix. The second term in the covariance is related to LAD matrix and nongenetic effects. In the null scenario, where $h_{γ}^{2} = 0$ , $δ = 0$ , the distribution of Z has means of zeros and covariances simply equal to the LAD matrix.

We then use singular value decomposition (SVD) to decorrelate the association statistics. We let the SVD of $A' P = U S V'$ , $A' P A = U S^{2} U'$ , and ${(A^{'} P A)}^{2} = U S^{4} U^{'}$ . We define $Z^{*} = S^{- 1} s_{R} U^{'} D^{1 / 2} Z$ , which follows $Z^{*} \sim N (V^{'} d δ, \frac{h_{γ}^{2}}{M} S^{2} + I σ_{ϵ}^{2})$ , where the components are uncorrelated. Since the mean of Z^∗ reflects the bias in association statistics induced by the unknown difference in sub-continental ancestries, we then assume $V^{'} d δ$ to be random and follow a normal distribution N(0, C^∗) such that $Z^{*} \sim N (0, \frac{h_{γ}^{2}}{M} S^{2} + (I σ_{ϵ}^{2} + C^{*}))$ . The parameters $h_{γ}^{2}$ and $C = {(I σ}_{ϵ}^{2} + C^{*})$ are the parameters to be inferred. We refer to the parameter C as “intercept” as it is analogous to LDSC intercept. To allow heterogeneous C across Z^∗, we allow C to be different every 500 elements, i.e., $C = d i a g (c_{1} \dots_{\times 500}, c_{2} \dots_{\times 500}, \dots)$ . Test statistics from different chromosomes are rotated separately and do not share elements in C.

Inferring $h_{γ}^{2}$ and biases using HAMSTA

HAMSTA first applies $Z^{*} = S^{- 1} s_{R} U^{'} D^{1 / 2} Z$ to obtain the rotated Z scores and then finds the estimates for $h_{γ}^{2}$ and C that maximize the likelihood given $Z^{*} \sim N (0, \frac{h_{γ}^{2}}{M} S^{2} + C)$ . Parameters $h_{γ}^{2}$ and C were log-transformed to ensure positivity during optimization. First, we test for ancestral stratification using a likelihood ratio test between models with multiple intercepts and single intercepts in which C is a scalar shared by all elements in Z^∗. If the test is significant with p < 0.05, we determine the maximum likelihood estimates ${\hat{h}}_{γ}^{2}$ and $\hat{C}$ under the multiple intercept model. Otherwise, we find ${\hat{h}}_{γ}^{2}$ and $\hat{C}$ under the single intercept model. To test for the significance of ${\hat{h}}_{γ}^{2}$ , we use a likelihood ratio test that test the hypothesis $h_{γ}^{2} = 0 .$ The standard errors of the estimates were determined using the jackknife method over 10 blocks.

Estimating h² from $h_{γ}^{2}$

Previous work¹¹ demonstrated a relationship between narrow-sense heritability h² and $h_{γ}^{2}$ as $h_{γ}^{2} = 2 F_{S T C} θ (1 - θ) h^{2}$ . The $h_{γ}^{2}$ was formulated as the variance of the expected phenotype conditioned on local ancestries, assuming only the genotypes are dependent on local ancestry. Assuming a distribution of genotypic effect size with respect to the ancestral allele frequencies, the F_STC is defined as the average genetic distance between the ancestral populations at causal loci weighted by the squared of genotypic effect sizes. At each site, the genetic distance is computed as $\frac{{(f_{1} - f_{2})}^{2}}{2 f (1 - f)}$ , where f₁, f₂, and f are the allele frequency in the ancestral populations and the admixed population. We provided h² estimates based on (1) F_STC = 0.1692 reported in the original study,¹¹ which was estimated from HapMap 3 dataset, and (2) F_STC = 0.1152 estimated in this study using a subset of African and European descent from the 1000 Genome and HGDP subset in gnomAD v.3.1,²² assuming common variants explain 90% of h². The average admixture proportion $π$ was observed to be 78% African ancestry.

Simulation design

To validate and assess performance of HAMSTA, we performed simulations using realistic demographic scenarios. Specifically, we simulated ancestral populations pop1 and pop2 mirroring African and European populations in the Out-of-Africa demography model.²³ We additionally introduced structure into pop2 by subdividing it into two subpopulations (denoted by pop2a and pop2b below, Figure S1). We set pop2a and pop2b to have diverged 200 generations ago with a migration rate of 10⁻³. These parameters were selected to result in a genetic differentiation similar to that within European populations ( $F_{S T} \approx 0.003$ ) estimated from the HGDP and 1000 Genome subsets in gnomAD.²² We simulated this demography for a 250 Mb region with a uniform recombination rate of 10⁻⁸ per bp using msprime.²⁴ Using the true genealogies from simulations, we extracted the true local ancestry of each individual by tracing their lineage to each ancestral population (pop1, pop2a, or pop2b). Global ancestries were computed from local ancestry information by computing the total proportion of the 250 Mb region that is inherited from an ancestral population. We sampled 50,000 admixed individuals and 20,000 local ancestry markers according to the demography mode.

Next, we simulated phenotypes according to our phenotype model $y = A β + π α + d δ + ϵ$ . Given a sparsity α, we drew the effect of a local ancestry marker $β_{k}$ from $N (0, \frac{h_{γ}^{2}}{α M})$ with probability α and $ϵ$ from $N (0, σ_{ϵ}^{2})$ . Then we set the true $h_{γ}^{2}$ , PVE by global ancestry, PVE by ancestral stratification, and $σ_{ϵ}^{2}$ by varying the values of γ and δ. Finally, test statistics were computed using linear regression adjusting for π using PLINK 2.0.²⁵

Estimate $h_{γ}^{2}$ with other approaches

To compare HAMSTA with existing methods in $h_{γ}^{2}$ estimation, we applied BOLT-REML,²⁶ GCTA,²⁷ LD score regression (LDSC),¹³ and cov-LDSC¹² to the simulated and real-world data. In GCTA, the same set of covariates included in the admixture mapping were used in $h_{γ}^{2}$ estimation. Following previous studies, we compute the genetic relatedness matrix using local ancestry in place of genotypes.¹¹ In LDSC and cov-LDSC, we define the “local ancestry linkage disequilibrium” (LAD) score for marker i as $l_{i} = \sum_{j \in W} r_{i, j}^{2}$ with r being the local ancestry correlation between marker i and marker j within W, the set of markers in a given window size. In cov-LDSC, the correlation is conditioned on the global ancestry. Window sizes of 1 cM and 20 cM were used. The LAD scores were used as the regressors and weights in LDSC and cov-LDSC.

Significance threshold estimation

Specifically, to determine the significance threshold for a given admixture mapping study, we randomly generated test statistics $Z = s_{R}^{- 1} D^{- 1 / 2} U S Q$ , where Q is a vector of random variable sampled from $N (0, σ_{q}^{2})$ . We set $σ_{q}^{2}$ to be the maximum intercept if the test for multiple intercepts is significant, and $σ_{q}^{2}$ to be the inferred intercept if the test is not significant. We repeated the sampling procedure 2,000 times to determine the critical value as the 95% percentile of $\max (Z^{2})$ . The significance threshold was determined as the tail probability of a chi-square distribution (degree of freedom = 1) at the critical value. To determine the threshold for multiple chromosomes, we estimate the threshold for each chromosome separately and then combine the thresholds by summing up the effective testing burden, i.e., $C o m b i n e d t h r e s = 0.05 / \sum_{i = 1}^{22} (0.05 / t h r e s_{c h r o m o s o m e i})$ . For comparison, we also estimated the significance threshold using STEAM,²⁰ which sampled from $Z = M V N (0, Σ)$ , where $Σ$ is a local ancestry correlation matrix based on genetic distance and admixture parameters. Family-wise error rates (FWERs) were computed as the percentage of times at least one significant signal is identified out of 500 null simulations.

Local ancestry inference and genome-wide mapping for admixed individuals in PAGE cohort

We obtained phenotypes and genotyping data measured on Multi-Ethnic Genotyping Array (MEGA) from the PAGE study.²¹ The complete dataset included 17,299 participants who self-identified as African American. Our analysis included 20 quantitative phenotypes: body mass index (BMI), height, waist-to-hip ratio, diastolic blood pressure, systolic blood pressure, PR interval, QRS interval, QT interval, fasting glucose, fasting insulin, C-reactive protein, mean corpuscular hemoglobin concentration, platelet count, estimated glomerular filtration rate, cigarettes per day, cups of coffee per day, high-density lipoprotein (HDL), low-density lipoprotein (LDL), triglycerides, and total cholesterol. Filters and transformations were applied, and covariates were selected according to the original PAGE analysis within the African American subset (Table S1).²¹

To infer the local ancestry, a subset of African and European genomes from the 1000 Genome and HGDP subset in gnomAD were used as reference individuals.²² After filtering out SNPs with missingness >10%, lifting over, and merging, 516,731 SNPs were used in the local ancestry inference, resulting in 101,292 local ancestry markers. The genotypes of PAGE and reference individuals were re-phased together using EAGLE,²⁸ and the ancestry probabilities were inferred as the local ancestry of the haplotype in a region using RFMIX2.²⁹ The global ancestry of an individual was computed by taking the average of all predicted local ancestries. We analyzed up to 15,988 individuals who have >5% of one of the inferred ancestries and have no missing values in the covariates in the 20 quantitative phenotypes. Admixture mapping was performed using linear regression adjusting for the study center, inferred global ancestry, and phenotype-specific covariates used in PAGE. The average estimate of $h_{γ}^{2}$ across phenotypes was calculated by weighting the estimate of each phenotype by the inverse of the squared standard error. The run time was measured on a machine with an Intel Xeon 4116 processor and 48 GB memory.

Results

HAMSTA provides unbiased estimates of $h_{γ}^{2}$ under ancestral stratification

To evaluate the accuracy of $h_{γ}^{2}$ estimates under various scenarios, we performed simulation studies using local ancestry data simulated under a population demographic model that mirrors African American admixture history with an addition of recent population structure in one of the ancestral populations (see material and methods). In brief, we simulated phenotypes without stratification effects where we varied $h_{γ}^{2}$ from 0 to 0.05 (corresponding to h² from 0 to 1 according to Zaitlen et al.¹¹), which reflects $h_{γ}^{2}$ estimates reported in previous African American samples,³⁰ and performed admixture mapping to compute summary statistics. Overall, we found HAMSTA produced approximately unbiased estimates of $h_{γ}^{2}$ (Figure 1A), irrespective of the sparsity of causal markers (Figure S2). The jackknife standard errors for $h_{γ}^{2}$ were also insensitive to the choice of jackknife blocks. For example, in a simulation of $h_{γ}^{2}$ = 0.03, the average standard error was 0.00535, 0.00531, and 0.00513 when using 10, 20, and 50 blocks, respectively. We observed that the summary statistics-based estimates from HAMSTA were highly correlated with those computed from individual-level data using BOLT-REML (Figure 1B), suggesting that when stratification bias is not present, there is no loss in accuracy across data settings. Next, to compare our method with existing summary statistics-based methods, we applied LD score regression (LDSC; see material and methods) and cov-LDSC and observed both methods produced biased estimates exhibiting large standard errors (Figure S3). Importantly, we found LDSC estimates remained biased after re-estimating “LAD scores” using a larger window size of 20 cM (Figure S3). Next, we varied effect of global ancestry while fixing the $h_{γ}^{2}$ and PVE by ancestral stratification and found that HAMSTA $h_{γ}^{2}$ estimates remained unbiased (Figure 1C). Together, our results suggest that when stratification does not inflate summary statistics, HAMSTA provides unbiased estimates of $h_{γ}^{2}$ , unlike existing summary-based approaches.

Simulation results from 50,000 admixed individuals and phenotypes under different levels of variance explained by local ancestry, global ancestry, and ancestral stratification

The boxplots show the range and quartiles of the estimates

(A) Results of $h_{γ}^{2}$ estimation when varying true $h_{γ}^{2}$ . Phenotypic variance explained (PVE) by global ancestry and ancestral stratification were set to 0. A gray identity line is plotted.

(B) Comparison of $h_{γ}^{2}$ estimates between HAMSTA and BOLT-REML applied to simulation data when true $h_{γ}^{2} = {0.01, 0.02, 0.03, 0.05}$ in (A).

(C) Results when varying the PVE by global ancestry, setting $h_{γ}^{2} = 0.03$ (horizontal line) and PVE by ancestral stratification = 0.

(D) Comparison of $h_{γ}^{2}$ estimates between HAMSTA and BOLT-REML under various levels of ancestral stratification. True $h_{γ}^{2}$ were fixed at 0.03 (horizontal line). The extremes are capped at 1.5 times of the interquartile range away from the lower and upper quartiles. Estimates beyond the extremes are represented by diamonds.

Next, we sought to evaluate HAMSTA in the presence of ancestral stratifications. We determined that the $h_{γ}^{2}$ estimates in our method were more robust to the presence of unadjusted ancestral stratification (Figure 1D). In contrast, BOLT-REML, where the inference model is not aware of ancestral stratification, produced biased results and elevated variance as the PVE by ancestral stratification increases.

Further, we demonstrate that our method is still robust to other scenarios of structures in the ancestral populations (Figure S4). We explored the cases where (1) both ancestral populations are structured, (2) the proportion of ancestries from the subpopulations are unequal in the admixed population, and (3) the subpopulations are introduced to the admixture event at different times. In all the scenarios, the unbiasedness of our estimator is not affected by the ancestral stratification.

Overall, we demonstrated HAMSTA provides unbiased estimates of $h_{γ}^{2}$ under various levels of effects from local ancestry, global ancestry, and stratification in ancestral populations.

HAMSTA estimates inflation in admixture mapping statistics due to stratification

Having established the unbiasedness in $h_{γ}^{2}$ estimates, we next sought to evaluate the ability of HAMSTA to identify inflation in admixture mapping statistics due to ancestral population stratification. Specifically, intercepts estimated by HAMSTA, which signify test statistics inflation and analogous to LDSC intercepts, can be tested against the null (i.e., 1) to evaluate stratification bias. Overall, we observed HAMSTA produced estimates greater than 1 as the PVE by ancestral stratification increased (Figure 2A), demonstrating the ability of HAMSTA-inferred intercepts to capture stratification-induced inflation. Although we noted similar trends in other measures of inflation, including mean $χ^{2}$ and genomic inflation factor $λ_{G C}$ , their inability to distinguish between polygenicity and confounding prevent their use for complex disease analyses.¹³ Next, we evaluated the ability of LDSC to identify stratification in admixture mapping statistics through its intercept estimates and observed biased results with large variability (Figure S5). We observed that HAMSTA is well calibrated (Figure 2B) and significantly more powerful to detect stratification bias compared with LDSC (Figure 2C). For example, HAMSTA has 80% power when stratification explains 10% of PVE, compared with 5% power of LDSC. These relative differences in performance held when we increased the LAD score window size for LDSC (Figure S5). Overall, HAMSTA provides unbiased estimates of inflation in admixture mapping statistics due to ancestral bias and has greater power to reject its null compared to alternative approaches.

Evaluating ancestral stratification by HAMSTA in 500 simulation replicates of 50,000 admixed individuals and phenotypes under $h_{γ}^{2} = 0$ and various levels of variance explained by local ancestral stratification

(A) Boxplots of measures of test statistic inflation reflecting ancestral stratification. The average estimates of HAMSTA’s intercepts are labeled.

(B) Quantile-quantile plot of p values for the test for ancestral stratification.

(C) Power comparison between HAMSTA and LDSC in detecting ancestral stratification. The p value cutoff for each approach was determined such that the significance level = 0.05 in null simulation.

(D) Family-wise error rate before and after correcting p value cutoff in admixture mapping using the estimated intercepts.

HAMSTA improves estimation of p value thresholds to control family-wise error rate

The number of approximately independent ancestry blocks depends on the demographic history of the population being studied, so there is no universal threshold to determine genome-wide significance in admixture mapping studies. Admixture mapping often relies on permutation-based approaches to estimate the FWER, but these approaches can be computationally intractable for large datasets. Although a recently developed summary-static sampling scheme (STEAM) bypasses the need for individual-level permutations and speeds up the FWER estimation,²⁰ its assumption that there exists no inflation in the test statistics may be unmet in the presence of population structure and polygenicity.

Here, we demonstrated that inferences from HAMSTA can be leveraged to produce significance thresholds for association testing to achieve calibrated FWERs compared with STEAM. First, when PVE due to stratification is zero, we found STEAM and HAMSTA estimated similar significance thresholds (HAMSTA mean: 1.12 × 10⁻⁴; STEAM: 1.57 × 10⁻⁴), yielding comparable FWERs at ∼5% (Figure 2D), which suggests that HAMSTA-based FWER estimates do not deflate overall power despite increased model complexity. Importantly, in presence of ancestral stratification, we found that HAMSTA estimates resulted in approximately calibrated FWERs unlike STEAM, which produced a considerable number of false positive associations (Figures 2D and S6). For example, when PVE due to stratification is 0.25, HAMSTA estimates resulted in FWER of 8% compared to the FWER of 34% from STEAM. Together, these findings demonstrate that intercepts estimated by HAMSTA can be incorporated into significance threshold estimation, producing better calibrated FWERs and thereby reducing false positive findings.

Application to African American in the PAGE study

To illustrate the ability of HAMSTA to estimate $h_{γ}^{2}$ from summary data, we applied it to admixture mapping summary statistics of 20 quantitative phenotypes computed from the African American participants in PAGE study²¹ (mean n = 8,383, SD n = 3,901; see material and methods). In brief, we performed admixture mapping using 101,292 markers adjusting for the study center, global ancestry, and phenotype-specific covariates. The average genomic inflation factor $λ_{G C}$ across phenotypes is 1.53 (SD = 0.64). Next, we applied HAMSTA to generated summary statistics to infer $h_{γ}^{2}$ and evaluate potential stratification biases. To estimate h² from $h_{γ}^{2}$ , we estimated the average African ancestry to be 78% and F_STC = 0.12 from the admixed individuals in PAGE and reference individuals from HGDP and 1000 Genomes.

We estimated the $h_{γ}^{2}$ ranges from 0.0025 for systolic blood pressure to 0.033 for height (mean $h_{γ}^{2}$ = 0.012; SE = 9.2 × 10⁻⁴) across the 20 phenotypes, of which 13/20 were individually significantly different from 0 (nominal p value < 0.05 in Table S2). Translating $h_{γ}^{2}$ to estimates of h², we observed the h² ranging from 0.062 for systolic blood pressure to 0.85 for height (mean h² = 0.30; SE = 0.023), of which 13/20 were individually significant. We found these results were robust to different values of F_STC (see Table S2).

Consistent with the simulation results, HAMSTA estimates were correlated more strongly with BOLT-REML estimates (r = 0.99, Figure 3) than those computed from LDSC (r = 0.44) (Figure S7). This was largely attributable to statistical precision, with standard errors in HAMSTA estimates (range from 0.0023 to 0.014, mean = 0.0058) being slightly, but not significantly (paired t test: p = 0.051), greater than those from BOLT-REML (range from 0.0021 to 0.0076, mean = 0.0042), and noticeably lower than those computed from LDSC (range from 0.0064 to 0.021, mean = 0.012). Since 5/20 phenotypes had limited sample sizes (n < 5,000), which is known to impact the performance of BOLT,²⁶ we also estimated $h_{γ}^{2}$ via GCTA. Of the 16 estimates computed by GCTA that converged, we observed they were in general bounded by the estimates by HAMSTA and BOLT-REML (Figure S8). Overall, we find that HAMSTA estimates of $h_{γ}^{2}$ are consistent with those computed from individual-level approaches in real data, while requiring much less computation time: HAMSTA takes 1–29 min for SVD of each chromosome and 49 s for the inference, but GCTA requires 1 h to compute the relatedness matrix and 1 h for the inference step.

Comparison of ${\hat{h}}_{γ}^{2}$ -based ${\hat{h}}^{2}$ between HAMSTA and BOLT-REML for the 20 quantitative traits in African American in PAGE

Results on 20 PAGE quantitative traits. Comparison between the estimates from HAMSTA and BOLT-REML. Each point shows the ${\hat{h}}^{2}$ , and the lengths of the error bars represent the standard errors.

To substantiate the translated h² estimates computed from HAMSTA, we compared with previous h² estimates reported from admixed individuals¹¹ as well as those from twin studies. Overall, we found our h² estimates are significantly correlated with the previously reported $h_{γ}^{2}$ -based estimates¹¹ (r = 0.84, p = 0.03). Focusing on height and BMI, HAMSTA estimated $h_{γ}^{2}$ = 0.033 (SE: 3.4 × 10⁻⁴) and $h_{γ}^{2}$ = 0.017 (3.4 × 10⁻⁴), respectively, corresponding to h² of 0.85 (0.085) and 0.42 (0.086), respectively. The estimated height h² was similar to the h² = 0.68–0.84 in twin studies,³¹ whereas the estimated BMI h² was smaller than the h² = 0.57–0.77 in twin studies³² and higher than the h² = 0.30 in an estimation from whole-genome sequence data in European ancestry populations.³³

HAMSTA-estimated intercepts suggested limited evidence for inflated summary statistics due to ancestral stratification in the admixture mapping (range from 0.97 to 1.01, average = 0.99; Table S2), with 0/20 phenotypes differing significantly from the expectation of 1. Although LDSC suggested no significant deviation of intercepts from 1 (range from 0.18 to 1.95, average = 1.07), individual intercepts varied more greatly under LDSC (mean SE = 0.34) than those computed under HAMSTA (mean SE = 5.6 × 10⁻³) (Table S2).

Since in simulation we demonstrated that the significance threshold for admixture mapping corresponding to FWER of 5% is sensitive to ancestral stratification, we estimated the thresholds based on the HAMSTA intercepts. Under no ancestral stratification (i.e., intercept = 1), HAMSTA estimated the significance threshold required to be 2.80 × 10⁻⁵, which agrees with the threshold of 2.10 × 10⁻⁵ reported by STEAM for African Americans.²⁰ Based on the estimated intercepts in HAMSTA for the 20 phenotypes, the estimated thresholds range from 2.70 × 10⁻⁵ to 3.52 × 10⁻⁵. To conclude, HAMSTA found no evidence of inflation in admixture mapping statistics and provided estimates for $h_{γ}^{2}$ and hence h² of the complex traits of African Americans in PAGE study.

Discussion

In this study, we demonstrated the use of summary statistics from admixture mapping to quantify the contribution of genetic variations to a trait. We developed a tool, HAMSTA, that unbiasedly estimates $h_{γ}^{2}$ under the various trait architectures, including in the presence of unknown population stratification in ancestral populations. Using the summary statistic-based approach, HAMSTA distinguishes the effect tagged by local ancestry on test statistics from unknown confounding biases. We also demonstrated that the estimated biases could be used to correct the significance threshold such that FWERs are better controlled. Lastly, we applied HAMSTA to real-world data, showing that it can recover the $h_{γ}^{2}$ and hence h² from admixture mapping summary statistics.

Our method addresses several limitations in existing approaches estimating $h_{γ}^{2}$ . First, because of the long-range correlations between local ancestry markers, LDSC requires a large window size to capture correlations with distant effect markers. Also, regression weights may not be sufficient to solve the problem of correlated $χ^{2}$ statistics, which could lead to inefficient estimation.³⁴ Our analysis shows that the efficiency can be improved when admixture mapping test statistics are rotated to an independent space. Second, although REML could provide an unbiased estimate, we showed in simulation that it is susceptible to ancestral stratification. Also, it is computationally expensive as the sample size increases. In real data analysis, the REML approach in GCTA failed to converge in waist-to-hip ratio, QT-interval, cigarette-per-day, and HDL. In contrast, we showed that HAMSTA would be a more robust approach to ancestral stratification and has no convergence problem in our analysis. Finally, existing methods assume uniform test statistics inflation although it has been shown that this assumption could be inaccurate.³⁵^,³⁶ HAMSTA relaxes this assumption by allowing multiple intercepts to represent non-uniform inflation. Overall, HAMSTA offers advantages over existing methods in the above aspects.

We are aware of several limitations of HAMSTA. First, HAMSTA provides estimates of heritability explained by local ancestries only in two-way admixtures, which may limit the use of the method in admixed populations with more than two ancestral populations. Currently, the relationship between $h_{γ}^{2}$ and h² are established only in two-way admixed populations such as African American, but models for $h_{γ}^{2}$ multi-way admixture have not yet been proposed. Incorporating the contribution of multiple ancestries in $h_{γ}^{2}$ and h² will be a possible extension in the future. Second, the standard error of HAMSTA $h_{γ}^{2}$ is larger than that from methods that use individual-level data like BOLT-REML (mean SE = 0.0058 in HAMSTA versus mean SE = 0.0042 in BOLT-REML). Nevertheless, HAMSTA $h_{γ}^{2}$ is robust to ancestral stratification, unlike BOLT-REML showing upward biases in the $h_{γ}^{2}$ estimates (Figure 1D). Third, HAMSTA models only summary statistics computed from linear regression on quantitative traits. The scope of this study is not extended to modeling binary traits. Future work can explore phenotypes under the liability-scale model and evaluate the use of summary statistics from logistic regression models. Lastly, since HAMSTA relies on an accurate LAD, factors that the LAD depends on, such as global ancestries, could potentially impact the accuracy of the estimates. These factors are required to be adjusted for when estimating the LAD.

Similar to previous summary statistic-based methods in GWASs such as LDSC, HAMSTA requires admixture mapping statistics and LAD information from individual-level local ancestry data. In LDSC or cov-LDSC, the LD scores need to be computed from an individual-level genotype data in which the LD is consistent with the study sample before proceeding with the downstream inference. Other summary-based methods such as h2-GRE³⁷ also use in-sample LD estimates before estimating heritability. Likewise, HAMSTA requires the SVD results of individual-level local ancestry data to capture the LAD information. In a cohort involving multiple phenotypes, LAD captured in the SVD results can be re-used in different phenotypes for fast and robust summary statistics analysis in admixture mapping studies.

In summary, our work opens a direction of summary statistics analysis in admixture mapping studies. Our method will facilitate studies of genetic architecture in large cohorts of admixed populations.

Data and code availability

The codes for HAMSTA are available at https://github.com/tszfungc/HAMSTA.

Acknowledgments

This work was funded in part by National Institutes of Health (NIH) under awards R01HG012133 and R35GM142783.

Declaration of interests

N.M. is a member of the HGG Advances (a sister journal of AJHG) editorial board.

Published: October 23, 2023

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2023.09.012.

Web resources

BOLT-REML, https://alkesgroup.broadinstitute.org/BOLT-LMM/BOLT-LMM_manual.html
GCTA, https://cnsgenomics.com/software/gcta/
GNOMAD HGDP and 1KG subsets, https://gnomad.broadinstitute.org/downloads#v3-hgdp-1kg
LDSC, https://github.com/bulik/ldsc
MSPRIME, https://github.com/tskit-dev/msprime
PLINK, https://www.cog-genomics.org/plink/
RFMIX2, https://github.com/slowkoni/rfmix
STEAM, https://github.com/kegrinde/STEAM

Supplemental information

Document S1. Figures S1–S8 and Table S1

mmc1.pdf^{(827.9KB, pdf)}

Table S2. Heritability estimates for the 20 quantitative phenotypes in the PAGE study

mmc2.xlsx^{(14.2KB, xlsx)}

Table S3. Members of the PAGE Consortium

mmc3.xlsx^{(26.7KB, xlsx)}

Document S2. Article plus supplemental information

mmc4.pdf^{(3.5MB, pdf)}

References

1.Ziyatdinov A., Parker M.M., Vaysse A., Beaty T.H., Kraft P., Cho M.H., Aschard H. Mixed-model admixture mapping identifies smoking-dependent loci of lung function in African Americans. Eur. J. Hum. Genet. 2020;28:656–668. doi: 10.1038/s41431-019-0545-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Nalls M.A., Wilson J.G., Patterson N.J., Tandon A., Zmuda J.M., Huntsman S., Garcia M., Hu D., Li R., Beamer B.A., et al. Admixture mapping of white cell count: genetic locus responsible for lower white blood cell count in the Health ABC and Jackson Heart studies. Am. J. Hum. Genet. 2008;82:81–87. doi: 10.1016/j.ajhg.2007.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Freedman M.L., Haiman C.A., Patterson N., McDonald G.J., Tandon A., Waliszewska A., Penney K., Steen R.G., Ardlie K., John E.M., et al. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc. Natl. Acad. Sci. USA. 2006;103:14068–14073. doi: 10.1073/pnas.0605832103. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Sofer T., Baier L.J., Browning S.R., Thornton T.A., Talavera G.A., Wassertheil-Smoller S., Daviglus M.L., Hanson R., Kobes S., Cooper R.S., et al. Admixture mapping in the Hispanic Community Health Study/Study of Latinos reveals regions of genetic associations with blood pressure traits. PLoS One. 2017;12 doi: 10.1371/journal.pone.0188400. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Galanter J.M., Gignoux C.R., Torgerson D.G., Roth L.A., Eng C., Oh S.S., Nguyen E.A., Drake K.A., Huntsman S., Hu D., et al. Genome-wide association study and admixture mapping identify different asthma-associated loci in Latinos: The Genes-environments & Admixture in Latino Americans study. J. Allergy Clin. Immunol. 2014;134:295–305. doi: 10.1016/j.jaci.2013.08.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Pino-Yanes M., Gignoux C.R., Galanter J.M., Levin A.M., Campbell C.D., Eng C., Huntsman S., Nishimura K.K., Gourraud P.-A., Mohajeri K., et al. Genome-wide association study and admixture mapping reveal new loci associated with total IgE levels in Latinos. J. Allergy Clin. Immunol. 2015;135:1502–1510. doi: 10.1016/j.jaci.2014.10.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Sun H., Lin M., Russell E.M., Minster R.L., Chan T.F., Dinh B.L., Naseri T., Reupena M.S., Lum-Jones A., et al. Samoan Obesity, Lifestyle, and Genetic Adaptations OLaGA Study Group The impact of global and local Polynesian genetic ancestry on complex traits in Native Hawaiians. PLoS Genet. 2021;17 doi: 10.1371/journal.pgen.1009273. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Shriner D. Overview of Admixture Mapping. Curr. Protoc. Hum. Genet. 2017;94:1.23.1–1.23.8. doi: 10.1002/cphg.44. [DOI] [PubMed] [Google Scholar]
9.Shriner D., Adeyemo A., Rotimi C.N. Joint ancestry and association testing in admixed individuals. PLoS Comput. Biol. 2011;7 doi: 10.1371/journal.pcbi.1002325. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Horimoto A.R.V.R., Xue D., Thornton T.A., Blue E.E. Admixture mapping reveals the association between Native American ancestry at 3q13.11 and reduced risk of Alzheimer’s disease in Caribbean Hispanics. Alzheimer's Res. Ther. 2021;13:122. doi: 10.1186/s13195-021-00866-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Zaitlen N., Pasaniuc B., Sankararaman S., Bhatia G., Zhang J., Gusev A., Young T., Tandon A., Pollack S., Vilhjálmsson B.J., et al. Leveraging population admixture to characterize the heritability of complex traits. Nat. Genet. 2014;46:1356–1362. doi: 10.1038/ng.3139. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Luo Y., Li X., Wang X., Gazal S., Mercader J.M., 23 and Me Research Team, SIGMA Type 2 Diabetes Consortium. Neale B.M., Florez J.C., Auton A., et al. Estimating heritability and its enrichment in tissue-specific gene sets in admixed populations. Hum. Mol. Genet. 2021;30:1521–1534. doi: 10.1093/hmg/ddab130. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Bulik-Sullivan B.K., Loh P.-R., Finucane H.K., Ripke S., Yang J., Schizophrenia Working Group of the Psychiatric Genomics Consortium. Patterson N., Daly M.J., Price A.L., Neale B.M. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Haak W., Lazaridis I., Patterson N., Rohland N., Mallick S., Llamas B., Brandt G., Nordenfelt S., Harney E., Stewardson K., et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature. 2015;522:207–211. doi: 10.1038/nature14317. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Browning S.R., Grinde K., Plantinga A., Gogarten S.M., Stilp A.M., Kaplan R.C., Avilés-Santa M.L., Browning B.L., Laurie C.C. Local Ancestry Inference in a Large US-Based Hispanic/Latino Study: Hispanic Community Health Study/Study of Latinos (HCHS/SOL) G3. 2016;6:1525–1534. doi: 10.1534/g3.116.028779. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Berg J.J., Harpak A., Sinnott-Armstrong N., Joergensen A.M., Mostafavi H., Field Y., Boyle E.A., Zhang X., Racimo F., Pritchard J.K., et al. Reduced signal for polygenic adaptation of height in UK Biobank. Elife. 2019;8:e39725. doi: 10.7554/eLife.39725. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Sohail M., Maier R.M., Ganna A., Bloemendal A., Martin A.R., Turchin M.C., Chiang C.W., Hirschhorn J., Daly M.J., Patterson N., et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Elife. 2019;8 doi: 10.7554/eLife.39702. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Gopalan S., Smith S.P., Korunes K., Hamid I., Ramachandran S., Goldberg A. Human genetic admixture through the lens of population genomics. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2022;377 doi: 10.1098/rstb.2020.0410. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Speed D., Balding D.J. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 2019;51:277–284. doi: 10.1038/s41588-018-0279-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Grinde K.E., Brown L.A., Reiner A.P., Thornton T.A., Browning S.R. Genome-wide Significance Thresholds for Admixture Mapping Studies. Am. J. Hum. Genet. 2019;104:454–465. doi: 10.1016/j.ajhg.2019.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Wojcik G.L., Graff M., Nishimura K.K., Tao R., Haessler J., Gignoux C.R., Highland H.M., Patel Y.M., Sorokin E.P., Avery C.L., et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P., et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Gravel S., Henn B.M., Gutenkunst R.N., Indap A.R., Marth G.T., Clark A.G., Yu F., Gibbs R.A., 1000 Genomes Project. Bustamante C.D. Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. USA. 2011;108:11983–11988. doi: 10.1073/pnas.1019276108. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Kelleher J., Etheridge A.M., McVean G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Comput. Biol. 2016;12 doi: 10.1371/journal.pcbi.1004842. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Loh P.-R., Bhatia G., Gusev A., Finucane H.K., Bulik-Sullivan B.K., Pollack S.J., Schizophrenia Working Group of Psychiatric Genomics Consortium. de Candia T.R., Lee S.H., Wray N.R., et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 2015;47:1385–1392. doi: 10.1038/ng.3431. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Yang J., Lee S.H., Goddard M.E., Visscher P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Loh P.-R., Danecek P., Palamara P.F., Fuchsberger C., A Reshef Y., K Finucane H., Schoenherr S., Forer L., McCarthy S., Abecasis G.R., et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 2016;48:1443–1448. doi: 10.1038/ng.3679. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Maples B.K., Gravel S., Kenny E.E., Bustamante C.D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 2013;93:278–288. doi: 10.1016/j.ajhg.2013.06.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Shriner D., Bentley A.R., Doumatey A.P., Chen G., Zhou J., Adeyemo A., Rotimi C.N. Phenotypic variance explained by local ancestry in admixed African Americans. Front. Genet. 2015;6:324. doi: 10.3389/fgene.2015.00324. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Silventoinen K., Sammalisto S., Perola M., Boomsma D.I., Cornes B.K., Davis C., Dunkel L., De Lange M., Harris J.R., Hjelmborg J.V.B., et al. Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res. 2003;6:399–408. doi: 10.1375/136905203770326402. [DOI] [PubMed] [Google Scholar]
32.Silventoinen K., Jelenkovic A., Sund R., Yokoyama Y., Hur Y.-M., Cozen W., Hwang A.E., Mack T.M., Honda C., Inui F., et al. Differences in genetic and environmental variation in adult BMI by sex, age, time period, and region: an individual-based pooled analysis of 40 twin cohorts. Am. J. Clin. Nutr. 2017;106:457–466. doi: 10.3945/ajcn.117.153643. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Wainschtein P., Jain D., Zheng Z., TOPMed Anthropometry Working Group, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium. Cupples L.A., Shadyab A.H., McKnight B., Shoemaker B.M., Mitchell B.D., et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat. Genet. 2022;54:263–273. doi: 10.1038/s41588-021-00997-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Song S., Jiang W., Zhang Y., Hou L., Zhao H. Leveraging LD eigenvalue regression to improve the estimation of SNP heritability and confounding inflation. Am. J. Hum. Genet. 2022;109:802–811. doi: 10.1016/j.ajhg.2022.03.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Speed D., Kaphle A., Balding D.J. SNP-based heritability and selection analyses: Improved models and new results. Bioessays. 2022;44 doi: 10.1002/bies.202100170. [DOI] [PubMed] [Google Scholar]
36.Mathieson I., McVean G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 2012;44:243–246. doi: 10.1038/ng.1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Hou K., Burch K.S., Majumdar A., Shi H., Mancuso N., Wu Y., Sankararaman S., Pasaniuc B. Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture. Nat. Genet. 2019;51:1244–1251. doi: 10.1038/s41588-019-0465-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S8 and Table S1

mmc1.pdf^{(827.9KB, pdf)}

Table S2. Heritability estimates for the 20 quantitative phenotypes in the PAGE study

mmc2.xlsx^{(14.2KB, xlsx)}

Table S3. Members of the PAGE Consortium

mmc3.xlsx^{(26.7KB, xlsx)}

Document S2. Article plus supplemental information

mmc4.pdf^{(3.5MB, pdf)}

Data Availability Statement

The codes for HAMSTA are available at https://github.com/tszfungc/HAMSTA.

[bib1] 1.Ziyatdinov A., Parker M.M., Vaysse A., Beaty T.H., Kraft P., Cho M.H., Aschard H. Mixed-model admixture mapping identifies smoking-dependent loci of lung function in African Americans. Eur. J. Hum. Genet. 2020;28:656–668. doi: 10.1038/s41431-019-0545-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Nalls M.A., Wilson J.G., Patterson N.J., Tandon A., Zmuda J.M., Huntsman S., Garcia M., Hu D., Li R., Beamer B.A., et al. Admixture mapping of white cell count: genetic locus responsible for lower white blood cell count in the Health ABC and Jackson Heart studies. Am. J. Hum. Genet. 2008;82:81–87. doi: 10.1016/j.ajhg.2007.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Freedman M.L., Haiman C.A., Patterson N., McDonald G.J., Tandon A., Waliszewska A., Penney K., Steen R.G., Ardlie K., John E.M., et al. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc. Natl. Acad. Sci. USA. 2006;103:14068–14073. doi: 10.1073/pnas.0605832103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Sofer T., Baier L.J., Browning S.R., Thornton T.A., Talavera G.A., Wassertheil-Smoller S., Daviglus M.L., Hanson R., Kobes S., Cooper R.S., et al. Admixture mapping in the Hispanic Community Health Study/Study of Latinos reveals regions of genetic associations with blood pressure traits. PLoS One. 2017;12 doi: 10.1371/journal.pone.0188400. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Galanter J.M., Gignoux C.R., Torgerson D.G., Roth L.A., Eng C., Oh S.S., Nguyen E.A., Drake K.A., Huntsman S., Hu D., et al. Genome-wide association study and admixture mapping identify different asthma-associated loci in Latinos: The Genes-environments & Admixture in Latino Americans study. J. Allergy Clin. Immunol. 2014;134:295–305. doi: 10.1016/j.jaci.2013.08.055. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Pino-Yanes M., Gignoux C.R., Galanter J.M., Levin A.M., Campbell C.D., Eng C., Huntsman S., Nishimura K.K., Gourraud P.-A., Mohajeri K., et al. Genome-wide association study and admixture mapping reveal new loci associated with total IgE levels in Latinos. J. Allergy Clin. Immunol. 2015;135:1502–1510. doi: 10.1016/j.jaci.2014.10.033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Sun H., Lin M., Russell E.M., Minster R.L., Chan T.F., Dinh B.L., Naseri T., Reupena M.S., Lum-Jones A., et al. Samoan Obesity, Lifestyle, and Genetic Adaptations OLaGA Study Group The impact of global and local Polynesian genetic ancestry on complex traits in Native Hawaiians. PLoS Genet. 2021;17 doi: 10.1371/journal.pgen.1009273. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Shriner D. Overview of Admixture Mapping. Curr. Protoc. Hum. Genet. 2017;94:1.23.1–1.23.8. doi: 10.1002/cphg.44. [DOI] [PubMed] [Google Scholar]

[bib9] 9.Shriner D., Adeyemo A., Rotimi C.N. Joint ancestry and association testing in admixed individuals. PLoS Comput. Biol. 2011;7 doi: 10.1371/journal.pcbi.1002325. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Horimoto A.R.V.R., Xue D., Thornton T.A., Blue E.E. Admixture mapping reveals the association between Native American ancestry at 3q13.11 and reduced risk of Alzheimer’s disease in Caribbean Hispanics. Alzheimer's Res. Ther. 2021;13:122. doi: 10.1186/s13195-021-00866-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Zaitlen N., Pasaniuc B., Sankararaman S., Bhatia G., Zhang J., Gusev A., Young T., Tandon A., Pollack S., Vilhjálmsson B.J., et al. Leveraging population admixture to characterize the heritability of complex traits. Nat. Genet. 2014;46:1356–1362. doi: 10.1038/ng.3139. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Luo Y., Li X., Wang X., Gazal S., Mercader J.M., 23 and Me Research Team, SIGMA Type 2 Diabetes Consortium. Neale B.M., Florez J.C., Auton A., et al. Estimating heritability and its enrichment in tissue-specific gene sets in admixed populations. Hum. Mol. Genet. 2021;30:1521–1534. doi: 10.1093/hmg/ddab130. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13.Bulik-Sullivan B.K., Loh P.-R., Finucane H.K., Ripke S., Yang J., Schizophrenia Working Group of the Psychiatric Genomics Consortium. Patterson N., Daly M.J., Price A.L., Neale B.M. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Haak W., Lazaridis I., Patterson N., Rohland N., Mallick S., Llamas B., Brandt G., Nordenfelt S., Harney E., Stewardson K., et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature. 2015;522:207–211. doi: 10.1038/nature14317. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Browning S.R., Grinde K., Plantinga A., Gogarten S.M., Stilp A.M., Kaplan R.C., Avilés-Santa M.L., Browning B.L., Laurie C.C. Local Ancestry Inference in a Large US-Based Hispanic/Latino Study: Hispanic Community Health Study/Study of Latinos (HCHS/SOL) G3. 2016;6:1525–1534. doi: 10.1534/g3.116.028779. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Berg J.J., Harpak A., Sinnott-Armstrong N., Joergensen A.M., Mostafavi H., Field Y., Boyle E.A., Zhang X., Racimo F., Pritchard J.K., et al. Reduced signal for polygenic adaptation of height in UK Biobank. Elife. 2019;8:e39725. doi: 10.7554/eLife.39725. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.Sohail M., Maier R.M., Ganna A., Bloemendal A., Martin A.R., Turchin M.C., Chiang C.W., Hirschhorn J., Daly M.J., Patterson N., et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Elife. 2019;8 doi: 10.7554/eLife.39702. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Gopalan S., Smith S.P., Korunes K., Hamid I., Ramachandran S., Goldberg A. Human genetic admixture through the lens of population genomics. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2022;377 doi: 10.1098/rstb.2020.0410. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.Speed D., Balding D.J. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 2019;51:277–284. doi: 10.1038/s41588-018-0279-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Grinde K.E., Brown L.A., Reiner A.P., Thornton T.A., Browning S.R. Genome-wide Significance Thresholds for Admixture Mapping Studies. Am. J. Hum. Genet. 2019;104:454–465. doi: 10.1016/j.ajhg.2019.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Wojcik G.L., Graff M., Nishimura K.K., Tao R., Haessler J., Gignoux C.R., Highland H.M., Patel Y.M., Sorokin E.P., Avery C.L., et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22.Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P., et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Gravel S., Henn B.M., Gutenkunst R.N., Indap A.R., Marth G.T., Clark A.G., Yu F., Gibbs R.A., 1000 Genomes Project. Bustamante C.D. Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. USA. 2011;108:11983–11988. doi: 10.1073/pnas.1019276108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] 24.Kelleher J., Etheridge A.M., McVean G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Comput. Biol. 2016;12 doi: 10.1371/journal.pcbi.1004842. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 26.Loh P.-R., Bhatia G., Gusev A., Finucane H.K., Bulik-Sullivan B.K., Pollack S.J., Schizophrenia Working Group of Psychiatric Genomics Consortium. de Candia T.R., Lee S.H., Wray N.R., et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 2015;47:1385–1392. doi: 10.1038/ng.3431. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] 27.Yang J., Lee S.H., Goddard M.E., Visscher P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 28.Loh P.-R., Danecek P., Palamara P.F., Fuchsberger C., A Reshef Y., K Finucane H., Schoenherr S., Forer L., McCarthy S., Abecasis G.R., et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 2016;48:1443–1448. doi: 10.1038/ng.3679. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] 29.Maples B.K., Gravel S., Kenny E.E., Bustamante C.D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 2013;93:278–288. doi: 10.1016/j.ajhg.2013.06.020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] 30.Shriner D., Bentley A.R., Doumatey A.P., Chen G., Zhou J., Adeyemo A., Rotimi C.N. Phenotypic variance explained by local ancestry in admixed African Americans. Front. Genet. 2015;6:324. doi: 10.3389/fgene.2015.00324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] 31.Silventoinen K., Sammalisto S., Perola M., Boomsma D.I., Cornes B.K., Davis C., Dunkel L., De Lange M., Harris J.R., Hjelmborg J.V.B., et al. Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res. 2003;6:399–408. doi: 10.1375/136905203770326402. [DOI] [PubMed] [Google Scholar]

[bib32] 32.Silventoinen K., Jelenkovic A., Sund R., Yokoyama Y., Hur Y.-M., Cozen W., Hwang A.E., Mack T.M., Honda C., Inui F., et al. Differences in genetic and environmental variation in adult BMI by sex, age, time period, and region: an individual-based pooled analysis of 40 twin cohorts. Am. J. Clin. Nutr. 2017;106:457–466. doi: 10.3945/ajcn.117.153643. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] 33.Wainschtein P., Jain D., Zheng Z., TOPMed Anthropometry Working Group, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium. Cupples L.A., Shadyab A.H., McKnight B., Shoemaker B.M., Mitchell B.D., et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat. Genet. 2022;54:263–273. doi: 10.1038/s41588-021-00997-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] 34.Song S., Jiang W., Zhang Y., Hou L., Zhao H. Leveraging LD eigenvalue regression to improve the estimation of SNP heritability and confounding inflation. Am. J. Hum. Genet. 2022;109:802–811. doi: 10.1016/j.ajhg.2022.03.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] 35.Speed D., Kaphle A., Balding D.J. SNP-based heritability and selection analyses: Improved models and new results. Bioessays. 2022;44 doi: 10.1002/bies.202100170. [DOI] [PubMed] [Google Scholar]

[bib36] 36.Mathieson I., McVean G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 2012;44:243–246. doi: 10.1038/ng.1074. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] 37.Hou K., Burch K.S., Majumdar A., Shi H., Mancuso N., Wu Y., Sankararaman S., Pasaniuc B. Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture. Nat. Genet. 2019;51:1244–1251. doi: 10.1038/s41588-019-0465-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Estimating heritability explained by local ancestry and evaluating stratification bias in admixture mapping from summary statistics

Tsz Fung Chan

Xinyue Rui

David V Conti

Myriam Fornage

Mariaelisa Graff

Jeffrey Haessler

Christopher Haiman

Heather M Highland

Su Yon Jung

Eimear E Kenny

Charles Kooperberg

Loic Le Marchand

Kari E North

Ran Tao

Genevieve Wojcik

Christopher R Gignoux

Charleston WK Chiang

Nicholas Mancuso

Summary

Introduction

Material and methods

Model for complex trait and ancestral stratification

Test statistics for admixture mapping

Inferring hγ2 and biases using HAMSTA

Estimating h2 from hγ2

Simulation design

Estimate hγ2 with other approaches

Significance threshold estimation

Local ancestry inference and genome-wide mapping for admixed individuals in PAGE cohort

Results

HAMSTA provides unbiased estimates of hγ2 under ancestral stratification

Figure 1.

HAMSTA estimates inflation in admixture mapping statistics due to stratification

Figure 2.

HAMSTA improves estimation of p value thresholds to control family-wise error rate

Application to African American in the PAGE study

Figure 3.

Discussion

Data and code availability

Acknowledgments

Declaration of interests

Footnotes

Web resources

Supplemental information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Inferring $h_{γ}^{2}$ and biases using HAMSTA

Estimating h² from $h_{γ}^{2}$

Estimate $h_{γ}^{2}$ with other approaches

HAMSTA provides unbiased estimates of $h_{γ}^{2}$ under ancestral stratification