Skip to main content
eLife logoLink to eLife
. 2020 Dec 29;9:e56029. doi: 10.7554/eLife.56029

Recent shifts in the genomic ancestry of Mexican Americans may alter the genetic architecture of biomedical traits

Melissa L Spear 1,2,3,4,, Alex Diaz-Papkovich 3,5, Elad Ziv 6,7,8,9, Joseph M Yracheta 10,11, Simon Gravel 3,4, Dara G Torgerson 3,4,12, Ryan D Hernandez 2,3,4,8,13,14,
Editors: Mashaal Sohail15, Patricia J Wittkopp16
PMCID: PMC7771964  PMID: 33372659

Abstract

People in the Americas represent a diverse continuum of populations with varying degrees of admixture among African, European, and Amerindigenous ancestries. In the United States, populations with non-European ancestry remain understudied, and thus little is known about the genetic architecture of phenotypic variation in these populations. Using genotype data from the Hispanic Community Health Study/Study of Latinos, we find that Amerindigenous ancestry increased by an average of ~20% spanning 1940s-1990s in Mexican Americans. These patterns result from complex interactions between several population and cultural factors which shaped patterns of genetic variation and influenced the genetic architecture of complex traits in Mexican Americans. We show for height how polygenic risk scores based on summary statistics from a European-based genome-wide association study perform poorly in Mexican Americans. Our findings reveal temporal changes in population structure within Hispanics/Latinos that may influence biomedical traits, demonstrating a need to improve our understanding of admixed populations.

Research organism: Human

Introduction

The United States Census Bureau refers to the Hispanic/Latino ethnicity as a category for individuals who self-identify as ‘a person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin regardless of race (United States Government, Executive Office of the President, Office of Management and Budget, Office of Information and Regulatory Affairs, 1997). As such, this broad ethnic group living in the United States is a culturally, phenotypically, and genetically diverse continuum of populations. Individuals who identify as Hispanic/Latino have varying proportions of Amerindigenous, African, and European genetic ancestries, each with their own unique continental demographic history. Demographic forces such as population bottlenecks, expansions, and migration as well as adaptation to novel environments resulted in observable differences in continental patterns of genetic variation (Nelson et al., 2008; Abecasis et al., 2012; Auton et al., 2015). These differing patterns were shaped by many historical events of migration which included the founding of the Americas by Amerindigenous populations, the colonization by Europeans, and the African slave trade (Gravel et al., 2013; Homburger et al., 2015; Moreno-Estrada et al., 2014; Moreno-Estrada et al., 2013; Reich et al., 2012; Bryc et al., 2015; Conomos et al., 2016; Han et al., 2017; Baharian et al., 2016; Jordan et al., 2019; Micheletti et al., 2020). However additional complexities surrounding these events remain highly understudied.

Demographic history has shaped the genetic architecture of modern human phenotypic variation (Agarwala et al., 2013; Eyre-Walker, 2010; Maher et al., 2013; Simons et al., 2014; Uricchio et al., 2016; Yang et al., 2015), and is critical to consider in the search for the genetic basis of complex diseases. The demography of the United States has changed drastically over the 20th century, and by 2044 is predicted to become a ‘minority-majority’ country whereby no one racial/ethnic group comprises more than 50% of the population. By 2060 Hispanics/Latinos are projected to make up 29% of the US population or 119 million individuals (Colby and Ortman, 2015). However, to date, population-based medical genomics research [and its subsequent benefits, including polygenic risk score (PRS) profiling] have been disproportionately focused on individuals of European descent, with the findings primarily benefiting European populations (Bustamante et al., 2011; Martin et al., 2019). Despite the increases in sample sizes, rates of discovery, and traits studied, Hispanic or Latin American participation in genome-wide association studies (GWAS) has continued to hover around 1% (Popejoy and Fullerton, 2016; Mills and Rahal, 2019). This trend, along with factors ranging from research abuse and community mistrust to community superstition and apathy have led to a situation where these populations (and other non-European populations) are particularly vulnerable to falling behind in receiving the benefits of the precision medicine revolution (Martin et al., 2019; Popejoy and Fullerton, 2016).

In this study, we utilize the largest genetic study of Hispanics/Latinos in the U.S. to date -- the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) (Conomos et al., 2016) -- to understand how patterns of genetic variation in Hispanic/Latino populations in the United States have changed over the last century, and evaluate the impact such changes may be having on complex traits.

Results

Global ancestry proportions among HCHS/SOL Hispanic/Latino Populations

Using the subset of sites that overlapped with our African, European, and Amerindigenous reference panels, we called 3-way global ancestry estimates for 10,268 unrelated HCHS/SOL individuals (see Materials and methods). Figure 1A summarizes the global ancestry proportions shaded by admixture estimates in a ternary plot, recapitulating the original HCHS/SOL analysis of continental ancestry (Conomos et al., 2016). However, while several population groups appear to have overlapping ancestry proportions (Figure 1B), this analysis masks more subtle structure in subcontinental ancestry. To investigate subtle population structure across these self-identified population groups, we performed UMAP on the top three principal components (see Materials and methods and Figure 1—figure supplement 1C), and find substantial structure across self-identified groups (Figure 1C–D). We find that Dominicans, who have the highest average proportions of African ancestry, are in the middle, with Puerto Ricans and Cubans, diverging in opposite directions (Figure 1D) with clines of increasing European ancestry proportions (Figure 1C). Further, while self-identified Mexican, Central, and South American groups appear to have overlapping ancestry proportions in Figure 1A–B, UMAP represents the Mexican Americans and Central/South American groups as large, separate wings that diverge from self-identified Cubans and Dominicans, with both clusters diverging with clines of increasing ancestry toward different Amerindigenous (AI) populations (Figure 1C–D and Figure 1—figure supplement 1B). When we included multiple European and African reference populations in our analyses as well as without reference populations, UMAP maintained the representation of separate clusters for each of the HCHS/SOL populations (Figure 1—figure supplement 1C–G). These clusters with varying AI ancestries are consistent with Conomos et al., 2016, however the UMAP embedding consolidates the signal present in the top three PCs into a succinct two-dimensional visualization.

Figure 1. Genomic ancestry and population structure in HCHS/SOL.

(A) Ternary plot of HCHS/SOL (n = 10,268) colored by admixture proportions. (B) Ternary plot of global ancestry proportions colored by population for 10,268 HCHS/SOL individuals (C) Uniform Manifold Approximation and Projection (UMAP) plot depicting the genetic diversity of HCHS/SOL and the reference panel (n = 10,591) using three principal components, colored by admixture proportions Within the legend, AFR, EUR, and AI refer to African, European, and Amerindigenous global ancestries, respectively. (D) UMAP plot of HCHS/SOL and the reference panel (n = 10,591) using three principal components, colored by HCHS/SOL population.

Figure 1.

Figure 1—figure supplement 1. Ancestral diversity of HCHS/SOL populations.

Figure 1—figure supplement 1.

(A) Scree plot of principal component analysis of HCHS/SOL and the reference panel (n = 10,591). (B) Uniform Manifold Approximation and Projection (UMAP) plot of HCHS/SOL and the reference panel (n = 10,591) using three principal components, colored by reference population. (C) Uniform Manifold Approximation and Projection (UMAP) plot of HCHS/SOL only (n = 10,591) using three principal components, colored by population. (D) Uniform Manifold Approximation and Projection (UMAP) plot of HCHS/SOL and the larger reference panel (n = 11,567) using three principal components, colored by HCHS/SOL population (E) Uniform Manifold Approximation and Projection (UMAP) plot of HCHS/SOL and the larger reference panel (n = 11,567) using three principal components, colored by African population (F) Uniform Manifold Approximation and Projection (UMAP) plot of HCHS/SOL and the larger reference panel (n = 11,567) using three principal components, colored by Amerindigenous population (G) Uniform Manifold Approximation and Projection (UMAP) plot of HCHS/SOL and the larger reference panel (n = 11,567) using three principal components, colored by European population.

Dynamic global ancestry proportions in Mexican Americans

For each of the HCHS/SOL populations, we evaluated differences in global ancestry estimates over time while accounting for the sampling method (referred to as ‘sampling weight’, see Materials and methods) used for the design of the HCHS/SOL study (Sorlie et al., 2010). We found that in all populations, the effect size for AI ancestry on birth year is positive, though only statistically significant after multiple testing in the Mexican American (β=0.0023; 95% CI:0.0021–0.0025, p=3.58E-22; Figure 2A–B) and Central American (β=0.0013; 95% CI:0.0009–0.0017, p=0.0013) cohorts (Supplementary file 1). Due to the larger sample size, magnitude of the effect, and statistical significance, we shift our focus to Mexican Americans. In Mexican Americans, the increase in AI global ancestry over time was consistent across multiple data stratifications including recruitment region, US-born or not US-born, educational attainment, and gender (Table 1 and Supplementary file 2), and was robust to alternative methods for estimating global ancestry proportions (e.g. based on the summation of RFMix local ancestry estimates; Figure 2—figure supplements 1 and 2). We identified significant differences in AI ancestry between recruitment region (t-test, 95% CI:0.12–0.15, p<2.2E-16), US-born or not US-born individuals (t-test, 95% CI:0.06–0.09, p<2.2E-16), and educational attainment, which can be considered a proxy for socioeconomic status (one-way ANOVA, p<2E-16). In order to further assess changes in global ancestry distributions over time, we performed bootstrap resampling over individuals (n = 1000) of global AI ancestry for the Mexican Americans. We observed a consistent increase in AI ancestry with fitted locally estimated scatterplot smoothing (LOESS; Figure 2B) when individuals were binned by birth year decades (Figure 2—figure supplement 3). On average, global AI ancestry has increased ~20% over the 50 year period for Mexican Americans born from 1940 to 1990.

Figure 2. Amerindigenous ancestry has increased over time in Mexican Americans.

(A) Global Amerindigenous ancestry proportions plotted by birth year for Mexican Americans (n = 3,622). Fitted line is multiple regression of Amerindigenous ~ birth year + sampling weight. Bars represent 95% confidence intervals for individuals grouped by decade. (B) Bootstrap resampling (n = 1000 iterations) of Amerindigenous global ancestry for the Mexican American individuals with a fitted LOESS curve for each iteration. Dashed lines represent the 95% quantile range of LOESS curves and the blue line represents the fitted regression line from A.

Figure 2.

Figure 2—figure supplement 1. Concordance of ADMIXTURE and RFMix global ancestry estimates.

Figure 2—figure supplement 1.

(A) Amerindigenous ancestry (B) African ancestry and (C) European ancestry.
Figure 2—figure supplement 2. Amerindigenous ancestry has increased over time in Mexican Americans.

Figure 2—figure supplement 2.

RFMix inferred Amerindigenous (AI) global ancestry proportions plotted by birth year for Mexican Americans (n = 3,622). Fitted line is multiple regression of AI global ancestry ~birth year + sampling weight (β=0.0022; SE = 0.0002, p<2E-16). Bars represent 95% confidence intervals for individuals grouped by decade.
Figure 2—figure supplement 3. Distributions of Amerindigenous global ancestry means for HCHS/SOL Mexican Americans (n = 3622) generated by 1000 bootstrap resampling iterations within each decade of binned birth years.

Figure 2—figure supplement 3.

Figure 2—figure supplement 4. Replication in the Health and Retirement Study for 705 self-identified Mexican Americans.

Figure 2—figure supplement 4.

(A) Ancestry over time (B) Distribution of regression slopes after 1000 bootstrap resampling iterations (C) Distribution of bootstrap regression p-values (D) ECDF of bootstrap regression p-values.
Figure 2—figure supplement 5. The increase in estimated AI ancestry over time is conditional on the number of US-born parents.

Figure 2—figure supplement 5.

AI ancestry vs birth year with interaction between birth year and number of US-born parents for 634 HCHS/SOL Mexicans.

Table 1. Relationship of Amerindigenous global ancestry and birth year for Mexican Americans stratified by recruitment region, US-born vs non-US-born status, gender and educational attainment.

For recruitment region, data stratification was limited to Chicago and San Diego as sample size for the Bronx and Miami was limited: 124 and 25 individuals, respectively. Education attainment was categorized as either less than a high school diploma or equivalent degree (<HS), equal to a high school diploma or equivalent degree (=HS), or post-secondary education (>HS). The significance threshold was set at 0.006 using Bonferroni correction for multiple testing (0.05/9).

Category N Mean Median R2 Effect Std.err p
All 3622 0.489 0.468 0.027 0.0023 0.0002 3.58E-22
Chicago 1310 0.562 0.550 0.017 0.0016 0.0005 0.0006
San Diego 2163 0.428 0.422 0.012 0.0012 0.0002 4.29E-07
US-born 634 0.427 0.418 0.063 0.0027 0.0004 1.77E-10
Non US-born 2987 0.502 0.481 0.050 0.0032 0.0003 1.38E-30
Male 1500 0.494 0.475 0.038 0.0028 0.0004 3.83E-14
Female 2122 0.485 0.462 0.022 0.0019 0.0003 3.07E-10
<HS 1518 0.520 0.500 0.045 0.0026 0.0004 1.39E-12
= HS 960 0.501 0.479 0.022 0.0018 0.0005 0.0003
>HS 1140 0.436 0.422 0.045 0.0027 0.0004 6.53E-13

We replicated the increase in global AI ancestry over time in a smaller, independent cohort of self-identified Mexican Americans (n = 705) from the Health and Retirement Study (HRS) (Fisher and Ryan, 2018). The HRS Mexican Americans are older compared to the HCHS/SOL Mexican Americans (birth year distribution: 1915–1981; mean = 1943, median:1942) and have lower levels of global AI ancestry on average (mean = 0.29), but we still observed an increase in global AI ancestry over time (β=0.00082; 95% CI: 0.0005–0.0012; p=0.02; Figure 2—figure supplement 4A). We performed 1000 bootstrap resampling iterations of the linear regression model (global AI ancestry ~birth year) fitted to the data. From these resampling iterations, 98.2% of the tests had a slope >0% and 61.5% of the regression p-values were less than 0.05 (Figure 2—figure supplement 4B–4D).

A previous study (Baharian et al., 2016) identified ancestry biased migration in African Americans where individuals with higher proportions of European ancestry migrated first out of the South during the Great Migration followed by individuals with higher proportions of African ancestry. We hypothesized that a similar process occurred in US Hispanic/Latino populations, whereby earlier immigrants to the US had higher proportions of European ancestry followed by recent immigrants having higher proportions of global AI ancestry. In our non-US-born individuals (N = 2987), we evaluated differences in ancestry estimates over time while accounting for years in the US and sampling weight and identified a significant effect of years in the US (β=−0.0009; 95% CI: −0.0012, −0.0006; p=0.0006) suggesting that individuals who arrived earlier to the US had less AI ancestry. However, accounting for this did not change the effect of birth year on the proportion of global AI ancestry (β=0.0028; 95% CI: 0.0025–0.0031; p<2E-16) suggesting that ancestry biased migration does not fully explain the dynamic AI ancestry patterns we have inferred.

For US-born individuals we assessed whether parental birthplace could explain the increases in global AI ancestry. Of the 634 US-born individuals, 385 had parents both born outside of the US, 149 had one parent born outside of the US, and 97 had both parents born within the US. We tested a model with an interaction between estimated birth year and the number of parents born in the United States. We found a strong positive relationship between estimated birth year and increase in AI ancestry for those with both parents born outside the US, who formed the baseline group in this model (β=0.004; 95% CI: 0.0034-0.0046; p=4.85e-12) (Figure 2—figure supplement 5). The relationship between estimated birth year and AI ancestry for those with one parent born in the US was still positive but smaller when the effect size was added to the baseline mean (β=-0.0034; 95% CI: -0.0043, -0.0025; p=0.000123) and for those with both parents born in the US the relationship was overall negative (β=-0.0049; 95% CI: -0.006, -0.0038; p=1.04e-5).

Little evidence for subcontinental population structure

We explored whether the increase in global AI ancestry over time could occur in tandem with local changes in the specific subcontinental AI ancestries over time. If it were the case, then we would expect subtle signals of genetic divergence in AI ancestry tracts over time. To investigate this, we calculated FST within AI ancestry tracts between all pairs of birth-decades (see Materials and methods). Figure 3—figure supplement 1 shows all pairwise comparisons among birth-decades, and demonstrates that while the estimates of FST are negligible (with many estimates below 0), there is a subtle trend of increasing FST as birth-decade differences increase (though individuals born in the 80 s and 90 s show a conflicting pattern).

We further investigated patterns of subcontinental population structure using genetic diversity, π, in AI ancestry tracts for each birth-decade (see Materials and methods). We hypothesized that if there were increased migration from multiple AI source populations (coupled with rapid population growth in Mexican American communities), then genetic diversity should be increasing over time. We found the opposite: Figure 3A shows a subtle decrease in genetic diversity (π) over time from the 1930s to the 1980s in non-US-born Mexican Americans, and a subtle decrease in US-born Mexican Americans from the 70 s to the 90 s (while remaining roughly constant from the 30 s to the 70 s).

Figure 3. Architecture of genetic diversity in Mexican American Genomes.

(A) Genetic diversity (π) in Amerindigenous ancestry tracts stratified by US-born/not US-born status, and calculated between pairs of individuals born within each decade (with shaded envelopes showing 95% confidence intervals for each group). (B) Proportion of total Amerindigenous (AI) ancestral tracts in the HCHS/SOL Mexican American population by decade. (C) Variation in ROH by birth year. Solid lines show LOESS of the proportion of the genome with AI ancestry that overlap ROH of different lengths, while dotted lines show LOESS of the proportion of the genome with European ancestry that overlap ROH of different lengths. (D) Scatter plot of parents’ inferred global Amerindigenous (AI) ancestries using ANCESTOR.

Figure 3.

Figure 3—figure supplement 1. FST within Amerindigenous ancestral tracts.

Figure 3—figure supplement 1.

FST estimates calculated between each decade group. Bars represent the 95% CI.
Figure 3—figure supplement 2. Admixture mapping in HCHS/SOL Mexicans (n = 3622) for Amerindigenous ancestry and (A) birth year and (B) generation.

Figure 3—figure supplement 2.

Ancestry association testing was performed at 211,151 markers using (A) linear regression and (B) logistic regression, both including global Amerindigenous ancestry, sampling weight and center as covariates.
Figure 3—figure supplement 3. Runs of homozygosity (ROH) in HCHS/SOL Mexican Americans.

Figure 3—figure supplement 3.

(A) ROH (summed per person) across all ancestries separated by ROH class (B) ROH (summed per person) overlapping Amerindigenous (AI) haplotypes separated by ROH class. (C) Proportion of the genome covered by total AI ROH separated by ROH class.
Figure 3—figure supplement 4. Ancestry-related assortative mating in HCHS/SOL Mexican Americans.

Figure 3—figure supplement 4.

Each distribution represents the difference in inferred parental Amerindigenous (AI) ancestry for each decade for (A) All (B) US-born and (c) Non-US-born. Within each segment is the correlation of parents inferred AI ancestry. Parental ancestry was inferred using ANCESTOR.
Figure 3—figure supplement 5. Standard neutral model simulations result in no change in ancestry proportions over time.

Figure 3—figure supplement 5.

Blue lines show forward simulations while gray lines reproduce the LOESS curves from the observed data shown in Figure 2B.
Figure 3—figure supplement 6. Population growth does not affect the mean ancestry proportions in a population.

Figure 3—figure supplement 6.

(A) The exponential growth rates evaluated. (B–E) The effect of increasing growth rates G={0, 0.1, 0.5, 1} on ancestry proportions.
Figure 3—figure supplement 7. Ancestry-based assortative mating does not change mean ancestry proportions, though variance in ancestry proportions can increase.

Figure 3—figure supplement 7.

(A–B) Low effect of assortative mating, (C–D) moderate effect (with similar correlation to that seen in HCHS Mexican Americans, see main text), and (E–F) extreme assortative mating.
Figure 3—figure supplement 8. Ancestry-based fecundity differences can induce systematic changes in ancestry proportions in a population.

Figure 3—figure supplement 8.

(A) We model the probability of reproducing (‘Prob Reprod’) using a Beta distribution over the ranked ancestry proportions in the population using parameter FAI. (B–E) As ancestry-based fecundity increases, the mean ancestry proportion in the population increases. (F–I) Ancestry-based assortative mating magnifies the effects of ancestry-based fecundity differences (here AM = 0.75, see Figure 3—figure supplement 7).
Figure 3—figure supplement 9. The ancestry proportions in the migrant population are modeled as a Beta distribution, with mean given by a weighted average between the domestic population at one with weight mAI.

Figure 3—figure supplement 9.

When mAI=0, migrants have the same distribution of ancestry proportions as the domestic population. When mAI=1, all migrants have 100% Amerindigenous ancestry.
Figure 3—figure supplement 10. Simulating the effects of migration on changing ancestry proportions.

Figure 3—figure supplement 10.

We show how the ancestry proportions in the domestic population change as we increase M (the probability that a new individual is migrant) and mAI (the parameter that governs the ancestry proportions in the migrant population, see Figure 3—figure supplement 9).
Figure 3—figure supplement 11. Similar to Figure 3—figure supplement 10, but adding assortative mating (AM=0.75, consistent with our data) and ancestry-based fecundity differences (FAI=0.1, see Figure 3—figure supplement 8A).

Figure 3—figure supplement 11.

Figure 3—figure supplement 12. Similar to Figure 3—figure supplement 10, but adding assortative mating (AM=0.75, consistent with our data) and ancestry-based fecundity differences (FAI=0.2, see Figure 3—figure supplement 8A).

Figure 3—figure supplement 12.

Figure 3—figure supplement 13. Similar to Figure 3—figure supplement 10, but adding assortative mating (AM=0.75, consistent with our data) and ancestry-based fecundity differences (FAI=0.4, see Figure 3—figure supplement 8A).

Figure 3—figure supplement 13.

Figure 3—figure supplement 14. Similar to Figure 3—figure supplement 10, but adding assortative mating (AM=0.75, consistent with our data) and ancestry-based fecundity differences (FAI=0.8, see Figure 3—figure supplement 8A).

Figure 3—figure supplement 14.

AI ancestry tract lengths have not changed, but runs of homozygosity (ROH) have increased

If there was a rapid increase in the migration of individuals with high AI ancestry, we would expect to see an increase in long AI tracts over time. To test this, we calculated the length of each RFMix inferred local ancestry tract in each Mexican American individual and tested for differences in the distribution of tract lengths across birth-decades using a multiple linear regression model (see Materials and methods). We found no significant associations between the decade bin and the proportion of AI ancestral tracts at various lengths (Figure 3B; β = 0.04, CI = (−0.019–0.099); p=0.19), even when testing for violations of model assumptions (e.g. normalizing the tracts per bin by the number of individuals, or excluding the 1930s and/or 1990s individuals due to the small sample size in each bin).

While there are no statistical differences in the length of admixture tracts, it is possible that local ancestry tracts have accumulated in specific regions of the genome to drive the increased global ancestry proportions over time. We used local ancestry estimates generated across the genome to perform admixture mapping in HCHS/SOL Mexican Americans to determine if younger individuals harbored excess AI ancestry in certain regions of the genome. Although we tried two different models (see Materials and methods), we did not find any loci to be significantly associated with birth year across the genome (Figure 3—figure supplement 2).

We find that there are no changes in AI ancestry tract lengths over time nor any regions of the genome that seem to be accumulating AI ancestry at disproportionate rates, yet genetic diversity has decreased over time in the AI ancestry tracts of Mexican Americans despite rapid growth of the census population size. We therefore investigated whether this population has experienced increased haplotype homozygosity over time. We investigated this possibility by exploring runs of homozygosity (ROH) across the genomes of each of the 3622 Mexican Americans. We classified ROH into three categories: short, medium, and long, based on the length distribution in the population. Generally, short ROH are tens of kilobases in length and likely reflect the homozygosity of old haplotypes; medium ROH are hundreds of kilobases in length and likely reflect background relatedness in the population; and long ROH are hundreds of kilobases to several megabases in length and are likely the result of recent parental relatedness. Overall, we find a significant positive correlation between birth year and the total ROH (summed across size classes; τ = 0.0449, p=6.12e-5, Kendall’s rank correlation), but this signal becomes stronger when we restrict our analysis to ROH calls that overlap AI ancestry tracts (τ=0.065, p=7.39e-9). Figure 3C shows a fitted LOESS curve to the proportion of the genome with AI (or European) ancestry covered by ROH across the genomes of Mexican Americans as a function of their birth year, broken down by ROH size class (see Figure 3—figure supplement 3 for the distribution of ROH by length classes and ancestry). When stratified by size class and normalized by AI global ancestry, the associations (all Kendall’s rank correlation) in AI ROH were primarily driven by the short (τ=0.097, p<2.2E-16), and medium (τ=0.084, p=1.27E-13) size classes (while long ROH was insignificant after multiple testing due to the small number of long ROH across individuals; τ = −0.032, p=0.004). We observed the opposite pattern when ROH were restricted to European ancestry segments of the genome: there is a significant negative correlation between birth year and the total ROH that overlap European ancestry tracts (τ=−0.089, p=1.82E-14).

Strong ancestry-related assortative mating in HCHS/SOL Mexicans

Given that short and medium length ROH have increased over time, it appears that background relatedness within AI ancestry in Mexican Americans has increased over time (but not an increase in recent parental relatedness). One way for this to occur is if individuals with similar ancestry patterns tend to mate with one another more often than expected under a model of random mating (i.e. assortative mating). To measure assortative mating, we estimated the ancestral proportions of the biological parents of each HCHS/SOL Mexican American (see Materials and methods). With individuals from all decades pooled together, we found the inferred biological parental AI ancestries to be significantly correlated (Figure 3D, r = 0.708, 95% CI:0.69–0.72, p<2.2E-16, Pearson correlation). When stratified by decade, the distributions of the difference in parental AI overlap each other and the correlation in inferred parental AI global ancestry ranged from 0.65 to 0.74 (Figure 3—figure supplement 4), but were not statistically different from each other. This shows a consistent pattern of strong parental ancestry correlations among Mexican Americans over different generations. This signature of assortative mating is not due to recent parental relatedness, because there is no trend in long ROH with birth year (and an overall low rate of long ROH among Mexican Americans).

Population genetic factors affecting changes in ancestry proportions over time

We developed a Moran model (Moran, 1958) style simulator to evaluate how migration, assortative mating, population growth, and variance in reproduction affect ancestry proportions over the timescale shown in Figure 2 (for details, see Materials and methods). Briefly, a Moran model is a forward simulation approach whereby each iteration, a single individual is replaced by another individual through a process of choosing parents. For a population of size N individuals, it takes N steps to simulate a single generation, and as such the Moran model is commonly used to represent overlapping generations.

In our simulations, the initial mean AI ancestry proportion was set at 0.42, and two generations were simulated (assuming ~26 years/generation) (Moorjani et al., 2016). Each iteration incorporated population growth, assortative mating, ancestry-based fecundity differences and migration. Simulating a standard neutral model of random mating and constant population size showed no change in ancestry proportions over time (Figure 3—figure supplement 5).

Population growth affects diversity by increasing number and proportion of variants that are rare, and decreasing the rate of genetic drift. While population growth can intensify the strength of natural selection (causing deleterious alleles to decrease in frequency and adaptive alleles to increase in frequency), population growth does not cause systematic changes in the frequency of segregating neutral alleles. Similarly, including population growth did not affect the mean ancestry proportion in a population (Figure 3—figure supplement 6).

In our simulations, we specified the assortative mating parameter to range from 0 (random mating) to 1 (parents are chosen as nearest neighbors when sorted by ancestry proportions). Ancestry-based assortative mating can lead to increased ROH and decreased genetic diversity (see Figure 3), but because mating occurs from individuals proportionally across the ancestry spectrum, ancestry-based assortative mating does not induce any changes in mean ancestry proportions in the population (albeit with slight increase in variance in ancestry proportions over time, Figure 3—figure supplement 7). Note that AM=0.75 results in a correlation in parental ancestry proportions similar to our observed data.

There are many social and cultural properties that result in variance in fecundity within and between populations. Some of these factors may be correlated with genomic ancestry proportions. We tested whether ancestry-based fecundity differences could induce changes in mean ancestry proportions, and how strong the fecundity differences had to be to induce an effect similar to what we see in the data. To simulate this process, we sampled individuals to reproduce based on their ancestry proportion using a Beta(1, 1(1+FAI)) distribution, where FAI=0 induces a uniform distribution (i.e. no ancestry-based fecundity differences) and FAI=1 induces a strong ancestry-based fecundity difference (Figure 3—figure supplement 8A). Ancestry-based fecundity differences can induce systematic changes in ancestry proportions in the population (Figure 3—figure supplement 8B–E), but we are unaware of estimates of this effect in Mexican Americans. Further, ancestry-based assortative mating can magnify the effects of ancestry-based fecundity differences (Figure 3—figure supplement 8F–I). The joint effects of strong ancestry-related assortative mating (AM=0.75) and fecundity differences (FAI=0.8) results in a change in ancestry proportions over time similar to our observed data (Figure 3—figure supplement 8I).

While migration cannot explain all the changes in ancestry proportions we report, it is clearly a contributor. To model migration, we specified two parameters mAI: a parameter affecting migrant Amerindigenous ancestry proportions (Figure 3—figure supplement 9), and M: the probability that a new individual is a migrant. We simulated the joint effects of these parameters (Figure 3—figure supplement 10) and added the effects of ancestry-related assortative mating (AM=0.75) and increasing degrees of ancestry-related differences in fecundity (Figure 3—figure supplements 1114: FAI={0.1, 0.2, 0.4, 0.8}, respectively). We find a large number of parameter combinations that are consistent with our observed ancestry trends in Mexican Americans.

Genetic association of global AI ancestry with biomedical traits

We have shown that genetic variation patterns changed over time in the Mexican American population, with AI ancestry increasing over a short period of time (combined with decreased genetic diversity and increased short and medium length ROH within AI ancestry tracts). These features may have implications for the genetic architecture of complex traits within Mexican Americans, a topic that is understudied and poorly understood. To further our understanding of the genetic architecture of complex traits in Mexican Americans, we investigated the relationship between AI ancestry and 69 biomedical phenotypes (while controlling for several environmental and other factors; see Materials and methods). As illustrated in Figure 4A, 18 of these traits (26%) are significantly associated with AI ancestry (Bonferroni correction p<6.6E-5) after adjusting for several factors including birth year, center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents. While this suggests that genetic ancestry has an effect on several traits, other unmodeled socio-economic variables that are correlated with AI ancestry may also be contributing to these patterns (though AI ancestry has among the strongest effects on a range of biomedical traits, comparable to the effects of gender; Figure 4—figure supplement 1). Regardless, these findings highlight the need for increased investigation into the role of AI genetic ancestry in admixed populations such as Mexican Americans.

Figure 4. Global Amerindigenous ancestry and biomedical traits in HCHS/SOL Mexican Americans.

(A) The effect size of global AI ancestry on each of 69 quantile normalized traits (see Materials and methods) while controlling for birth year, center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents. (B–C) The relationship between (B) Birth year and height and (C) Height and polygenic height score (PHS). The black line indicates the fitted linear model for all individuals. Each color represents a different quartile of Amerindigenous global ancestry. Polygenic height scores were assessed utilizing UKBB summary statistics for 1,078 SNPs.

Figure 4.

Figure 4—figure supplement 1. Distribution of variable effects associated with quantile normalized traits.

Figure 4—figure supplement 1.

For 69 biomedical traits we used a multiple linear regression model to analyze the effects of global AI ancestry on each trait while controlling for birth year, center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents. Variables significantly associated with the traits (Bonferroni correction p<6.6E-5) are highlighted in red.
Figure 4—figure supplement 2. Comparison of allele frequencies used in polygenic height score calculations.

Figure 4—figure supplement 2.

Plotted are the allele frequencies of the non-reference allele in UKBB vs. 1000 Genomes Americas (AMR) population for the 1078 SNPs used to calculate the polygenic height score for the HCHS/SOL Mexican Americans. Colors indicated whether the non-reference allele has a positive or negative effect.
Figure 4—figure supplement 3. Polygenic height scores over time.

Figure 4—figure supplement 3.

The relationship between birth year and polygenic height score; the black line indicates the fitted linear model for all individuals. Each color represents a different quartile of Amerindigenous global ancestry. Polygenic height scores were assessed utilizing UKBB summary statistics for 1,078 SNPs.
Figure 4—figure supplement 4. Correlation of 69 p-values for Amerindigenous effect sizes of untransformed vs quantile normalized traits.

Figure 4—figure supplement 4.

Assessing the genetic contribution of AI ancestry to height

Among the traits we tested for association with global AI ancestry, height had the strongest effect. Further, our regression model indicated that height also had a strong positive relationship with birth year (Supplementary file 3). Globally, populations have grown taller over time due to a variety of non-genetic, environmental factors (NCD Risk Factor Collaboration (NCD-RisC), 2016). We find a similar trend in the HCHS/SOL Mexican Americans (β=0.096, 95% CI:0.077–0.114; p=5.95E-23) (Figure 4B and Supplementary file 4). Indeed, when we stratified individuals by quartiles of global AI ancestry, we see that all quartiles have increased in height by a similar amount over the period investigated (though individuals with lower AI ancestry were taller on average). The rates of change in height between AI quartiles were all positive and significant (p<5e-6). The largest was for the quartile with the highest AI ancestry, but the rates did not change monotonically with respect to AI ancestry across quartiles. The estimates for the quartiles with their 95% CIs are: β=0.135 (CI:0.097–0.173) for AI >0.58; β=0.124 (CI:0.089–0.160) for 0.46 <= AI <= 0.58; β=0.083 (CI:0.047–0.119) for 0.37 <= AI <= 0.46; and β=0.113 (CI:0.074–0.151) for AI <0.37 (Supplementary file 4).

Height is one of the most highly studied complex traits, with GWAS sample sizes numbering in the hundreds of thousands (Yengo et al., 2018). Results for many of these studies have been made readily available on public databases as summary association statistics that can be leveraged to build genetic predictions through polygenic risk scores (PRS) (Pasaniuc and Price, 2017). In Europeans, PRS have been shown to have great predictive power for several traits, including breast cancer, prostate cancer, and type 1 diabetes (Maas et al., 2016Sharp et al., 2019Maas et al., 2016Schumacher et al., 2018). PRS are most effective in populations of European descent as GWAS studies have been primarily performed in these populations (Bustamante et al., 2011; Martin et al., 2019; Popejoy and Fullerton, 2016) and are expected to be biased when applied to other populations due to differences in the genetic architecture of traits across diverse populations (Martin et al., 2017). Since Mexican Americans have some fraction of European ancestry, we sought to determine whether PRS calculated utilizing GWAS summary statistics from European populations could still provide useful insight.

To evaluate the effectiveness of a PRS for height calculated based on 1078 genome-wide SNPs selected from the UKBB GWAS of height (i.e. the polygenic height score, or PHS, see Materials and methods), we first tested whether there was an association between the observed height and the predicted height estimates while controlling for sampling weight, gender, recruitment center, educational attainment, US-born status and number of US-born parents (see Materials and methods). Allele frequencies for these SNPs between the 1000 Genomes Americas Superpopulation and UKBB showed good concordance (Figure 4—figure supplement 2, r = 0.93, 95% CI:0.92–0.94, p<2.2E-16, Pearson correlation). We identified a significant association between observed height and predicted height for the population as a whole (β=0.004, 95% CI: 0.0023–0.005; p=9.91E-8; Figure 4C, Supplementary file 5). However, when we stratified by quartiles of AI global ancestry, the association only remained for the individuals in the lower two quartiles of global AI ancestry proportions (AI < 0.37: β=0.005, 95% CI:0.0022–0.0076; p=4.42E-4 and 0.36 < AI < 0.46: β=0.006, 95% CI: 0.0032–0.009; p=4.38E-5, Supplementary file 5). The association between predicted height and observed height was no longer significant for individuals in the upper two quartiles of global AI ancestry proportions (0.46 < AI < 0.58: β=0.0007, p=0.6 and 0.58 < AI: β=0.003, p=0.08, Supplementary file 5).

As we found global AI ancestry to be increasing over time (and there is a strong association between observed height and both AI as well as birth year), we hypothesized that there would be a change in PHS over time as well. However, we did not find a significant effect of birth year on PHS (Figure 4—figure supplement 3; p=0.14) even when we stratified by the quartiles of global AI ancestry.

Discussion

The United States is a dynamic, rapidly changing population, and this will continue to occur as the population size grows (Colby and Ortman, 2015). Hispanics/Latinos are the largest and fastest growing minority group, and are projected to comprise ~29% of the US population by 2060. They are a genetically and phenotypically diverse population as a result of extensive admixture between Amerindigenous populations and immigrants from multiple geographic locations around the world. In this study, we identified additional population substructure complexities that may contribute to phenotypic variation within Hispanics/Latinos.

Specifically, we demonstrated how the admixture composition of Mexican Americans have changed over time, resulting in an increase of ~20% Amerindigenous ancestry on average over the 50 year period studied. This change in ancestry is equivalent to a mean increase in Amerindigenous ancestry of ~0.4% per year. While the effect sizes vary to some extent, we replicate the underlying pattern across multiple data stratifications (two metropolitan cities, US-born and non-US-born) and also replicate this feature in an independent cohort of Mexican Americans. Further, we find that a similar trend holds across multiple self-identified Hispanic/Latino populations in the US (and is statistically significant in Central Americans). This effect does not appear to have a simple explanation: we do not see any statistically significant increases in Amerindigenous ancestry at individual loci, we do not see more than a negligible degree of population differentiation over time, and this increase cannot be entirely explained by very recent migration based on our analyses of non-US-born individuals.

What could be driving the increased Amerindigenous ancestry in Mexican Americans? We hypothesize that several population, cultural, and environmental factors operating in unison have altered the genetic architecture of Mexican Americans. First, we identify strong ancestry-based assortative mating. However, while assortative mating could explain the increased ROH and decreased genetic diversity we inferred over time, ancestry-based assortative mating alone should not result in mean changes in global ancestry proportions (since a proportional number of offspring should derive from high- versus low-Amerindigenous ancestry parents, see simulations in Figure 3—figure supplement 7). Second, we do infer a subtle increase in Amerindigenous ancestry among individuals who migrated to the US more recently than individuals who migrated earlier. Independent analyses have shown that migration from Mexico to the US has shifted over the years from states with less Amerindigenous ancestry to states with higher Amerindigenous ancestry (Moreno-Estrada et al., 2014; Terrazas, 2010). However, these subtle shifts in recent migration cannot fully explain the changes in Amerindigenous ancestry we infer, and taking them into account in our statistical model did not change the effect size that birth year has on Amerindigenous ancestry over time. Third, from US census data, we know that Hispanic/Latino is the fastest growing ethnicity (with Mexican Americans constituting the plurality). However, similar to assortative mating, population size changes alone should not drive mean changes in global ancestry proportions (Figure 3—figure supplement 6). While none of these factors alone can adequately explain the temporal dynamics of Amerindigenous ancestry we have observed, simulations of the joint effects of all of these factors operating in unison can indeed drive substantial changes in global ancestry patterns (Figure 3—figure supplements 514). However, more research is necessary to understand which parameters are consistent with the continuum of Mexican American populations across the US.

Regardless of the underlying mechanisms driving increased Amerindigenous ancestry in Mexican Americans, this additional source of temporal substructure within this population has substantial consequences for phenotypic variation in biomedical traits. We identify several biomedical traits that are associated with Amerindigenous ancestry, with effects comparable to the high effects of gender, and show that in the case of height, there are both ancestry and temporal effects. While we do see differences in mean height based on percentage of AI ancestry, height increases over time in all groups at similar rates. Individuals with lower percentages of AI ancestry were taller on average than individuals with higher AI ancestry pointing to the role of AI ancestry on the trait. Further study is necessary to understand whether other biomedical traits are also changing over time as a result of the change in genomic ancestry proportions, and the degree to which other socio-economic factors independently drive both ancestry patterns as well as biomedical traits.

In our study, we bring specific attention to the biases that continue to exist with using European GWAS summary statistics to calculate polygenic risk scores in admixed populations such as Mexican Americans that are comprised of European, Amerindigenous, and African genetic ancestries. In particular, in the case of height, we found that the polygenic height score (PHS) correlated with observed height only in the subset of individuals with the lowest levels of Amerindigenous ancestry (i.e. the subset of individuals with highest European ancestry). As the population dynamics of the US continue to change, it is imperative that we study diverse populations, or we risk exacerbating the health disparities that currently exist. To date, population-based medical genomics research (and its subsequent benefits) have been disproportionately focused on populations of European ancestry. In order to improve the design and implementation of medical genetics studies for the ethnically diverse U.S. population, we need detailed insights into the population history of diverse U.S. populations. This includes characterizing the admixture dynamics of Hispanic/Latino populations, as well as the evolutionary forces that shaped patterns of genetic variation of the ancestral populations that contributed to modern day Hispanic/Latino populations.

The genetic variation of the Hispanic community in the United States belies categorization under a single label (Conomos et al., 2016). The events that have shaped and continue to shape this genetic diversity are complex, numerous, and nuanced, and the social history of such a diverse population is intrinsic to any genetic study. Mexico’s society was largely defined by an established social caste system based on ancestry, which disappeared after Mexico’s independence in 1821 (Lisker et al., 1990). Even so, social inequalities persist today with skin color having a significant effect on wealth and education (Martinez, 2017). A multitude of factors within and outside Mexico — whether related to trade, immigration policies, or armed conflicts — acted to influence who immigrated to the United States, and the impact of each of these fluctuates over time (Contreras, 2014Verea and Verea, 2014Fernández-Kelly and Massey, 2007). These changes shift the demographics of immigration, which is inherently related to the genetic ancestry of the population.

Consequently, this shapes the genetic architecture of complex traits. Diverse populations are at risk not only from underrepresentation in research, but because of poor understanding of the temporal and spatial dynamics at play in genetic variation. The promise of equitable precision medicine — one of the ultimate goals of medical genomics — cannot be kept without understanding this interplay. Health disparities in the United States are fed by structural inequalities. For example, studies that use modern Artificial Intelligence techniques have already been shown to inflate existing disparities between Black Americans and White Americans (Obermeyer et al., 2019). Such biases, whether from algorithms, study designs, or misunderstandings of subtleties in data, feed into the larger systemic pressures faced by minority populations in the United States.

While we have shown a dramatic shift in ancestry proportions in US Hispanic/Latinos, one of the caveats of this study is that the HCHS/SOL cohort is not representative of all US Hispanics/Latinos. HCHS/SOL participants were recruited at four primary centers: Bronx, Chicago, Miami, and San Diego. There may be additional genetic diversity that has not been captured by this dataset and trends exhibited in this dataset may not translate to Hispanic/Latino populations living in other regions of the US (though the temporal increase in Amerindigenous ancestry was replicated in an independent sample of Mexican Americans). Further, we have only assembled a reference panel with limited numbers of individuals with various Amerindigenous, European, and African ancestry. With better population genetic modeling and a deeper understanding of the social and historical aspects of Hispanic/Latino populations, we will be able to improve our understanding of the genetic and phenotypic diversity across these populations, and subsequently improve our ability to understand genetic contributions to complex traits and disease. These insights will lead to optimization of population sampling for the design of future medical genetic studies, the identification of disease risk variants, and ultimately, precision medicine for all.

Materials and methods

Study dataset and initial quality control

The HCHS/SOL study is a community-based cohort study of self-identified Hispanic/Latino individuals from four US metropolitan areas with the general goal of identifying risk and protective factors for various medical conditions including cardiovascular disease, diabetes, pulmonary disease, and sleep disorders (Sorlie et al., 2010). The sample survey for design for HCHS/SOL has been described previously (LaVange et al., 2010). Briefly, census block groups were selected in defined communities near each of the four recruitment centers, and households were sampled within census block groups. Households with Hispanic/Latino surnames and individuals as well as residents over 45 years old were oversampled in order to increase representation of the Hispanic/Latino target population and achieve a uniform age distribution. Sampling weights were calculated for each individual to reflect the probability of sampling (Conomos et al., 2016). 12,434 participants with birth year estimates between 1934–1993 who self-identified as being of Cuban, Dominican, Puerto Rican, Mexican, Central American, or South American background consented to genetics studies and posting of their genetic and phenotype data on the publicly available Database of Genotypes and Phenotypes (dbGaP) through Study Accession phs000810.v1.p1. Samples were genotyped on an Illumina custom array, SoL HCHS Custom 15041502 array (annotation B3, genome build 37), consisting of the Illumina Omni 2.5M array and 148,353 custom single nucleotide polymorphisms (SNPs) (Conomos et al., 2016). Data posted to dbGaP had passed initial sample quality control filters, including removing samples with differences in reported vs. genetic sex, call rates > 95%, and evidence for sample contamination (e.g. heterozygosity and sample call rates). For initial SNP quality control, we filtered out SNPs that were monomorphic, positional duplicates, or Illumina technical failures, as well as SNPs that had cluster separation <= 0.3, call rate <= 2%,>2 disconcordant calls in 291 duplicate samples,>3 Mendelian errors in parent-offspring pairs/trios, Hardy-Weinberg Equilibrium combined p-value<10−5, and sex differences in allele frequency ≥0.2. Our filtering resulted in 1,763,935 genotyped SNPs with minor allele frequency (MAF) >0.01.

Additional sample quality control performed in the HCHS/SOL dataset included filtering out samples with (1) large chromosomal anomalies, (2) substantial Asian ancestry as previously identified in HCHS/SOL (Conomos et al., 2016) and (3) individuals with up to third degree genetic relatedness in the dataset as inferred by REAP (Thornton et al., 2012). For genetic relatedness filtering, individuals from pairs were kept to maximize representation of the birth year distribution, which resulted in 10,268 unrelated remaining individuals.

From the original HCHS/SOL analysis, individuals were classified into genetic-analysis groups, similar to self-identified background groups in that they share cultural and environmental characteristics, but are also more genetically homogenous (Conomos et al., 2016).

Birth year for all individuals was estimated by subtracting the difference between date of first clinic visit for the baseline examination (Sorlie et al., 2010) and age. Year of arrival was estimated by subtracting the difference between date of first clinic visit for the baseline examination and years in the US.

Global, local, and parental ancestry inference

All ancestry analyses were restricted to the 211,152 autosomal SNP markers that overlapped between the study and reference panel genotyping array. For the HCHS/SOL dataset, global African, European, and Amerindigenous ancestries were inferred with ADMIXTURE (Alexander et al., 2009); in an unsupervised manner, with K = 3. Amerindigenous ancestry refers to estimates of Indigenous genetic ancestry from the Americas. For some analyses, HCHS/SOL individuals with greater than 95% of a single ancestry (e.g African, European, or Amerindigenous) were filtered out resulting in 9913 individuals: 1,099 Central American, 1,536 Cuban, 954 Dominican, 3,622 Mexican, 1,783 Puerto Rican, 652 South American and 267 ‘Other’ individuals.

Ancestral tracts, known as ‘local’ ancestry, along the genome for all HCHS/SOL individuals were inferred using RFMix (Maples et al., 2013) and a three population reference panel, comprised of 315 individuals: 104 HapMap phase 3 CEU (European) and 107 YRI (African) individuals (Altshuler et al., 2010) and 112 Amerindigenous individuals from throughout Latin America (Reich et al., 2012). The reference panel was limited to individuals with 99% continental ancestry as inferred by unsupervised ADMIXTURE (Alexander et al., 2009). Prior to local ancestry inference, HCHS/SOL individuals were merged with the reference panels and then phased using SHAPEIT2 (Delaneau et al., 2013). For all HCHS/SOL Mexican American individuals, parental genomic ancestry was inferred with ANCESTOR (Zou et al., 2015) using the local ancestry estimates generated by RFMix.

Bootstrap analyses (Figure 2B and Figure 2—figure supplement 3) were performed by calculating relevant statistics based on repeated resampling of individuals with replacement. Bootstrap resampling results in an estimate of the variance of the statistics that we are calculating in our data, and allows us to assess the impact of outliers (who are only resampled in a subset of iterations).

Uniform manifold approximation and projection (UMAP)

Principal components for HCHS/SOL and the reference panel were computed using smartPCA (Patterson et al., 2006). UMAP (version 0.3.8) was run using the Python script freely available at (https://github.com/diazale/gt-dimred; Diaz-Papkovich, 2019) with parameter specification set at 15 nearest neighbors and a minimum distance between points of 0.5.

For further analyses of HCHS/SOL population structure, a larger reference panel was assembled comprising of additional European and African populations from the Human Genome Diversity Project (HGDP) (Rosenberg et al., 2002Reich et al., 2012) and 1000 Genomes Project (Auton et al., 2015). For the European reference panel, 24 Basque, 28 French, 12 Italian, 25 Russian, and 28 Sardinian individuals from HGDP and 90 GBR, 107 IBS, 99 FIN, and 107 TSI individuals from 1000 Genomes were included with the original 104 CEU individuals. For the African reference panel, 9 BantuKenya, 8 BantuSouthAfrica, 22 Mandenka, 26 Mozabite from HGDP and 99 ESN, 113 GWD, 97 LWK, and 82 MSL from 1000 Genomes were included with the original 107 YRI individuals. Combined with the 112 Amerindigenous and 10,268 HCHS/SOL individuals, the larger additional analyses comprised 11,567 individuals in total.

Admixture mapping

Local ancestry estimates for 211,151 SNPs across the genome were used to perform admixture mapping in HCHS/SOL Mexican Americans to determine if younger individuals harbored excess Amerindigenous ancestry in certain regions of the genome. Admixture mapping was performed applying two different models: (1) a linear regression model with age as the dependent variable adjusting for global Amerindigenous ancestry, sampling weight and center and (2) a logistic regression model dividing the HCHS/SOL Mexican cohort in to an older vs younger generation with 1965 set as the dividing point while also adjusting for global Amerindigenous ancestry, sampling weight, and center. The threshold for genome-wide significance, 1.38 × 10−4 was calculated using the empirical autoregression framework with the package coda in R to estimate the total number of ancestral blocks (Sobota et al., 2015Plummer et al., 2012).

Tract lengths

The multiple regression model: log(f) = β01 T2 A3 TA +ε, where f  is a matrix containing the proportion of lengths of all ancestral tracts across the genome for all 3622 Mexican American individuals, T  the tract length bin and A  decade of birth year bin, was used to test for an effect of birth-decade on the proportion of Amerindigenous ancestral tract lengths. For assessment between the fraction of ancestry tracts in an individual’s genome and birth year, long tract cutoffs were chosen based on tract separation between the birth year decades in Figure 3B.

Diversity calculations

Subcontinental ancestry was assessed using the diversity measurements π and FST. π was calculated as the average number of pairwise genetic differences among all pairs of overlapping Amerindigenous ancestry tracts across individuals. FST was calculated as:

FST = (HT- HS)/HT where HT is the average heterozygosity when all individuals are pooled across decades and HS is the average heterozygosity within each decade of individuals.

Inference of runs of homozygosity

ROH were called using the program GARLIC v1.1.4 (Szpiech et al., 2017) on 211,152 sites for the Mexican American individuals. An analysis window size of 50 SNPs and an overlap fraction of 0.25 were both chosen using GARLIC’s rule of thumb parameter estimation. GARLIC chose a LOD score cutoff of 0. Using a three-component Gaussian mixture, GARLIC determined class A/B (short/medium) and class B/C (medium/long) size boundaries as 845,097 bp and 2,501,750 bp, respectively.

Simulating ancestry proportions over time

Our Moran model simulator includes population growth (exponential), migration (with adjustable levels of migration and ancestry patterns), ancestry-based assortative mating, and ancestry-based variability in fecundity (see https://github.com/mlspear09/hchs-sol; Spear, 2020). Our simulations are modeled after the data shown in Figure 2. Initial ancestry proportion in the population was set to 0.42. Previous estimates of the generation time in humans has resulted in an estimate of ~26–30 years per generation (Moorjani et al., 2016). As such, the data analyzed correspond to ~2 human generations. We therefore begin our simulations with a random sample of ancestry proportions with mean 0.42, and simulate two generations (corresponding to 2N steps in our simulator). In all simulations, we start with N = 1000. The general idea is to model population parameters (such as average ancestry proportions in the population), which is less sensitive to the actual population size used.

Imputation

Imputation for HCHS/SOL was performed locally using IMPUTE2 (Howie et al., 2009) with the 1000 Genomes Project Phase three haplotypes (Auton et al., 2015) used as a reference panel. After filtering on an info score cutoff of 0.3, this resulted in 33,041,084 SNPs.

Analyzing biomedical traits

We analyzed a total of 69 biomedical traits contained in the HCHS/SOL phenotypic dataset. We used a multiple linear regression model to analyze the effects of global AI ancestry on each trait while controlling for birth year (a proxy for age), center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents. In Figure 4A, we show the effect size (β) for AI ancestry after quantile normalizing each trait (Bolstad et al., 2003; Qiu et al., 2013). Quantile normalization forces each phenotype to be rank-transformed to a Standard Normal distribution. While quantile normalization is a common approach to transforming data to conform to the Normal distribution assumption inherent in linear regression (and provides the benefit of effect sizes that are readily comparable across traits), this procedure can result in a modest reduction in statistical power compared to untransformed data (Qiu et al., 2013). We find that the p-values for the AI effect sizes are highly correlated when phenotypes are untransformed vs quantile normalized (Spearman ρ=0.943; p<2.2e-16) (Figure 4—figure supplement 4) with no statistical evidence for a difference in their distribution (Mann-Whitney U test p=0.912).

Polygenic risk score calculations

Polygenic risk scores for height were calculated using the publicly available UK Biobank (UKBB) GWAS Round 2 Summary Statistics retrieved from http://www.nealelab.is/uk-biobank. Briefly, for sample quality control, sample inclusion was limited to unrelated samples who passed the sex chromosome aneuploidy filter. British ancestry was determined using the 1st 6 PCs; individuals more than seven standard deviations away from the 1st 6 PCs were excluded. Further filtering included limiting to self -reported 'white-British' / 'Irish' / 'White' resulting in a QCed sample count of 361,194 individuals as described in (https://github.com/Nealelab/UK_Biobank_GWAS#imputed-v3-sample-qc; Neale Lab, 2018). An imputation panel of ~90 million SNPs from HRC, UK10K and 1 KG were used to impute genotypes. 13.7 million autosomal and X-chromosome SNPs passed quality control thresholds including Info score >0.8, MAF >0.0001, and HWE p-value>1e-10. For the phenotype, a linear regression model in Hail was run for all individuals (both sexes) adjusting by the first 20 PCs + sex + age + age2 + (sex*age) + (sex*age)2. For height, there was complete phenotype information for 360,388 individuals.

Risk scores were calculated by extracting the overlapping genome-wide significant hits initially discovered in the UKBB GWASs of height and selecting SNPs with the lowest p-value in each 1 Mb window across the genome. Prior to extraction there were a total of 227,794 genome-wide significant SNPs initially discovered in the UKBB GWAS of height. For height this resulted in a dataset of 1078 overlapping SNPs for the PRS calculation that were present in our dataset of genotyped and imputed SNPs.

Health and retirement Study (HRS)

For replication, we used genotype data from 705 self-identified Mexican Americans from the Health and Retirement Study (HRS) (Fisher and Ryan, 2018), genotyped on the Illumina Human Omni 2.5M platform. HRS data was made available under IRB Study No. A11-E91-13B - The apportionment of genetic diversity within the United States. Estimated global ancestry proportions for the Mexican American population in the HRS were calculated as in Baharian et al., 2016, which used an alternative reference panel and alternative ancestry inference approach. Briefly, RFMix was used to infer local ancestry estimates across the genome utilizing CHS, YRI, and CEU individuals from the 1000 Genomes Project as reference populations for Amerindigenous/Asian, African, and European ancestries, respectively. Global ancestry estimates were calculated using the summed RFMix calls.

Statistical analyses and plots

Statistical analyses and plot generation were performed within Rstudio using Version 1.1.463 and R version 3.5.3. ternary and ggridges/ggplot2 packages were used to create the simplex and ridgeline plots.

For each of the HCHS/SOL populations, we evaluated differences in global ancestry estimates over time while accounting for the sampling method (referred to as ‘sampling weight’, see Materials and methods) used for the design of the HCHS/SOL study.

To test for differences in each ancestry over time for each HCHS/SOL population, we ran a linear regression model of Ancestry = β01 BY +β2 SW + ε, where BY = birth year and SW = log(sampling weight). Within the Mexican Americans, we ran this model stratified by gender, the recruitment centers Chicago and San Diego, born in the US versus outside the US and education attainment. For recruitment centers, data stratification was limited to Chicago and San Diego as sample size for the Bronx and Miami was limited: 124 and 25 individuals, respectively. Education attainment was categorized as either less than a high school diploma or equivalent degree (<HS), equal to a high school diploma or equivalent degree (=HS), or post-secondary education (>HS).

To test differences in mean Amerindigenous ancestry by group, we ran t-tests. The data were split and compared by gender, recruitment center, born in the US versus outside the US, and educational attainment levels.

For the height and polygenic height score analyses, 3604 Mexicans were included based on complete information for height, gender, recruitment center, sampling weight, education attainment, born in the US versus outside the US, and number of US-born parents.

Acknowledgements

We thank many colleagues who commented on our preprint prior to submission, particularly Reed Cartwright for suggestions on terminology. MLS was supported through the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) under Award Number F31HG010104. ADP and SG were supported, in part, thanks to funding from the Canada Research Chairs program and CIHR grant MOP-136855. EZ was supported, in part, by NIH grants R0184545 and K24CA169004. RDH was supported, in part, by NHGRI grant R01HG007644 and the Canadian Research Chairs program.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Melissa L Spear, Email: mlspear09@gmail.com.

Ryan D Hernandez, Email: ryan.hernandez@me.com.

Mashaal Sohail, Brigham and Women's Hospital and Harvard Medical School, United States.

Patricia J Wittkopp, University of Michigan, United States.

Funding Information

This paper was supported by the following grants:

  • National Institutes of Health R01HG007644 to Ryan D Hernandez.

  • National Institutes of Health F31HG010104 to Melissa L Spear.

  • Canadian Institutes of Health Research MOP-136855 to Simon Gravel.

  • National Cancer Institute R01184545 to Elad Ziv.

  • National Cancer Institute K24169004 to Elad Ziv.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Resources, Data curation, Software, Formal analysis, Funding acquisition, Investigation, Visualization, Methodology, Writing - original draft.

Resources, Formal analysis, Visualization, Writing - review and editing.

Conceptualization, Resources, Data curation, Formal analysis, Visualization, Methodology, Writing - original draft.

Formal analysis, Visualization, Writing - review and editing.

Resources, Supervision, Writing - review and editing.

Resources, Supervision, Writing - review and editing.

Conceptualization, Resources, Supervision, Methodology, Writing - review and editing.

Additional files

Supplementary file 1. Association of global ancestries and birth year for all HCHS/SOL individuals.

For each population, we tested for an association between global ancestry and birth year while accounting for the sampling design. AI, AFR, and EUR refer to Amerindigenous, African, and European ancestry respectively. The significance threshold was set at 0.003 using Bonferroni correction for multiple testing (0.05/18).

elife-56029-supp1.xlsx (9.6KB, xlsx)
Supplementary file 2. Frequency table of 3622 HCHS/SOL Mexican Americans stratified by recruitment region, US-born vs non-US-born status, gender and educational attainment.

Recruitment was performed at four regions: Bronx, Chicago, Miami and San Diego. Education attainment was categorized as either less than a high school diploma or equivalent degree (<HS), equal to a high school diploma or equivalent degree (=HS), or post-secondary education (>HS).

elife-56029-supp2.xlsx (9.2KB, xlsx)
Supplementary file 3. Association of quantitative traits and Amerindigenous ancestry in HCHS/SOL Mexican Americans.

Each trait as a function of AI ancestry adjusted by birth year, center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents. Results are shown for both the raw data and quantile normalized data.

elife-56029-supp3.xlsx (105.3KB, xlsx)
Supplementary file 4. Height over time.

Height (cm) as a function of birth year adjusting by center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents for 3604 Mexican Americans stratified by the quartiles of global Amerindigenous ancestry (AI).

elife-56029-supp4.xlsx (9.3KB, xlsx)
Supplementary file 5. Predicted height vs. observed height.

Predicted height (cm) as a function of observed height (cm) adjusting by center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents for 3604 Mexican Americans stratified by Amerindigenous ancestry (AI).

elife-56029-supp5.xlsx (9.4KB, xlsx)
Transparent reporting form

Data availability

All data used in this manuscript were downloaded from publicly available sources (dbGap). No new data were created.

The following previously published datasets were used:

Conomos MP, Laurie CA, Stilp AM, Gogarten SM, McHugh CP, Nelson SC. 2016. Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos. phs000810.v1.p1. phs000810.v1.p1

Fisher GG, Ryan LH. 2018. Overview of the Health and Retirement Study and Introduction to the Special Issue. A11-E91-13B. A11-E91-13B

References

  1. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA, 1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Agarwala V, Flannick J, Sunyaev S, Altshuler D, GoT2D Consortium Evaluating empirical bounds on complex disease genetic architecture. Nature Genetics. 2013;45:1418–1427. doi: 10.1038/ng.2804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Peltonen L, Dermitzakis E, Bonnen PE, Altshuler DM, Gibbs RA, de Bakker PI, Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye M, Jia X, Palotie A, Parkin M, Whittaker P, Yu F, Chang K, Hawes A, Lewis LR, Ren Y, Wheeler D, Gibbs RA, Muzny DM, Barnes C, Darvishi K, Hurles M, Korn JM, Kristiansson K, Lee C, McCarrol SA, Nemesh J, Dermitzakis E, Keinan A, Montgomery SB, Pollack S, Price AL, Soranzo N, Bonnen PE, Gibbs RA, Gonzaga-Jauregui C, Keinan A, Price AL, Yu F, Anttila V, Brodeur W, Daly MJ, Leslie S, McVean G, Moutsianas L, Nguyen H, Schaffner SF, Zhang Q, Ghori MJ, McGinnis R, McLaren W, Pollack S, Price AL, Schaffner SF, Takeuchi F, Grossman SR, Shlyakhter I, Hostetter EB, Sabeti PC, Adebamowo CA, Foster MW, Gordon DR, Licinio J, Manca MC, Marshall PA, Matsuda I, Ngare D, Wang VO, Reddy D, Rotimi CN, Royal CD, Sharp RR, Zeng C, Brooks LD, McEwen JE, International HapMap 3 Consortium Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. doi: 10.1038/nature09298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR, 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Baharian S, Barakatt M, Gignoux CR, Shringarpure S, Errington J, Blot WJ, Bustamante CD, Kenny EE, Williams SM, Aldrich MC, Gravel S. The great migration and African-American genomic diversity. PLOS Genetics. 2016;12:e1006059. doi: 10.1371/journal.pgen.1006059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and Bias. Bioinformatics. 2003;19:185–193. doi: 10.1093/bioinformatics/19.2.185. [DOI] [PubMed] [Google Scholar]
  8. Bryc K, Durand EY, Macpherson JM, Reich D, Mountain JL. The genetic ancestry of african americans, latinos, and european americans across the united states. The American Journal of Human Genetics. 2015;96:37–53. doi: 10.1016/j.ajhg.2014.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bustamante CD, Burchard EG, De la Vega FM. Genomics for the world. Nature. 2011;475:163–165. doi: 10.1038/475163a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Colby SL, Ortman JM. Projections of the size and compositon of the US. Population: 2014 to 2060. U.S. Census Bureau, Commerce USDo.2015. [Google Scholar]
  11. Conomos MP, Laurie CA, Stilp AM, Gogarten SM, McHugh CP, Nelson SC, Sofer T, Fernández-Rhodes L, Justice AE, Graff M, Young KL, Seyerle AA, Avery CL, Taylor KD, Rotter JI, Talavera GA, Daviglus ML, Wassertheil-Smoller S, Schneiderman N, Heiss G, Kaplan RC, Franceschini N, Reiner AP, Shaffer JR, Barr RG, Kerr KF, Browning SR, Browning BL, Weir BS, Avilés-Santa ML, Papanicolaou GJ, Lumley T, Szpiro AA, North KE, Rice K, Thornton TA, Laurie CC. Genetic diversity and association studies in US hispanic/Latino populations: applications in the hispanic community health Study/Study of latinos. The American Journal of Human Genetics. 2016;98:165–184. doi: 10.1016/j.ajhg.2015.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Contreras VR. The role of Drug-Related violence and extortion in promoting mexican migration: unexpected consequences of a drug war. Latin American Research Review. 2014;49:199–217. doi: 10.1353/lar.2014.0038. [DOI] [Google Scholar]
  13. Delaneau O, Zagury JF, Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nature Methods. 2013;10:5–6. doi: 10.1038/nmeth.2307. [DOI] [PubMed] [Google Scholar]
  14. Diaz-Papkovich A. Genotype dimension reduction research. 1KGPGithub Repository. 2019 https://github.com/diazale/gt-dimred
  15. Eyre-Walker A. Evolution in health and medicine sackler colloquium: genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. PNAS. 2010;107:1752–1756. doi: 10.1073/pnas.0906182107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Fernández-Kelly P, Massey DS. Borders for whom? the role of NAFTA in Mexico-U.S. migration. The ANNALS of the American Academy of Political and Social Science. 2007;610:98–118. doi: 10.1177/0002716206297449. [DOI] [Google Scholar]
  17. Fisher GG, Ryan LH. Overview of the health and retirement study and introduction to the special issue. Work, Aging and Retirement. 2018;4:1–9. doi: 10.1093/workar/wax032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gravel S, Zakharia F, Moreno-Estrada A, Byrnes JK, Muzzio M, Rodriguez-Flores JL, Kenny EE, Gignoux CR, Maples BK, Guiblet W, Dutil J, Via M, Sandoval K, Bedoya G, Oleksyk TK, Ruiz-Linares A, Burchard EG, Martinez-Cruzado JC, Bustamante CD, 1000 Genomes Project Reconstructing native american migrations from whole-genome and whole-exome data. PLOS Genetics. 2013;9:e1004023. doi: 10.1371/journal.pgen.1004023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Han E, Carbonetto P, Curtis RE, Wang Y, Granka JM, Byrnes J, Noto K, Kermany AR, Myres NM, Barber MJ, Rand KA, Song S, Roman T, Battat E, Elyashiv E, Guturu H, Hong EL, Chahine KG, Ball CA. Clustering of 770,000 genomes reveals post-colonial population structure of North America. Nature Communications. 2017;8:14238. doi: 10.1038/ncomms14238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Homburger JR, Moreno-Estrada A, Gignoux CR, Nelson D, Sanchez E, Ortiz-Tello P, Pons-Estel BA, Acevedo-Vasquez E, Miranda P, Langefeld CD, Gravel S, Alarcón-Riquelme ME, Bustamante CD. Genomic insights into the ancestry and demographic history of south America. PLOS Genetics. 2015;11:e1005602. doi: 10.1371/journal.pgen.1005602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLOS Genetics. 2009;5:e1000529. doi: 10.1371/journal.pgen.1000529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Jordan IK, Rishishwar L, Conley AB. Native american admixture recapitulates population-specific migration and settlement of the continental united states. PLOS Genetics. 2019;15:e1008225. doi: 10.1371/journal.pgen.1008225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. LaVange LM, Kalsbeek WD, Sorlie PD, Avilés-Santa LM, Kaplan RC, Barnhart J, Liu K, Giachello A, Lee DJ, Ryan J, Criqui MH, Elder JP. Sample Design and Cohort Selection in the Hispanic Community Health Study/Study of Latinos. Annals of Epidemiology. 2010;20:642–649. doi: 10.1016/j.annepidem.2010.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Lisker R, Ramirez E, Briceno RP, Granados J, Babinsky V. Gene frequencies and admixture estimates in four mexican urban centers. Human biology. 1990;62:791–801. [PubMed] [Google Scholar]
  25. Maas P, Barrdahl M, Joshi AD, Auer PL, Gaudet MM, Milne RL, Schumacher FR, Anderson WF, Check D, Chattopadhyay S, Baglietto L, Berg CD, Chanock SJ, Cox DG, Figueroa JD, Gail MH, Graubard BI, Haiman CA, Hankinson SE, Hoover RN, Isaacs C, Kolonel LN, Le Marchand L, Lee I-M, Lindström S, Overvad K, Romieu I, Sanchez M-J, Southey MC, Stram DO, Tumino R, VanderWeele TJ, Willett WC, Zhang S, Buring JE, Canzian F, Gapstur SM, Henderson BE, Hunter DJ, Giles GG, Prentice RL, Ziegler RG, Kraft P, Garcia-Closas M, Chatterjee N. Breast Cancer Risk From Modifiable and Nonmodifiable Risk Factors Among White Women in the United States. JAMA Oncology. 2016;2:1295–1302. doi: 10.1001/jamaoncol.2016.1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Maher MC, Uricchio LH, Torgerson DG, Hernandez RD. Population genetics of rare variants and complex diseases. Human Heredity. 2013;74:118–128. doi: 10.1159/000346826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Maples BK, Gravel S, Kenny EE, Bustamante CD. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. The American Journal of Human Genetics. 2013;93:278–288. doi: 10.1016/j.ajhg.2013.06.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, Daly MJ, Bustamante CD, Kenny EE. Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. The American Journal of Human Genetics. 2017;100:635–649. doi: 10.1016/j.ajhg.2017.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Martinez DZ. Is Mexico a Post-Racial Country? Inequality and Skin Tone Across the Americas. AmericasBarometer Insights Series by the Latin. American Public Opinion Project (LAPOP); 2017. https://www.vanderbilt.edu/lapop/insights/ITB031en.pdf [Google Scholar]
  31. Micheletti SJ, Bryc K, Ancona Esselmann SG, Freyman WA, Moreno ME, Poznik GD, Shastri AJ, Beleza S, Mountain JL, 23andMe Research Team Genetic consequences of the transatlantic slave trade in the americas. The American Journal of Human Genetics. 2020;107:265–277. doi: 10.1016/j.ajhg.2020.06.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Mills MC, Rahal C. A scientometric review of genome-wide association studies. Communications Biology. 2019;2:9. doi: 10.1038/s42003-018-0261-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Moorjani P, Sankararaman S, Fu Q, Przeworski M, Patterson N, Reich D. A genetic method for dating ancient genomes provides a direct estimate of human generation interval in the last 45,000 years. PNAS. 2016;113:5652–5657. doi: 10.1073/pnas.1514696113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Moran PAP. Random processes in genetics. Mathematical Proceedings of the Cambridge Philosophical Society. 1958;54:60–71. doi: 10.1017/S0305004100033193. [DOI] [Google Scholar]
  35. Moreno-Estrada A, Gravel S, Zakharia F, McCauley JL, Byrnes JK, Gignoux CR, Ortiz-Tello PA, Martínez RJ, Hedges DJ, Morris RW, Eng C, Sandoval K, Acevedo-Acevedo S, Norman PJ, Layrisse Z, Parham P, Martínez-Cruzado JC, Burchard EG, Cuccaro ML, Martin ER, Bustamante CD. Reconstructing the Population Genetic History of the Caribbean. PLOS Genetics. 2013;9:e1003925. doi: 10.1371/journal.pgen.1003925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Moreno-Estrada A, Gignoux CR, Fernández-López JC, Zakharia F, Sikora M, Contreras AV, Acuña-Alonzo V, Sandoval K, Eng C, Romero-Hidalgo S, Ortiz-Tello P, Robles V, Kenny EE, Nuño-Arana I, Barquera-Lozano R, Macín-Pérez G, Granados-Arriola J, Huntsman S, Galanter JM, Via M, Ford JG, Chapela R, Rodriguez-Cintron W, Rodríguez-Santana JR, Romieu I, Sienra-Monge JJ, del Rio Navarro B, London SJ, Ruiz-Linares A, Garcia-Herrera R, Estrada K, Hidalgo-Miranda A, Jimenez-Sanchez G, Carnevale A, Soberón X, Canizales-Quinteros S, Rangel-Villalobos H, Silva-Zolezzi I, Burchard EG, Bustamante CD. Human genetics the genetics of Mexico recapitulates native american substructure and affects biomedical traits. Science. 2014;344:1280–1285. doi: 10.1126/science.1251688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. NCD Risk Factor Collaboration (NCD-RisC) A century of trends in adult human height. eLife. 2016;5:e13410. doi: 10.7554/eLife.13410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Neale Lab GWAS Results. Nealelab. 2018 http://www.nealelab.is/uk-biobank/
  39. Nelson MR, Bryc K, King KS, Indap A, Boyko AR, Novembre J, Briley LP, Maruyama Y, Waterworth DM, Waeber G, Vollenweider P, Oksenberg JR, Hauser SL, Stirnadel HA, Kooner JS, Chambers JC, Jones B, Mooser V, Bustamante CD, Roses AD, Burns DK, Ehm MG, Lai EH. The population reference sample, POPRES: a resource for population, disease, and pharmacological genetics research. The American Journal of Human Genetics. 2008;83:347–358. doi: 10.1016/j.ajhg.2008.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial Bias in an algorithm used to manage the health of populations. Science. 2019;366:447–453. doi: 10.1126/science.aax2342. [DOI] [PubMed] [Google Scholar]
  41. Pasaniuc B, Price AL. Dissecting the genetics of complex traits using summary association statistics. Nature Reviews Genetics. 2017;18:117–127. doi: 10.1038/nrg.2016.142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLOS Genetics. 2006;2:e190. doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Plummer MBN, Cowles K, Vines K. CODA: convergence diagnosis and output analysis for MCMC. R News. 2012;6:7–11. [Google Scholar]
  44. Popejoy AB, Fullerton SM. Genomics is failing on diversity. Nature. 2016;538:161–164. doi: 10.1038/538161a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Qiu X, Wu H, Hu R. The impact of quantile and rank normalization procedures on the testing power of gene differential expression analysis. BMC Bioinformatics. 2013;14:124. doi: 10.1186/1471-2105-14-124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Reich D, Patterson N, Campbell D, Tandon A, Mazieres S, Ray N, Parra MV, Rojas W, Duque C, Mesa N, García LF, Triana O, Blair S, Maestre A, Dib JC, Bravi CM, Bailliet G, Corach D, Hünemeier T, Bortolini MC, Salzano FM, Petzl-Erler ML, Acuña-Alonzo V, Aguilar-Salinas C, Canizales-Quinteros S, Tusié-Luna T, Riba L, Rodríguez-Cruz M, Lopez-Alarcón M, Coral-Vazquez R, Canto-Cetina T, Silva-Zolezzi I, Fernandez-Lopez JC, Contreras AV, Jimenez-Sanchez G, Gómez-Vázquez MJ, Molina J, Carracedo A, Salas A, Gallo C, Poletti G, Witonsky DB, Alkorta-Aranburu G, Sukernik RI, Osipova L, Fedorova SA, Vasquez R, Villena M, Moreau C, Barrantes R, Pauls D, Excoffier L, Bedoya G, Rothhammer F, Dugoujon JM, Larrouy G, Klitz W, Labuda D, Kidd J, Kidd K, Di Rienzo A, Freimer NB, Price AL, Ruiz-Linares A. Reconstructing native american population history. Nature. 2012;488:370–374. doi: 10.1038/nature11258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW. Genetic structure of human populations. Science. 2002;298:2381–2385. doi: 10.1126/science.1078311. [DOI] [PubMed] [Google Scholar]
  48. Schumacher FR, Al Olama AA, Berndt SI, Benlloch S, Ahmed M, Saunders EJ, Dadaev T, Leongamornlert D, Anokian E, Cieza-Borrella C, Goh C, Brook MN, Sheng X, Fachal L, Dennis J, Tyrer J, Muir K, Lophatananon A, Stevens VL, Gapstur SM, Carter BD, Tangen CM, Goodman PJ, Thompson IM, Batra J, Chambers S, Moya L, Clements J, Horvath L, Tilley W, Risbridger GP, Gronberg H, Aly M, Nordström T, Pharoah P, Pashayan N, Schleutker J, Tammela TLJ, Sipeky C, Auvinen A, Albanes D, Weinstein S, Wolk A, Håkansson N, West CML, Dunning AM, Burnet N, Mucci LA, Giovannucci E, Andriole GL, Cussenot O, Cancel-Tassin G, Koutros S, Beane Freeman LE, Sorensen KD, Orntoft TF, Borre M, Maehle L, Grindedal EM, Neal DE, Donovan JL, Hamdy FC, Martin RM, Travis RC, Key TJ, Hamilton RJ, Fleshner NE, Finelli A, Ingles SA, Stern MC, Rosenstein BS, Kerns SL, Ostrer H, Lu YJ, Zhang HW, Feng N, Mao X, Guo X, Wang G, Sun Z, Giles GG, Southey MC, MacInnis RJ, FitzGerald LM, Kibel AS, Drake BF, Vega A, Gómez-Caamaño A, Szulkin R, Eklund M, Kogevinas M, Llorca J, Castaño-Vinyals G, Penney KL, Stampfer M, Park JY, Sellers TA, Lin HY, Stanford JL, Cybulski C, Wokolorczyk D, Lubinski J, Ostrander EA, Geybels MS, Nordestgaard BG, Nielsen SF, Weischer M, Bisbjerg R, Røder MA, Iversen P, Brenner H, Cuk K, Holleczek B, Maier C, Luedeke M, Schnoeller T, Kim J, Logothetis CJ, John EM, Teixeira MR, Paulo P, Cardoso M, Neuhausen SL, Steele L, Ding YC, De Ruyck K, De Meerleer G, Ost P, Razack A, Lim J, Teo SH, Lin DW, Newcomb LF, Lessel D, Gamulin M, Kulis T, Kaneva R, Usmani N, Singhal S, Slavov C, Mitev V, Parliament M, Claessens F, Joniau S, Van den Broeck T, Larkin S, Townsend PA, Aukim-Hastie C, Gago-Dominguez M, Castelao JE, Martinez ME, Roobol MJ, Jenster G, van Schaik RHN, Menegaux F, Truong T, Koudou YA, Xu J, Khaw KT, Cannon-Albright L, Pandha H, Michael A, Thibodeau SN, McDonnell SK, Schaid DJ, Lindstrom S, Turman C, Ma J, Hunter DJ, Riboli E, Siddiq A, Canzian F, Kolonel LN, Le Marchand L, Hoover RN, Machiela MJ, Cui Z, Kraft P, Amos CI, Conti DV, Easton DF, Wiklund F, Chanock SJ, Henderson BE, Kote-Jarai Z, Haiman CA, Eeles RA, Profile Study. Australian Prostate Cancer BioResource (APCB) IMPACT Study. Canary PASS Investigators. Breast and Prostate Cancer Cohort Consortium (BPC3) PRACTICAL (Prostate Cancer Association Group to Investigate Cancer-Associated Alterations in the Genome) Consortium. Cancer of the Prostate in Sweden (CAPS) Prostate Cancer Genome-wide Association Study of Uncommon Susceptibility Loci (PEGASUS) Genetic Associations and Mechanisms in Oncology (GAME-ON)/Elucidating Loci Involved in Prostate Cancer Susceptibility (ELLIPSE) Consortium Association analyses of more than 140,000 men identify 63 new prostate Cancer susceptibility loci. Nature Genetics. 2018;50:928–936. doi: 10.1038/s41588-018-0142-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Sharp SA, Rich SS, Wood AR, Jones SE, Beaumont RN, Harrison JW, Schneider DA, Locke JM, Tyrrell J, Weedon MN, Hagopian WA, Oram RA. Development and standardization of an improved type 1 diabetes genetic risk score for use in newborn screening and incident diagnosis. Diabetes Care. 2019;42:200–207. doi: 10.2337/dc18-1785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Simons YB, Turchin MC, Pritchard JK, Sella G. The deleterious mutation load is insensitive to recent population history. Nature Genetics. 2014;46:220–224. doi: 10.1038/ng.2896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Sobota RS, Shriner D, Kodaman N, Goodloe R, Zheng W, Gao YT, Edwards TL, Amos CI, Williams SM. Addressing population-specific multiple testing burdens in genetic association studies. Annals of Human Genetics. 2015;79:136–147. doi: 10.1111/ahg.12095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Sorlie PD, Avilés-Santa LM, Wassertheil-Smoller S, Kaplan RC, Daviglus ML, Giachello AL, Schneiderman N, Raij L, Talavera G, Allison M, Lavange L, Chambless LE, Heiss G. Design and implementation of the hispanic community health Study/Study of latinos. Annals of Epidemiology. 2010;20:629–641. doi: 10.1016/j.annepidem.2010.03.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Spear ML. Mexican American research project. v1.p1Github Repository. 2020 https://github.com/mlspear09/hchs-sol
  54. Szpiech ZA, Blant A, Pemberton TJ. GARLIC: genomic autozygosity regions Likelihood-based inference and classification. Bioinformatics. 2017;33:2059–2062. doi: 10.1093/bioinformatics/btx102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Terrazas A. Migration Information Source; 2010. https://www.migrationpolicy.org/article/mexican-immigrants-united-states-2 [Google Scholar]
  56. Thornton T, Tang H, Hoffmann TJ, Ochs-Balcom HM, Caan BJ, Risch N. Estimating Kinship in Admixed Populations. The American Journal of Human Genetics. 2012;91:122–138. doi: 10.1016/j.ajhg.2012.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. United States Government, Executive Office of the President, Office of Management and Budget, Office of Information and Regulatory Affairs Revisions to the standards for the classification of federal data on race and ethnicity. [December 23, 2020];1997 https://www.govinfo.gov/content/pkg/FR-1997-10-30/pdf/97-28653.pdf
  58. Uricchio LH, Zaitlen NA, Ye CJ, Witte JS, Hernandez RD. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Research. 2016;26:863–873. doi: 10.1101/gr.202440.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Verea M, Verea M. Immigration trends after 20 years of NAFTA. Norteamérica. 2014;9:109–143. doi: 10.20999/nam.2014.b005. [DOI] [Google Scholar]
  60. Yang J, Bakshi A, Zhu Z, Hemani G, Vinkhuyzen AA, Lee SH, Robinson MR, Perry JR, Nolte IM, van Vliet-Ostaptchouk JV, Snieder H, Esko T, Milani L, Mägi R, Metspalu A, Hamsten A, Magnusson PK, Pedersen NL, Ingelsson E, Soranzo N, Keller MC, Wray NR, Goddard ME, Visscher PM, LifeLines Cohort Study Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nature Genetics. 2015;47:1114–1120. doi: 10.1038/ng.3390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Visscher PM, GIANT Consortium Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of european ancestry. Human Molecular Genetics. 2018;27:3641–3649. doi: 10.1093/hmg/ddy271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Zou JY, Halperin E, Burchard E, Sankararaman S. Inferring parental genomic ancestries using pooled semi-Markov processes. Bioinformatics. 2015;31:i190–i196. doi: 10.1093/bioinformatics/btv239. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Mashaal Sohail1
Reviewed by: Mashaal Sohail2, Genevieve L Wojcik3

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This article studies genetic and complex trait variation in individuals in the United States with origins in Mexico. The authors find a pattern of increasing amerindigenous ancestry with birth year, which they investigate using simulations and attribute to the likely combination of several cultural and historical factors. They find amerindigenous ancestry to be associated with trait variation for several complex traits, and highlight the importance of further work in medical and population genomics across human diversity.

Decision letter after peer review:

Thank you for submitting your article "Recent fluctuations in Mexican American genomes have altered the genetic architecture of biomedical traits" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Mashaal Sohail as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Patricia Wittkopp as the Senior Editor. The following individual involved in review of your submission had agreed to reveal their identity: Genevieve L Wojcik.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

The editors have judged that your manuscript is of interest, but as described below that some conclusions and analyses need to revised as presented in light of our comments before it is published. We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is “in revision at eLife”. Please let us know if you would like to pursue this option. (If your work is more suitable for medRxiv, you will need to post the preprint yourself, as the mechanisms for us to do so are still in development.)

Summary:

This paper examines population structure in a Hispanic/Latino (H/L) cohort, first highlighting genetic diversity and fine-scale population substructure via UMAP. The paper highlights an interesting temporal aspect to population substructure in H/L groups, demonstrating increasing proportions of Amerindigenous ancestry, particularly in Mexican Americans, over time. They also provide other analyses that may help interpret or follow from this observation, involving genetic diversity, assortative mating and runs of homozygosity. They show a correlation between amerindigenous ancestry and complex traits, and show that the behavior of UKB height GWAS polygenic scores in Mexican-Americans depends on the proportion of Amerindigenous ancestry.

Essential revisions:

This paper's major finding is an observation of an increase in Amerindigenous ancestry in Mexican Americans in the 1940-1990 period. The manuscript is interesting and in theory suitable for publication in eLife, but after a number of points are addressed. First, the authors need to articulate clearly the factors that may have caused this primary observation, and what may be the most likely explanations outlined below. Second, they need to address that the primary explanation for the ROH increase is likely the amerindigenous ancestry increase, and in that sense, determine and clearly articulate the place of the assortative mating observation in the manuscript. Lastly, they need to clearly admit in the manuscript the importance of the un-modelled socio-economic variable in the correlation between amerindigenous ancestry and complex traits, and only present this analysis after controlling for appropriate covariates. It is a time when we are finally seeing some population genetic studies of understudied populations as they relate to complex trait variation (to which this manuscript can be an important contribution), and so the bar to be as rigorous as possible in considering alternative explanations and model all possible covariates should be set extremely high.

1) The main finding is the increase in amerindigenous ancestry in the 1940 – 1990 period in Mexican-Americans in the United States.

a) The authors state: "In our non-US born individuals (N=2987), we evaluated differences in ancestry estimates over time while accounting for years in the US and sampling weight and identified a significant effect of years in the US (𝛽=-0.0009; P=0.0006; SE=0.0003)." If I understand correctly, this regression is amerindigenous ancestry against time and other covariates. It would be helpful if the authors add to this sentence something along the lines of, "implying that individuals who arrived earlier to the US from Mexico had more European ancestry."

b) Given the above analysis, and independent migration analyses (see, for example, https://www.migrationpolicy.org/article/mexican-immigrants-united-states-2), it seems that migration from Mexico to the US shifted over the years from states with less amerindigenous ancestry to states with higher amerindigenous ancestry (Chiapas, Oaxaca, Veracruz) in the South and South-east of the country. This seems a highly likely explanation for the pattern of increasing amerindigenous ancestry that they see, and should be stated as such in the manuscript. This seems especially likely given that the signal comes primarily from non-US born individuals, or US-born individuals with parents born outside the US.

c) The authors state that "It is possible that the increase in global Amerindigenous ancestry over time could be biased by changes in the specific subcontinental Amerindigenous ancestries over time (though such an effect is not visible in our UMAP analysis, Figure 1B)." – It is not clear what is meant by this sentence – please re-phrase and articulate more clearly. If it alludes to the difference in migration sources over time I mention above, I don't think their analyses of Fst and genetic diversity rule that explanation out.

d) Assortative mating (Figure 2D and Supplementary figure 9). This argument is puzzling because if there is assortative mating along indigenous ancestry as they suggest, then would this not mean that there is also assortative mating along the collinear European ancestry? If this is the case, why would amerindigenous ancestry be increasing in particular? The authors do state that assortative mating would not cause an increase in one ancestry. In that case, the paper overall does not provide an explanation for why the amerindigenous ancestry is actually increasing – is the migration sources explanation the most likely explanation? Along with that individuals with higher amerindigenous ancestry must be reproducing more? The likely explanations of the primary result should be made very clear for the reader.

e) The authors state: "and this increase cannot be entirely explained by very recent migration." What is the evidence backing this claim?

f) "sampling weight" is not actually defined in the Materials and methods. Can the authors clarify how this is defined and used to weight the major analysis?

g) Please describe the bootstrap resampling performed – is the bootstrapping performed over individuals or segments of the genome? Please justify the strategy picked. This should be described in the manuscript.

2) The pattern of ROH change over time observed.

a) What are the number of individuals in each decade? Please show this in the manuscript. If there are more individuals in later birth decades (as may be expected), you would see an increase in the ROH summed over each genome with time, simply because you are summing over more individuals at later time periods. It is not clear if the analyses for Figure 2C are normalized by the number of individuals in each decade – if not, this would be important to do, and only the normalized results should be reported.

b) The ROHsum increasing with time could simply be due to the amerindigenous ancestry increasing with time, as amerindigenous ancestry carries more short ROH segments than European ancestry (see for example Ceballos et al. Nature Review Genetics 2018). The authors should explicitly describe this, as this simplest explanation would not require assortative mating to be invoked either.

3) Correlation of amerindigenous ancestry and complex traits

Many of the traits studied would also be affected by socioeconomic status (for example, height, cholesterol). Do the authors have this variable available? If yes, it should be included in the multiple regression. If not, it should be clearly mentioned that they are not able to account for this likely important effect, leaving their estimates confounded by socio-economic differences that likely correlate with amerindigenous ancestry. For Figure 3, we don't think it is fair to show tau between only amerindigenous ancestry and traits as this analysis does not account for important covariates, and would like to see Supplementary file 3 instead to replace Figure 3 (in a figure form as the authors prefer) such that only the multiple regression effect sizes are reported in the manuscript that account for covariates.

Why do a first pass of this analysis without covariates included, and then re-run with covariates in the Bonferroni significant subset of traits only? (Given that there will be confounding between Amerindigenous ancestry and socioeconomic, environmental and other non-genetic factors, and especially age). Furthermore, looking in the Materials and methods section, we cannot seem to find the full description of how this analysis was performed i.e. what models were run, and how/if phenotypic measures used were cleaned and normalized etc. Please provide this.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for the revised submission of your manuscript, now called "Recent shifts in the genomic ancestry of Mexican Americans may alter the genetic architecture of biomedical traits." for consideration by eLife.

The Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.

Summary:

The manuscript is significantly improved, and the addition of the new simulation analyses greatly help in the interpretation of the trends that they see. The authors have sufficiently addressed concerns about two of their main results: (1) Interpretation of ancestry change over time and (2) Interpretation of ROH change over time. I still have the following points I would ask them to consider regarding their third main result (3) the potential effects of ancestry and ancestry change over time on the genetic architecture of complex traits, and to make appropriate additions/revisions to their analyses and text before submitting a revised manuscript. Beyond that, I see that the new simulation results don't appear in the manuscript until the Discussion. I would consider them a result and would like to see the authors try to integrate the main simulation results in the Results section, when they report their empirical observed patterns.

Revisions for this paper:

The authors state in the manuscript that, "As illustrated in Figure 4A, 20 of these traits (29%) are significantly correlated after Bonferroni correction (P<0.000145), highlighting the need for increased investigation into the role of AI genetic ancestry and other unmodelled socio-economic variables in admixed populations such as Mexican Americans." As written, it understates the correlation between AI ancestry and unmodelled socio-economic variables in the United States which we know exists on the basis of historical and social science research. Given this, I would like to see the text revised to say that it is not clear what the correlation with AI ancestry implies, and while it could be reflecting genetic effects, it can also be reflecting socio-economic variables that are correlated with AI ancestry and that AI ancestry could be serving as a proxy for. The authors themselves show that AI ancestry is correlated with educational attainment levels which they state is a proxy for socio-economic status. I would like to see their model for testing the effect of ancestry on traits include as covariates: (1) educational attainment as a proxy for socioeconomic status, (2) whether they are US born or not, and whether their parents are US born or not, to help capture the effects of different environments they would have been born in, and that their parents would have created for them on various levels. I would also encourage the authors to make their model as rich as possible by adding other environmental variables they could obtain on their recruitment sites (altitude, latitude, longitude, population density, average obesity rates to name some that are likely relevant). Ideally, this kind of analysis would be done in a mixed model framework as well, correcting for the full genomic relationship matrix, and adding a random effect to account for unaccounted for environmental factors, but they should at least add covariates to their multiple regression framework that they have access to, or could access. They should also consider issues of collinearity as they may affect their estimates and study and report the Variance Inflation Factors (VIF) of the different variables. They should report in the manuscript, the results for the full model, giving coefficients and p-values for not only AI ancestry but also the other test variables, and should describe these in the results as well, and report and discuss the contribution of other test variables relative to AI ancestry as estimated from their model.

Further, the authors results show three observations that I would like to see described in the Results and discussed in the Discussion, as the implications are important. The authors observe an increase in height in Mexican Americans (at roughly the same rate in all amerindigenous ancestry stratifications, see note below) with birth year. First, they do not see any trend of the polygenic height score with birth year. This suggests that while the genetic predisposition of the trait remains the same, the trait has changed significantly due to non-genetic environmental factors. Second, even though amerindigenous ancestry is negatively correlated with height, and amerindigenous ancestry is increasing over time, the trait value/height increases rather than decreases over time. This also points to the effects of non-genetic factors playing an important role in values of the trait.

Lastly, if amerindigenous ancestry is negatively correlated with height due to genetic reasons, then shouldn't we expect to see the polygenic height score decrease with birth year, as amerindigenous ancestry increases with birth year? How do the authors interpret this meta pattern across their analyses, and what implications does it have for how temporal change in ancestry can alter the genetic architecture of traits? Overall, this could mean that (1) the correlation of amerindigenous ancestry with height is at least partly due to genetic reasons. Given that, while ancestry trends (AI ancestry increasing), combined with correlations of ancestry with traits, may make you predict one thing with respect to the genetic architecture of traits (height will decrease with birth year), the way heredity interacts with the environment through the randomness of development makes the trait move in the opposite direction than the model would predict. Or (2) the correlation of amerindigenous ancestry with height is fully picking a signal of height being lower in individuals with higher amerindigenous ancestry due to primarily environmental reasons, and therefore, as environment changes, the correlation is not meaningful for prediction. I would like to see the authors consider the above, and state these patterns as they stand across analyses, and discuss their implications for their overall thesis (as it relates to height and traits in general).

Note: They say in the study, "We find a similar trend in the HCHS/SOL Mexican Americans (Figure 4B). Indeed, when we stratified individuals by quartiles of global AI ancestry, we see that all quartiles have increased in height by a similar amount over the period investigated." Can the authors make this statement more specific in the manuscript – what are the rates of change in the different quartiles? Are they higher in quartiles with higher indigenous ancestry? Please integrate this into the Discussion above as well.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Recent shifts in the genomic ancestry of Mexican Americans may alter the genetic architecture of biomedical traits" for further consideration by eLife. Your revised article has been evaluated by Patricia Wittkopp (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

To be able to appreciate the effects of different factors affecting complex trait variation, can the authors add the effect sizes of the important covariates to Figure 4A. Or, I would like to see these as supplemental figures. This is to put the AI ancestry effect in context, and see its magnitude relative to the effect of other non-genetic factors that have been modelled. While the authors have reported these in Supplementary file 3, figures will help more with being able to compare and parse the results. Further, I'd like to see a few sentences added to the Results and Discussion to summarize and discuss these results.

The authors have the following sentence in their Discussion "While height increases across all groups at a similar rate, illustrating the effects of non-genetic factors having an important role in the values of the trait, we do see differences based on percentage of AI ancestry." Given their new estimates of the rates of change in different groups the first part of this sentence needs to be revised.

eLife. 2020 Dec 29;9:e56029. doi: 10.7554/eLife.56029.sa2

Author response


Essential revisions:

This paper's major finding is an observation of an increase in Amerindigenous ancestry in Mexican Americans in the 1940-1990 period. The manuscript is interesting and in theory suitable for publication in eLife, but after a number of points are addressed. First, the authors need to articulate clearly the factors that may have caused this primary observation, and what may be the most likely explanations outlined below.

We developed a simulation framework to investigate the evolutionary forces that can/cannot contribute to changes in ancestry proportions over such a short period of time. More details are included below.

Second, they need to address that the primary explanation for the ROH increase is likely the amerindigenous ancestry increase, and in that sense, determine and clearly articulate the place of the assortative mating observation in the manuscript.

We have redone the ROH analysis to control for global Amerindigenous ancestry as suggested and found that the pattern remains: ROH increases over time at a rate that exceeds the increase in Amerindigenous ancestry. In contrast, we find the opposite pattern when we perform the analogous analysis with European ancestry.

Last, they need to clearly admit in the manuscript the importance of the un-modelled socio-economic variable in the correlation between amerindigenous ancestry and complex traits, and only present this analysis after controlling for appropriate covariates.

We have removed the previous correlative analyses, and replaced them with a more thorough statistical analysis of AI ancestry with biomedical traits while controlling for several covariates. We now further discuss unmodelled socio-economic variables in the Results as well as Discussion sections.

It is a time when we are finally seeing some population genetic studies of understudied populations as they relate to complex trait variation (to which this manuscript can be an important contribution), and so the bar to be as rigorous as possible in considering alternative explanations and model all possible covariates should be set extremely high.

We completely agree that extreme care must be taken when studying marginalized populations, lest more harm may result.

1) The main finding is the increase in amerindigenous ancestry in the 1940 – 1990 period in Mexican-Americans in the United States.

a) The authors state: "In our non-US born individuals (N=2987), we evaluated differences in ancestry estimates over time while accounting for years in the US and sampling weight and identified a significant effect of years in the US (𝛽=-0.0009; P=0.0006; SE=0.0003)." If I understand correctly, this regression is amerindigenous ancestry against time and other covariates. It would be helpful if the authors add to this sentence something along the lines of, "implying that individuals who arrived earlier to the US from Mexico had more European ancestry."

The reviewers understand correctly, and we have changed the sentence to reflect this: “In our non-US born individuals (N=2987), we evaluated differences in ancestry estimates over time while accounting for years in the US and sampling weight and identified a significant effect of years in the US (𝛽=-0.0009; P=0.0006; SE=0.0003) suggesting that individuals who arrived earlier to the US had less AI ancestry.”

b) Given the above analysis, and independent migration analyses (see, for example, https://www.migrationpolicy.org/article/mexican-immigrants-united-states-2) , it seems that migration from Mexico to the US shifted over the years from states with less amerindigenous ancestry to states with higher amerindigeous ancestry (Chiapas, Oaxaca, Veracruz) in the South and South-east of the country. This seems a highly likely explanation for the pattern of increasing amerindigenous ancestry that they see, and should be stated as such in the manuscript. This seems especially likely given that the signal comes primarily from non-US born individuals, or US-born individuals with parents born outside the US.

We have added to the conclusion “Independent analyses have shown that migration from Mexico to the US has shifted over the years from states with less Amerindigenous ancestry to states with higher Amerindigenous ancestry” and have cited the suggested reference.

c) The authors state that "It is possible that the increase in global Amerindigenous ancestry over time could be biased by changes in the specific subcontinental Amerindigenous ancestries over time (though such an effect is not visible in our UMAP analysis, Figure 1B)." – It is not clear what is meant by this sentence – please re-phrase and articulate more clearly. If it alludes to the difference in migration sources over time I mention above, I don't think their analyses of Fst and genetic diversity rule that explanation out.

This sentence has been simplified to “We next explored whether the increase in global Amerindigenous ancestry over time could be biased by local changes in the specific subcontinental Amerindigenous ancestries over time.”

d) Assortative mating (Figure 2D and Supplementary figure 9). This argument is puzzling because if there is assortative mating along indigenous ancestry as they suggest, then would this not mean that there is also assortative mating along the collinear European ancestry? If this is the case, why would amerindigenous ancestry be increasing in particular? The authors do state that assortative mating would not cause an increase in one ancestry. In that case, the paper overall does not provide an explanation for why the amerindigenous ancestry is actually increasing – is the migration sources explanation the most likely explanation? Along with that individuals with higher amerindigenous ancestry must be reproducing more? The likely explanations of the primary result should be made very clear for the reader.

Within Appendix 1, we have added new simulations that demonstrate how different factors such as population growth, migration, fecundity and assortative mating can shape differences in global ancestry patterns. According to the simulations, migration can have a large effect on shaping patterns of Amerindigenous ancestry, but other factors can shape these patterns as well. In particular, while assortative mating does not lead to differences in Amerindigenous ancestry over time on its own, assortative mating can amplify other factors (such as ancestry-related differences in fecundity). Given these simulations and the data we have analyzed, we argue that there is no single cause for the increased Amerindigenous ancestry over time. Rather, this increase is a result of all factors: migration, ancestry-related fecundity differences, ancestry-biased assortative mating, and population growth.

e) The authors state: "and this increase cannot be entirely explained by very recent migration." What is the evidence backing this claim?

This is based on our analyses discussed in the “Dynamic Global Ancestry Proportions in Mexican Americans” section. Specifically starting with, “In our non-US born individuals (N=2987), we evaluated differences in ancestry estimates over time while accounting for years in the US and sampling weight and identified a significant effect of years in the US (𝛽=-0.0009; P=0.0006; SE=0.0003) suggesting that individuals who arrived earlier to the US had less Amerindigenous ancestry. However, this did not change the effect of birth year on the proportion of global Amerindigenous ancestry (𝛽 = 0.0028; P<2e-16, SE=0.0003).” To clarify the conclusion, we have rephrased the sentence: “and this increase cannot be entirely explained by very recent migration based on our analyses non-US born individuals "

f) "Sampling weight" is not actually defined in the Materials and methods. Can the authors clarify how this is defined and used to weight the major analysis?

We have added the definition of the sampling weight to the “Study dataset and initial quality control” section within the Materials and methods and have cited the paper. Specifically, “The sample survey for design for HCHS/SOL has been described previously. Briefly, census block groups were selected in defined communities near each of the four recruitment centers, and households were sampled within census block groups. Households with Hispanic/Latino surnames and individuals as well as residents over 45 years old were oversampled in order to increase representation of the Hispanic/Latino target population and achieve a uniform age distribution. Sampling weights were calculated for each individual to reflect the probability of sampling.”

g) Please describe the bootstrap resampling performed – is the bootstrapping performed over individuals or segments of the genome? Please justify the strategy picked. This should be described in the manuscript.

We performed bootstrap resampling over individuals and this has been further described in within the manuscript. Specifically, we now say “Bootstrap analyses (Figure 2B and Figure 2—figure supplement 3) were performed by calculating relevant statistics based on repeated resampling of individuals with replacement. Bootstrap resampling results in an estimate of the variance of the statistics that we are calculating in our data, and allows us to assess the impact of outliers (who are only resampled in a subset of iterations).”

2) The pattern of ROH change over time observed.

a) What are the number of individuals in each decade? Please show this in the manuscript. If there are more individuals in later birth decades (as may be expected), you would see an increase in the ROH summed over each genome with time, simply because you are summing over more individuals at later time periods. It is not clear if the analyses for Figure 2C are normalized by the number of individuals in each decade – if not, this would be important to do, and only the normalized results should be reported.

Supplementary file 2. We have clarified within the figure captions that the ROH sums are ROH sums per person, but to address the below comment as well, we show the normalized data in Figure 3C.

b) The ROHsum increasing with time could simply be due to the amerindigenous ancestry increasing with time, as amerindigenous ancestry carries more short ROH segments than European ancestry (see for example Ceballos et al. Nature Review Genetics 2018). The authors should explicitly describe this, as this simplest explanation would not require assortative mating to be invoked either.

We thank the reviewers for drawing our attention to this point that we missed in our original analysis. We have redone this analysis by normalizing the ROH sums per person by their global Amerindigenous ancestry. We redid the analyses with the normalized data and this is reflected now within the “Increased runs of homozygosity over time” Results section including Figure 3C and Figure 3—figure supplement 3. Our prior conclusion remains. ROH increases at a rate faster than the increase in Amerindigenous ancestry.

3) Correlation of amerindigenous ancestry and complex traits

Many of the traits studied would also be affected by socioeconomic status (for example, height, cholesterol). Do the authors have this variable available? If yes, it should be included in the multiple regression. If not, it should be clearly mentioned that they are not able to account for this likely important effect, leaving their estimates confounded by socio-economic differences that likely correlate with amerindigenous ancestry. For Figure 3, we don't think it is fair to show tau between only amerindigenous ancestry and traits as this analysis does not account for important covariates, and would like to see supplementary file 3 instead to replace Figure 3 (in a figure form as the authors prefer) such that only the multiple regression effect sizes are reported in the manuscript that account for covariates.

We agree with the reviewers, our correlation analysis was too simplistic. As suggested, we have replaced this analysis with our multiple regression model that accounts for birth year, center, sex, and sampling weight (and included all regression statistics in Supplementary file 3). While some of the particular traits that are significant after Bonferroni correction changed slightly (we are now controlling for 5*69=345 tests instead of just 69), the overall conclusion remains: nearly 1/3 of the traits are correlated with genomic Amerindigenous ancestry. While we agree that socio-economic factors can have a direct impact on biomedical traits (and can also be correlated with Amerindigenous ancestry), HCHS/SOL did not collect this variable so we cannot include it in our analysis.

Why do a first pass of this analysis without covariates included, and then re-run with covariates in the Bonferroni significant subset of traits only? (Given that there will be confounding between Amerindigenous ancestry and socioeconomic, environmental and other non-genetic factors, and especially age). Furthermore, looking in the Materials and methods section, we cannot seem to find the full description of how this analysis was performed i.e. what models were run, and how/if phenotypic measures used were cleaned and normalized etc. Please provide this.

The reviewers are entirely correct, the first-pass correlative analysis was unwarranted. As discussed above, we replaced this analysis with the multiple regression model that accounts for birth year (e.g. age), center, sex, and sampling weight. To compare the effect of AI ancestry across traits, we quantile normalized all traits, and include a justification for our use of quantile normalization in the main text. We also compared the p-values for Amerindigenous ancestry effects across traits when the data were untransformed vs quantile normalized, and found a strong correlation (rho=0.944; p<2.2e16) with no statistical evidence for a difference in the distributions of p-values (MannWhitney U test p-value=0.857).

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Summary:

The manuscript is significantly improved, and the addition of the new simulation analyses greatly help in the interpretation of the trends that they see. The authors have sufficiently addressed concerns about two of their main results: (1) Interpretation of ancestry change over time and (2) Interpretation of ROH change over time. I still have the following points I would ask them to consider regarding their third main result (3) the potential effects of ancestry and ancestry change over time on the genetic architecture of complex traits, and to make appropriate additions/revisions to their analyses and text before submitting a revised manuscript. Beyond that, I see that the new simulation results don't appear in the manuscript until the Discussion. I would consider them a result and would like to see the authors try to integrate the main simulation results in the Results section, when they report their empirical observed patterns.

We appreciate the overall positive sentiment of our revision, and the focus on what additional steps would further improve our manuscript. We have now moved the simulations from Appendix 1 to the main Results section. They are now discussed after the “Strong ancestry related assortative mating in HCHS/SOL Mexicans” section and before the “Genetic association of global AI ancestry with biomedical traits” section. We hope our revisions on the potential effects of ancestry and ancestry change over time on the genetic architecture of complex traits section sufficiently address the requested revisions, as we describe further below.

Revisions for this paper:

The authors state in the manuscript that, "As illustrated in Figure 4A, 20 of these traits (29%) are significantly correlated after Bonferroni correction (P<0.000145), highlighting the need for increased investigation into the role of AI genetic ancestry and other unmodelled socio-economic variables in admixed populations such as Mexican Americans." As written, it understates the correlation between AI ancestry and unmodelled socio-economic variables in the United States which we know exists on the basis of historical and social science research. Given this, I would like to see the text revised to say that it is not clear what the correlation with AI ancestry implies, and while it could be reflecting genetic effects, it can also be reflecting socio-economic variables that are correlated with AI ancestry and that AI ancestry could be serving as a proxy for. The authors themselves show that AI ancestry is correlated with educational attainment levels which they state is a proxy for socio-economic status. I would like to see their model for testing the effect of ancestry on traits include as covariates: (1) educational attainment as a proxy for socioeconomic status, (2) whether they are US born or not, and whether their parents are US born or not, to help capture the effects of different environments they would have been born in, and that their parents would have created for them on various levels. I would also encourage the authors to make their model as rich as possible by adding other environmental variables they could obtain on their recruitment sites (altitude, latitude, longitude, population density, average obesity rates to name some that are likely relevant). Ideally, this kind of analysis would be done in a mixed model framework as well, correcting for the full genomic relationship matrix, and adding a random effect to account for unaccounted for environmental factors, but they should at least add covariates to their multiple regression framework that they have access to, or could access. They should also consider issues of collinearity as they may affect their estimates and study and report the Variance Inflation Factors (VIF) of the different variables. They should report in the manuscript, the results for the full model, giving coefficients and p-values for not only AI ancestry but also the other test variables, and should describe these in the results as well, and report and discuss the contribution of other test variables relative to AI ancestry as estimated from their model.

We appreciate the reviewers’ focus on improving our statistical model. As suggested, we added covariates for educational attainment, US born status, and number of US born parents in our multiple regression as these were the variables that we had access to. These results have been updated in Figure 4A, Supplementary file 3 and Figure 4—figure supplement 3. Supplementary file 3 specifically includes the coefficients and p-values for all test variables for each trait. Here we have included a figure (Author response image 1) to illustrate the differences in the effect size of AI ancestry before and after the additional adjustments accounting for educational attainment, US born status and number of US born parents. Notably, there was very little change. This was in additional to the original adjustments of birthyear, gender, center, and sampling weight. However, even after accounting for these additional variables, the effect sizes of AI ancestry were largely unchanged (Pearson correlation coefficient = 0.984, P<2.2E-16). There was one trait (% Immature granulocytes) that changed from a negative association with AI ancestry to a positive association with AI ancestry, but the AI effect on this trait was not statistically significant before or after the addition of additional covariates.

Author response image 1.

Author response image 1.

We modified the above reviewer-quoted statement to “As illustrated in Figure 4A, 18 of these traits (26%) are significantly associated with AI ancestry (Bonferroni correction P<9.1E-5) after adjusting for several factors including birth year, educational attainment, US-born, and number of US-born parents. While this suggests that genetic ancestry has an effect on several traits, other unmodeled socio-economic variables that are correlated with AI ancestry may also be contributing to these patterns. Regardless, these findings highlight the need for increased investigation into the role of AI genetic ancestry in admixed populations such as Mexican Americans.”

We have included Supplementary file 3, which includes results for all phenotypes (raw and quantile normalized) with the effect size, SE, and P-value for each covariate.

Further, the authors results show three observations that I would like to see described in the Results and discussed in the Discussion, as the implications are important. The authors observe an increase in height in Mexican Americans (at roughly the same rate in all amerindigenous ancestry stratifications, see note below) with birth year. First, they do not see any trend of the polygenic height score with birth year. This suggests that while the genetic predisposition of the trait remains the same, the trait has changed significantly due to non-genetic environmental factors.

We have elaborated on this further below.

Second, even though amerindigenous ancestry is negatively correlated with height, and amerindigenous ancestry is increasing over time, the trait value/height increases rather than decreases over time. This also points to the effects of non-genetic factors playing an important role in values of the trait.

This is true. Both genetic and non-genetic factors drive variation in height. Height is estimated to have a broad-sense heritability of 80% in Northern European populations, suggesting that environmental factors explain ~20% of the variation in height in these populations. It is unclear how these estimates of heritability translate to Mexican Americans. Within the Discussion, we have elaborated further on this. We have added, “While height increases across all groups at a similar rate, illustrating the effects of nongenetic factors having an important role in the values of the trait, we do see differences based on percentage of AI ancestry. Individuals with lower percentages of AI ancestry were taller on average than individuals with higher AI ancestry pointing the role of AI ancestry on the trait.”

Last, if amerindigenous ancestry is negatively correlated with height due to genetic reasons, then shouldn't we expect to see the polygenic height score decrease with birth year, as amerindigenous ancestry increases with birth year? How do the authors interpret this meta pattern across their analyses, and what implications does it have for how temporal change in ancestry can alter the genetic architecture of traits? Overall, this could mean that (1) the correlation of amerindigenous ancestry with height is at least partly due to genetic reasons. Given that, while ancestry trends (AI ancestry increasing), combined with correlations of ancestry with traits, may make you predict one thing with respect to the genetic architecture of traits (height will decrease with birth year), the way heredity interacts with the environment through the randomness of development makes the trait move in the opposite direction than the model would predict. Or (2) the correlation of amerindigenous ancestry with height is fully picking a signal of height being lower in individuals with higher amerindigenous ancestry due to primarily environmental reasons, and therefore, as environment changes, the correlation is not meaningful for prediction. I would like to see the authors consider the above, and state these patterns as they stand across analyses, and discuss their implications for their overall thesis (as it relates to height and traits in general).

This is a very complex issue, and we have attempted to be conservative in the way we describe these patterns. For example, Figure 4—figure supplement 2 shows that there is indeed a slight negative overall trend for polygenic height score and birth year when we accounted for additional environmental variables including educational attainment, US born status and number of US born parents. The slope is not significant (P=0.14), so the approach we take is to not draw conclusions upon it. As a further point of clarification, PHS is only correlated with observed height in the bottom two quartiles of AI ancestry (i.e., only the Mexican Americans with highest European ancestry). As AI ancestry increases over time, we expect the performance of PHS to decrease. Such a decrease in accuracy could also manifest as an elimination of signal with birth-year.

It is possible we did not see a significant trend because the of the way the polygenic height score is calculated as a metric. As we know, the majority of GWASs have been performed in populations with primarily European ancestry thus providing insight into our understanding of the genetic architecture of height. However, due to the exclusion of diverse populations, we are still limited in our full understanding of the genetics of height.

A recent study of Peruvians (1) demonstrated the role of population specific variants and their contributions to differences in height. As we do not fully understand the genetics of height in Amerindigenous or admixed populations, it is possible there may be ancestral specific variants within the AI component in Mexicans. These variants may have a significant effect on the trait but these variants would not have been captured in the PHS as these GWAS results were derived from an analysis on European individuals.

Individuals with higher AI ancestry may harbor variants that may contribute more to differences in height that would not have been detected in a European GWAS. Even for Mexican individuals with higher European ancestry, they still may have variants within the AI component that could have a significant impact on the trait. However, these analyses exceed the limitations of the data available and are therefore outside the scope of this manuscript.

Note: They say in the study, "We find a similar trend in the HCHS/SOL Mexican Americans (Figure 4B). Indeed, when we stratified individuals by quartiles of global AI ancestry, we see that all quartiles have increased in height by a similar amount over the period investigated." Can the authors make this statement more specific in the manuscript – what are the rates of change in the different quartiles? Are they higher in quartiles with higher indigenous ancestry? Please integrate this into the Discussion above as well.

We have elaborated this further in the manuscript by specifically adding to the results, “The rates of change in height between AI quartiles were all positive and significant (P<5E-6). The largest was for the quartile with the highest AI ancestry, but the rates did not change monotonically with respect to AI ancestry across quartiles. The estimates for the quartiles with their 95% CIs are: 𝛽=0.135 (CI:0.097-0.173) for AI>0.58; 𝛽=0.124 (CI:0.089-0.160) for 0.46<=AI<=0.58; 𝛽=0.083 (CI:0.047-0.119) for 0.37<=AI<=0.46; and 𝛽=0.113 (CI:0.074-0.151) for AI<0.37.”

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

To be able to appreciate the effects of different factors affecting complex trait variation, can the authors add the effect sizes of the important covariates to Figure 4A. Or, I would like to see these as supplemental figures. This is to put the AI ancestry effect in context, and see its magnitude relative to the effect of other non-genetic factors that have been modelled. While the authors have reported these in Supplementary file 3, figures will help more with being able to compare and parse the results. Further, I'd like to see a few sentences added to the Results and Discussion to summarize and discuss these results.

We agree with the suggestion of a new figure and how the effects of all of the different factors can be better appreciated in addition to Supplementary file 3. Instead of adding to the main Figure 4A, we added a supplementary figure (now Figure 4—figure supplement 1) due to the high number of total points (traits x covariates). Within the Results section we have included, “While this suggests that genetic ancestry has an effect on several traits, other unmodeled socio-economic variables that are correlated with AI ancestry may also be contributing to these patterns (though AI ancestry has among the strongest effects on a range of biomedical traits, comparable to the effects of gender; Figure 4—figure supplement 1).”

For context the paragraph now reads as, “As illustrated in Figure 4A, 18 of these traits (26%) are significantly associated with AI ancestry (Bonferroni correction P<6.6E-5) after adjusting for several factors including birth year, center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents. While this suggests that genetic ancestry has an effect on several traits, other unmodeled socio-economic variables that are correlated with AI ancestry may also be contributing to these patterns (though AI ancestry has among the strongest effects on a range of biomedical traits, comparable to the effects of gender; Figure 4—figure supplement 1). Regardless, these findings highlight the need for increased investigation into the role of AI genetic ancestry in admixed populations such as Mexican Americans.”

Within the Discussion, we have rephrased one of the sentences, “We identify several biomedical traits that are associated with Amerindigenous ancestry, and show that in the case of height, there are both ancestry and temporal effects” to “We identify several biomedical traits that are associated with Amerindigenous ancestry, with effects comparable to the high effects of gender, and show that in the case of height, there are both ancestry and temporal effects.”

We kept the new minor additions simple as we believe the sections in the Discussion that we had previously written about the importance of studying diverse populations are solid.

The authors have the following sentence in their Discussion "While height increases across all groups at a similar rate, illustrating the effects of non-genetic factors having an important role in the values of the trait, we do see differences based on percentage of AI ancestry." Given their new estimates of the rates of change in different groups the first part of this sentence needs to be revised.

We have reworded the sentence to “While we do see differences in mean height based on percentage of AI ancestry, height increases over time in all groups at similar rates.” We hope this clarifies that the “differences based on percentage of AI ancestry” were referring to the mean heights for each group rather than the rates.

References

1) Asgari S, Luo Y, Akbari A, Belbin GM, Li X, Harris DN, et al. A positively selected FBN1 missense variant reduces height in Peruvian individuals. Nature. 2020;582(7811):234-9.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Conomos MP, Laurie CA, Stilp AM, Gogarten SM, McHugh CP, Nelson SC. 2016. Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos. phs000810.v1.p1. phs000810.v1.p1 [DOI] [PMC free article] [PubMed]
    2. Fisher GG, Ryan LH. 2018. Overview of the Health and Retirement Study and Introduction to the Special Issue. A11-E91-13B. A11-E91-13B [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    Supplementary file 1. Association of global ancestries and birth year for all HCHS/SOL individuals.

    For each population, we tested for an association between global ancestry and birth year while accounting for the sampling design. AI, AFR, and EUR refer to Amerindigenous, African, and European ancestry respectively. The significance threshold was set at 0.003 using Bonferroni correction for multiple testing (0.05/18).

    elife-56029-supp1.xlsx (9.6KB, xlsx)
    Supplementary file 2. Frequency table of 3622 HCHS/SOL Mexican Americans stratified by recruitment region, US-born vs non-US-born status, gender and educational attainment.

    Recruitment was performed at four regions: Bronx, Chicago, Miami and San Diego. Education attainment was categorized as either less than a high school diploma or equivalent degree (<HS), equal to a high school diploma or equivalent degree (=HS), or post-secondary education (>HS).

    elife-56029-supp2.xlsx (9.2KB, xlsx)
    Supplementary file 3. Association of quantitative traits and Amerindigenous ancestry in HCHS/SOL Mexican Americans.

    Each trait as a function of AI ancestry adjusted by birth year, center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents. Results are shown for both the raw data and quantile normalized data.

    elife-56029-supp3.xlsx (105.3KB, xlsx)
    Supplementary file 4. Height over time.

    Height (cm) as a function of birth year adjusting by center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents for 3604 Mexican Americans stratified by the quartiles of global Amerindigenous ancestry (AI).

    elife-56029-supp4.xlsx (9.3KB, xlsx)
    Supplementary file 5. Predicted height vs. observed height.

    Predicted height (cm) as a function of observed height (cm) adjusting by center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents for 3604 Mexican Americans stratified by Amerindigenous ancestry (AI).

    elife-56029-supp5.xlsx (9.4KB, xlsx)
    Transparent reporting form

    Data Availability Statement

    All data used in this manuscript were downloaded from publicly available sources (dbGap). No new data were created.

    The following previously published datasets were used:

    Conomos MP, Laurie CA, Stilp AM, Gogarten SM, McHugh CP, Nelson SC. 2016. Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos. phs000810.v1.p1. phs000810.v1.p1

    Fisher GG, Ryan LH. 2018. Overview of the Health and Retirement Study and Introduction to the Special Issue. A11-E91-13B. A11-E91-13B


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES