Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2023 Sep 18;110(10):1804–1816. doi: 10.1016/j.ajhg.2023.08.015

Demographic modeling of admixed Latin American populations from whole genomes

Santiago G Medina-Muñoz 1, Diego Ortega-Del Vecchyo 2, Luis Pablo Cruz-Hervert 3, Leticia Ferreyra-Reyes 3, Lourdes García-García 3, Andrés Moreno-Estrada 1,, Aaron P Ragsdale 1,4,∗∗
PMCID: PMC10577084  PMID: 37725976

Summary

Demographic models of Latin American populations often fail to fully capture their complex evolutionary history, which has been shaped by both recent admixture and deeper-in-time demographic events. To address this gap, we used high-coverage whole-genome data from Indigenous American ancestries in present-day Mexico and existing genomes from across Latin America to infer multiple demographic models that capture the impact of different timescales on genetic diversity. Our approach, which combines analyses of allele frequencies and ancestry tract length distributions, represents a significant improvement over current models in predicting patterns of genetic variation in admixed Latin American populations. We jointly modeled the contribution of European, African, East Asian, and Indigenous American ancestries into present-day Latin American populations. We infer that the ancestors of Indigenous Americans and East Asians diverged ∼30 thousand years ago, and we characterize genetic contributions of recent migrations from East and Southeast Asia to Peru and Mexico. Our inferred demographic histories are consistent across different genomic regions and annotations, suggesting that our inferences are robust to the potential effects of linked selection. In conjunction with published distributions of fitness effects for new nonsynonymous mutations in humans, we show in large-scale simulations that our models recover important features of both neutral and deleterious variation. By providing a more realistic framework for understanding the evolutionary history of Latin American populations, our models can help address the historical under-representation of admixed groups in genomics research and can be a valuable resource for future studies of populations with complex admixture and demographic histories.

Keywords: demographic inference, Latin America, site frequency spectrum, local ancestry, admixture, population genetics


Drawing from whole-genome data, we infer detailed demographic models for Latin American populations. Our models connect ancient demographic events with recent admixtures, highlighting the often-overlooked Asian migrations in Latin America. These models provide greatly improved accuracy for genomic studies of admixed populations, especially those based on simulations.

Introduction

Genetic diversity among human populations has been shaped by recurrent periods of migration and admixture.1 Relatively recent admixture across Latin America has resulted in a rich and complex history, shaped by the mixing of ancestries from multiple regions, including Indigenous American, European, and African populations. Each of these populations has its own unique demographic history, both prior to and following their arrival to the continent.2,3,4 As a result, the genetic composition of Latin American populations depends on both the shared broad-scale history of global human expansion and distinct regional admixture events, leading to patterns of genetic diversity that can vary markedly among Latin American populations and between individuals. However, there is currently a lack of historical models that accurately capture the demographic history and genetic composition of Latin American populations.

Accurate demographic models are essential for investigating a range of evolutionary, epidemiological, and ecological questions, as well as for simulating realistic genetic data.5,6 But despite the importance of understanding the demographic dynamics of Latin American populations, current models often fail to fully capture the heterogeneity of their genetic diversity. One limitation is that such models do not incorporate the diverse ancestries that have contributed to the genetic makeup of Latin American populations. For example, previous studies have used East Asian populations as a proxy for Indigenous American ancestries,7,8 despite the genetic divergence between present-day Indigenous American and East Asian individuals due to many thousands of years of isolation and serial founder effects during the peopling of the Americas.9,10,11,12

Current models are also limited by simplistic admixture histories, which do not accurately reflect the complexity of admixture in Latin American populations.8,13 Admixture histories in the Americas vary across geographical regions,4,14 and Latin American populations have experienced admixture events that involved additional populations beyond those typically considered in demographic models. For example, admixture events in Mexico have also involved individuals who descended from East and Southeast Asian populations.15 This highlights the need for more nuanced models to fully capture the diverse ancestral contributions to the genetic makeup of present-day populations.

The demographic histories of Latin American populations comprise both recent and deeper-in-time processes that shape patterns of genetic diversity.16,17,18 Inferring demographic models for these cohorts is challenging, as it requires considering multiple historical timescales. At the most recent timescale, admixture between source populations with different present-day genetic ancestries has resulted in a mosaic of genetic ancestry in Latin America. In turn, variation within contributed ancestries has been shaped by the demographic histories of each source population. This deeper-in-time history therefore also contributes to genetic diversity among Latin Americans.19 To fully capture the intricate histories of these populations, it is therefore essential to consider both of these timescales jointly in demographic inference.

Here, we infer more comprehensive demographic models for multiple cohorts in Latin America that better capture the complex and multifaceted history of these populations. To do this, we employed high-coverage whole-genome sequences from worldwide populations and from across Latin America, including Colombia, Mexico, Peru, and Puerto Rico (Figure 1). We also included 50 recently sequenced individuals of Indigenous ancestries from Mexico, which provide increased resolution to distinguish East Asian and Indigenous American ancestries.3,20 We integrated demographic inferences at different timescales using the distributions of allele frequencies within and between populations and the numbers and lengths of ancestry tracts within individuals. Using extensive genetic simulations, we show that the resulting models accurately reflect multiple features of genetic variation observed in Latin American cohorts.

Figure 1.

Figure 1

Populations used for demographic reconstruction

(A) Map showing the approximate sampling locations and the population names and codes. Numbers in parentheses denote the number of sampled individuals. Latin American cohorts are marked with a cross.

(B) We used ADMIXTURE (with K = 4) to visualize ancestry proportions of source populations and those from Latin America. The plot reveals substantial variation in ancestry proportions both across populations and among individuals within populations.

Our inferred demographic models can be coupled with models of selection or quantitative traits to explore the joint effects of demography and selection in shaping genomic variation. We demonstrate this application by recovering patterns of variation among missense and nonsense mutations in protein-coding genes. This approach can be extended to model the genetic basis of disease susceptibility and other traits in Latin American cohorts and should serve as a valuable resource for researchers studying evolutionary and medical genomics in Latin American cohorts and other populations with a recent genetic mixture.

Subjects and Methods

Whole-genome data and quality control

We used two high-coverage whole-genome datasets to investigate the demographic dynamics of Latin American populations. The first dataset was the 1000 Genomes Project (1KGP),21 which provided 30× high-coverage genomic data from the NYGC that was mapped to the GRCh38 reference genome. We selected reference samples for Africa (AFR), Europe (EUR), and East Asia (EAS) from this dataset; specifically, we used 108 Yorubans from Ibadan Nigeria, 107 Iberians from Spain, and 103 Han Chinese from Beijing, respectively. In addition to these reference samples, we also used data from admixed cohorts within the 1KGP, including 60 Mexicans from Los Angeles (MXL), 104 Puerto Ricans from Puerto Rico (PUR), 94 Colombians from Medellin (CLM), and 85 Peruvians from Lima (PEL), to infer admixture history. The second dataset was from the MX Biobank (MXB) project3,20 and included 50 high-coverage whole genomes from Indigenous American (AME) descendants in present-day Mexico.

To combine the datasets from the 1KGP MXB projects, we used the CrossMap22 tool to liftover the MXB genomes from the GRCh37 reference genome to the GRCh38 reference genome. Next, we applied two rounds of filters to remove variants that were likely affected by genotyping error or batch effects. The first filter, applied to each source population (AFR, EUR, EAS, and AME) separately, removed variants in Hardy-Weinberg disequilibrium with a p value of 10−4. The second filter, with a more strict Hardy-Weinberg p value of 0.05, was applied with the purpose of removing potential batch effects. In this case, we applied the second filter to the 50 MXB genomes combined with a set of 29 individuals from the 1KGP (2 MXL and 27 PEL) with the highest proportion of Indigenous American ancestries. In addition to these filters, we also removed variants that were located in masked regions of the genome. After applying these filters, we obtained a combined dataset that was used for all downstream analyses in the study.

Admixture analysis

We used the software package ADMIXTURE v.1.323 to estimate individual ancestry proportions based on genome-wide single-nucleotide polymorphism (SNP) data. We applied the following filters to the data: SNPs with a minor allele frequency below 0.05 and SNPs with a Hardy-Weinberg equilibrium test p value below 0.001 were excluded. In addition, we filtered SNPs in linkage disequilibrium with each other using a threshold of r2 = 0.1. We ran ADMIXTURE with default parameters, specifying the number of ancestral populations (K) to range from 2 to 6. We conducted an evaluation of the cross-validation error across different K values and found that K = 4 yielded the lowest error.

Demographic inference from allele frequencies

To accurately infer the demographic model, we excluded variants located in CpG sites, which have been shown to have highly variable mutation rates and could potentially bias the results.24,25 To infer the demographic model, we calculated the folded joint site frequency spectrum (jSFS) for intronic, intergenic, and synonymous variants. The jSFS is the distribution of allele frequencies across multiple populations and can be used to infer demographic parameters such as effective population sizes, divergence times, and migration rates.7

We used data from the Yoruba in Ibadan, Nigeria (YRI, as a proxy for African ancestries, AFR), the Iberian population in Spain (IBS, as a proxy for European ancestries, EUR), the Han Chinese in Beijing, China (CHB, as a proxy for East Asian ancestries, EAS), and the Indigenous American population in Mexico (as a proxy for American ancestries, AME) (Figure 1). These or related populations contributed to the genetic makeup of present-day Latin Americans19 and by using the jSFS, we aimed to identify major demographic events that have influenced their genetic diversity. We classified the variants according to their functional effect with VEP v.102.26 We projected the SFS to 40 haploid copies per population, resulting in a jSFS with four dimensions, one for each of the four populations (AFR, EUR, EAS, and AME). The projection of the jSFS to a smaller sample sizes retains information from the full dataset but reduces the effective sample size so that inference is computationally tractable.

To infer demographic parameters, we used the model moments as our inference engine due to its ability to handle inferences for more than three populations using a diffusion approximation.27 We adopted a two-step approach to infer demographic parameters. In the first step, we inferred a three-population out-of-Africa model (AFR, EUR, and EAS),7,27 which included 14 parameters such as effective population sizes, split times, and migration rates. In the second step, we included the Indigenous American population as branching out from the ancestors of the East Asian population and inferred three additional parameters: the AME split time from EAS, the bottleneck size, and the population expansion rate (Figure 2A). We performed this same inference for each class of putatively neutral jSFSs (intronic, intergenic, and synonymous). To account for uncertainty in the parameters, we divided the genome into non-overlapping windows of 10 Mb and calculated the jSFS and scaled mutation rates for each window. We generated bootstrap replicates from these windows (n = 288) by sampling with replacement and used a Godambe Information Matrix (GIM) approach28 to estimate parameter uncertainties. Reported confidence intervals from the GIM account for uncertainty due to sampling variance and a finite genome. Uncertainty in estimated mutation rates or average generation time are unaccounted for, so true uncertainty is likely larger than the reported uncertainties using the GIM. However, we expect model and parameter uncertainty to be larger than that of mutation rates and average generation times.

Figure 2.

Figure 2

Inference of continental population history from allele frequencies

(A) Scheme of the inferred model. Widths are proportional to the effective population size, and fonts in italics show the name of the inferred parameters. Abbreviations: anatomically modern humans (AMH) and out-of-Africa (OOA).

(B) Comparison between the folded intronic SFS for the data (blue) and the model’s prediction (orange). Background lines (gray) show the model’s prediction in the rest of the populations.

(C) Comparison of estimated parameters with different sets of SNPs (synonymous, intergenic, and intronic). The error bar denotes the bootstrap confidence intervals (±2 standard deviations). The 95% confidence intervals overlap between annotations for the vast majority of inferred parameters.

Estimation of mutation rates

To convert genetic to physical units (i.e., event times in generations and effective instead of relative population sizes) we required estimates of total mutation rates in retained genomic regions. To estimate this scaled mutation rate, μL, for non-coding (intronic and intergenic) and coding (missense, synonymous, and loss-of-function) regions of the genome, we first subset the genomic coordinates for these regions. For intergenic and intronic regions, we counted the occurrence of each triplet context (excluding CpG contexts) and used the corresponding total mutation rates for each possible mutation for each triplet context, obtained from the gnomAD mutation model,24 to weight the counts. We then summed the weighted counts for all triplets to estimate the scaled mutation rate for non-coding regions. For coding regions, we generated a file with every possible mutation (as bi-allelic SNPs) in the coding regions of the genome and used VEP v.102 tool to predict the consequence of each variant.26 For each consequence (synonymous, missense, or nonsense), we counted the occurrence of each triplet and used the same approach as for non-coding regions to estimate the scaled mutation rate.

Accounting for background selection

Within each of the non-overlapping 10 Mb genomic windows described above, we used estimates of the strength of background selection proposed by McVicker et al.29 to partition data. Within each 10 Mb interval, we computed the weighted mean of McVicker’s B value over that interval. We then stratified windows into four quartiles predicated on the estimated B values. We then executed the same inference process detailed above within each of these quartiles.

Local ancestry inference

We used Gnomix30 to infer the ancestry of specific ancestry segments within the Latin American genomes. We first constructed a reference panel consisting of individuals with known African, European, and Indigenous American ancestries. We selected the Yoruba population (YRI = 108) to represent West African ancestries, the Iberian population in Spain (IBS = 107) and the British in England and Scotland (GBR = 91) for European ancestries, and a combination of MXB Indigenous American samples (n = 50) with 30 1KGP individuals with high Indigenous American ancestries proportions from Peru (PEL = 28) and Mexico (MXL = 2) for Indigenous American ancestries. In addition, we also generated a reference panel including East-Asian ancestry, using the Han Chinese population (CHB = 103) as a reference. We then trained two Gnomix models using these two reference panels, using the default settings which are optimal for whole-genome data and setting the phase parameter to True, in order to re-phase the genomes using the predicted local ancestry. Finally, we used these models to predict the local ancestry in each of the Latin American genomes.

Inference of admixture history

To infer the admixture history, we used the ancestry tract length distribution obtained from the Gnomix output, which we collapsed into haploid ancestry tracts. We analyzed the admixture dynamics independently in each Latin American cohort (MXL, PEL, CLM, and PUR). We evaluated five different admixture models with tracts31: ppx_xxp, ppx_xxp_pxx, ccx_xxp, ppx_ccx_xxp, and ppc. The order of each letter in these models corresponds to the order of the source ancestries (AMR, EUR, and AFR), with an underscore indicating a distinct migration event. A “p” represents an ancestry pulse (e.g., xxp is a pulse from AFR ancestry), “c” indicates continuous migration, and “x” indicates no input. We selected these models for analysis because they cover a range of plausible admixture history scenarios and have been previously tested in similar cohorts.32,33,34,35

To account for the uncertainty in estimated parameters, we generated 100 bootstrap replicates of the ancestry tract distribution by randomly sampling individuals with replacement. For each replicate, we calculated the Bayesian Information Criterion (BIC) to determine the best-fit model. The selected model for each population was based on the BIC as well as agreement with historical evidence when more than one model provide comparable fits to the data. Local ancestry inference identified 0.5% East Asian ancestry in the MXL and 1% EAS ancestry in PEL, but 0.2% EAS ancestry in the PUR and CLM cohorts. We therefore explored four-source models for both the MXL and PEL. For the MXL population, we used our best-fit model from the three population inference (the ppx_ccx_xxp model) and added a pulse of East Asian ancestry occurring simultaneously with the African pulse.15 We optimized this model by exploring the parameter space to find the best fit parameters (modeled as ppxx_ccxx_xxpp, where the fourth population index accounts for EAS-associated migration). For the PEL population, we adapted our best-fit three-population model (ppx_ccx_xxp) to allow for a more recent EAS-associated pulse (modeled as ppxx_ccxx_xxpx_xxxp).

Combining tracts and moments inferences

We integrated our demographic inferences from allele frequencies and tracts into a unified model. To do so, we used demes36 to combine the parameters. In order to account for variations in the generation time, we used a value of 29 years.37 This enabled us to transform the parameters from tracts and moments (e.g., the timing of divergence and admixture events) from generations to years in the past.

Neutral simulations

To ensure the accuracy of our demographic model, we conducted coalescent simulations of neutral genetic sequence data. We used msprime38,39 to implement the classical coalescent with recombination model (Hudson’s algorithm) for the majority of the simulation, and for the most recent 20 generations we employed the discrete-time Wright-Fisher model,40 which better accounts for large sample sizes and genome lengths and provides more accurate patterns of identity-by-descent and ancestry tract variation in admixed populations. To further verify the correctness of our model, we used multiple simulation engines to compare to observed data, including msprime, fwdpy11, and moments. We used tskit for downstream analysis of the simulated genomes.

PCA analysis

We performed a principal component analysis (PCA) on three datasets using the scikit-allel software.41 The three datasets were (1) observed genotypes, (2) simulated genotypes generated by our model, and (3) simulated genotypes generated by the Browning et al. model.8 For the PCA analysis, we excluded all singleton variants. We also performed linkage disequilibrium (LD) pruning by removing variants in high LD (r2 > 0.1). The PCA analysis was conducted separately for each dataset.

Computing linkage disequilibrium

We computed linkage disequilibrium (LD) to compare LD decay between data and model predictions. LD was computed as σd2=E[D2]/E[p(1p)q(1q)], where D is the standard covariance measure of LD, and p and q are allele frequencies at the two loci.42 σd2 is closely related to the squared correlation r2 but has the advantage that model predictions can be directly computed.43,44 We binned pairs of intronic and intergenic variants by their recombination distance, using the HapMap genetic map lifted over to GRCh38 coordinates,45,46 with 18 bins roughly logarithmically spaced between 0 and 1 cM. For each population (YRI, CHB, IBS, AME, and MXL), we computed three LD decay curves: (1) from the real data using software that returns unbiased LD estimates from unphased data,47 (2) neutral expectations computed directly from the model parameterization,44 and (3) from individual-based simulations including background selection.48 These approaches for computing σd2 (instead of r2) allows for unbiased comparisons when sample sizes differ between populations and avoids any biases due to phasing errors.

Forward-in-time simulations

We performed forward simulations using the software fwdpy11 v.0.18.3.48,49 Our simulations incorporated our inferred demographic model. To ensure the realism of our simulations, we divided the human genome (autosomes only) into windows of 1 Mb and randomly selected 350 of these regions for simulation. We then analyzed the functional annotation of each of these regions, including intergenic, intronic, and coding sequences, and calculated the corresponding mutation rates. To accurately capture the effects of recombination on evolutionary dynamics, we incorporated a human recombination map for GRCh38 obtained from https://github.com/odelaneau/shapeit4/tree/master/maps into our simulations.45,46 We incorporated distributions of fitness effects (DFEs) for missense and nonsense mutations, in which nonsense mutations were inferred to be more strongly selected against.50 To do this, we employed a multiplicative fitness model that allowed us to examine the influence of selective pressures on the genetic diversity of our simulated populations (Figure S8). In comparing between simulations and data, we aggregated data across the 1 Mb regions of the genome that were simulated. The simulation was run for 152,471 generations, corresponding to the recent history of the five modeled populations. Finally, we sampled individuals from each of the five populations at the present time, with sample sizes matching those of our data.

Results

Inferring the deep demographic history through allele frequencies

We used the joint site-frequency spectrum (jSFS) to infer a demographic model for the broad-scale human expansions that resulted in populations representing sources of ancestry of recent admixture in Latin America. Our inferred four-population demographic model builds upon previously inferred models for the out-of-Africa expansion involving AFR, EUR, and EAS populations.7,17,27 To include the Indigenous American population, we considered models in which their ancestors branched from those of EAS, introducing three additional parameters: the AME split time from EAS, the bottleneck size, and the population expansion rate (Figure 2A). We used moments27 to fit our parameterized models to the observed jSFS for three putative neutral mutation classes—intergenic, intronic, and synonymous variants—which were fit separately.

Our inferred models for deep population history were consistent with the observed SFS for all three mutation classes (Figures 2B, 2C, and S1; Table S1). The estimated 95% confidence intervals for most parameters overlapped across the three sets of SNPs (Figure 2C; Table S1). This consistency across annotations suggests that our inferred models are robust to differences in selective effects (either direct or linked) between mutation classes. We also evaluated alternative models of population size changes in the AME branch, but these models did not provide a better fit to the data (Figure S2). We found that the inferred AME split time was consistent across all tested models, with values ranging from 29,000 to 33,000 years ago.

Because background selection (BGS) is known to bias demographic inference,51,52 we evaluated the robustness of our inferred demographic model across genomic regions with different inferred strengths of BGS. We partitioned the genome using McVicker’s B values,29 recalculated the jSFS from each subset, and reinferred the same parameterized model using data from regions of low, medium, and high estimates of BGS. As expected, the SFS in each population is skewed toward rare variants in regions experiencing higher BGS (Figure S3). This did not translate to a perceptible difference in our estimated model parameters, except for the inferred ancestral effective population size (Figures S3D and S4). Ancestral Ne (effective population size) was smaller in subsets with stronger BGS, mirroring the decrease in pairwise diversity across subsets (Figure S3C). The consistency of other demographic parameters across B values suggests that our inferred models are not strongly affected by BGS, supporting previous work showing that SFS-based demographic inference is robust to linked selection at levels expected from the genomic architecture in humans.53

While previously reported models have excluded either the Indigenous American population8,17 or one or more of the other populations (European, African, and East Asian)7,12 due to methodological limitations, we reconstructed a broad-scale demographic model for global human dispersal that includes the peopling of the Americas. By including all four populations in our analysis, we were able to provide a model that better predicts allele frequencies in each of these groups (Figures 2A and 2B).

Inferring admixture dynamics in Latin American populations

The recent history of Latin American populations involves multiple periods of immigration and extensive gene flow among different Indigenous American, European, and African populations.4,16,18 Population genetic methods that rely on the jSFS may not be appropriate for resolving such admixture dynamics, as they typically assume a straightforward admixture process involving only a few ancestral populations and lack the necessary resolution to accurately infer the timing of admixture events that occurred relatively recently. However, by analyzing the ancestry tract length distribution (i.e., the distribution of lengths of continuous ancestry segments from each source population in admixed genomes), we inferred admixture models that include up to four ancestral populations and complex migration events.31

Using the distribution of ancestry tract lengths, we inferred admixture histories for four cohorts: Mexico (MXL), Puerto Rico (PUR), Colombia (CLM), and Peru (PEL). To calculate the ancestry tract length distribution, we constructed a reference panel of African, European, East Asian, and Indigenous American ancestries to perform local ancestry inference30 (Figure 3A). We treated each population independently, as admixture histories vary between regions in Latin America,4,14,16 leading to diverse ancestry patterns (Figure 3B). In each population, we used tracts31 to test five admixture models including three source populations: AFR, EUR, and AME (Figure S5 and subjects and methods). These models allowed us to explore a variety of admixture scenarios including single pulses of gene flow, multiple pulses, and continuous migration.

Figure 3.

Figure 3

Inference of admixture history from ancestry tract length distributions

(A) Karyogram with inferred local ancestry tracts (K = 3) in a Mexican individual (MXL).

(B) Ternary plot of ancestry fractions for African, European, and Indigenous American ancestries inferred with ADMIXTURE. Points in blue correspond to the individuals from the population shown, with other populations shown as gray points.

(C) Ancestry tract length distribution of data and best-fit model predictions. Plotted points show the aggregate tract length counts, lines show the maximum-likelihood best-fit tract length distributions for the best model in each population, and shading shows one standard deviation confidence interval, assuming a Poisson distribution of counts per bin.

(D) Scheme of inferred admixture models. Top: pie charts sizes indicate an approximate proportion of migrants in each generation, with the fraction of migrants of each origin in a given generation. The dashed lines connecting the smaller circles denote continuous migration. Bottom: the y axis shows the expected fraction of ancestry in the population over time, and the x axis represents generations ago, 15.9 ga corresponds to c. 1548, and 14 to c. 1604 (1 generation = 29 years).

We used a Bayesian Information Criterion approach (BIC) to compare the fit of these models and select the best-fit model (Figures 3C, 3D, and S6). For Colombia, two of the three-population models provided a good fit to the data based on the BIC: one with a recent discrete admixture event ∼10 generations ago (early 1700s, assuming mean generation times of 29 years), and another that included continuous migration from Indigenous American and European ancestries after an initial admixture event ∼14 generations ago (c. 1600). This second model more closely aligns with known historical events,54,55 leading us to prefer the model with post-admixture continuous migration. Overall, the models that were selected are consistent with previous studies where admixture histories in Latin America have been explored.32,33,34

Previous work has identified genetic connections between Mexico and Asia that originated during the Manila galleon trade between the colonial Spanish Philippines and the Pacific port of Acapulco.15 We identified ∼1% (0.5–1) EAS ancestry among individuals in the MXL cohort using both ADMIXTURE and local ancestry inference (Figure S7). To incorporate this fourth source ancestry, we added an East Asian component to our admixture model for Mexico. We adapted our best-fitting three-population model to include an additional migration pulse from East Asia, occurring at the same time as the African pulse. This choice was motivated by the historical timing of the Manila galleon trade, which coincided with the period of the African slave trade.56,57 We inferred that this pulse occurred 13 generations ago (early 1600s, Figures 3C and 3D), with initial pulse proportions 0.7% from EAS and 11% from AFR, which is consistent with published estimates.15

We also identified East Asian-related ancestry in the Peruvian cohort using both ADMIXTURE (0.8%) and local ancestry inference (0.9%). From this, we inferred a four-population admixture model that included an additional migration pulse from EAS. Our findings suggest that an EAS-related migration occurred ∼5 generations ago (mid-1800s) with initial proportion 1.1%. This is considerably more recent than the AFR-related migration, which occurred 8 generations ago (mid- to late 1700s) with initial proportion 5.2%. We did not explore four-source population models for the other cohorts (CLM and PUR), as we observed lower levels of East Asian ancestry in these populations (Figure S7).

Integrating allele frequencies and ancestry tracts: A joint modeling approach for demographic reconstruction in Latin America

We combined our inferences from allele frequencies (Figure 2) and ancestry tracts (Figure 3) to provide a model spanning multiple timescales that shaped genetic diversity in Latin America (Figure 4A). Information from different inference engines (tracts and moments) was integrated using demes,36 a standard format for demographic models in population genetics. In our combined model, we made several assumptions to simplify the analysis and produce a more tractable model. We assumed that the Latin American cohorts (MXL, CLM, PEL, and PUR) are independent of one another and have not experienced recent gene flow between them. We also fixed the effective population sizes (Ne) in admixed populations to a constant value of 20,000 that roughly reflects the present-day Ne in Latin American populations.8 We did not infer Ne in these populations because the ancestry tract distribution does not contain information about Ne, and the admixture histories in Latin America are very recent on an evolutionary timescale so that allele frequencies did not fluctuate significantly due to genetic drift.

Figure 4.

Figure 4

Model incorporating both time scales of population history and continental history and admixture dynamics

(A) Visualization of the inferred demographic model. We combined the inferences from allele frequencies and ancestry tracts into a single demographic model. The parameters in red correspond to the parameters estimated from ancestry tracts and the parameters in black were inferred from allele frequencies. The admixed population represents the Mexican population (MXL).

(B) PCA decomposition of genetic data. Left: data chromosome 22; middle: simulation using our inferred model (A); right: simulation using the Browning et al. model. For both models, we simulated a genome of length 50 Mb with msprime. In the data (bottom left) the Indigenous American similarity component is capture by the PC3. This variation is capture in our inferred model (A) but is not present in the simulated genomes from the Browning et al. model,8 which uses the East Asian population as a proxy for Indigenous American ancestry.

To validate the accuracy of our joint demographic model, we compared simulated and observed data. We used the coalescent simulator msprime38,39 to simulate samples for each population under the reconstructed demography (subjects and methods). We compared a principal component analysis (PCA) between the simulated data and observed genotypes on chromosome 22 (Figure 4B). The PCA of simulated individuals closely mirrors those in the observed data (Figure S8), indicating that our model accurately captures broad-scale patterns of genetic diversity in Latin American populations.

PCA uses observed allele frequencies within and between populations, which were also used in the inference of the model. We therefore also examined how well our model captures the decay of linkage disequilibrium (LD) within each population, which was not used in the inference process. Recently admixed populations are expected to show increased LD at distant pairs of loci, while a larger long-term Ne should result in decreased LD. This pattern is observed in the data, with higher long-range LD in the MXL population and lower LD at all distances in the YRI. This pattern is also seen in data simulated under our demographic model, which captured the magnitude of short-range LD and the overall patterns of LD decay (Figure S9A). However, LD decayed faster in our model compared to the data by roughly a factor of 2. This discrepancy does not appear to be due to linked purifying selection, as simulations that included deleterious mutations closely matched neutral model expectations (Figure S9B). Instead, patterns of LD may be more sensitive than allele frequencies to residual population structure within populations, or possible inconsistencies between mutation and recombination clocks could account for the more rapid LD decay in the model.58,59

To compare to existing models for admixed Latin American populations, we also simulated data under the Browning et al. model of American admixture.6,8 Using the same simulation and PCA approach, we compared a PCA between this model and real data, finding that our model (Figure 4B) more accurately recapitulates important axes of genetic diversity in Latin American populations. In particular, our model reproduces the variation in Indigenous American ancestries among individuals, as shown by the strong association between the principal component 3 (PC3) and Indigenous American ancestries. In contrast, the Browning et al. model has no association with Indigenous ancestries in PC3, as it uses East Asian ancestry as a proxy for American ancestry.

Simulation of complex demography and selected functional variation

To evaluate the ability of our inferred demographic model to explain patterns of genetic variation under selection, we conducted forward-in-time simulations of non-coding (intergenic and intronic), synonymous, missense, and loss-of-function variants using the fwdpy11 simulation engine.48,49 We assessed the ability of our demographic model, when combined with inferred models of selection, to explain patterns of genetic variation in different functional categories of the genome.

Missense and loss-of-function (nonsense) mutations are often subject to negative selection because they can have significant impacts on protein function. While non-coding and synonymous variants may be under direct selection due to changes in regulatory elements and codon usage bias,60 respectively, here we assumed that they were neutral with respect to selection.51,61 To simulate selection on nonsynonymous mutations, we incorporated distributions of fitness effects (DFEs) for the two classes of nonsynonymous mutations50 (Figure S10; subjects and methods). We simulated a total of 350 Mb of genomic data using our inferred demographic model and spanning various regions of the human genome by sampling from existing recombination maps and genomic annotations (subjects and methods). Due to the computational burden of running forward-in-time simulations, we limited the simulation to only one Latin American population (MXL) jointly with the other source populations (AFR, EUR, EAS, and AME).

We evaluated how well our joint modeling approach matched data from selected mutation classes and assessed any biases within putatively neutral classes due to linked selection. We found that the SFS from simulated data accurately reproduced the SFS from the data for both neutral and selected genetic variation (Figures 5A and S11). Specifically, the SFS fits well for non-coding, synonymous, and missense variants, for which many variants are observed. While there are fewer observed loss-of-function variants, leading to noisier estimates of the SFS, simulated data are largely unbiased across frequency bins (Figure S11). In addition to analyzing the SFS within single populations, we calculated pairwise FST between populations. We observed that the simulated FST values accurately reproduced those from the observed data (Figure 5B). Together, this modeling approach was able to capture observed patterns of allele frequency variation within and between populations, supporting the utility of our model as a tool for studying complex evolutionary processes in these cohorts through genetic simulations.

Figure 5.

Figure 5

Forward-in-time simulations of functional genetic variation

We simulated a total of 350 independent regions, 1 Mb each, of the human genome by sampling coding annotations and inferred recombination rates.

(A) Comparison between the folded SFS for the data (blue) and the simulation (orange) in different functional categories, summed across all regions. The SFS corresponds to the admixed Mexican population (MXL). Additional populations are shown in Figure S11A.

(B) Box plot of pairwise FST values comparing data (n = 350 regions) and simulation (corresponding n = 350 regions). Missense and loss-of-function mutations were combined into a single non-synonymous category.

Discussion

Our study provides comprehensive demographic models for populations in Latin America inferred from high-coverage whole-genome data. These models allow us to examine population dynamics at different time scales, incorporating multiple epochs of demographic history. By inferring a four-population Out-of-Africa model for the deeper demographic history (Figure 2), and using ancestry tract length distributions to infer the recent admixture history in four Latin American cohorts (Figure 3), we were able to create a single model (Figure 4) that jointly captures both allele frequencies and the distribution of ancestry tract lengths. Through extensive simulations, we demonstrated that this model accurately reflects the patterns of genetic diversity present in Latin American populations (Figure 5).

Our inferred demographic models for populations in Latin America represent a significant advance in our ability to accurately account for their evolutionary history in genomic studies. By capturing different aspects of population dynamics, our model provides a powerful tool for studying Latin American populations and understanding the genetic basis of disease in these underrepresented groups.62 Populations and individuals with recent genetic admixture are routinely excluded from genomic studies due to concerns over population structure.63 This is due in part to the lack of methods and pipelines to effectively account for shared ancestries and complex demographic histories,64,65,66 which our study takes steps to address. Past demographic history can also pose a significant challenge to detecting selection, as it can create patterns of genetic diversity that mimic the signatures of selection.67,68,69 Detailed demographic models are important for controlling for such confounding, allowing for more robust inference of recent selection in Latin American populations. Our models therefore provide a resource for developing and testing methods and pipelines that are specifically tailored to studying admixed populations, improving the accuracy and validity of results obtained from such studies.

Our results reveal a dynamic and multifaceted history. Of note, the inferred AME-EAS split time ranges from 29,000 to 33,000 years ago and it is consistent across all tested models. This suggests that population structure between ancestral groups was already established by this point. This may support the idea of an early migration to the Americas, predating the commonly used time frame of ∼15,000 years ago, or reflect population structure in Siberia, Beringia, and East Asia prior to the expansion into the Americas.70,71 Importantly, this divergence time does not necessarily pinpoint the exact period of entry into the Americas. Rather, it dates the divergence between the ancestors of Indigenous Americans in Northeast Asia and those of present-day East Asians, and our inferred divergence time aligns with previous estimates.72

Our results also show that the ancestry proportions in Latin American populations have changed over time, with evidence of continuous migration of European and Indigenous ancestries in Mexico, Peru, and Colombia. Similar patterns of changes in ancestry proportions over time have been observed in other admixed populations in the United States.73 This is likely due to both ongoing migrations from Europe to the Americas and migration within the Americas, such as the movement of people from rural to urban areas.74,75

Recent migration from Asia to the Americas has often been overlooked in genetic studies of Latin American populations. By including data from both East Asian and Indigenous American individuals, we can distinguish between those ancestries in admixed individuals and fit migration models including both sources. We are able to corroborate a previous study that found connections between Southeast Asia and Mexico in the 17th and 18th centuries, associated with the Manila galleon trade.15 We also find a genetic connection between East Asia and Peru, with admixture occurring in the mid- to late 19th century (Figure 3D). This corresponds with the documented migration of Chinese laborers to Peru during this time period.76 Despite high mortality and male-biased migration,77 this migration event had lasting contributions to both the cultural and genetic diversity in present-day Peru. Each of these points highlights the importance of considering the heterogeneity of population histories and the historical and cultural interactions of the Americas to understand the demographic history of Latin American populations.

Our models provide valuable insights into the larger-scale demographic history of Latin American populations, but they do not capture all aspects of population history. For example, the arrival of the Spanish in central Mexico led to a significant decline in population size,78 which can impact genetic diversity but is not fully captured by our model. With the increasing availability of larger datasets, it will be possible to study more detailed population structure and dynamics using the framework from this study. As we continue to increase the representation of Latin American genomic data, including more whole genomes, we can expect to gain a deeper understanding of the complexity of Latin American population history and to resolve historical processes at finer scales.

Demographic models

The inferred demographic models are provided as supplementary material and in the GitHub repo: https://github.com/santiago1234/mxb-genomes. The model files are in the demes36 format.

  • Model 1 (m1-out-of-africa.yml): four populations out of Africa

  • Model 2 (m2-Mexico-admixture.yml): admixture in Mexico

  • Model 3 (m3-Colombia-admixture.yml): admixture in Colombia

  • Model 4 (m4-Peru-admixture.yml): admixture in Peru

  • Model 5 (m5-PuertoRico-admixture.yml): admixture in Puerto Rico

  • Model 6 (m6-All-admixture.yml): admixture in Latin American populations, combined across models 2–6

Acknowledgments

This work was supported by “The Mexican Biobank Project: Building Capacity for Big Data Science in Medical Genomics in Admixed Populations,” a bi-national initiative between Mexico and the UK co-funded equally by CONACYT (FONCICYT/50/2016) and The Newton Fund through The Medical Research Council (MR/N028937/1) awarded to A.M.-E. Both A.P.R. and S.G.M.-M. were financially supported by CONACYT with funds from the MX Biobank Project and a graduate program scholarship, respectively. We also thank Carmina Barberena Jonas, Juan Esteban Rodríguez, and Ram Gonzales for their feedback throughout the project and Jacob Cervantes for IT support. We also would like to acknowledge Kevin Thornton for his assistance with setting up the forward-in-time simulations using fwdpy11.

Author contributions

A.P.R. and A.M.-E. conceived the study and provided overall supervision. S.G.M.-M. and A.P.R. carried out the analyses and were responsible for writing the initial manuscript. D.O.-D.V. and A.M.-E. provided feedback on the manuscript. L.G.-G., L.P.C.-H., and L.F.-R. contributed to the acquisition of the data. All authors read and approved the final manuscript.

Declaration of interests

The authors declare no competing interests.

Published: September 18, 2023

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2023.08.015.

Contributor Information

Andrés Moreno-Estrada, Email: andres.moreno@cinvestav.mx.

Aaron P. Ragsdale, Email: apragsdale@wisc.edu.

Web resources

Supplemental information

Document S1. Figures S1–S12 and Tables S1–S5
mmc1.pdf (1.6MB, pdf)
Data S1. Demographic models
mmc2.zip (6.2KB, zip)
Document S2. Article plus supplemental information
mmc3.pdf (5.1MB, pdf)

Data and code availability

The 1KGP data21,79 is publicly available and accessible without restriction. The 50 genomes from the MX-Biobank3,20 project are deposited in the European Genome-phenoms Archive (EGA) repository, accession number EGAD00001008354.

References

  • 1.Korunes K.L., Goldberg A. Human genetic admixture. PLoS Genet. 2021;17 doi: 10.1371/journal.pgen.1009374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hellenthal G., Busby G.B.J., Band G., Wilson J.F., Capelli C., Falush D., Myers S. A genetic atlas of human admixture history. Science. 2014;343:747–751. doi: 10.1126/science.1243518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Sohail M., Chong A.Y., Quinto-Cortes C.D., Palma-Martínez M.J., Ragsdale A., Medina-Muñoz S.G., Barberena-Jonas C., Delgado-Sánchez G., Cruz-Hervert L.P., Ferreyra-Reyes L., et al. Nationwide genomic biobank in mexico unravels demographic history and complex trait architecture from 6,057 individuals. bioRxiv. 2022 doi: 10.1101/2022.07.11.499652. Preprint at. [DOI] [Google Scholar]
  • 4.Ruiz-Linares A., Adhikari K., Acuña-Alonzo V., Quinto-Sanchez M., Jaramillo C., Arias W., Fuentes M., Pizarro M., Everardo P., de Avila F., et al. Admixture in latin america: geographic structure, phenotypic diversity and self-perception of ancestry based on 7,342 individuals. PLoS Genet. 2014;10 doi: 10.1371/journal.pgen.1004572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hoban S., Bertorelle G., Gaggiotti O.E. Computer simulations: tools for population and evolutionary genetics. Nat. Rev. Genet. 2012;13:110–122. doi: 10.1038/nrg3130. [DOI] [PubMed] [Google Scholar]
  • 6.Adrion J.R., Cole C.B., Dukler N., Galloway J.G., Gladstein A.L., Gower G., Kyriazis C.C., Ragsdale A.P., Tsambos G., Baumdicker F., et al. A community-maintained standard library of population genetic models. Elife. 2020;9 doi: 10.7554/eLife.54967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gutenkunst R.N., Hernandez R.D., Williamson S.H., Bustamante C.D. Inferring the joint demographic history of multiple populations from multidimensional snp frequency data. PLoS Genet. 2009;5 doi: 10.1371/journal.pgen.1000695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Browning S.R., Browning B.L., Daviglus M.L., Durazo-Arvizu R.A., Schneiderman N., Kaplan R.C., Laurie C.C. Ancestry-specific recent effective population size in the americas. PLoS Genet. 2018;14 doi: 10.1371/journal.pgen.1007385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Reich D., Patterson N., Campbell D., Tandon A., Mazieres S., Ray N., Parra M.V., Rojas W., Duque C., Mesa N., et al. Reconstructing native american population history. Nature. 2012;488:370–374. doi: 10.1038/nature11258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Posth C., Nakatsuka N., Lazaridis I., Skoglund P., Mallick S., Lamnidis T.C., Rohland N., Nägele K., Adamski N., Bertolini E., et al. Reconstructing the deep population history of central and south america. Cell. 2018;175:1185–1197.e22. doi: 10.1016/j.cell.2018.10.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Moreno-Mayar J.V., Vinner L., de Barros Damgaard P., de la Fuente C., Chan J., Spence J.P., Allentoft M.E., Vimala T., Racimo F., Pinotti T., et al. Early human dispersals within the americas. Science. 2018;362:eaav2621. doi: 10.1126/science.aav2621. [DOI] [PubMed] [Google Scholar]
  • 12.Ávila-Arcos M.C., McManus K.F., Sandoval K., Rodríguez-Rodríguez J.E., Villa-Islas V., Martin A.R., Luisi P., Peñaloza-Espinosa R.I., Eng C., Huntsman S., et al. Population history and gene divergence in native mexicans inferred from 76 human exomes. Mol. Biol. Evol. 2020;37:994–1006. doi: 10.1093/molbev/msz282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bryc K., Durand E.Y., Macpherson J.M., Reich D., Mountain J.L. The genetic ancestry of african americans, latinos, and european americans across the united states. Am. J. Hum. Genet. 2015;96:37–53. doi: 10.1016/j.ajhg.2014.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wang S., Ray N., Rojas W., Parra M.V., Bedoya G., Gallo C., Poletti G., Mazzotti G., Hill K., Hurtado A.M., et al. Geographic patterns of genome admixture in latin american mestizos. PLoS Genet. 2008;4 doi: 10.1371/journal.pgen.1000037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Rodríguez-Rodríguez J.E., Ioannidis A.G., Medina-Muñoz S.G., Barberena-Jonas C., Blanco-Portillo J., Quinto-Cortés C.D., Moreno-Estrada A. The genetic legacy of the manila galleon trade in mexico. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2022;377 doi: 10.1098/rstb.2020.0419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Sans M. Admixture studies in latin america: from the 20th to the 21st century. Hum. Biol. 2000;72:155–177. [PubMed] [Google Scholar]
  • 17.Gravel S., Henn B.M., Gutenkunst R.N., Indap A.R., Marth G.T., Clark A.G., Yu F., Gibbs R.A., 1000 Genomes Project. Bustamante C.D. Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. USA. 2011;108:11983–11988. doi: 10.1073/pnas.1019276108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Adhikari K., Mendoza-Revilla J., Chacón-Duque J.C., Fuentes-Guajardo M., Ruiz-Linares A. Admixture in latin america. Curr. Opin. Genet. Dev. 2016;41:106–114. doi: 10.1016/j.gde.2016.09.003. [DOI] [PubMed] [Google Scholar]
  • 19.Nielsen R., Akey J.M., Jakobsson M., Pritchard J.K., Tishkoff S., Willerslev E. Tracing the peopling of the world through genomics. Nature. 2017;541:302–310. doi: 10.1038/nature21347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Jiménez-Kaufmann A., Chong A.Y., Cortés A., Quinto-Cortés C.D., Fernandez-Valverde S.L., Ferreyra-Reyes L., Cruz-Hervert L.P., Medina-Muñoz S.G., Sohail M., Palma-Martinez M.J., et al. Imputation performance in latin american populations: Improving rare variants representation with the inclusion of native american genomes. Front. Genet. 2021;12:719791. doi: 10.3389/fgene.2021.719791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Byrska-Bishop M., Evani U.S., Zhao X., Basile A.O., Abel H.J., Regier A.A., Corvelo A., Clarke W.E., Musunuri R., Nagulapalli K., et al. High-coverage whole-genome sequencing of the expanded 1000 genomes project cohort including 602 trios. Cell. 2022;185:3426–3440.e19. doi: 10.1016/j.cell.2022.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhao H., Sun Z., Wang J., Huang H., Kocher J.P., Wang L. Crossmap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics. 2014;30:1006–1007. doi: 10.1093/bioinformatics/btt730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Alexander D.H., Novembre J., Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P., et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Chen S., Francioli L.C., Goodrich J.K., Collins R.L., Kanai M., Wang Q., Alföldi J., Watts N.A., Vittal C., Gauthier L.D., et al. A genome-wide mutational constraint map quantified from variation in 76,156 human genomes. bioRxiv. 2022 doi: 10.1101/2022.03.20.485034. Preprint at. [DOI] [Google Scholar]
  • 26.McLaren W., Gil L., Hunt S.E., Riat H.S., Ritchie G.R.S., Thormann A., Flicek P., Cunningham F. The ensembl variant effect predictor. Genome Biol. 2016;17:122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Jouganous J., Long W., Ragsdale A.P., Gravel S. Inferring the joint demographic history of multiple populations: beyond the diffusion approximation. Genetics. 2017;206:1549–1567. doi: 10.1534/genetics.117.200493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Coffman A.J., Hsieh P.H., Gravel S., Gutenkunst R.N. Computationally efficient composite likelihood statistics for demographic inference. Mol. Biol. Evol. 2016;33:591–593. doi: 10.1093/molbev/msv255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.McVicker G., Gordon D., Davis C., Green P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 2009;5 doi: 10.1371/journal.pgen.1000471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hilmarsson H., Kumar A.S., Rastogi R., Bustamante C.D., Montserrat D.M., Ioannidis A.G. High resolution ancestry deconvolution for next generation genomic data. bioRxiv. 2021 doi: 10.1101/2021.09.19.460980. Preprint at. [DOI] [Google Scholar]
  • 31.Gravel S. Population genetics models of local ancestry. Genetics. 2012;191:607–619. doi: 10.1534/genetics.112.139808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Kidd J.M., Gravel S., Byrnes J., Moreno-Estrada A., Musharoff S., Bryc K., Degenhardt J.D., Brisbin A., Sheth V., Chen R., et al. Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation. Am. J. Hum. Genet. 2012;91:660–671. doi: 10.1016/j.ajhg.2012.08.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Moreno-Estrada A., Gravel S., Zakharia F., McCauley J.L., Byrnes J.K., Gignoux C.R., Ortiz-Tello P.A., Martínez R.J., Hedges D.J., Morris R.W., et al. Reconstructing the population genetic history of the caribbean. PLoS Genet. 2013;9 doi: 10.1371/journal.pgen.1003925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Gravel S., Zakharia F., Moreno-Estrada A., Byrnes J.K., Muzzio M., Rodriguez-Flores J.L., Kenny E.E., Gignoux C.R., Maples B.K., Guiblet W., et al. Reconstructing native american migrations from whole-genome and whole-exome data. PLoS Genet. 2013;9 doi: 10.1371/journal.pgen.1004023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Martin A.R., Gignoux C.R., Walters R.K., Wojcik G.L., Neale B.M., Gravel S., Daly M.J., Bustamante C.D., Kenny E.E. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 2017;100:635–649. doi: 10.1016/j.ajhg.2017.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Gower G., Ragsdale A.P., Bisschop G., Gutenkunst R.N., Hartfield M., Noskova E., Schiffels S., Struck T.J., Kelleher J., Thornton K.R. Demes: a standard format for demographic models. Genetics. 2022;222:iyac131. doi: 10.1093/genetics/iyac131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Fenner J.N. Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies. Am. J. Phys. Anthropol. 2005;128:415–423. doi: 10.1002/ajpa.20188. [DOI] [PubMed] [Google Scholar]
  • 38.Kelleher J., Etheridge A.M., McVean G. Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS Comput. Biol. 2016;12 doi: 10.1371/journal.pcbi.1004842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Baumdicker F., Bisschop G., Goldstein D., Gower G., Ragsdale A.P., Tsambos G., Zhu S., Eldon B., Ellerman E.C., Galloway J.G., et al. Efficient ancestry and mutation simulation with msprime 1.0. Genetics. 2022;220:iyab229. doi: 10.1093/genetics/iyab229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Nelson D., Kelleher J., Ragsdale A.P., Moreau C., McVean G., Gravel S. Accounting for long-range correlations in genome-wide simulations of large cohorts. PLoS Genet. 2020;16 doi: 10.1371/journal.pgen.1008619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Miles A., Murillo R., Ralph P., Harding N., Pisupati R., Rae S., Millar T. version v1. 3.3. Zenodo; 2021. Cggh/scikit-allel: V1. 3.3. [DOI] [Google Scholar]
  • 42.Ohta T., Kimura M. Linkage disequilibrium at steady state determined by random genetic drift and recurrent mutation. Genetics. 1969;63:229–238. doi: 10.1093/genetics/63.1.229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Hill W.G., Robertson A. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 1968;38:226–231. doi: 10.1007/BF01245622. [DOI] [PubMed] [Google Scholar]
  • 44.Ragsdale A.P., Gravel S. Models of archaic admixture and recent history from two-locus statistics. PLoS Genet. 2019;15 doi: 10.1371/journal.pgen.1008204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.International HapMap Consortium A haplotype map of the human genome. Nature. 2005;437:1299–1320. doi: 10.1038/nature04226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Delaneau O., Zagury J.-F., Robinson M.R., Marchini J.L., Dermitzakis E.T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 2019;10:5436. doi: 10.1038/s41467-019-13225-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Ragsdale A.P., Gravel S. Unbiased estimation of linkage disequilibrium from unphased data. Mol. Biol. Evol. 2020;37:923–932. doi: 10.1093/molbev/msz265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Thornton K.R. Polygenic adaptation to an environmental shift: temporal dynamics of variation under gaussian stabilizing selection and additive effects on a single trait. Genetics. 2019;213:1513–1530. doi: 10.1534/genetics.119.302662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Thornton K.R. A c++ template library for efficient forward-time population genetic simulation of large populations. Genetics. 2014;198:157–166. doi: 10.1534/genetics.114.165019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Ragsdale A.P. Local fitness and epistatic effects lead to distinct patterns of linkage disequilibrium in protein-coding genes. Genetics. 2022;221:iyac097. doi: 10.1093/genetics/iyac097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Charlesworth B., Charlesworth D. Some evolutionary consequences of deleterious mutations. Genetica. 1998;102–103:3–19. [PubMed] [Google Scholar]
  • 52.Johri P., Riall K., Becher H., Excoffier L., Charlesworth B., Jensen J.D. The impact of purifying and background selection on the inference of population history: problems and prospects. Mol. Biol. Evol. 2021;38:2986–3003. doi: 10.1093/molbev/msab050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Schrider D.R., Shanku A.G., Kern A.D. Effects of linked selective sweeps on demographic inference and model selection. Genetics. 2016;204:1207–1223. doi: 10.1534/genetics.116.190223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Butzer K.W. The americas before and after 1492: An introduction to current geographical research. Ann. Assoc. Am. Geogr. 1992;82:345–368. [Google Scholar]
  • 55.Larsen C.S. In the wake of columbus: Native population biology in the postcontact americas. Am. J. Phys. Anthropol. 1994;37:109–154. [Google Scholar]
  • 56.Aguirre Beltrán G. Tierra firme; 1972. La población negra de méxico: estudio etnohistórico. [Google Scholar]
  • 57.Seijas T. Vol. 100. Cambridge University Press; 2014. (Asian Slaves in Colonial Mexico: From Chinos to Indians). [Google Scholar]
  • 58.Moorjani P., Gao Z., Przeworski M. Human germline mutation and the erratic evolutionary clock. PLoS Biol. 2016;14 doi: 10.1371/journal.pbio.2000744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Moorjani P., Sankararaman S., Fu Q., Przeworski M., Patterson N., Reich D. A genetic method for dating ancient genomes provides a direct estimate of human generation interval in the last 45,000 years. Proc. Natl. Acad. Sci. USA. 2016;113:5652–5657. doi: 10.1073/pnas.1514696113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Medina-Muñoz S.G., Kushawah G., Castellano L.A., Diez M., DeVore M.L., Salazar M.J.B., Bazzini A.A. Crosstalk between codon optimality and cis-regulatory elements dictates mrna stability. Genome biology. 2021;22:1–23. doi: 10.1186/s13059-020-02251-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Orr H.A. The distribution of fitness effects among beneficial mutations. Genetics. 2003;163:1519–1526. doi: 10.1093/genetics/163.4.1519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Sirugo G., Williams S.M., Tishkoff S.A. The missing diversity in human genetic studies. Cell. 2019;177:26–31. doi: 10.1016/j.cell.2019.02.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Atkinson E.G., Maihofer A.X., Kanai M., Martin A.R., Karczewski K.J., Santoro M.L., Ulirsch J.C., Kamatani Y., Okada Y., Finucane H.K., et al. Tractor uses local ancestry to enable the inclusion of admixed individuals in gwas and to boost power. Nat. Genet. 2021;53:195–204. doi: 10.1038/s41588-020-00766-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Sul J.H., Martin L.S., Eskin E. Population structure in genetic studies: Confounding factors and mixed models. PLoS Genet. 2018;14 doi: 10.1371/journal.pgen.1007309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Sohail M., Maier R.M., Ganna A., Bloemendal A., Martin A.R., Turchin M.C., Chiang C.W., Hirschhorn J., Daly M.J., Patterson N., et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Elife. 2019;8 doi: 10.7554/eLife.39702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Lander E.S., Schork N.J. Genetic dissection of complex traits. Science. 1994;265:2037–2048. doi: 10.1126/science.8091226. [DOI] [PubMed] [Google Scholar]
  • 67.De A., Durrett R. Stepping-stone spatial structure causes slow decay of linkage disequilibrium and shifts the site frequency spectrum. Genetics. 2007;176:969–981. doi: 10.1534/genetics.107.071464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Nielsen R., Williamson S., Kim Y., Hubisz M.J., Clark A.G., Bustamante C. Genomic scans for selective sweeps using snp data. Genome Res. 2005;15:1566–1575. doi: 10.1101/gr.4252305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Koropoulis A., Alachiotis N., Pavlidis P. Statistical Population Genomics. Humana; 2020. Detecting positive selection in populations using genetic data; pp. 87–123. [DOI] [PubMed] [Google Scholar]
  • 70.Skoglund P., Reich D. A genomic view of the peopling of the americas. Curr. Opin. Genet. Dev. 2016;41:27–35. doi: 10.1016/j.gde.2016.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Willerslev E., Meltzer D.J. Peopling of the americas as inferred from ancient genomics. Nature. 2021;594:356–364. doi: 10.1038/s41586-021-03499-y. [DOI] [PubMed] [Google Scholar]
  • 72.Sikora M., Pitulko V.V., Sousa V.C., Allentoft M.E., Vinner L., Rasmussen S., Margaryan A., de Barros Damgaard P., de la Fuente C., Renaud G., et al. The population history of northeastern siberia since the pleistocene. Nature. 2019;570:182–188. doi: 10.1038/s41586-019-1279-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Spear M.L., Diaz-Papkovich A., Ziv E., Yracheta J.M., Gravel S., Torgerson D.G., Hernandez R.D. Recent shifts in the genomic ancestry of mexican americans may alter the genetic architecture of biomedical traits. Elife. 2020;9 doi: 10.7554/eLife.56029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Salas M.E.N. La migración a la ciudad de méxico: un proceso multifacético. Estud. Demográficos Urbanos. 1990:641–654. [PubMed] [Google Scholar]
  • 75.Sobrino J. Migración interna y tamaño de localidad en méxico. Estud. Demográficos Urbanos. 2014;29:443–480. [Google Scholar]
  • 76.Lausent-Herrera I. Tusans (tusheng) and the changing chinese community in peru. J. Chin. Overseas. 2009;5:115–152. [Google Scholar]
  • 77.Gonzales M.J. Chinese plantation workers and social conflict in peru in the late nineteenth century. J. Lat. Am. Stud. 1989;21:385–424. [Google Scholar]
  • 78.Mann C.C. Alfred a Knopf Incorporated; 2005. 1491: New Revelations of the Americas before Columbus. [Google Scholar]
  • 79.1000 Genomes Project Consortium. Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S12 and Tables S1–S5
mmc1.pdf (1.6MB, pdf)
Data S1. Demographic models
mmc2.zip (6.2KB, zip)
Document S2. Article plus supplemental information
mmc3.pdf (5.1MB, pdf)

Data Availability Statement

The 1KGP data21,79 is publicly available and accessible without restriction. The 50 genomes from the MX-Biobank3,20 project are deposited in the European Genome-phenoms Archive (EGA) repository, accession number EGAD00001008354.


Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES