Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Feb 16.
Published in final edited form as: Annu Rev Biomed Data Sci. 2023 Apr 26;6:105–127. doi: 10.1146/annurev-biodatasci-020722-014310

Strategies for the Genomic Analysis of Admixed Populations

Taotao Tan 1, Elizabeth G Atkinson 1
PMCID: PMC10871708  NIHMSID: NIHMS1964232  PMID: 37127050

Abstract

Admixed populations comprise a large portion of global human genetic diversity, yet they are often left out of genomics analyses. This exclusion is problematic, as it leads to disparities in the understanding of the genetic structure and history of diverse cohorts and the performance of genomic medicine across populations. Admixed populations have particular statistical challenges, as they inherit genomic segments from multiple source populations—the primary reason they have historically been excluded from genetic studies. In recent years, however, an increasing number of statistical methods and software tools have been developed to account for and leverage admixture in the context of genomics analyses. Here, we provide a survey of such computational strategies for the informed consideration of admixture to allow for the well-calibrated inclusion of mixed ancestry populations in large-scale genomics studies, and we detail persisting gaps in existing tools.

Keywords: admixture, statistical genetics, complex traits, genomics, methods, ancestry

1. INTRODUCTION

Genetic studies offer a promising basis for understanding pathophysiology and identifying new molecular targets for medicine. Recent landmark papers have made major strides in our understanding of the genetic architecture of complex disorders. However, research thus far has been based overwhelmingly on subjects of European ancestry (1, 2), the results of which have often been demonstrated not to fully generalize to other populations (3, 4). In particular, admixed populations of mixed ancestry comprise only a minute proportion of published genomics studies of complex traits, as they are systematically removed from many large-scale and mixed collections. Disparities in sample recruitment have fed into this problem, but even within collected cohorts, admixed individuals have historically been excluded. This is due in large part to the lack of methods and pipelines to effectively account for admixed ancestry. Specifically, if standard pipelines designed for homogeneous cohorts are used, there is a concern that population substructure can distort analyses and bias results (512). Thankfully, many large ongoing recruitment efforts are now working to collect genetic data from more diverse groups that contain higher amounts of admixture (1318). For instance, the All of Us Research Program recently released nearly 100,000 whole-genome sequences (WGSs), of which about 50% are from individuals that are part of a historically underrepresented race or ethnicity group (19). The increasing diversity in this and other resources is a fantastic step forward for the field and will help ensure that health-related research applies to a more inclusive set of individuals. The increased inclusion and resultant increased complexity will, however, require additional careful considerations for analysis. To ensure that growing genomics resources are utilized to the fullest and that genomic discoveries are relevant to individuals of all ancestries, it is imperative to design appropriate analysis strategies that facilitate the well-calibrated inclusion of diverse and admixed individuals in statistical genomics. Here, we provide a comprehensive survey of existing and emerging strategies for analyzing admixed individuals in the context of genomics.

Generally, the analysis of admixed populations shares similar conceptual principles with the analysis of multiancestry homogeneous cohorts, including a primary concern of properly controlling for population stratification. But there are also unique challenges, opportunities, and techniques that apply specifically to admixed populations. Numerous methodological advancements relevant to analyzing admixed samples have been shaping the field of population genetics and statistical genetics in the last two decades, such that it is no longer necessary or prudent to drop all admixed samples from analyses. In this review article, we survey important methodologies for inclusion of admixed samples in (a) global ancestry inference (GAI) and local ancestry inference (LAI), (b) disease mapping, (c) downstream analysis of gene discovery [e.g., polygenic risk scores (PRS), fine-mapping, and meta-analysis], and (d) demographic inference.

2. AN INTRODUCTION TO ADMIXTURE

Admixed populations are defined as groups that originated from migration events between historically isolated ancestral populations. In the broad sense, all human beings are admixed to at least some degree, but in this article, we use the term “admixture/admixed population” to imply recent admixture (typically within 30 generations) at the scale of continents, although it should be appreciated that ancestry is a spectrum and fine-scale structure within continental regions certainly exists.

Admixture has primarily occurred following the many large-scale migration events of recent human history, creating individuals that inherit segments of their genomes from multiple source populations (20). To illustrate this concept, we can model admixture in an idealized data-generative process: As humans are biallelic, one can model a human genome as generated from a binomial distribution, gj ~ Binom (2,fj), where fj is a predefined allele frequency for each variant j. Different populations have different allele frequency spectra, and the allele frequency divergence between two populations can be measured with the parameter FST, a commonly used population genetic statistic that summarizes the overall genetic differentiation between two groups. The values of FST range from 0 to 1, where a higher FST implies a larger degree of genetic differentiation between the groups. Cross-continental human populations thus typically have larger FST (e.g., African and European populations have FST ≈ 0.15), whereas within-continental populations have smaller FST (e.g., Finnish and Italian European populations have FST ≈ 0.02) (21). A common practice for generating population-specific allele frequencies for two populations is to pull allele frequencies twice from a beta distribution parameterized by FST according to the Balding–Nicoles model: fj, pop 1,fj, pop 2~Beta1FSTFSTfj,1FSTFST1fj (22). Admixed individuals are formed as the progeny of the two homogeneous populations. The first generation of admixed individuals will thus inherit one copy of each chromosome from each parental population. Starting with the second generation, recombination events can occur and will break up the contiguous homogeneous chromosomes, which are then passed onto the offspring (Figure 1).

Figure 1.

Figure 1

Local ancestry deconvolution and its use in gene mapping efforts. (a) Illustration of the disruption of local ancestry tracts over time through recombination. Local ancestry can be used in a variety of statistical genomics analyses, including (i) the dating of admixture events based on tract lengths, (ii) admixture mapping to identify gene regions associated with disease, and (iii) various statistical models to assist in adjusting for covariates. (b) An ultrasimplified schematic of local ancestry inference: Using a reference panel, we learn parameters such as ancestry-specific allele and haplotype frequencies. Combined with other information like recombination rates, we can make predictions about the local ancestry background (LA) of the query genotype (G). Abbreviations: P, parent generation; Fn, n-th filial generation.

There are a variety of ways in which admixture can occur; two commonly used admixture generation models are the hybrid-isolation model (HI) and continuous-gene-flow model (CGF). The HI model assumes a single pulse of admixture occurs at the first generation, whereas the CGF model allows for continued input from the parental ancestries (23), although realistically admixture is more likely formed as a hybrid of HI and CGF. The more generations that have passed since the admixture event, the more likely that recombination events have occurred between two loci and the shorter the contiguous ancestry segments (sometimes referred to as tracts). This mosaic of genomic segments present within admixed genomes can provide unique opportunities to understand various aspects of human genetics from migration history to disease mapping.

3. ANCESTRY INFERENCE

Since admixed populations inherit genomic segments from multiple parental populations, we can conceptually think of their genomes as mosaics painted with their local ancestry, where each color represents a source ancestry. The color of an ancestry tract is not directly measurable but can be inferred from genotype information due to allele frequency and haplotype differences among reference source populations. The process of painting an individual’s chromosome is formally called LAI. It is also possible to infer one’s overall proportion of ancestry from each source population using genotype data, a procedure called GAI. We briefly discuss the modeling and the usage of GAI and LAI in this section.

3.1. Global Ancestry Inference

There are several motivations for conducting GAI. A primary use involves attempting to control for population stratification in association testing, which we discuss in detail in Section 4. Another usage is to assign admixed individuals into their major population or ancestry group for the purposes of a joint analysis among genetically similar individuals. Numerous methods have been developed to estimate global admixture proportions, which can be broadly categorized into model-based and algorithmic-based approaches (24).

3.1.1. Model-based approaches.

Model-based approaches seek to output a vector θ for each individual, the fraction of the genome that is inherited from each parental population. One of the initial efforts of model-based GAI approaches was done by STRUCTURE and achieved great success and popularity (25). STRUCTURE employs a Bayesian framework to model the distribution of parental ancestry proportions and allele frequencies given the observed genotype, as well as Markov chain Monte Carlo (MCMC) to sample from the posterior distribution of the global admixture proportion. A subsequent software program, ADMIXTURE, models admixed populations via a frequentist approach: The log-likelihood function is explicitly expressed as a function of global ancestry proportion, which can then be swiftly optimized to find the most probable global ancestry for each individual with a quasi-Newton method (26).

Both STRUCTURE and ADMIXTURE require careful consideration of the parameter K, the number of source populations for the admixed cohort, so that one does not overinterpret the results (27). However, this information is typically unknown a priori. ADMIXTURE implements a fivefold cross-validation to select the K that produces the lowest error as visualized in a scree plot (28). STRUCTURE, on the other hand, directly incorporates the K into the model as a prior distribution. However, the authors acknowledged that K estimates from the software are merely an ad hoc guide (25).

3.1.2. Algorithm-based approaches.

Algorithmic approaches, on the other hand, heavily rely on the property of principal components analysis (PCA) (29). Contrary to model-based approaches, principal components (PC)-based approaches do not produce an intuitive admixture proportion output (bounded between 0 and 1) but instead project samples into a PC space that captures the largest variation. The advantage is that PCA is much faster and easier to implement than model-based approaches. Prior to PC calculation, one typically needs to remove related samples, restrict to uncorrelated common variants [linkage disequilibrium (LD) pruning, minor allele frequency (MAF) > 5%], and perform standardization on the genotype matrix (mean = 0, variance = 1) to ensure reproducibility (30).

Intuitively, samples that are genetically similar will appear to be close in the projected PC space. Therefore, individuals from the same ancestry group will often form clusters. In human genetics studies, top PCs have been shown to be highly correlated with admixture proportion and geographical location, empirically and theoretically (29, 3134), such that admixed samples form a cline between parental population in PC space (31). This property allows PCA to be used as a diagnostic for the presence of admixture in a cohort and allows for heuristic assignment of samples to their major parental population. Since related individuals are more genetically similar to each other, the inclusion of relatives can distort PCA (30). Therefore, for collections with high degrees of relatedness, it is recommended first to take subsets of unrelated samples representing all ancestries, then to calculate their PCs, and subsequently to project related samples back into this PC space (30). This approach has been implemented in the R package GENESIS’s PC-Air function (35), and appears to be more effective than traditional PCA and multidimensional scaling approaches.

3.2. Local Ancestry Inference

GAI methods provide us a handy way to view a person’s overall admixture proportions, but they fail to provide us information about the fine-scale patterns of ancestry across the genome. Two individuals with similar global admixture proportions may have very different local ancestry composition, which would not be revealed without LAI. The information provided by LAI has numerous utilities: One can leverage local ancestry to genetically map diseases (36), estimate ancestry-specific allele frequencies (37), estimate recombination rates (38), characterize demographic history (3941), and more (Figure 1). The ability to utilize local ancestry is perhaps the most unique factor that distinguishes the analysis of admixed populations from homogeneous groups.

Initially, LAI was conducted using only markers that have significant allele frequency difference across ancestries, or ancestry-informative markers (AIMs). Using AIMs, researchers can reasonably well guess an individual’s local ancestry given the individual’s allele. An extreme example of this is the Duffy antigen, where the allele frequency of the null allele is close to 100% in some African populations but close to 0% in most European populations (42). For an admixed individual who carries the null Duffy antigen allele, the probability this variant is inherited from African ancestry will be very high and methods will assign the local ancestry of this variant as African.

Methodologically, hidden Markov models (HMMs) are a natural option for LAI. In a data-generative perspective, local ancestry produces alleles according to the ancestry-specific allele frequency. The probability of a local ancestry state switch along a chromosome is a function of the local recombination rate and global ancestry proportions. Given ancestry-specific allele frequency (emission probability), recombination rate (hidden state transition probability), and global ancestry proportion (initial probability), one can infer the local ancestry (hidden state) using approaches like the Viterbi algorithm (43). For a more comprehensive discussion of local ancestry deconvolution from the mathematical perspective, we refer the readers to two existing excellent review articles (43, 44).

In 2004, Patterson and colleagues (45, 46) proposed an LAI method called ANCESTRYMAP. It utilized HMMs to not only infer hidden states but also account for uncertainties of parental allele frequency estimates with MCMC. Accompanying the release of ANCESTRYMAP was a database containing 3,011 AIMs to facilitate LAI (46). This database was widely used in the early era of admixture mapping and LAI but became less utilized with the advent of denser genotype arrays and sequencing technology. Denser genotyping necessitated the consideration of LD, and methods built on standard HMM were no longer sufficient. To account for LD, Price et al. (47) proposed HAPMIX, a method that utilizes blocks of linked markers instead of individual independent markers to help prediction. The approach allows haploid transitions within a population and between populations, thereby effectively accounting for LD.

The conditional random field (CRF) is a technique that has been widely applied in sequence prediction scenarios such as natural language processing and has gained popularity for LAI in recent years thanks to its flexibility of modeling dependencies between markers and local ancestries. Contrary to HMM, which assumes a generative process of risk alleles and infers the hidden state by applying Bayes’ rule [using ancestry-specific allele frequencies P(G| LA) to infer local ancestries P(LA| G)], CRF directly specifies the relationship between local ancestry and markers in a discriminative model. The program RFMix, currently considered a state-of-art method for LAI, employs this approach, capturing dependencies of local ancestry and genotype through a random forest classifier (48), and has been seen to perform at an impressive true positive rate of ~98% in African American cohorts (49). More recently developed software such as Gnomix has further optimized the algorithm to facilitate efficient LAI with WGS data (50). Since the average of local ancestry across the genome can be viewed as global ancestry, it is recommended to compare the concordance of the proportions estimated by local and global (model-based) ancestry inference to assess LAI performance. Typically, the outputs from LAI and GAI should be very consistent, although one study has found that calculating global ancestry by averaging across local ancestry tracts can achieve higher accuracy than global ancestry algorithms (51).

Despite great methodological advancements in LAI achieved in the past decade, challenges remain in several settings. Firstly, the more genetically similar component ancestries are (i.e., the lower the ϜST between component ancestries), the more challenging it is to deconvolve them. This is especially pronounced in subcontinental-level LAI due to smaller genetic divergences. Secondly, LAI methods typically rely on reference panels. As such, comprehensive and well-matched reference panels are vital for accurate local ancestry deconvolution. Reference panels are also typically preferred to contain homogeneous samples of the representative ancestry for optimal performance, although some software provides options for handling admixture in the reference itself (48). It is worth noting that some understudied ancestries are currently only seen in an admixed context in existing reference panels. Finally, performing multiway LAI is usually more challenging than two-way LAI, resulting in reduced accuracy for populations such as Hispanics/Latinos, who are typically modeled as three-way admixed with African, European, and Amerindigenous ancestry components (24).

4. DISEASE MAPPING

Admixed populations are often excluded in GWAS efforts due to the concern of population structure and long-range admixture LD, which may lead to increased false positive hits. The removal of diverse samples will inevitably lead to reduced generalizability across ancestries (since the results may not apply to everyone) and loss of statistical power (due to the reduction of sample size). In this section, we argue that false positive rates can be well controlled among admixed cohorts with existing methods, and that studying admixture can provide unique opportunities to deepen our understanding of human genetics and diseases.

4.1. Quality Control

Quality control (QC) of genomic and sample data prior to analysis is a key part of all genomics pipelines. Importantly, the appropriate GWAS QC metrics to use for admixed populations differ from homogeneous populations, and QC criteria for variants might need to be adjusted to accommodate allele frequency differences between populations when working with admixed cohorts. For instance, variant QC based on Hardy–Weinberg equilibrium intends to identify sequencing artifacts bearing unusual genotype frequencies but can remove biologically valid variants in admixed/multiancestry cohorts due to the Wahlund effect. As a general principle, QC criteria and considerations that can be applied for multiancestry studies are appropriate for admixed populations, as described more fully in an excellent review paper by Peterson et al. (52). However, a standardized QC pipeline specifically for admixed population is still lacking.

4.2. Imputation

For genotype array datasets, it is a common practice to impute genomic data before GWAS to increase the number of markers to test, which improves the variant overlap of discovery and target datasets for PRS calculation and can help localize association signals. There is still a need for best practices for imputation in admixed populations to be firmly established (53). Of note, imputation quality varies across ancestral segments among admixed populations, which is concerning since it can lead to biases in GWAS and other analyses (54). Multiple benchmarking studies have been conducted to access the performance of imputation under different reference panels and algorithms (55, 56). The general guideline is to use a large reference panel that matches well with the admixed cohort in terms of ancestry composition. Currently, the largest available diverse reference panel is TOPMed (Trans-Omics Precision Medicine), which has 97,256 deeply sequenced human genomes and is available for use in phasing and imputation via the Michigan Imputation Server, although individual-level data are not freely released (57, 58). Studies have shown that using TOPMed as a reference panel for admixed groups can outperform reference panels from the 1000 Genomes Project phase 3 and the Haplotype Reference Consortium (59, 60). More recently, population-specific imputation panels, including a reference panel designated specifically for African American cohorts, have been released (60).

On the methodological side, at least two imputation methods have been designed specifically for admixed populations: Pasaniuc et al. (61) proposed an approach to leverage local ancestry information for imputation, and Liu et al. (62) proposed MaCH-Admix, a piecewise reference selection method that improves imputation accuracy for admixed populations.

4.3. Admixture Linkage Disequilibrium

Another unique feature of admixed populations is the presence of admixture LD. Admixture LD can occur between a pair of independent markers among source populations, as long as the allele frequencies of the same marker markedly differ across populations (63). Mathematically, if loci X and Y are originally unlinked in source population k, we can express the statement in terms of conditional covariance: Dpopk=cov(X,Ypop=k)=0. However, that the conditional covariance equals 0 does not imply that the overall covariance equals 0, D=cov (X,Y) ≠ 0 (64). Said differently, the process of admixture itself can introduce covariation between formerly unlinked markers. This is because if particular haplotypes across the genome are more common in one population, they may be inherited concurrently together such that they are now in (admixture) LD with each other in the admixed daughter population, even if they appeared unlinked in the source population from which they derived. Contrary to traditional LD, where its effect is restricted to a small region of the genome (typically less than a few hundreds of kilobases) (65), admixture LD can span over large segments of the genome (tens of megabases), and can even cross chromosomes (23, 36). Since recombination events break down LD during meiosis, admixture LD decays rapidly over generations, following the formula Dt = D0(1 − r)t (66), where D0 is the initial admixture LD, r is the recombination rate, and t is the number of generations since admixture.

In GWAS, admixture LD can potentially cause issues: A noncausal variant tens of centimorgans away from the causal variant can show a significant association signal due to admixture LD tagging (67), which harms fine-mapping analysis due to a wider credible set. The concern of admixture LD can be addressed by including local ancestry as covariates in a multiple regression, which we discuss below in the Section 4.5.

4.4. Admixture Mapping

Admixture provides a unique opportunity to understand diseases that have different prevalences/incidences in different populations. This phenomenon was first exploited in the context of gene discovery via admixture mapping, the first association strategy designed specifically for admixed populations. Admixture mapping was met with much enthusiasm at its introduction, as it can be more powerful than GWAS in cases when the causal variant is untyped and exhibits different cross-population allele frequencies. This is due to a reduced Bonferroni correction, as there are fewer hypothesis tests conducted in admixture mapping since we use local ancestry tracts as explanatory variables instead of genetic markers (45, 68). The technique was especially attractive prior to the WGS era, as the test only requires a few thousand highly differentiated markers, which is very economical (36). In admixture mapping, we hypothesize that variants that have different frequencies among ancestral groups explain observed prevalence differences in a trait. For a causal variant with large FST, we can roughly view local ancestry as a proxy for the causal variant, and the association between trait and local ancestry persists. Below we discuss two commonly used admixture mapping techniques, namely case-only and case–control.

In case-only studies (Figure 2a), the local ancestral proportion is assumed to be largely evenly distributed across the genome. However, disease-causing variants, which correlate with local ancestry tracts, should exhibit enrichment in the population with higher prevalence (45). In their paper, Patterson et al. (45) discussed the possibility of comparing each locus with the rest of the genome and proposed a Bayesian likelihood ratio test to formally test unusual distributions of local ancestry. An obvious advantage to this approach is that it does not require collecting samples from unaffected individuals and therefore is more economic (36). But in practice, a visual comparison between case and control admixture proportions is highly recommended to determine if confounding factors exist.

Figure 2.

Figure 2

A comparison of association outcomes from common ancestry-informed gene discovery methods: (a) case-only admixture mapping, (b) case–control admixture mapping, (c) genome-wide association study (GWAS), and (d) GWAS conditioned on local ancestry (LA). In all panels we simulated an admixed population’s LA tracts with a Markov model, where individual genotypes were generated using independent ancestry-specific allele frequency according to the Balding–Nichols model without specific consideration of linkage disequilibrium (LD). The phenotype is generated via logistic regression, with the explanatory variable being a marker with large allele frequency divergence (AFpop1=0.75, AFpop2=0.09) and constant effect size across ancestries. For case-only admixture mapping (a), we took a subset of individuals with the disease and plotted the average proportion of LA across individuals. Note that the admixture proportion is relatively constant across the genome, but shows an unusual peak at the causal variant. For case–control admixture mapping (b), we regressed the phenotype on LA and drew a classic Manhattan plot; note that while we have good statistical power, the association signal peak is very wide. For GWAS (c), we ran a logistic regression on all markers and observed very high statistical power, but we also observed admixture LD resulting in an associated region that is somewhat wide. For GWAS + LA (d), we ran logistic regression on markers while conditioning on LA; note that we achieved better signal localization with the cost of losing statistical power. We note that for causal markers with no allele frequency difference among ancestries, the inclusion of LA as a covariate has a smaller impact on GWAS outcomes.

In the case–control approach (Figure 2b), like most association tests, we fit a logistic regression with disease status being the dependent variable and local ancestry being the independent variable. Under the null assumption of no association between local ancestry and phenotype, we obtain significant p-values when the odds ratios between local ancestry and phenotype significantly deviate from 1. The advantage of using logistic regression instead of contingency table–based approaches such as the χ-squared and Fisher exact test is to allow for confounder adjustment (69). Compared to case-only admixture mapping, case–control admixture mapping makes less assumptions and is typically more robust.

Similar to GWAS, where p-value significance should be adjusted to control for false positives, one should also apply multiple test correction for admixture mapping. However, there is not yet a consensus on the optimal p-value threshold for admixture mapping, and different thresholds have been used in different studies (7072). Grinde et al. (73) developed a theoretical framework to characterize the optimal threshold based on ancestry, generations, and population structure and found 2.1 × 10−5 for African American and 4.5 × 10−6 for Hispanic populations of the WHI SHARe [Women’s Health Initiative SNP (single-nucleotide polymorphism) Health Association Resource] dataset. However, the optimal threshold may vary from study to study due to different demographic composition of the admixture cohort. Despite its success in finding plausible loci for many complex traits such as asthma and cardiovascular disease (7476), admixture mapping is not without its limitations. First, good performance in admixture mapping requires accurate local ancestry calls—a biased LAI can lead to an increased false-positive rate (see Section 3.2) (36). Second, admixture mapping assumes that causal variant MAF differences explain the differences in prevalence among source populations. If the causal variant has similar MAFs across populations, one cannot detect signal through admixture mapping (in both case-only and case–control studies). This is because local ancestry is linked to disease through causal variants, and if local ancestry is independent of the causal variant [because the allele frequencies are similar across ancestries, or p (G|LA) = p (G)], then knowing the local ancestry does not provide information of disease status (Figure 3b). Therefore, admixture mapping works better for diseases that have a different incidence/prevalence across populations due to a higher chance of divergent causal variant frequencies (36). Third, admixture mapping implicitly leverages admixture LD (discussed in Section 4.3) to perform disease mapping, and therefore requires admixture LD strength to be moderate but not cross-chromosomal. Ideal cohorts should thus have at least two generations since admixture events to reduce long-range admixture LD (36). Finally, admixture mapping suffers from its ability to localize causal variants since the chromosomal regions identified through this technique are substantially larger than those identified in GWAS (42). This can make downstream experimental validation more difficult. Overall, with the arrival of new sequencing technologies with higher coverage of the genome at diminishing costs, admixture mapping has had decreasing usage over time.

Figure 3.

Figure 3

Causal diagrams illustrating the impact of ancestry on gene discovery. (a) Population stratification: When ancestry is associated with both allele frequency and the prevalence of the disease, an association between the testing marker and the disease can thus be due to the uncontrolled ancestry states rather than a true causal association of the marker to the phenotype. Population stratification can be fixed/alleviated by conditioning on ancestry states. (b) Admixture mapping: Local ancestry influences the frequency of the causal variant, which induces a difference in disease prevalence across populations. Admixture mapping uses this fact to detect associations between regions with enrichment of a local ancestry and disease. However, if the causal variant does not have different frequencies across populations (i.e., local ancestry is independent of the causal variant), then the association between local ancestry and disease vanishes/disappears. (c) Usage of local ancestry in genome-wide association studies (GWAS): The correlation between two independent markers can be induced by admixture linkage disequilibrium (ALD) since the causal variant transfers its signal to tagging variants through linkage disequilibrium and ALD. By conditioning on local ancestry, researchers can effectively remove long-range ALD signal with the cost of lost statistical power. Dashed lines are the association tests that researchers typically perform. Solid lines are data generating processes.

4.5. Population Stratification/Confounder Adjustments

Population stratification is the phenomenon whereby spurious associations can occur between a phenotype and the testing markers due to the confounding effect of ancestry state. Population stratification can happen since different populations may have different allele frequencies and incidence/prevalence. To ameliorate the concern of false positives, one should always take relevant confounders into account, and the solution can be generally categorized into global and local ancestry adjustments. We demonstrate population stratification with a causal directed acyclic graph, presented in Figure 3a.

4.5.1. Global ancestry adjustment

One strategy to control for population stratification is to use inferred global ancestry. A well-known and standard approach in GWAS is to incorporate top PCs in the multiple linear/logistic regression. Top PCs are generally highly correlated with ancestry, and therefore conditioning on PCs can effectively reduce false positives (29). However, the appropriate number of PCs to include in the model is not uniform and the number selected for a given analysis can be somewhat arbitrary. Similarly, it is also possible to directly include the admixture proportion(s) as covariates (e.g., outputs from STRUCTURE or ADMIXTURE), and the same intuition applies. A related but distinct approach is to compute a genetic relationship matrix (GRM) for use in a mixed-effect model, which we further discuss in Section 4.6.

Although conditioning on global ancestry with fixed effects or random effects is theoretically appealing and easy to implement, studies have often reported insufficient control of false-positives in realistic data (10). This is likely explained by subpopulation structure beyond the global ancestry proportions or excessive differences in ancestry composition between case and control cohorts. Therefore, we expect that the search for optimal approaches to control for false positives for multiancestry/admixed cohorts will remain an active research area in statistical genetics.

4.5.2. Local ancestry adjustment

A less recognized approach to control for confounders is to adjust for local ancestry: Qin et al. (77) proposed a local ancestry PC approach to control for population stratification. Directly using local ancestry as a covariate to control for confounding was proposed soon thereafter (78). By conditioning on local ancestries in GWAS analysis, genetic markers tagged to causal variants by admixture LD will no longer exhibit significant spurious associations (Figure 3c).

Extensive theoretical and simulation work has demonstrated that global ancestry adjustment is typically required for controlling population stratification in admixed populations, given that phenotypes may vary with global ancestry proportions, whereas local ancestry adjustment can be helpful if the admixture LD (Figure 2d) is the main concern, with the cost of potential loss in statistical power (67). The effect of adjusting for local ancestry is especially pronounced on variants whose frequencies greatly diverge across ancestries. Another simulation demonstrated that adjusting for local ancestry can increase statistical power when admixture LD is in the opposite direction of LD, while decreasing statistical power when admixture LD is in the same direction as LD due to overcorrection (79).

4.6. Genetic Relationship Matrices and Mixed-Effect Models

As an alternative approach for GWAS, mixed-effect model approaches have gained popularity in the last decade due to their versatility. Compared to fixed-effect models, mixed-effect models can effectively control for population stratification, model sample relatedness, aggregate small effect sizes to fill the gap of missing heritability (80) [even under the context of admixture (81)], and sometimes boost statistical power by absorbing phenotypical variance attributed to relatedness. In this section, we discuss the usage of GRMs in the context of multiancestry/admixed populations.

4.6.1. Controlling for population stratification

As an alternative to PCs, population stratification can be controlled by using a GRM (30). GRMs are computed using sample-wise correlations of LD pruned markers given by Φ = GGT / p, where the genotype matrix G is standardized (mean = 0, variance = 1) and p is the number of markers. For a homogeneous population, two individuals should have Φij ≈ 0, indicating no relatedness. However, for multiancestry cohorts or admixed cohorts, two individuals from the same population typically look genetically more similar than two individuals from different populations due to population-specific allele frequency spectra, which is reflected in the estimated GRM: Φipop1,jpop1>0,Φipop1,jpop2<0. Including a GRM in a (generalized) linear mixed model can thus account for population-level relatedness.

Compared to PCs, GRMs are less arbitrary (in terms of selecting the appropriate number of PCs) and robust under population stratification. The performance of PCAs versus GRMs remains contentious/debatable, but more evidence has suggested that GRMs are generally more efficient (8285). As an aside, PCs are obtained by performing an eigen-decomposition of the GRM (82); therefore, conditioning on top PCs somewhat approximates the full GRM. The downside of the mixed-effect model is its high computational cost: Fitting a linear/logistic mixed model is significantly more computationally expensive than standard GWAS. Luckily, recent methodological advancements, such as SAIGE and GMMAT, have optimized the computation and made mixed-model GWAS possible on biobank-scale cohorts (86, 87).

4.6.2. Modeling sample relatedness

Mixed-effect models were initially applied in animal and plant breeding, where the kinship matrix can be easily obtained from a known pedigree (88, 89). The intuition is that if two individuals share 50% of their genetic material, then half of their phenotypic correlation is attributed to genetical similarity. However, pedigree-based kinship specification is often unfeasible in large collections of human samples, given that family information is typically hard to collect or unknown and there can be cryptic relatedness. Thus, a common practice is to infer the kinship matrix from the genotype matrix. However, inferring the kinship matrix is not straightforward in multiancestry and admixed populations due to the entangled population structure and family structure. Using a homogeneous GRM estimator, Φ = GGT / p, can therefore result in biases in sample relatedness (i.e., samples from the same population can be incorrectly inferred to be relatives).

In 2010, a method called KING-robust was developed for estimating sample relatedness among multiancestry cohorts (90), but the method did not explicitly consider admixture. Subsequently, Thornton et al. (91) proposed a method called REAP that effectively takes population structure into account while producing unbiased estimates of family structure for admixed populations. This method helped find cryptic relatives in large biobanks that had not yet been reported. The downside is that it requires ancestry-specific allele frequencies as input and a good estimation of parental ancestry composition, which limits its practical usage. To overcome this potential issue, Conomos et al. (92) developed PC-Relate, a method that can accurately estimate admixed sample relatedness using PC-based approaches. The combination of PC-Air (described in Section 3.1) and PC-Relate, implemented in the R package GENESIS, is conceptually appealing for its application to multiancestry/admixed GWAS: Population structure and family structure can be disentangled so that PCs control population structure and the kinship matrix controls family structure (35, 92). However, simulations found that in practice GENESIS produces p-values very similar to GMMAT (93), where standard GRM is used as a random-effect term.

4.7. Jointly Modeling Local Ancestry and Genetic Markers

Multiple researchers have argued that GWAS can provide better statistical power for admixed populations than it can for homogeneous populations (67, 94). This is likely due to the fact that phenotypical variance is larger within admixed populations than within the parental populations, as well as due to MAF divergence across populations. For instance, due to genetic drift, a variant may exist at extremely different allele frequencies or even be fixed in two ancestral populations. Performing GWAS on homogeneous populations thus provides little statistical power at this locus. However, larger phenotypic variance and moderate allele frequencies among admixed populations may provide better statistical power (94).

Several efforts have been made to jointly model local ancestry and genetic markers. Pasaniuc et al. (61) developed a 1-degree-of-freedom model that combines case-only admixture mapping with genetic marker association testing and found 8% improved power. Shriner et al. (95) developed BMIX, which leverages the reduced multiple-testing burden of admixture mapping by sequentially performing admixture mapping and association mapping. Skotte et al. (96) developed a method to generate ancestry-specific effect size estimates in admixed populations by leveraging ancestry-specific allele frequencies in GWAS. Atkinson et al. (49) modeled ancestry-specific genotype counts with a logistic or linear regression conditioning on local ancestry, outputting ancestry-specific effect size estimates, and found improved discovery power when there is cross-ancestry effect size heterogeneity.

Overall, there are numerous well-performing solutions to control for population stratification and to increase statistical power such that admixed individuals should not be excluded from GWAS efforts. By including more diverse and historically understudied populations in gene discovery efforts, we can find novel and meaningful associations relevant to individuals of all ancestries but that may have been undiscoverable in previously studied populations.

4.8. Effect Size Heterogeneity

Marginal effect sizes, the effect sizes estimated from GWAS, do not always reflect the causal effect size. This is because the vast majority of GWAS loci are not themselves causal but rather pinpoint loci of interest by tagging nearby causal variants via LD. In homogeneous populations, the marginal effect size is the product of the LD matrix and causal effect size (when the genotype matrix and phenotypes are standardized). However, in admixed populations, the marginal effect size becomes hard to quantify, since both LD patterns and causal effect sizes can vary across ancestries. Nevertheless, a good understanding of the extent of allelic effect heterogeneity is essential for many downstream analyses such as PRS and meta-analysis.

Multiple studies have demonstrated that marginal effect sizes can be different across ancestries for complex traits (97, 98). Several studies have also attempted to infer the extent of causal variant heterogeneity across populations and admixture: Shi et al. (99) introduced S-LDXR to quantify the correlation of effect sizes across East Asian and European populations and found the average trans-ancestry correlation is 0.85, implying some degree of causal effect size heterogeneity across populations. Patel et al. (100) found allelic heterogeneity among tagging variants between European Americans and European genomic regions of African Americans (controlled LD difference), which implies that causal allelic heterogeneity can be attributed to interactions. More recent work indicates that causal effect size correlations for common variants across local ancestry tracts within admixed cohorts are generally larger than are observed for cross-ancestry cohorts, likely due to the absence of gene–environment interactions (G×E) (101). Since admixed populations have multiple ancestry components within the same individual, they offer unique opportunities for assessing genetic effect sizes while controlling for the environment.

4.9. Testing for Interactions

Gene–gene interaction (G×G, aka epistasis) and G×E can play a role in phenotype formation and can introduce effect size heterogeneity among cross-ancestry or admixed cohorts (100). While not widely used, methods have been proposed to leverage the unique genetic makeup of admixed populations to test such interactions: Park et al. (102) proposed to test G×E by leveraging degrees of admixture. They posited that admixture proportions can be viewed as a proxy for environmental factors, such that the product of the testing marker and admixture proportion can be viewed as a G×E term. Aschard et al. (103) proposed testing G×G using local ancestries instead of genetic markers, thus reducing the number of required tests to 5 × 105, thereby reducing computational burden and Bonferroni correction.

4.10. Rare Variant Association Testing

Rare variant association testing has been largely unexplored among admixed populations. Like in GWAS, one should pay extra attention to population stratification to avoid spurious associations. Notably, it has been demonstrated that rare variants are more subject to population stratification than common variants since rare variants are typically younger and more ancestry specific (104, 105) and PCs that are constructed using common variants are often insufficient (104, 106, 107). However, systematic theoretical and simulation work is still needed to evaluate the performance of rare variant association tests among admixed populations.

5. DOWNSTREAM ANALYSIS AFTER GENOME-WIDE ASSOCIATION STUDIES

Following GWAS, subsequent analyses are often conducted to summarize results genome wide (e.g., conducting gene set enrichment analyses or constructing PRSs), compare or combine results from the study at hand with those of other studies (e.g., meta-analysis), or localize GWAS peaks down to putative causal variants (e.g., in statistical fine-mapping). Following up on GWAS results helps provide more concrete biological interpretations of association signals and assists in the identification of patients at elevated risk of developing complex diseases.

5.1. Meta-Analysis

To combine results from GWAS runs on multiple cohorts or populations, one can apply meta-analysis to aggregate GWAS summary statistics, which benefits effect size estimates and statistical power by effectively increasing the total sample size. Meta-analysis is often followed by other techniques such as fine-mapping (108) and PRS (109) and has been shown to improve their performance in some situations.

One commonly used meta-analysis technique is the inverse-variance-weighted (IVW) fixed-effect model, where one assumes constant effect size across cohorts and weights them under the assumption that better powered GWAS are more reliable and have smaller variance. Cross-ancestry meta-analysis can be complicated by factors like causal/marginal effect size heterogeneity and LD and allele frequency differences, such that an IVW estimator may not be appropriate (52). Several methods have been designed for cross-ancestry meta-analysis that allow for allelic heterogeneity: MANTRA models effect size heterogeneity based on population distance using a Bayesian partition model (110, 111). More specifically, cohorts are discretely clustered based on genetic distance; samples within the same cluster have smaller effect size heterogeneity, whereas samples in different clusters can have larger effect size heterogeneity. MR-MEGA models effect size heterogeneity as a continuum of genetic variation, and it has a smaller computational cost than MANTRA. However, the approach can lose power for admixed populations due to the correlation between allelic effect heterogeneity and admixture proportion (112). More recently, Turley et al. (113) developed the MAMA (Multi-Ancestry Meta-Analysis) method, which models allele frequency and LD differences across populations and accounts for heterogeneity of marginal effect sizes across GWAS to produce unbiased summary statistics estimates. These methods were all designed with cross-ancestry meta-analyses in mind; however, a method specifically for admixed groups does not yet exist.

5.2. Polygenic Scores

For complex traits, once summary statistics from GWAS are obtained, one can use these results in aggregate to predict a patient’s risk of developing a disease due to their inherited genetic factors. This is accomplished by summing over all risk alleles weighted by their effect sizes as estimated from GWAS: PGSi=j=1pβ^jgij. This allows for the prospective identification of individuals at elevated risk of diseases, potentially allowing for earlier screening or interventions (114). Although sound progress has been made in the development of PGS models, PGS have extremely poor transferability across ancestries (3, 4). PGS are typically constructed with European discovery data, as standard practice advises building the model on the largest discovery GWAS available, which typically is of European individuals given the current Eurocentric bias of genetic research (1, 4). For PGS built on such discovery data, prediction accuracy with standard methods drops by half for admixed Latin American individuals and is up to fivefold less predictive in individuals of African descent (4). This drop in prediction accuracy has been shown to follow genetic divergence between the discovery and target cohorts. Further, for PGS prediction on African American admixed populations, studies have found that the transferability (using European summary statistics) decreases with the increased global proportion of African ancestry (97, 115). This decreased prediction accuracy across populations is thought to be due to on the genetic side to differences in the patterns of LD, minor allele frequencies, and genetic architecture of traits across ancestries (4, 97). Differential phenotyping, environments, and effect size heterogeneity due to G×E also can contribute to the reduced transferability of PRS.

The most commonly used PGS approach is pruning and thresholding, where we perform LD clumping first to remove variants that are in high LD with each other, and then provide a threshold for variants to include in the model based on their GWAS p-values. More sophisticated PGS approaches that attempt to select or weight variants in a more principled way include effect size regularization methods such as Lasso regression and Bayesian approaches directly considering LD, such as LDpred and PRS-CS (116, 117). For a more comprehensive list of PGS software, we recommend the review by Wang et al. (118).

In the multiancestry space, meta-analyzing summary statistics from multiple populations can be beneficial in generating more portable summary statistics (109). A new method, PRS-CSx, shows increases in prediction accuracy and transferability across populations by placing a shared continuous shrinkage prior distribution to model correlated effect sizes from multiple discovery summary statistics (119). This method may also lead to improved performance in an admixed setting, although this has yet to be thoroughly evaluated. PolyPred improves PGS transferability across populations by incorporating causal effects through fine-mapping and shows increased accuracy in South Asians and Africans (120). Using population-specific effect sizes combined with LAI can also improve PRS performance for admixed populations (97, 121) by more appropriately weighting variants’ contribution to the phenotype. The design of PRS that perform better for diverse and admixed populations is general an area of active research and innovation.

5.3. Fine-Mapping

After running GWAS, a common follow-up step is to statistically fine-map top loci to identify the putative causal variant(s) residing in the peak region or at least reduce the credible set of variants going into functional validation. Multiple studies have demonstrated that applying fine-mapping across diverse populations decreases the credible set by leveraging different LD patterns across populations (122125). Several methods have been developed for trans-ancestry fine-mapping; for example, Kichaev et al. (126) developed PAINTOR, which allows for causal effect size heterogeneity across ancestries and integrates functional annotations to prioritize the most likely causal variants. More recently, MsCAVIAR has been developed to integrate multiple cross-population studies under the random effect framework, using summary statistics and LD as inputs (127). For a detailed discussion on fine-mapping techniques, we refer readers to an excellent review paper by Schaid et al. (111).

On the admixed front, as admixture LD can convolute GWAS signals, summary statistics that are obtained by conditioning on local ancestries may help fine-mapping in admixed populations (Figure 2d). However, current research in the fine-mapping space on admixed populations specifically is still limited.

6. A DEMOGRAPHIC PERSPECTIVE OF ADMIXED POPULATIONS

In addition to the uses in translational genetics described above, the ability to paint local ancestry within admixed populations provides a window into the migratory, evolutionary, and demographic processes that have shaped the patterns of genetic variation observed in modern-day populations.

6.1. Gene Flow

In 2012, Patterson et al. introduced a suite of statistical Ftests (e.g., F3, F2, F4) that allow for an examination of population history to detect past instances of admixture (128). To test if population C is admixed between populations A and B, one can compute F3(C;A,B) = E [(c’−a’) (c’−b’)],where a′, b′, and c′ are allele frequencies of population A, B, and C, respectively. The value of F3 (C;A B) is expected to be negative if population C is indeed admixed from population A and B. Intuitively, this is testing if the average allele frequency of population C is between those of populations A and B; if so, we have confidence that population C is mixed from population A and B. Deeper discussion of admixed populations in the population genetic perspective is beyond the scope of this review, but we refer readers to a review paper (129).

6.2. Demographic Inference From Local Ancestry

Before making its way into statistical genetics, LAI was born in the field of population genetics and biological anthropology. Using features of the genomic ancestry tracts present in modern admixed populations, one can trace the timing of past admixture events and the origin of the component ancestries. For example, Moreno-Estrada et al. (40) performed LAI on a Caribbean cohort and developed a method to conduct ancestry-specific PCA to project only the component ancestries of interest onto PC space of the relevant continental reference panel. This allows for finer-scale investigation of the geographic regions from which ancestry components of admixed modern populations are derived. The primary novelty of their method as an extension to classic PCA is its ability to handle the large amounts of missing data introduced by restricting to only local ancestry segments from one ancestry background.

The length of tracts can also be used for inferring the timing of admixture events (130132). As recombination can be thought of as a molecular clock, the length of haplotype tracts reflects how many generations have occurred since the original pulse of admixture, as each meiotic episode will chop up tracts. As such, modern day local ancestry tract lengths can be used to infer the demographic history of the population, as well as the ancestry proportions contributed from various ancestry populations at different points in the past, and whether admixture occurred in pulses or continuously (130). Another conceptually different approach to timing admixture events is to leverage the decaying nature of admixture LD over generations, as implemented in the package ALDER (41).

Further uses of local ancestry in the population and evolutionary genetics domains includes calculating ancestry-specific allele frequencies (37, 39), constructing recombination maps based on LAI switch point locations (38), and characterizing natural selection based on regions with elevated local ancestries (133). Local ancestry is thus informative about historical demographic and evolutionary processes shaping patterns of modern-day genetic variation, which can influence medically relevant traits. However, the population genetic and statistical/medical genomics fields have historically remained fairly distinct.

7. DISCUSSION AND PERSPECTIVE

Historically, admixed individuals have been systematically removed from large-scale genomics studies due to concerns that they would bias results via population stratification. As we have discussed throughout this review, there are numerous options and software packages allowing for proper handling of admixture in various statistical genetics settings such that there is typically no longer a need to exclude admixed populations from study. Methodological development for the proper consideration of admixture is also an active ongoing area of study, and in the future we anticipate many additional statistical, software, and genetic reference panel resources. Such tools will prove to be vital, as large-scale datasets and biobanks are increasingly recruiting more representative samples, and the global population continues to become more mixed (134).

It is worth highlighting that diverse cohorts also offer the opportunity to expand and accelerate gene discovery findings relevant for individuals of all ancestries. As the majority of causal variants appear to be shared across populations (123, 124, 135140), the inclusion of previously understudied populations, with both new variants and a different frequency spectrum for shared variants, opens the door for the identification of new genome-wide-significant loci. Such loci are meaningful to everyone’s health, but they may have been statistically undiscoverable in Europe although discoverable elsewhere in the world (52, 94, 98). Expanding the diversity of research participants also helps gene discovery efforts in other ways, as we have discussed here, notably through fine-mapping improvements from different linkage patterns across populations, as well as by offering deeper insight into gene–gene/gene–environment interactions and past historical and evolutionary events.

In general, many of the same considerations for the study of mixed-ancestry datasets pertain to the study of admixed samples. Such features include the need for informed choices of QC metrics specific to the population(s) at hand (Section 4.1), selection of the most appropriate reference panel for phasing/imputation (Section 4.2), and consideration of LD, which differs across genetic backgrounds. In the admixed setting, a unique type of LD is also present—admixture LD (Section 4.3)—which can induce long-range associations in gene discovery settings if not controlled for. Another genetic feature exclusive to admixed samples is the ability to account for different genetic backgrounds within samples via local ancestry deconvolution, which is a core principle underlying many of the statistical methods designed with admixed populations in mind, chronologically beginning with admixture mapping (Section 4.4), and subsequently integrated into GWAS (Section 4.7). Like multiancestry GWAS, population structure can induce inflated type 1 errors if not addressed properly (Section 4.5). The issue can be largely fixed by including PCs or global admixture proportion as covariates, as well as local ancestry, particularly if the primary concern is long-range admixture LD. Methods for accounting for familial relatedness with GRMs have also been extended to mixed cohorts and admixed settings (Section 4.6), as have tests for gene–gene and gene–environment interactions (Section 4.9). Admixed populations offer researchers the rare ability to control for the external environment in individuals with multiple ancestries, thus allowing for deconvolution of genetic versus environmental impacts on estimated GWAS effect sizes. Several areas of study do not have admixture-specific tools, but have seen the recent pioneering of novel methods for multiancestry studies that are relevant for admixed populations. These areas include many of the post-GWAS activities: meta-analysis (Section 5.1), construction of more portable PRS (Section 5.2), and statistical fine-mapping (Section 5.3). By painting local ancestry tracts, one also has a window into the history of recombination and demography for the population, elucidating past migration, evolutionary, and admixture events (Section 6). In sum, admixed populations are a growing proportion of the global population, motivating their timely incorporation into statistical genomics study, as well as providing a unique and invaluable lens to better understand the genomic architecture of complex traits. We hope that this review demystifies the strategies for the study of admixed populations and facilitates their inclusion in future genomics efforts.

ACKNOWLEDGMENTS

This manuscript was supported by K01MH121659 from the NIH/NIMH, the Caroline Wiess Law Fund for Research in Molecular Medicine, and the ARCO Foundation Young Teacher-Investigator Fund at Baylor College of Medicine.

Footnotes

DISCLOSURE STATEMENT

The authors are not aware of any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this review.

LITERATURE CITED

  • 1.Sirugo G, Williams SM, Tishkoff SA. 2019. The missing diversity in human genetic studies. Cell 177(1):26–31 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Popejoy AB, Fullerton SM. 2016. Genomics is failing on diversity. Nature 538:161–64 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, et al. 2017. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet 100(4):635–49 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. 2019. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet 51(4):584–91 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, et al. 2019. Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8:e39725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Coram MA, Fang H, Candille SI, Assimes TL, Tang H. 2017. Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am. J. Hum. Genet 101(2):218–26 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Huang H, Peloso GM, Howrigan D, Rakitsch B, Simon-Gabriel CJ, et al. 2016. Bootstrat: population informed bootstrapping for rare variant tests. bioRxiv 068999. 10.1101/068999 [DOI] [Google Scholar]
  • 8.Lander ES, Schork NJ. 1994. Genetic dissection of complex traits. Science 265(5181):2037–48 [DOI] [PubMed] [Google Scholar]
  • 9.Martin ER, Tunc I, Liu Z, Slifer SH, Beecham AH, Beecham GW. 2018. Properties of global-and local-ancestry adjustments in genetic association tests in admixed populations. Genet. Epidemiol 42(2):214–29 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, et al. 2019. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 8:e39702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sul JH, Martin LS, Eskin E. 2018. Population structure in genetic studies: confounding factors and mixed models. PLOS Genet 14(12):e1007309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Walters RK, Polimanti R, Johnson EC, McClintick JN, Adams MJ, et al. 2018. Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nat. Neurosci 21(12):1656–69 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.TOPMed (Trans-Omics Precis. Med.). 2021. TOPMed Whole Genome Sequencing Project—freeze 5b, phases 1 and 2. Tech. Rep., TOPMed, Natl. Heart Lung Blood Inst., Bethesda, MD, updated Oct. 28. https://topmed.nhlbi.nih.gov/topmed-whole-genome-sequencingproject-freeze-5b-phases-1-and-2 [Google Scholar]
  • 14.Rotimi C, Abayomi A, Abimiku A, Adabayeri VM, Adebamowo C, et al. 2014. Research capacity. Enabling the genomic revolution in Africa. Science 344(6190):1346–48 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Stevenson A, Akena D, Stroud RE, Atwoli L, Campbell MM, et al. 2019. Neuropsychiatric Genetics of African Populations-Psychosis (NeuroGAP-Psychosis): a case-control study protocol and GWAS in Ethiopia, Kenya, South Africa and Uganda. BMJ Open 9(2):e025469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bien SA, Wojcik GL, Hodonsky CJ, Gignoux CR, Cheng I, et al. 2019. The future of genomic studies must be globally representative: perspectives from PAGE. Annu. Rev. Genomics Hum. Genet 20:181–200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Logue MW, Amstadter AB, Baker DG, Duncan L, Koenen KC, et al. 2015. The Psychiatric Genomics Consortium Posttraumatic Stress Disorder Workgroup: Posttraumatic stress disorder enters the age of large-scale genomic collaboration. Neuropsychopharmacology 40(10):2287–97 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Precis. Med. Initiat. (PMI) Work. Group. 2015. The Precision Medicine Initiative Cohort Program—building a research foundation for 21st century medicine Tech. Rep., PMI Work. Group., Sept. 17. https://acd.od.nih.gov/documents/reports/DRAFT-PMI-WG-Report-9-112015-508.pdf [Google Scholar]
  • 19.Ramirez AH, Sulieman L, Schlueter DJ, Halvorson A, Qian J, et al. 2022. The All of Us Research Program: data quality, utility, and diversity. Patterns 3(8):100570. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Korunes KL, Goldberg A. 2021. Human genetic admixture. PLOS Genet 17(3):e1009374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Nelis M, Esko T, Mägi R, Zimprich F, Toncheva D, et al. 2009. Genetic structure of Europeans: a view from the North-East. PLOS ONE 4(5):e5472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Balding DJ, Nichols RA. 1995. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96(1–2):3–12 [DOI] [PubMed] [Google Scholar]
  • 23.Pfaff CL, Parra EJ, Bonilla C, Hiester K, McKeigue PM, et al. 2001. Population structure in admixed populations: effect of admixture dynamics on the pattern of linkage disequilibrium. Am. J. Hum. Genet 68(1):198–207 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Padhukasahasram B 2014. Inferring ancestry from population genomic data and its applications. Front. Genet 5:204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Pritchard JK, Stephens M, Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics 155(2):945–59 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Alexander DH, Novembre J, Lange K. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19(9):1655–64 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Lawson DJ, van Dorp L, Falush D. 2018. A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots. Nat. Commun 9:3258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Alexander DH, Lange K. 2011. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinform 12:246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. 2006. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet 38(8):904–9 [DOI] [PubMed] [Google Scholar]
  • 30.Price AL, Zaitlen NA, Reich D, Patterson N. 2010. New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet 11:459–63 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Patterson N, Price AL, Reich D. 2006. Population structure and eigenanalysis. PLOS Genet 2(12):2074–93 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.McVean G 2009. A genealogical interpretation of principal components analysis. PLOS Genet 5(10):e1000686. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zheng X, Weir BS. 2016. Eigenanalysis of SNP data with an identity by descent interpretation. Theor. Popul. Biol 107:65–76 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, et al. 2008. Genes mirror geography within Europe. Nature 456(7218):98–101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Conomos MP, Miller MB, Thornton TA. 2015. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet. Epidemiol 39(4):276–93 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Smith MW, O’Brien SJ. 2005. Mapping by admixture linkage disequilibrium: advances, limitations and guidelines. Nat. Rev. Genet 6(8):623–32 [DOI] [PubMed] [Google Scholar]
  • 37.Zhang QS, Browning BL, Browning SR. 2016. ASAFE: ancestry-specific allele frequency estimation. Bioinformatics 32(14):2227–29 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Wegmann D, Kessner DE, Veeramah KR, Mathias RA, Nicolae DL, et al. 2011. Recombination rates in admixed individuals identified by ancestry-based inference. Nat. Genet 43(9):847–53 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Gravel S, Zakharia F, Moreno-Estrada A, Byrnes JK, Muzzio M, et al. 2013. Reconstructing Native American migrations from whole-genome and whole-exome data. PLOS Genet 9(12):e1004023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Moreno-Estrada A, Gravel S, Zakharia F, McCauley JL, Byrnes JK, et al. 2013. Reconstructing the population genetic history of the Caribbean. PLOS Genet 9(11):e1003925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Loh PR, Lipson M, Patterson N, Moorjani P, Pickrell JK, et al. 2013. Inferring admixture histories of human populations using linkage disequilibrium. Genetics 193(4):1233–54 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Shriner D 2013. Overview of admixture mapping. Curr. Protoc. Hum. Genet 76:1.23.1–1.23.8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Wu J, Liu Y, Zhao Y. 2021. Systematic review on local ancestor inference from a mathematical and algorithmic perspective. Front. Genet 12:639877. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Geza E, Mugo J, Mulder NJ, Wonkam A, Chimusa ER, Mazandu GK. 2019. A comprehensive survey of models for dissecting local ancestry deconvolution in human genome. Brief. Bioinform 20(5):1709–24 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Patterson N, Hattangadi N, Lane B, Lohmueller KE, Hafler DA, et al. 2004. Methods for high-density admixture mapping of disease genes. Am. J. Hum. Genet 74(5):979–1000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Smith MW, Patterson N, Lautenberger JA, Truelove AL, Mcdonald GJ, et al. 2004. A highdensity admixture map for disease gene discovery in African Americans. Am. J. Hum. Genet 74(5):1001–13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Price AL, Tandon A, Patterson N, Barnes KC, Rafaels N, et al. 2009. Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLOS Genet 5(6):e1000519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Maples BK, Gravel S, Kenny EE, Bustamante CD. 2013. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet 93(2):278–88 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Atkinson EG, Maihofer AX, Kanai M, Martin AR, Karczewski KJ, et al. 2021. Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat. Genet 53(2):195–204 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Hilmarsson H, Kumar AS, Rastogi R, Bustamante CD, Montserrat M, Ioannidis AG. 2021. High resolution ancestry deconvolution for next generation genomic data. bioRxiv 10.1101/2021.09.19.460980. 10.1101/2021.09.19.460980 [DOI] [Google Scholar]
  • 51.Uren C, Hoal EG, Möller M. 2020. Putting RFMix and ADMIXTURE to the test in a complex admixed population. BMC Genet 21:40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Peterson RE, Kuchenbaecker K, Walters RK, Chen CY, Popejoy AB, et al. 2019. Genomewide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179(3):589–603 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Rosenberg NA, Huang L, Jewett EM, Szpiech ZA, Jankovic I, Boehnke M. 2010. Genomewide association studies in diverse populations. Nat. Rev. Genet 11(5):356–66 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Seldin MF, Pasaniuc B, Price AL. 2011. New approaches to disease mapping in admixed populations. Nat. Rev. Genet 12(8):523–28 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Sariya S, Lee JH, Mayeux R, Vardarajan BN, Reyes-Dumeyer D, et al. 2019. Rare variants imputation in admixed populations: comparison across reference panels and bioinformatics tools. Front Genet 10:239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Schurz H, Müller SJ, van Helden PD, Tromp G, Hoal EG, et al. 2019. Evaluating the accuracy of imputation methods in a five-way admixed population. Front Genet 10:34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Das S, Forer L, Schönherr S, Sidore C, Locke AE, et al. 2016. Next-generation genotype imputation service and methods. Nat. Genet 48(10):1284–87 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, et al. 2021. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590(7845):290–99 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Kowalski MH, Qian H, Hou Z, Rosen JD, Tapia AL, et al. 2019. Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLOS Genet 15(12):e1008500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.O’Connell J, Yun T, Moreno M, Li H, Litterman N, et al. 2021. A population-specific reference panel for improved genotype imputation in African Americans. Commun. Biol 4:1269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Pasaniuc B, Zaitlen N, Lettre G, Chen GK, Tandon A, et al. 2011. Enhanced statistical tests for GWAS in admixed populations: assessment using African Americans from CARe and a breast cancer consortium. PLOS Genet 7(4):e1001371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Liu EY, Li M, Wang W, Li Y. 2013. MaCH-Admix: genotype imputation for admixed populations. Genet. Epidemiol 37(1):25–37 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Rybicki BA, Iyengar SK, Harris T, Liptak R, Elston RC, et al. 2002. Prospects of admixture linkage disequilibrium mapping in the African-American genome. Cytometry 47(1):63–65 [DOI] [PubMed] [Google Scholar]
  • 64.Zaidi A 2021. Why does admixture create LD between unlinked loci? Arslan Zaidi Personal Blog, Oct. 9. https://www.arslanzaidi.com/post/why-does-admixture-create-ld-betweenunlinked-loci/ [Google Scholar]
  • 65.Reich D, Cargill M, Bolk S, Ireland J, Sabeti P, et al. 2001. Linkage disequilibrium in the human genome. Nature 3:199–204 [DOI] [PubMed] [Google Scholar]
  • 66.Chakraborty R, Weisst KM. 1988. Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci (linkage disequilibrium/genetic epidemiology). PNAS 85(23):9119–23 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Zhang J, Stram DO. 2014. The role of local ancestry adjustment in association studies using admixed populations. Genet. Epidemiol 38(6):502–15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Hoggart CJ, Shriver MD, Kittles RA, Clayton DG, McKeigue PM. 2004. Design and analysis of admixture mapping studies. Am. J. Hum. Genet 74(5):965–78 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Redden DT, Divers J, Vaughan LK, Tiwari HK, Beasley TM, et al. 2006. Regional admixture mapping and structured association testing: conceptual unification and an extensible general linear model. PLOS Genet 2(8):1254–64 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Carty CL, Johnson NA, Hutter CM, Reiner AP, Peters U, et al. 2012. Genome-wide association study of body height in African Americans: The Women’s Health Initiative SNP Health Association Resource (SHARe). Hum. Mol. Genet 21(3):711–20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Coram MA, Duan Q, Hoffmann TJ, Thornton T, Knowles JW, et al. 2013. Genome-wide characterization of shared and distinct genetic components that influence blood lipid levels in ethnically diverse human populations. Am. J. Hum. Genet 92(6):904–16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Reiner AP, Beleza S, Franceschini N, Auer PL, Robinson JG, et al. 2012. Genome-wide association and population genetic analysis of C-reactive protein in African American and Hispanic American women. Am. J. Hum. Genet 91(3):502–12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Grinde KE, Brown LA, Reiner AP, Thornton TA, Browning SR. 2019. Genome-wide significance thresholds for admixture mapping studies. Am. J. Hum. Genet 104(3):454–65 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Gignoux CR, Torgerson DG, Pino-Yanes M, Uricchio LH, Galanter J, et al. 2019. An admixture mapping meta-analysis implicates genetic variation at 18q21 with asthma susceptibility in Latinos. J. Allergy Clin. Immunol 143(3):957–69 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Shetty PB, Tang H, Feng T, Tayo B, Morrison AC, et al. 2015. Variants for HDL-C, LDL-C, and triglycerides identified from admixture mapping and fine-mapping analysis in African American families. Circ. Cardiovasc. Genet 8(1):106–13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Spear ML, Hu D, Pino-Yanes M, Huntsman S, Eng C, et al. 2019. A genome-wide association and admixture mapping study of bronchodilator drug response in African Americans with asthma. Pharmacogenom. J 19(3):249–59 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Qin H, Morris N, Kang SJ, Li M, Tayo B, et al. 2010. Interrogating local population structure for fine mapping in genome-wide association studies. Bioinformatics 26(23):2961–68 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Wang X, Zhu X, Qin H, Cooper RS, Ewens WJ, et al. 2011. Adjustment for local ancestry in genetic association analysis of admixed populations. Bioinformatics 27(5):670–77 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Liu J, Lewinger JP, Gilliland FD, Gauderman WJ, Conti DV. 2013. Confounding and heterogeneity in genetic association studies with admixed populations. Am. J. Epidemiol 177(4):351–60 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, et al. 2010. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet 42(7):565–69 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Zaitlen N, Pasaniuc B, Sankararaman S, Bhatia G, Zhang J, et al. 2014. Leveraging population admixture to characterize the heritability of complex traits. Nat. Genet 46(12):1356–62 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Yao Y, Ochoa A. 2022. Limitations of principal components in quantitative genetic association models for human studies. bioRxiv 10.1101/2022.03.25.485885. 10.1101/2022.03.25.485885 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Zhang Y, Pan W. 2015. Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements? Genet. Epidemiol 39(3):149–55 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Wang K, Hu X, Peng Y. 2013. An analytical comparison of the principal component method and the mixed effects model for association studies in the presence of cryptic relatedness and population stratification. Hum. Hered 76(1):1–9 [DOI] [PubMed] [Google Scholar]
  • 85.Shin J, Lee C. 2015. A mixed model reduces spurious genetic associations produced by population stratification in genome-wide association studies. Genomics 105(4):191–96 [DOI] [PubMed] [Google Scholar]
  • 86.Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, et al. 2018. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet 50(9):1335–41 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Chen H, Wang C, Conomos MP, Stilp AM, Li Z, et al. 2016. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet 98(4):653–66 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet 38(2):203–8 [DOI] [PubMed] [Google Scholar]
  • 89.Zhao K, Aranzana MJ, Kim S, Lister C, Shindo C, et al. 2007. An Arabidopsis example of association mapping in structured samples. PLOS Genet 3(1):71–82 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. 2010. Robust relationship inference in genome-wide association studies. Bioinformatics 26(22):2867–73 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Thornton T, Tang H, Hoffmann TJ, Ochs-Balcom HM, Caan BJ, Risch N. 2012. Estimating kinship in admixed populations. Am. J. Hum. Genet 91(1):122–38 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Conomos MP, Reiner AP, Weir BS, Thornton TA. 2016. Model-free estimation of recent genetic relatedness. Am. J. Hum. Genet 98(1):127–48 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Gogarten SM, Sofer T, Chen H, Yu C, Brody JA, et al. 2019. Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics 35(24):5346–48 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Lin M, Park DS, Zaitlen NA, Henn BM, Gignoux CR. 2021. Admixed populations improve power for variant discovery and portability in genome-wide association studies. Front Genet 12:673167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Shriner D, Adeyemo A, Rotimi CN. 2011. Joint ancestry and association testing in admixed individuals. PLOS Comput. Biol 7(12):e1002325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Skotte L, Jørsboe E, Korneliussen TS, Moltke I, Albrechtsen A. 2019. Ancestry-specific association mapping in admixed populations. Genet. Epidemiol 43(5):506–21 [DOI] [PubMed] [Google Scholar]
  • 97.Bitarello BD, Mathieson I. 2020. Polygenic scores for height in admixed populations. G3 10(11):4027–36 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Wojcik GL, Graff M, Nishimura KK, Tao R, Haessler J, et al. 2019. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570(7762):514–18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Shi H, Gazal S, Kanai M, Koch EM, Schoech AP, et al. 2021. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat. Commun 12:1098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Patel RA, Musharoff SA, Spence JP, Pimentel H, Tcheandjieu C, et al. 2022. Genetic interactions drive heterogeneity in causal variant effect sizes for gene expression and complex traits. Am. J. Hum. Genet 109(7):1286–97 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Hou K, Ding Y, Xu Z, Wu Y, Bhattacharya A, et al. 2022. Causal effects on complex traits are similar across segments of different continental ancestries within admixed individuals. medRxiv 2022.08.16.22278868. 10.1101/2022.08.16.22278868 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Park DS, Eskin I, Kang EY, Gamazon ER, Eng C, et al. 2018. An ancestry-based approach for detecting interactions. Genet. Epidemiol 42(1):49–63 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Aschard H, Gusev A, Brown R, Pasaniuc B. 2015. Leveraging local ancestry to detect genegene interactions in genome-wide data. BMC Genet 16:124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Mathieson I, McVean G. 2012. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet 44(3):243–46 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Mathieson I, McVean G. 2014. Demography and the age of rare variants. PLOS Genet 10(8):e1004528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.O’Connor TD, Kiezun A, Bamshad M, Rich SS, Smith JD, et al. 2013. Fine-scale patterns of population stratification confound rare variant association tests. PLOS ONE 8(7):e65834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Zaidi AA, Mathieson I. 2020. Demographic history mediates the effect of stratification on polygenic scores. eLife 9:e61548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Cannon ME, Duan Q, Wu Y, Zeynalzadeh M, Xu Z, et al. 2017. Trans-ancestry fine mapping and molecular assays identify regulatory variants at the ANGPTL8 HDL-C GWAS locus. G3 7(9):3217–27 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Grinde KE, Qi Q, Thornton TA, Liu S, Shadyab AH, et al. 2019. Generalizing polygenic risk scores from Europeans to Hispanics/Latinos. Genet. Epidemiol 43(1):50–62 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Morris AP. 2011. Transethnic meta-analysis of genome-wide association studies. Genet. Epidemiol 35(8):809–22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Schaid DJ, Chen W, Larson NB. 2018. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet 19:491–504 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Mägi R, Horikoshi M, Sofer T, Mahajan A, Kitajima H, et al. 2017. Trans-ethnic metaregression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum. Mol. Genet 26(18):3639–50 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Turley P, Martin AR, Goldman G, Li H, Kanai M, et al. 2021. Multi-Ancestry Meta-Analysis yields novel genetic discoveries and ancestry-specific associations. bioRxiv 10.1101/2021.04.23.441003. 10.1101/2021.04.23.441003 [DOI] [Google Scholar]
  • 114.Lewis CM, Vassos E. 2020. Polygenic risk scores: from research tools to clinical instruments. Genom. Med 12:44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Cavazos TB, Witte JS. 2021. Inclusion of variants discovered from diverse populations improves polygenic risk score transferability. Hum. Genet. Genom. Adv 2(1):100017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Ge T, Chen CY, Ni Y, Feng YCA, Smoller JW. 2019. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun 10:1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S, et al. 2015. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet 97(4):576–92 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Wang Y, Tsuo K, Kanai M, Neale BM, Martin AR. 2022. Challenges and opportunities for developing more generalizable polygenic risk scores. Annu. Rev. Biomed. Data Sci 5:293–320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Ruan Y, Lin Y-F, Feng Y-CA, Chen C-Y, Lam M, et al. 2022. Improving polygenic prediction in ancestrally diverse populations. Nat. Genet 54(5):573–80 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Weissbrod O, Kanai M, Shi H, Gazal S, Peyrot WJ, et al. 2022. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat. Genet 54(4):450–58 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Marnetto D, Pärna K, Läll K, Molinaro L, Montinaro F, et al. 2020. Ancestry deconvolution and partial polygenic score can improve susceptibility predictions in recently admixed individuals. Nat. Commun 11:1628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Zaitlen N, Paşaniuc B, Gur T, Ziv E, Halperin E. 2010. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet 86(1):23–33 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Li YR, Keating BJ. 2014. Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations. Genome Med 6:91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Mahajan A, Go MJ, Zhang W, Below JE, Gaulton KJ, et al. 2014. Genome-wide transancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet 46(3):234–44 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Spain SL, Barrett JC. 2015. Strategies for fine-mapping complex traits. Hum. Mol. Genet 24(R1):R111–19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Kichaev G, Pasaniuc B. 2015. Leveraging functional-annotation data in trans-ethnic finemapping studies. Am. J. Hum. Genet 97(2):260–71 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.LaPierre N, Taraszka K, Huang H, He R, Hormozdiari F, Eskin E. 2021. Identifying causal variants by fine mapping across multiple studies. PLOS Genet 17(9):e1009733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, et al. 2012. Ancient admixture in human history. Genetics 192(3):1065–93 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Peter BM. 2016. Admixture, population structure, and F-statistics. Genetics 202(4):1485–501 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Gravel S 2012. Population genetics models of local ancestry. Genetics 191(2):607–19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Pool JE, Nielsen R. 2009. Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics 181(2):711–19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Pugach I, Matveyev R, Wollstein A, Kayser M, Stoneking M. 2011. Dating the age of admixture via wavelet transform analysis of genome-wide data. Genome Biol 12(2):R19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Hamid I, Korunes K, Beleza S, Goldberg A. 2021. Rapid adaptation to malaria facilitated by admixture in the human population of Cabo Verde. eLife 10:e63177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Parker K, Horowitz JM, Morin R, Lopez MH. 2015. Multiracial in America: proud, diverse and growing in numbers Rep., Pew Res. Cent., Washington, DC [Google Scholar]
  • 135.Lam M, Chen CY, Li Z, Martin AR, Bryois J, et al. 2019. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet 51(12):1670–78 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.Carlson CS, Matise TC, North KE, Haiman CA, Fesinmeyer MD, et al. 2013. Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study. PLOS Biol 11(9):e1001661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Liu JZ, van Sommeren S, Huang H, Ng SC, Alberts R, et al. 2015. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet 47(9):979–86 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Marigorta UM, Navarro A. 2013. High trans-ethnic replicability of GWAS results implies common causal variants. PLOS Genet 9(6):e1003566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Kuchenbaecker K, Telkar N, Reiker T, Walters RG, Lin K, et al. 2019. The transferability of lipid loci across African, Asian and European cohorts. Nat. Commun 10:4330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Waters KM, Stram DO, Hassanein MT, Le Marchand L, Wilkens LR, et al. 2010. Consistent association of type 2 diabetes risk variants found in Europeans in diverse racial and ethnic groups. PLOS Genet 6(8):e1001078. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES