Abstract
Admixture mapping studies have become more common in recent years, due in part to technological advances and growing international efforts to increase the diversity of genetic studies. However, many open questions remain about appropriate implementation of admixture mapping studies, including how best to control for multiple testing, particularly in the presence of population structure. In this study, we develop a theoretical framework to characterize the correlation of local ancestry and admixture mapping test statistics in admixed populations with contributions from any number of ancestral populations and arbitrary population structure. Based on this framework, we develop an analytical approach for obtaining genome-wide significance thresholds for admixture mapping studies. We validate our approach via analysis of simulated traits with real genotype data for 8,064 unrelated African American and 3,425 Hispanic/Latina women from the Women’s Health Initiative SNP Health Association Resource (WHI SHARe). In an application to these WHI SHARe data, our approach yields genome-wide significant p value thresholds of 2.1 × 10−5 and 4.5 × 10−6 for admixture mapping studies in the African American and Hispanic/Latina cohorts, respectively. Compared to other commonly used multiple testing correction procedures, our method is fast, easy to implement (using our publicly available R package), and controls the family-wise error rate even in structured populations. Importantly, we note that the appropriate admixture mapping significance threshold depends on the number of ancestral populations, generations since admixture, and population structure of the sample; as a result, significance thresholds are not, in general, transferable across studies.
Keywords: family-wise error rate, genetic admixture, genome-wide association, multiple testing, population structure
Introduction
Understanding the genetic causes of human diseases and traits has long been of interest in the scientific community. However, the large majority of the research in this area has been conducted in populations of European descent.1, 2, 3 Admixed populations, such as African Americans and Hispanics/Latinos, are historically underrepresented in genetic studies, yet their mixed and diverse ancestry presents unique opportunities for detecting genetic variants associated with complex traits and diseases.
Due to the processes involved in the inheritance of genetic material, the genomes of admixed individuals are a mosaic of segments with different ancestral origins (Figure 1). This mosaic pattern of locus-specific ancestry, or local ancestry, varies considerably across individuals within an admixed population and proves useful for identifying causal genetic variants via admixture mapping. Admixture mapping studies scan the genomes of admixed individuals for associations between local ancestry and a trait of interest.4, 5, 6 Disease prevalence and trait values often differ across ancestral groups (e.g., asthma,7 prostate cancer,8 blood pressure9), due to a combination of genetic and environmental causes. By looking for associations between a trait and local ancestry, admixture mapping seeks to identify the genetic variants that differ in frequency across these ancestral groups and drive the observed phenotypic differences. In recent years, admixture mapping has become more widely used and has proven to be a powerful approach for localizing causal genetic variants.8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30
A single genome-wide admixture mapping study will typically involve hundreds of thousands or millions of hypothesis tests, and multiple testing correction procedures are needed to control the overall type I error rate. Perhaps the best-known multiple testing correction procedure involves a Bonferroni correction on the total number of hypothesis tests. Although easy to implement, this approach is widely criticized for yielding conservative significance thresholds in the presence of correlated tests. A related approach involves a Bonferroni correction on the estimated effective number of independent tests;13, 19, 31, 32 however, a number of authors33, 34, 35 have suggested that this approach does not always guarantee family-wise error rate control in genome-wide association studies. Permutation-based22, 24, 36, 37 and simulation-based20, 38, 39 multiple testing correction procedures are often considered to be the gold standard for genetic association studies but can be very computationally intensive. Alternatives to these procedures, based on the multivariate normal distribution, have been suggested to speed up computation time.35, 40, 41
A promising alternative to the above-mentioned approaches involves an analytic multiple testing correction.36, 42, 43 In particular, Siegmund and Yakir43 derived the correlation of admixture mapping test statistics and used that theoretical result to provide an analytic approximation to the appropriate significance threshold for admixture mapping studies in admixed populations with two ancestral populations and equal admixture proportions across individuals. However, many admixed populations have more than two ancestral populations (e.g., Hispanics/Latinos) and/or unequal admixture proportions across individuals in the population,44, 45, 46, 47, 48 the latter being a consequence of population structure.49
In this paper, we develop a theoretical framework which applies to admixed populations with any number of ancestral populations or distribution of admixture proportions, and then use that theoretical framework to develop multiple testing correction procedures for admixture mapping studies in admixed populations with population structure. We apply our proposed procedures to genotype data for samples of African American and Hispanic/Latina ancestry from the Women’s Health Initiative SNP Health Association Resource (WHI SHARe). We also perform a simulation study using these WHI SHARe genotype data and simulated traits to validate our theoretical work and evaluate the performance of our approach relative to other commonly used multiple testing correction procedures.
Material and Methods
Admixture Mapping Model
Following previous studies,6, 39, 50 we use marginal regression to perform admixture mapping in samples with unrelated individuals, regressing the trait of interest on inferred local ancestry at each observed locus across the genome. At each locus, we quantify local ancestry as the number of alleles (0, 1, or 2) inherited from each ancestral population at that locus. In an admixed population with K ancestral populations, we characterize the local ancestry for admixed individual i at locus j via the vector , where and the kth component of this vector, aijk, denotes the number of alleles inherited by individual i from ancestral population k at locus j. Similarly, we represent the admixture proportions for each individual via the vector , where and the components of this vector represent the overall (genome-wide) proportion of genetic material inherited by individual i from each ancestral population. To perform admixture mapping, we regress the trait of interest, , on each component of the local ancestry vector at each locus using the marginal regression model
(Equation 1) |
where includes the first components of the vector of admixture proportions. We fit separate regression models for each ancestral group in order to investigate which ancestral population(s) drive the association between the trait and local ancestry at each locus, and we adjust for estimated admixture proportions in all models to account for potential population structure.5, 50 We test for association between the trait and local ancestry using a Wald test, where the test statistic is the ratio of the estimated regression coefficient for the local ancestry term and its standard error , with one test statistic per locus and ancestral component.
Theoretical Framework: Joint Distribution of Admixture Mapping Test Statistics
Our goal is to derive a significance threshold that controls the family-wise error rate, or the probability of making at least one type I error, for a genome-wide admixture mapping study. In other words, we wish to find the genome-wide test statistic threshold such that
for some pre-specified level (e.g., 0.05). To derive this threshold, we must understand the asymptotic joint distribution of our admixture mapping test statistics .
The first step is to characterize the correlation of local ancestry vectors at pairs of loci across the genome. For an admixed population with any number of ancestral populations, generations since admixture, or distribution of admixture proportions across the population, we can show that the correlation of local ancestry vectors at two loci depends on the recombination fraction between the loci , the number of generations since admixture (g), and the population mean (E), variance (V), and covariance (Cov) of the admixture proportions:
We refer to this result as Lemma 1, and a proof is available in Appendix A. Note that if all individuals in the population have the same admixture proportions (i.e., ), then the local ancestry correlation is simply when , as had been shown previously in the context of admixed populations with two ancestral populations.43
Using Lemma 1, it is straightforward to derive the asymptotic joint distribution of our collection of admixture mapping test statistics . For an admixed population with any number of ancestral populations, generations since admixture, or distribution of admixture proportions across the population, we can show that the asymptotic joint distribution of the test statistics can be approximated by a mean zero Gaussian process with covariance (and correlation) given by
where the recombination fraction , generations since admixture (g), and population mean admixture proportions are defined as above. We refer to this result as Theorem 1, and a proof is available in Appendix A. Note that the covariance of test statistics simplifies conveniently when the admixed population has only two ancestral populations :
where is the genetic distance, in centimorgans (cM), between loci ; it follows that the distribution of admixture mapping test statistics can then be approximated by an Ornstein-Uhlenbeck process.
Multiple Testing Correction Procedures
We propose two multiple testing correction procedures which use the asymptotic joint distribution of admixture mapping test statistics provided by Theorem 1 to derive a genome-wide significance threshold that will control the family-wise error rate in admixture mapping studies. Both approaches are implemented in our R package STEAM (Significance Threshold Estimation for Admixture Mapping).
Simulation-Based Approach
To estimate the appropriate genome-wide test statistic threshold for an admixture mapping study, we simulate test statistics from their asymptotic joint distribution (Theorem 1) and choose the threshold that controls the empirical family-wise error rate at the desired level. This approach differs from traditional simulation-based multiple testing approaches in that we simulate test statistics directly, rather than simulating traits and re-calculating test statistics at each locus for each simulation replicate. By simulating test statistics directly, computation time for our multiple testing correction procedure is considerably reduced and, importantly, is independent of sample size. To simulate admixture mapping test statistics from this distribution, we have developed a fast algorithm that requires only the genetic distances between loci, the estimated admixture proportions for individuals in the sample, and an estimate of the parameter g (see Appendix B for details).
Analytic Approximation Approach
An alternative approach for deriving genome-wide significance thresholds in the special case of admixed populations with two ancestral populations was developed previously.43 Siegmund and Yakir43 showed that, under some assumptions, the asymptotic joint distribution of admixture mapping test statistics can be approximated by an Ornstein-Uhlenbeck process, and then used that result to provide an analytic approximation to the family-wise error rate:43
(Equation 2) |
where C is the number of chromosomes analyzed, having total genetic length L cM; is the marker density; and are the cumulative distribution and density functions, respectively, of the standard normal distribution; ; and the function is an infinite sum which can be approximated by . Although this analytic approximation was initially proposed for admixture mapping studies in populations with equal admixture proportions across individuals,43 our work (i.e., Theorem 1) shows that it is also applicable to populations with heterogeneous admixture proportions, provided that the admixture proportions are included as covariates in the regression analysis. As a result, we can use this analytic approximation to find the admixture mapping test statistic threshold that will control the family-wise error rate at the desired level in an admixed population with two ancestral populations and any distribution of admixture proportions: we simply find the value z that sets the right hand side of Equation 2 equal to . This involves an optimization step that can be quickly solved using existing tools (e.g., uniroot in R51). Simulation is not required for this approach, so the significance threshold can be derived in a matter of seconds.
Estimating the Number of Generations since Admixture
Both the analytic approximation and simulation-based multiple testing correction approaches rely on the number of generations since admixture. We can estimate the number of generations since admixture (g) using the observed pattern of local ancestry correlation in our sample, since g determines the rate of decay of this correlation (see Lemma 1). We propose an approach similar to that of Hellenthal et al.,52 where we use non-linear least-squares to find the value of g that provides the best fit to the observed local ancestry correlation curves. We implement this approach in our R package STEAM, along with our multiple testing correction procedures.
Analysis of WHI SHARe Data
We applied our multiple testing correction procedures to two cohorts of admixed individuals with African American and Hispanic/Latina ancestry from the Women’s Health Initiative SNP Health Association Resource (WHI SHARe), and also used these data to perform simulation studies comparing the performance of our proposed multiple testing correction procedures to competing approaches.
The Data
The WHI is a long-term health study of postmenopausal women in the United States. A total of 161,808 postmenopausal women aged 50–79 years old were recruited, including 12,151 self-identified African Americans (AA) and 5,469 self-identified Hispanic Americans (HA) who had consented to genetic research. Study design details and cohort characteristics are described elsewhere.53 A subsample of these women were selected for genotyping, using the Affymetrix Genome-Wide Human SNP Array 6.0 that contains 906,000 single-nucleotide polymorphisms (SNPs) and more than 946,000 probes for the detection of copy-number variants. In these analyses, we focus only on the SNPs. The genotype data were processed for quality control, including call rate, concordance rates for blinded and unblinded duplicates, and sex discrepancy, leaving 871,309 unflagged SNPs with a genotyping rate of 99.8% and 12,008 (8,421 AA and 3,587 HA) women used in the current analysis.14
Local Ancestry Inference
To implement and evaluate our proposed multiple testing correction procedures in the WHI SHARe data, we first needed to infer local ancestry. First, we formed reference panels for local ancestry inference using individuals of European, African, and Native American descent from the International HapMap Project54 (HapMap) and the Human Genome Diversity Project55 (HGDP). In particular, the reference panels for both the AA and HA cohorts included 165 individuals of European descent (HapMap CEU, Utah residents with Northern and Western European ancestry) and 203 individuals of African descent (HapMap YRI, Yoruba in Nigeria), and the HA reference panel additionally included 63 individuals of Native American descent from HGDP. We identified a set of 551,025 and 536,374 SNPs common to the reference panels and the WHI AA and HA samples, respectively. Second, we used an iterative procedure suggested by Conomos et al.56 to identify sets of 8,064 and 3,425 mutually unrelated AA and HA individuals, respectively. Third, we performed phasing and imputation of sporadic missing genotypes using Beagle57 version 3. Genetic distances were estimated using the publicly available HapMap genetic map.58 After these pre-processing steps, we performed local ancestry inference using RFMix59 to estimate the number of alleles inherited from each ancestral population at each locus across the genome.
Application of Multiple Testing Correction Procedures
We implemented the analytic approximation approach in the AA cohort and our test statistic simulation-based approach (with 10,000 replications) in both the AA and HA cohorts. Both approaches require the number of generations since admixture, which we estimated from the observed pattern of local ancestry correlation in these samples using our non-linear least-squares approach described above. Our simulation-based approach additionally requires admixture proportions, which we estimated for each individual using the genome-wide average inferred local ancestry.
Simulation Study Using WHI SHARe Genotypes
To evaluate the performance of our proposed methods, we simulated 10,000 sets of traits for each individual according to the model . We used PLINK v.1.960 to perform admixture mapping in each cohort, adjusting for estimated admixture proportions. We calculated the observed correlation of these tests across simulation replicates to compare to our theoretical result (Theorem 1) and evaluated the empirical family-wise error rate of our methods across the 10,000 simulation replicates. Finally, we compared our approaches to two competing methods: a Bonferroni correction on the total number of hypothesis tests and the trait simulation approach (with 10,000 replicates).
Results
Population Structure and Validation of Theoretical Results in WHI SHARe
The WHI SHARe African American (AA) and Hispanic American (HA) cohorts exhibit considerable heterogeneity in estimated admixture proportions (Figure 2), indicating that the theoretical work of previous authors43 would not be applicable to these samples, even in the case of the AA cohort with just two ancestral populations. However, we do observe that the patterns of local ancestry and test statistic correlation in the WHI SHARe samples are consistent with our new theoretical results (Figure 3). Furthermore, a non-linear least-squares regression on the observed local ancestry curves yields estimates of the generations since admixture for each cohort that are consistent with previously published studies.49, 61, 62, 63, 64
Comparison of Multiple Testing Correction Procedures in WHI SHARe
In the African American cohort, our multiple testing correction procedures yield genome-wide p value thresholds of 2.1 × 10−5 and 2.0 × 10−5 for the test statistic simulation and analytic approximation approaches, respectively. Both thresholds are consistent with the threshold given by the trait simulation approach (see Table 1) and are three orders of magnitude less stringent than the Bonferroni threshold. The empirical family-wise error rate for each approach from a simulation study using simulated traits is reported in Table 2. As expected, the trait simulation approach controls the empirical family-wise error rate exactly at the nominal level 0.05. Our proposed correction procedures also control family-wise error rate at the nominal level, while the Bonferroni correction, as expected, is very conservative.
Table 1.
Bonferroni (# Tests) |
Simulation (10,000 reps) |
Analytic Approximation | ||
---|---|---|---|---|
Traits | Test Statistics | |||
AA | ||||
(95% CI) | ||||
HA | n/a | |||
(95% CI) |
For simulation-based approaches, we also provide a 95% bootstrap confidence interval. Both simulation-based approaches used 10,000 replications. The nominal genome-wide type I error rate () is 0.05.
Table 2.
Bonferroni (# Tests) |
Simulation (10,000 reps) |
Analytic Approximation | ||
---|---|---|---|---|
Traits | Test Statistics | |||
AA | ||||
HA | n/a |
Empirical family-wise error rate was calculated across 10,000 replications of a simulated null trait. The nominal genome-wide type I error rate () is 0.05.
The derived significance thresholds for the Hispanic American cohort are more stringent than those in the African American cohort, reflecting the differences between the two cohorts in terms of the number of ancestral populations, number of generations since admixture, and distribution of admixture proportions. Our test statistic simulation procedure yields a p value threshold of 4.5 × 10−6, which is again consistent with the trait simulation threshold and controls the empirical family-wise error rate at the nominal level (see Tables 1 and 2). As in the African American cohort, the Bonferroni correction yields a significance threshold that is orders of magnitude too conservative.
Computation Time
Computation time differs considerably across the four approaches. The Bonferroni correction can be used to compute the significance threshold nearly instantaneously. The analytic approximation approach is also very quick, taking under half a second on a 12-core 2.4 GHz computer with Intel Xeon E5-2630Lv2 processors and 128 GB of memory. The slowest is the trait simulation approach: for our WHI SHARe analyses, each replicate (which involved running a genome-wide admixture mapping study) took approximately 5 min on the same computer, for a total of more than 800 h of computation time to run all 10,000 replicates. In comparison, our test statistic simulation approach took only a fraction of a second per replicate, amounting to less than 10 min to run all 10,000 replicates in the African American and Hispanic American cohorts.
Discussion
We have developed a theoretical framework to characterize the correlation of local ancestry vectors and admixture mapping test statistics in admixed populations with any number of ancestral populations and distribution of admixture proportions. Our application to data from the Women’s Health Initiative SNP Health Association Resource highlights the importance of this extension, as both the African American and Hispanic American samples display considerable heterogeneity in admixture proportions (Figure 2). Based on these new theoretical results, we show that an existing analytic approximation43 can be used to derive significance thresholds for admixture mapping studies in admixed populations with two ancestral populations, even in the presence of population structure, as long as the admixture mapping model adjusts for admixture proportions. For admixed populations with any number of ancestral populations, we propose an approach that simulates test statistics directly from their asymptotic joint distribution, saving considerable computation time relative to the trait simulation approach, while still yielding an appropriate significance threshold that controls the family-wise error rate.
Our multiple testing correction procedures are based on theoretical work that explicitly models the correlation of admixture mapping test statistics, so are not conservative like the commonly used Bonferroni correction; this will translate to gains in power in genome-wide admixture mapping studies. Compared to the trait simulation approach, our correction procedures yield comparably appropriate significance thresholds but are far less computationally intensive, and we provide an R package for easy implementation. Furthermore, by simulating test statistics directly from their asymptotic distribution, the computation time of our simulation-based multiple testing does not increase with sample size, which will prove useful as future studies are able to recruit larger and larger numbers of individuals. We believe that our approaches provide an attractive alternative for researchers looking to control for multiple testing in genome-wide admixture mapping studies, particularly in admixed populations with population structure.
In this paper, our theoretical work and data analyses have focused on genome-wide admixture mapping studies with quantitative traits and unrelated individuals. However, preliminary analyses indicate that our theoretical work extends easily to binary traits (see Figure S1). In the case of quantitative traits that are heavily skewed (or otherwise depart considerably from normality) larger sample sizes may be needed for asymptotic normality of the test statistics to be achieved; to address this problem, transformations such as rank normalization65, 66 could be considered. The presence of relatedness, accounted for by use of a mixed model,67 should not change the marginal distribution of admixture mapping test statistics, but would likely change their correlation structure. We expect that this will not have a large impact on the appropriate significance threshold, but further investigation is needed to confirm this hypothesis.
Our multiple testing correction procedures require estimates of the admixture proportions for each admixed individual, the number of generations since admixture, and the genetic distance between consecutive loci. To assess sensitivity to the choice of genetic map used to produce these pairwise genetic distances, we implemented STEAM in the WHI African American cohort using both the HapMap genetic map and an African American-specific genetic map.68 Although these maps are quite different in some regions of the genome, we found that they still produce similar estimates of the number of generations since admixture (HapMap: 5.9, African American map: 5.7) and the genome-wide p value threshold (HapMap: 2.1 × 10−5 [95% CI: 1.9 × 10−5, 2.2 × 10−5]; African American map: 2.0 × 10−5 [1.8 × 10−5, 2.3 × 10−5]). Our estimates of the number of generations since admixture (g) may be sensitive to assortative mating or departures from the assumption of a single instantaneous admixture event. Assortative mating can lead to increased variability in admixture proportions across a population,49, 69 which our approach accounts for by allowing these proportions to vary, and may additionally change the pattern of local ancestry correlation in the sample,69 which will impact our estimate of the number of generations since admixture. However, in application to real admixed populations (e.g., WHI SHARe) where departures from the assumption of a single instantaneous admixture event and/or random mating (e.g., due to geographic constraints) are likely, we find that our approach still works well. In estimating the parameter g from observed data using our proposed method, we are able to appropriately capture the correlation structure of admixture mapping test statistics in the sample, which is what is important for estimating an appropriate genome-wide significance threshold.
The p value threshold 5 × 10−8 has become quite widely adopted as a control for multiple testing in genome-wide association studies,38, 70, 71, 72 but there is no such “established” threshold for admixture mapping studies. Even in the specific context of the WHI SHARe genotype data, at least four different genome-wide p value thresholds have been used in published admixture mapping analyses in the African American cohort (including 7 × 10−6,14, 17 1 × 10−5,15 1.5 × 10−5,16 and 1.82 × 10−522), demonstrating the lack of consensus up to this point (even across analyses of the same dataset) on how best to derive significance thresholds for genome-wide admixture mapping studies. In practice, many admixture mapping studies cite the work of other studies (e.g., Tang et al.36) as the basis for their chosen significance threshold. However, our theoretical work and analysis of the African American and Hispanic WHI SHARe cohorts demonstrate that admixture mapping significance thresholds are not necessarily transferable across studies. In particular, the appropriate significance threshold depends on the number of ancestral populations, generations since admixture (to which it is particularly sensitive), population structure (through the distribution of admixture proportions), and density of markers tested, all of which often differ from one study to another. We encourage investigators to take this important point into consideration when choosing a significance threshold for their own genome-wide admixture mapping study.
Declaration of Interests
The authors declare no competing interests.
Acknowledgments
The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts HHSN268201100046C, HHSN268201100001C, HHSN268201100002C, HHSN268201100003C, HHSN268201100004C, and HHSN271201100004C. Funding for WHI SNP Health Association Resource (WHI-SHARe) genotyping was provided by NHLBI contract N02-HL-64278. The authors thank the WHI investigators and staff for their dedication, and the study participants for making the program possible. A listing of WHI investigators can be found at http://www.whi.org/researchers/Documents%20%20Write%20a%20Paper/WHI%20Investigator%20Short%20List.pdf. K.E.G. was supported by the National Science Foundation Graduate Research Fellowship Program under grant no. DGE-1256082. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Published: February 14, 2019
Footnotes
Supplemental Data include one figure and can be found with this article online at https://doi.org/10.1016/j.ajhg.2019.01.008.
Appendix A: Proofs of Theoretical Results
Lemma 1: Local Ancestry Correlation
Consider an admixed population with K ancestral populations, g generations since admixture, and admixture proportions distributed according to , where F has finite first and second moments. Then, the correlation of local ancestry vectors at two loci separated by recombination fraction is given by:
Proof
Consider an admixed population with K ancestral populations and g generations since admixture. Denote the admixture proportions by the vector and let that vector be drawn from a distribution F with finite first and second moments. Let be the local ancestry vector, where and ajk denotes the number of alleles inherited from ancestral population k at locus j. Similarly, let be the parental local ancestry vector, where now and denotes the number of alleles inherited from parent ( for mother and father, respectively) that are derived from ancestral population k at locus j. Consider two loci separated by recombination fraction or, equivalently, genetic distance cM. We wish to derive the correlation of local ancestry vectors at these loci, but first we will consider the correlation of the parental local ancestry vectors .
Note that for the parental local ancestry vector , exactly one of the components takes the value 1 and the other K −1 components must take the value 0. Then, conditional on the admixture proportions , the correlation of components of the parental local ancestry vectors at loci is:
To reduce this further, we must consider two cases: and . First, suppose that . Then, . Second, suppose that . Now, . After simplifying, it follows that
At each locus, we can separate the local ancestry vector into the sum of the parental local ancestry vectors , such that . The parental local ancestry vectors are conditionally independent (conditional on ). Thus, , and the conditional correlation of components of the local ancestry vectors is the same as that of the parental vectors :
We use the laws of total expectation, variance, and covariance to find the marginal correlation of local ancestry vectors:
With a bit of algebra (excluded from this proof for the sake of brevity), it follows that the marginal correlation of components of the local ancestry vectors at loci is
as desired.
Theorem 1: Test Statistic Correlation
Consider an admixed population with K ancestral populations, g generations since admixture, and admixture proportions distributed according to , where F has finite first and second moments. For loci and ancestry components , define test statistics based on the regression model in Equation 1. Then, under the universal null hypothesis , the collection of test statistics has an asymptotic multivariate normal distribution with mean and covariance (and correlation) given by
where , is the recombination fraction between loci , and is the genetic distance (cM) between those loci.
Proof
Consider an admixed population with K ancestral populations and g generations since admixture. Denote the admixture proportions by the vector and let that vector be drawn from a distribution with finite first and second moments. Let be the local ancestry vector, where and denotes the number of alleles inherited from ancestral population at locus . Define Wald test statistics based on the marginal linear regression model (Model 1). Suppose that the universal null hypothesis holds, such that . We must show that the collection of test statistics is asymptotically multivariate normal with mean and covariance as defined above.
It is straightforward to show (e.g., by using the asymptotic equivalence between Wald tests and score tests, combined with existing results about the asymptotic distribution of score tests from such a model35, 40) that the test statistics are asymptotically multivariate normal with mean and covariance matrix with elements . Furthermore, we can show (e.g., as in Joo et al.73) that test statistics from the unadjusted admixture mapping model (Model 1 without ) have covariance . From the adjusted model (Model 1 with ), the covariance of test statistics is simply the correlation of local ancestry conditioned on the covariates :35 . Combining these results with Lemma 1 and the approximation from Siegmund and Yakir,43 we see that asymptotically the test statistics will have covariance defined by, where
as desired.
Corollary
When , since , so the covariance of test statistics simplifies nicely:
and the distribution of test statistics can then be approximated by an Ornstein-Uhlenbeck process, as shown by Siegmund and Yakir.43
Appendix B: Recursive Simulation Algorithm
We propose a simulation-based multiple testing correction approach that simulates admixture mapping test statistics from their asymptotic joint distribution (Theorem 1). To do so, we use the following recursive algorithm:
Here, is the number of ancestral populations, is the number of loci, is a matrix which depends on the first and second moments (which we can estimate using their sample equivalents) of the distribution of admixture proportions in the population, and are scalars which depend on the number of generations since admixture () and the genetic distance (in cM) between consecutive loci .
Others35, 73 have proposed multiple-testing correction procedures that similarly utilize knowledge of the asymptotic distribution of test statistics; however, our approach takes advantage of the specific, convenient form of this distribution for admixture mapping studies to speed up computation time. Note that this algorithm scales linearly with the number of loci , but run time does not depend on the number of samples (except through calculation of the first and second moments of the sample admixture proportions). Computation time can be drastically reduced by considering just a single locus per unique ancestry block (if applicable; not all local ancestry inference programs perform calling within windows). Run time does increase slightly, but not drastically, with increasing number of ancestral populations . Running 10,000 replicates on the WHI SHARe data took approximately 8 and 9 min for the African American () and Hispanic American () samples, respectively. We have implemented this algorithm in our R package STEAM (Significance Threshold Estimation for Admixture Mapping), which is available on GitHub.
Web Resources
Supplemental Data
References
- 1.Need A.C., Goldstein D.B. Next generation disparities in human genomics: concerns and remedies. Trends Genet. 2009;25:489–494. doi: 10.1016/j.tig.2009.09.012. [DOI] [PubMed] [Google Scholar]
- 2.Bustamante C.D., Burchard E.G., De la Vega F.M. Genomics for the world. Nature. 2011;475:163–165. doi: 10.1038/475163a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Popejoy A.B., Fullerton S.M. Genomics is failing on diversity. Nature. 2016;538:161–164. doi: 10.1038/538161a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rife D.C. Populations of hybrid origin as source material for the detection of linkage. Am. J. Hum. Genet. 1954;6:26–33. [PMC free article] [PubMed] [Google Scholar]
- 5.McKeigue P.M. Mapping genes that underlie ethnic differences in disease risk: methods for detecting linkage in admixed populations, by conditioning on parental admixture. Am. J. Hum. Genet. 1998;63:241–251. doi: 10.1086/301908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Shriner D. Overview of admixture mapping. Curr Protoc Hum Genet. 2013;76:1.23.1–1.23.8. doi: 10.1002/0471142905.hg0123s76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Forno E., Celedon J.C. Asthma and ethnic minorities: socioeconomic status and beyond. Curr. Opin. Allergy Clin. Immunol. 2009;9:154–160. doi: 10.1097/aci.0b013e3283292207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Freedman M.L., Haiman C.A., Patterson N., McDonald G.J., Tandon A., Waliszewska A., Penney K., Steen R.G., Ardlie K., John E.M. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc. Natl. Acad. Sci. USA. 2006;103:14068–14073. doi: 10.1073/pnas.0605832103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Chobanian A.V., Bakris G.L., Black H.R., Cushman W.C., Green L.A., Izzo J.L., Jr., Jones D.W., Materson B.J., Oparil S., Wright J.T., Jr., Roccella E.J., Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. National Heart, Lung, and Blood Institute. National High Blood Pressure Education Program Coordinating Committee Seventh report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. Hypertension. 2003;42:1206–1252. doi: 10.1161/01.HYP.0000107251.49515.c2. [DOI] [PubMed] [Google Scholar]
- 10.Zhu X., Luke A., Cooper R.S., Quertermous T., Hanis C., Mosley T., Gu C.C., Tang H., Rao D.C., Risch N., Weder A. Admixture mapping for hypertension loci with genome-scan markers. Nat. Genet. 2005;37:177–181. doi: 10.1038/ng1510. [DOI] [PubMed] [Google Scholar]
- 11.Reich D., Patterson N., Ramesh V., De Jager P.L., McDonald G.J., Tandon A., Choy E., Hu D., Tamraz B., Pawlikowska L., Health, Aging and Body Composition (Health ABC) Study Admixture mapping of an allele affecting interleukin 6 soluble receptor and interleukin 6 levels. Am. J. Hum. Genet. 2007;80:716–726. doi: 10.1086/513206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Winkler C.A., Nelson G.W., Smith M.W. Admixture mapping comes of age. Annu. Rev. Genomics Hum. Genet. 2010;11:65–89. doi: 10.1146/annurev-genom-082509-141523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhu X., Young J.H., Fox E., Keating B.J., Franceschini N., Kang S., Tayo B., Adeyemo A., Sun Y.V., Li Y. Combined admixture mapping and association analysis identifies a novel blood pressure genetic locus on 5p13: contributions from the CARe consortium. Hum. Mol. Genet. 2011;20:2285–2295. doi: 10.1093/hmg/ddr113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Reiner A.P., Beleza S., Franceschini N., Auer P.L., Robinson J.G., Kooperberg C., Peters U., Tang H. Genome-wide association and population genetic analysis of C-reactive protein in African American and Hispanic American women. Am. J. Hum. Genet. 2012;91:502–512. doi: 10.1016/j.ajhg.2012.07.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Carty C.L., Johnson N.A., Hutter C.M., Reiner A.P., Peters U., Tang H., Kooperberg C. Genome-wide association study of body height in African Americans: the Women’s Health Initiative SNP Health Association Resource (SHARe) Hum. Mol. Genet. 2012;21:711–720. doi: 10.1093/hmg/ddr489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ochs-Balcom H.M., Preus L., Wactawski-Wende J., Nie J., Johnson N.A., Zakharia F., Tang H., Carlson C., Carty C., Chen Z. Association of DXA-derived bone mineral density and fat mass with African ancestry. J. Clin. Endocrinol. Metab. 2013;98:E713–E717. doi: 10.1210/jc.2012-3921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Coram M.A., Duan Q., Hoffmann T.J., Thornton T., Knowles J.W., Johnson N.A., Ochs-Balcom H.M., Donlon T.A., Martin L.W., Eaton C.B. Genome-wide characterization of shared and distinct genetic components that influence blood lipid levels in ethnically diverse human populations. Am. J. Hum. Genet. 2013;92:904–916. doi: 10.1016/j.ajhg.2013.04.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Galanter J.M., Gignoux C.R., Torgerson D.G., Roth L.A., Eng C., Oh S.S., Nguyen E.A., Drake K.A., Huntsman S., Hu D. Genome-wide association study and admixture mapping identify different asthma-associated loci in Latinos: the Genes-environments & Admixture in Latino Americans study. J. Allergy Clin. Immunol. 2014;134:295–305. doi: 10.1016/j.jaci.2013.08.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Gomez F., Wang L., Abel H., Zhang Q., Province M.A., Borecki I.B. Admixture mapping of coronary artery calcification in African Americans from the NHLBI family heart study. BMC Genet. 2015;16:42. doi: 10.1186/s12863-015-0196-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Schick U.M., Jain D., Hodonsky C.J., Morrison J.V., Davis J.P., Brown L., Sofer T., Conomos M.P., Schurmann C., McHugh C.P. Genome-wide association study of platelet count identifies ancestry-specific loci in Hispanic/Latino Americans. Am. J. Hum. Genet. 2016;98:229–242. doi: 10.1016/j.ajhg.2015.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Brown L.A., Sofer T., Stilp A.M., Baier L.J., Kramer H.J., Masindova I., Levy D., Hanson R.L., Moncrieft A.E., Redline S. Admixture mapping identifies an Amerindian ancestry locus associated with albuminuria in Hispanics in the United States. J. Am. Soc. Nephrol. 2017;28:2211–2220. doi: 10.1681/ASN.2016091010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Giri A., Hartmann K.E., Aldrich M.C., Ward R.M., Wu J.M., Park A.J., Graff M., Qi L., Nassir R., Wallace R.B. Admixture mapping of pelvic organ prolapse in African Americans from the Women’s Health Initiative Hormone Therapy trial. PLoS ONE. 2017;12:e0178839. doi: 10.1371/journal.pone.0178839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sofer T., Baier L.J., Browning S.R., Thornton T.A., Talavera G.A., Wassertheil-Smoller S., Daviglus M.L., Hanson R., Kobes S., Cooper R.S. Admixture mapping in the Hispanic Community Health Study/Study of Latinos reveals regions of genetic associations with blood pressure traits. PLoS ONE. 2017;12:e0188400. doi: 10.1371/journal.pone.0188400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Giri A., Edwards T.L., Hartmann K.E., Torstenson E.S., Wellons M., Schreiner P.J., Velez Edwards D.R. African genetic ancestry interacts with body mass index to modify risk for uterine fibroids. PLoS Genet. 2017;13:e1006871. doi: 10.1371/journal.pgen.1006871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Cyr D.D., Allen A.S., Du G.J., Ruffin F., Adams C., Thaden J.T., Maskarinec S.A., Souli M., Guo S., Dykxhoorn D.M. Evaluating genetic susceptibility to Staphylococcus aureus bacteremia in African Americans using admixture mapping. Genes Immun. 2017;18:95–99. doi: 10.1038/gene.2017.6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Parra E.J., Mazurek A., Gignoux C.R., Sockell A., Agostino M., Morris A.P., Petty L.E., Hanis C.L., Cox N.J., Valladares-Salgado A. Admixture mapping in two Mexican samples identifies significant associations of locus ancestry with triglyceride levels in the BUD13/ZNF259/APOA5 region and fine mapping points to rs964184 as the main driver of the association signal. PLoS ONE. 2017;12:e0172880. doi: 10.1371/journal.pone.0172880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Uribe-Salazar J.M., Palmer J.R., Haddad S.A., Rosenberg L., Ruiz-Narváez E.A. Admixture mapping and fine-mapping of type 2 diabetes susceptibility loci in African American women. J. Hum. Genet. 2018;63:1109–1117. doi: 10.1038/s10038-018-0503-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wang H., Cade B.E., Sofer T., Sands S.A., Chen H., Browning S., Stilp A.M., Louie T.L., Thornton T.A., Craig Johnson W. Admixture mapping identifies novel loci for obstructive sleep apnea in Hispanic/Latino Americans. Hum. Mol. Genet. 2018 doi: 10.1093/hmg/ddy387. Published online November 7, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Gignoux C.R., Torgerson D.G., Pino-Yanes M., Uricchio L.H., Galanter J., Roth L.A., Eng C., Hu D., Nguyen E.A., Huntsman S. An admixture mapping meta-analysis implicates genetic variation at 18q21 with asthma susceptibility in Latinos. J. Allergy Clin. Immunol. 2018 doi: 10.1016/j.jaci.2016.08.057. S0091-6749(18)31274-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Spear M.L., Hu D., Pino-Yanes M., Huntsman S., Eng C., Levin A.M., Ortega V.E., White M.J., McGarry M.E., Thakur N. A genome-wide association and admixture mapping study of bronchodilator drug response in African Americans with asthma. Pharmacogenomics J. 2018 doi: 10.1038/s41397-018-0042-4. Published online September 12, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Li J., Ji L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity (Edinb) 2005;95:221–227. doi: 10.1038/sj.hdy.6800717. [DOI] [PubMed] [Google Scholar]
- 32.Shriner D., Adeyemo A., Rotimi C.N. Joint ancestry and association testing in admixed individuals. PLoS Comput. Biol. 2011;7:e1002325. doi: 10.1371/journal.pcbi.1002325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Dudbridge F., Koeleman B.P. Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am. J. Hum. Genet. 2004;75:424–435. doi: 10.1086/423738. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Salyakina D., Seaman S.R., Browning B.L., Dudbridge F., Muller-Myhsok B. Evaluation of Nyholt’s procedure for multiple testing correction. Hum. Hered. 2005;60:19–25, discussion 61–62. doi: 10.1159/000087540. [DOI] [PubMed] [Google Scholar]
- 35.Conneely K.N., Boehnke M. So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. Am. J. Hum. Genet. 2007;81:1158–1168. doi: 10.1086/522036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Tang H., Siegmund D.O., Johnson N.A., Romieu I., London S.J. Joint testing of genotype and ancestry association in admixed families. Genet. Epidemiol. 2010;34:783–791. doi: 10.1002/gepi.20520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Qin H., Zhu X. Power comparison of admixture mapping and direct association analysis in genome-wide association studies. Genet. Epidemiol. 2012;36:235–243. doi: 10.1002/gepi.21616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Pe’er I., Yelensky R., Altshuler D., Daly M.J. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 2008;32:381–385. doi: 10.1002/gepi.20303. [DOI] [PubMed] [Google Scholar]
- 39.Brown L. University of Washington; 2016. Statistical Methods in Admixture Mapping: Mixed Model Based Testing and Genome-wide Significance Thresholds. PhD Thesis. [Google Scholar]
- 40.Seaman S.R., Müller-Myhsok B. Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. Am. J. Hum. Genet. 2005;76:399–408. doi: 10.1086/428140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Han B., Kang H.M., Eskin E. Rapid and accurate multiple testing correction and power estimation for millions of correlated markers. PLoS Genet. 2009;5:e1000456. doi: 10.1371/journal.pgen.1000456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Sha Q., Zhang X., Zhu X., Zhang S. Analytical correction for multiple testing in admixture mapping. Hum. Hered. 2006;62:55–63. doi: 10.1159/000096094. [DOI] [PubMed] [Google Scholar]
- 43.Siegmund D., Yakir B. Springer; New York: 2007. The Statistics of Gene Mapping. [Google Scholar]
- 44.Bryc K., Auton A., Nelson M.R., Oksenberg J.R., Hauser S.L., Williams S., Froment A., Bodo J.M., Wambebe C., Tishkoff S.A., Bustamante C.D. Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proc. Natl. Acad. Sci. USA. 2010;107:786–791. doi: 10.1073/pnas.0909559107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Tishkoff S.A., Reed F.A., Friedlaender F.R., Ehret C., Ranciaro A., Froment A., Hirbo J.B., Awomoyi A.A., Bodo J.M., Doumbo O. The genetic structure and history of Africans and African Americans. Science. 2009;324:1035–1044. doi: 10.1126/science.1172257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Conomos M.P., Laurie C.A., Stilp A.M., Gogarten S.M., McHugh C.P., Nelson S.C., Sofer T., Fernández-Rhodes L., Justice A.E., Graff M. Genetic diversity and association studies in US Hispanic/Latino populations: applications in the Hispanic Community Health Study/Study of Latinos. Am. J. Hum. Genet. 2016;98:165–184. doi: 10.1016/j.ajhg.2015.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Parra E.J., Marcini A., Akey J., Martinson J., Batzer M.A., Cooper R., Forrester T., Allison D.B., Deka R., Ferrell R.E., Shriver M.D. Estimating African American admixture proportions by use of population-specific alleles. Am. J. Hum. Genet. 1998;63:1839–1851. doi: 10.1086/302148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Bryc K., Velez C., Karafet T., Moreno-Estrada A., Reynolds A., Auton A., Hammer M., Bustamante C.D., Ostrer H. Colloquium paper: genome-wide patterns of population structure and admixture among Hispanic/Latino populations. Proc. Natl. Acad. Sci. USA. 2010;107(Suppl 2):8954–8961. doi: 10.1073/pnas.0914618107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Bryc K., Durand E.Y., Macpherson J.M., Reich D., Mountain J.L. The genetic ancestry of African Americans, Latinos, and European Americans across the United States. Am. J. Hum. Genet. 2015;96:37–53. doi: 10.1016/j.ajhg.2014.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Redden D.T., Divers J., Vaughan L.K., Tiwari H.K., Beasley T.M., Fernández J.R., Kimberly R.P., Feng R., Padilla M.A., Liu N. Regional admixture mapping and structured association testing: conceptual unification and an extensible general linear model. PLoS Genet. 2006;2:e137. doi: 10.1371/journal.pgen.0020137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.R Core Team . R Foundation for Statistical Computing; Vienna, Austria: 2018. R: A language and environment for statistical computing. [Google Scholar]
- 52.Hellenthal G., Busby G.B.J., Band G., Wilson J.F., Capelli C., Falush D., Myers S. A genetic atlas of human admixture history. Science. 2014;343:747–751. doi: 10.1126/science.1243518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Hays J., Hunt J.R., Hubbell F.A., Anderson G.L., Limacher M., Allen C., Rossouw J.E. The Women’s Health Initiative recruitment methods and results. Ann. Epidemiol. 2003;13(9, Suppl):S18–S77. doi: 10.1016/s1047-2797(03)00042-5. [DOI] [PubMed] [Google Scholar]
- 54.Altshuler D.M., Gibbs R.A., Peltonen L., Altshuler D.M., Gibbs R.A., Peltonen L., Dermitzakis E., Schaffner S.F., Yu F., Peltonen L., International HapMap 3 Consortium Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. doi: 10.1038/nature09298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Cann H.M., de Toma C., Cazes L., Legrand M.F., Morel V., Piouffre L., Bodmer J., Bodmer W.F., Bonne-Tamir B., Cambon-Thomsen A. A human genome diversity cell line panel. Science. 2002;296:261–262. doi: 10.1126/science.296.5566.261b. [DOI] [PubMed] [Google Scholar]
- 56.Conomos M.P., Reiner A.P., Weir B.S., Thornton T.A. Model-free estimation of recent genetic relatedness. Am. J. Hum. Genet. 2016;98:127–148. doi: 10.1016/j.ajhg.2015.11.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Browning S.R., Browning B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 2007;81:1084–1097. doi: 10.1086/521987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.International HapMap Consortium A haplotype map of the human genome. Nature. 2005;437:1299–1320. doi: 10.1038/nature04226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Maples B.K., Gravel S., Kenny E.E., Bustamante C.D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 2013;93:278–288. doi: 10.1016/j.ajhg.2013.06.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Smith M.W., Patterson N., Lautenberger J.A., Truelove A.L., McDonald G.J., Waliszewska A., Kessing B.D., Malasky M.J., Scafe C., Le E. A high-density admixture map for disease gene discovery in african americans. Am. J. Hum. Genet. 2004;74:1001–1013. doi: 10.1086/420856. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Hoggart C.J., Shriver M.D., Kittles R.A., Clayton D.G., McKeigue P.M. Design and analysis of admixture mapping studies. Am. J. Hum. Genet. 2004;74:965–978. doi: 10.1086/420855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Price A.L., Patterson N., Yu F., Cox D.R., Waliszewska A., McDonald G.J., Tandon A., Schirmer C., Neubauer J., Bedoya G. A genomewide admixture map for Latino populations. Am. J. Hum. Genet. 2007;80:1024–1036. doi: 10.1086/518313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Price A.L., Tandon A., Patterson N., Barnes K.C., Rafaels N., Ruczinski I., Beaty T.H., Mathias R., Reich D., Myers S. Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet. 2009;5:e1000519. doi: 10.1371/journal.pgen.1000519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Beasley T.M., Erickson S., Allison D.B. Rank-based inverse normal transformations are increasingly used, but are they merited? Behav. Genet. 2009;39:580–595. doi: 10.1007/s10519-009-9281-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Sofer T., Zheng X., Gogarten S.M., Laurie C.A., Grinde K., Shaffer J.R., Shungin D., O’Connell J.R., Durazo-Arvizo R.A., Raffield L. A fully-adjusted two-stage procedure for rank normalization in genetic association studies. Genet. Epidemiol. 2019:1–13. doi: 10.1002/gepi.22188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Yu J., Pressoir G., Briggs W.H., Vroh Bi I., Yamasaki M., Doebley J.F., McMullen M.D., Gaut B.S., Nielsen D.M., Holland J.B. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 2006;38:203–208. doi: 10.1038/ng1702. [DOI] [PubMed] [Google Scholar]
- 68.Hinch A.G., Tandon A., Patterson N., Song Y., Rohland N., Palmer C.D., Chen G.K., Wang K., Buxbaum S.G., Akylbekova E.L. The landscape of recombination in African Americans. Nature. 2011;476:170–175. doi: 10.1038/nature10336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Zaitlen N., Huntsman S., Hu D., Spear M., Eng C., Oh S.S., White M.J., Mak A., Davis A., Meade K. The effects of migration and assortative mating on admixture linkage disequilibrium. Genetics. 2017;205:375–383. doi: 10.1534/genetics.116.192138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Risch N., Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–1517. doi: 10.1126/science.273.5281.1516. [DOI] [PubMed] [Google Scholar]
- 71.McCarthy M.I., Abecasis G.R., Cardon L.R., Goldstein D.B., Little J., Ioannidis J.P., Hirschhorn J.N. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 2008;9:356–369. doi: 10.1038/nrg2344. [DOI] [PubMed] [Google Scholar]
- 72.Jannot A.S., Ehret G., Perneger T. P < 5 × 10(-8) has emerged as a standard of statistical significance for genome-wide association studies. J. Clin. Epidemiol. 2015;68:460–465. doi: 10.1016/j.jclinepi.2015.01.001. [DOI] [PubMed] [Google Scholar]
- 73.Joo J.W., Hormozdiari F., Han B., Eskin E. Multiple testing correction in linear mixed models. Genome Biol. 2016;17:62. doi: 10.1186/s13059-016-0903-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.