Skip to main content
Molecular Biology and Evolution logoLink to Molecular Biology and Evolution
. 2011 Mar 2;28(8):2231–2237. doi: 10.1093/molbev/msr049

Genetic Variation in Native Americans, Inferred from Latino SNP and Resequencing Data

Jeffrey D Wall 1,*, Rong Jiang 1, Christopher Gignoux 2, Gary K Chen 3, Celeste Eng 2, Scott Huntsman 2, Paul Marjoram 3
PMCID: PMC3144384  PMID: 21368315

Abstract

Analyses of genetic polymorphism data have the potential to be highly informative about the demographic history of Native American populations, but due to a combination of historical and political factors, there are essentially no autosomal sequence polymorphism data from any Native American group. However, there are many resequencing studies involving Latinos, whose genomes contain segments inherited from their Native American ancestors. In this study, we introduce a new method for estimating local ancestry across the genomes of admixed individuals and show how this method, along with dense genotyping and targeted resequencing, can be used to assay genetic variation in ancestral Native American groups. We analyze roughly 6 Mb of resequencing data from 22 Mexican Americans to provide the first large-scale view of sequence level variation in Native Americans. We observe low levels of diversity and high levels of linkage disequilibrium in the Native American–derived sequences, consistent with a recent severe population bottleneck associated with the initial peopling of the Americas. Using two different computational approaches, one novel, we estimate that this bottleneck occurred roughly 12.5 Kya; when uncertainty in the estimation process is taken into account, our results are consistent with archeological estimates for the colonization of the Americas.

Keywords: admixture, human evolution, demographic inference

Introduction

Evolutionary geneticists have long used genetic polymorphism data to make inferences about human demographic history, utilizing restriction site polymorphism surveys (e.g., Cann et al. 1987), microsatellite data (e.g., Rosenberg et al. 2002), single nucleotide polymorphism (SNP) data (e.g., Conrad et al. 2006; International HapMap Consortium 2007), and resequencing data (e.g., Vigilant et al. 1991; Harding et al. 1997; Kaessmann et al. 1999). Resequencing studies, where all sampled individuals are fully sequenced across the target regions, are more informative than SNP or microsatellite-based studies because they provide an unbiased and complete snapshot of both rare and common variants in the study sample. With recent advances in molecular sequencing technology, we now have resequencing data from more than 1,000 genetic regions spanning more than 20 Mb of sequence (e.g., Reich et al. 2001; Crawford et al. 2004; Livingston et al. 2004; Voight et al. 2005; ENCODE Project Consortium 2007; Wall et al. 2008). Although Old World continental groups (i.e., Europeans, Asians, and Africans) are well sampled in these studies, populations indigenous to the Americas are generally not included. In fact, although large-scale SNP (Jakobsson et al. 2008; Li et al. 2008) and microsatellite (Wang et al. 2007) studies have been performed with Native American samples, the amount of autosomal resequencing data generated (from Native American samples) is almost negligible (see, e.g., Hey 2005), and the insights gained from the analyses of Native American resequencing data have been limited. In this paper, we address this knowledge gap by collecting and analyzing genetic data from admixed “Latinos,” who have partial Native American ancestry.

Latinos (also called Hispanics) are considered an ethnic group with a shared cultural heritage spread out over most of the Americas, without regard to race or ancestry. Latinos encompass a mix of European, Native American, and African ancestries, and the relative contributions of these three ancestral continental groups can vary substantially between self-identified Latino subgroups (e.g., between Mexican Americans and Puerto Ricans) and among individuals within the same subgroup (e.g., Salari et al. 2005; Choudhry et al. 2006; Bryc et al. 2010). For example, the estimated proportion of European ancestry in a sample of 181 Mexican controls varied from ∼0% to ∼100% (Choudhry et al. 2006). This heterogeneity is a problem both for evolutionary studies and for genetic association studies in Latinos unless genetic ancestry can be measured and accounted for.

Recently, several methods have been developed to estimate local genetic ancestry in admixed individuals from dense genotype data (e.g., Falush et al. 2003; Tang et al. 2006; Sankararaman et al. 2008; Price et al. 2009; Bryc et al. 2010). Specifically, at each position in the genome, these methods estimate how many copies (0, 1, or 2) were inherited from prespecified ancestral populations (see fig. 1). If the mixing between ancestral populations is recent (e.g., within the last 500 years), then the size of chromosomal “chunks” inherited from one of the ancestral populations is still relatively large (e.g., several megabases long on average) and the methods tend to work reasonably well.

FIG. 1.

FIG. 1.

Schematic showing a pair of chromosomes from an admixed individual with ancestry from different continental populations (shown in black and red). Local ancestry can be inferred by estimating the number of copies inherited from each ancestral population at each location across the genome.

In this paper, we integrate the estimation of local ancestry in admixed individuals with targeted resequencing to obtain sequences directly inherited from the ancestral populations. Specifically, we first use dense genotype data (e.g., from commercially available SNP chips) to estimate continent of origin across the genomes of admixed Mexican American individuals. Then, we analyze resequencing data from parts of the admixed genomes inferred to have been inherited from Native American ancestors. The result is a data set of diploid sequences, all of which were inherited from Native American ancestors within the past 500 years. We focused our study on 22 Mexican Americans from Los Angeles that are part of the NIGMS Human Variation Collection; these individuals have already been sequenced at several hundred genes as part of the ongoing NIEHS SNPs project (Livingston et al. 2004). In total, we analyze roughly 6 Mb of sequence data from 244 genes, roughly 100 times more Native American resequencing data than currently exist in the public domain. We use this data set to address a longstanding question about the demographic history of Native American populations—the timing of the initial founding of the Americas over the Bering land bridge. We use two different computational methods for estimating demographic parameters: a composite likelihood approach that has previously been used to analyze subsets of the NIEHS SNPs data (Plagnol and Wall 2006; Wall et al. 2009) and a novel summary likelihood method that is roughly an order of magnitude faster than the other approach. Although these data are not ideal for demographic inference due to the potential effects of direct or linked selection on patterns of genetic variation, we are analyzing the largest publicly available resequencing data set from Latino individuals, and the NIEHS SNP project data allow us to make a direct comparison with patterns of genetic variation in other ethnic groups.

Materials and Methods

Genotyping

Twenty-two samples from the NIGMS human variation panel of Mexican Americans (Coriell Catalog ID HD100MEX) were genotyped using Affymetrix 6.0 arrays. Genotype calls were made using the birdseed v2 algorithm using default parameters. An additional 40 Latino samples genotyped for a separate project were temporarily included to improve the performance of the base-calling algorithm but removed prior to all other analyses. A list of the sample ID's used is given in supplementary table S2 (Supplementary Material online).

Estimating Local Ancestry

We assume there were two ancestral populations, corresponding to Europeans and Native Americans and utilize a sliding-window composite likelihood approach. At each location across the genome, there are four possible ancestral configurations, corresponding to European versus Native American assignment for the maternal and paternal alleles. One configuration corresponds to the inheritance of two European alleles, another to the inheritance of two Native American alleles, and the remaining two configurations correspond to the inheritance of one European and one Native American allele. In sliding windows of 2 cm, we calculated the likelihood of each ancestral configuration (for each individual separately), assuming

  • i) no change in ancestral configuration across the window

  • ii) each SNP is independent

  • iii) allele frequencies in the ancestral populations can be estimated from publicly available genotype data from European Americans (International HapMap Consortium 2007) and Native Americans (Mesoamerican samples from Mao et al. 2007).

We then tabulated the ancestral configuration with the maximum (composite) likelihood for each window and used majority rule over all windows containing a particular marker to make each ancestry call. For step iii, we implemented the quality control filters suggested by Mao et al. (2007), excluding SNPs with >20% missing data or Hardy–Weinberg equilibrium (HWE) test P values <0.05.

For each sequenced gene in the NIEHS SNP database, we then calculated the ancestral configuration for each of the 22 Mexican American sequences, excluding those where the inferred configuration changes from one end of the gene to the other. To exclude individuals with potential African ancestry, we tabulated for each gene a list of all polymorphisms present in the Yoruba + African American samples but absent in the European + East Asian samples. These “African-specific” SNPs can be used to identify individuals with African ancestry at a particular gene. Specifically, we added up the frequencies of the African-specific SNPs to obtain a rough estimate of the total number of African-specific alleles expected in a sequence with African ancestry—if an African-specific allele is at frequency k, then a randomly sampled African sequence would have a probability of k of having the allele. If there are multiple African-specific SNPs with frequencies k1, … kj, then the expected number of African-specific alleles in a random haploid sequence is k1 + … + kj. For each Latino individual, we excluded the (diploid) sequence at a particular gene if the number of African-specific alleles was greater than 50% of the expectation (for a haploid sequence) calculated above (i.e., closer to the expectation of an individual with one African sequence than to those with no African sequences).

A complete list of the loci used, and the ancestral assignments for each locus, is given in supplementary table S3 (Supplementary Material online). Despite the potential problems of the independence across SNPs assumption, the method performs quite well on simulated data sets—substantially better than Structure (Falush et al. 2003) or LAMP (Sankararaman et al. 2008) and comparable to Hapmix (see table 1).

Table 1.

Comparison of Different Methods for Estimating Local Ancestry.

Method δ Marker-specific accuracy (%)
Our method (unphased data) 0.2 91.0
0.05 92.3
Our method (phased data) 0.2 93.4
0.05 94.5
Hapmix 0.2 96.1
0.05 98.0
LAMP 0.2 84.1
Structure 0.4 72.0

We used only those SNPs with ancestral allele frequencies that differed by at least δ in the two ancestral populations. We calculated the average accuracy of the marker-specific ancestry calls for each method. Note that different methods make different assumptions about phased versus unphased data. See text for further details.

Estimating Local Ancestry with Phase-Known Data

The method described above assumes that phase is unknown in both the ancestral and admixed genotypes. To facilitate comparisons with Hapmix (Price et al. 2009), we also implemented a version of our ancestry estimation algorithm that assumes that phase is known in the admixed individuals’ genotypes. In this alternate implementation, we estimate the local ancestry of each chromosome using the same sliding-window composite likelihood approach but with only two possible ancestral states, corresponding to ancestry from each of the two ancestral populations. Diploid ancestry calls are obtained by a post hoc “adding” of the ancestry calls from each of an individual’s pair of chromosomes.

Comparison Across Methods

We used a standard coalescent simulator (Hudson 2002) to generate five small chromosomes’ worth of sequence data appropriate for multiple continental populations (ms command line: ms 1600 1 –t 2500. –r 30000. 10000000 –I 2 800 800 0. –ej .06 1 2). These simulated data sets had SNP densities comparable to extant genotyping arrays such as the Affymetrix 6.0, and levels of population differentiation similar to what is found between Europeans and Native Americans. We then used the following algorithm to simulate a chromosome with y% inherited from the first population and instantaneous admixture x generations ago:

  1. Choose a random ancestral chromosome (y% probability from the first population, 100 − y% probability from the second population)

  2. Copy this ancestral chromosome for an exponentially distributed distance with mean 100/x centimorgans

  3. Switch to a different ancestral chromosome, chosen as in step 1

  4. Repeat steps 2 and 3 until the end of the chromosome is reached

We generated 400 admixed chromosomes with x = 10 and y = 25 and 50 (200 for each value of y) and randomly paired chromosomes with the same y value to form diploid “individuals.” We then used 50 (diploid) individuals from each of the ancestral populations to estimate ancestral allele frequencies and used each of the four methods to estimate local ancestry across the remaining individuals. For each SNP, the methods estimated the number of copies (i.e., 0, 1, or 2) inherited from population 1. Due to the slow speed and model assumptions of Structure, we further thinned the data (δ, the difference in allele frequency in the two ancestral populations, was required to be ≥0.4) to only include the most informative SNPs. We tabulated the proportion of ancestry calls that were correct across each method.

We also performed a similar comparison using actual genotype data from Chromosome 2 (from Affymetrix 6.0 arrays) from 88 Native Americans and 112 Europeans (Shriver M, unpublished data). We phased the data using BEAGLE (Browning SR and Browning BL 2007), constructed “admixed” individuals using the same algorithm as above, and estimated the accuracy of local ancestry calls using Hapmix (Price et al. 2009) and our composite likelihood method. Our results were similar to the accuracies estimated from simulated data (table 1). For δ = 0.2, Hapmix and our haplotype-based approach had accuracies of 96% and 94%, respectively, whereas our genotype-based approach had an accuracy of 91%.

Population Genetic Analyses

We downloaded all loci using sample population panel 2 from the NIEHS SNPs Web site (http://egp.gs.washington.edu) in November 2009. A total of 244 genes were accessed (supplementary table S3, Supplementary Material online), and we utilized all biallelic polymorphisms (both SNPs and short indels) for our analyses. θW (Watterson 1975) and π (Tajima 1983) were calculated across each locus, adjusting for different sample sizes and missing data. ρ (Hudson 2001) and FST (Hudson et al. 1992) were estimated for each gene with more than ten polymorphisms and averaged across loci. One hundred and sixty-three of the 244 loci had six or more individuals with two Native American–inferred sequences. To construct the 163 loci data set, we sampled the six with the lowest individual number as labeled in supplementary table S2 (Supplementary Material online). In addition, we included six Europeans (Coriell ID's NA11882, NA11994, NA11995, NA12815, NA12891, and NA12892), six East Asians (Coriell ID's NA18526, NA18545, NA18562, NA18566, NA18609, and NA18621), and six West Africans (Coriell ID's NA18502, NA18504, NA18870, NA19153, NA19201, and NA19223) to ensure equal sampling from each continental region.

Estimation of Demographic Parameters

We used two different likelihood-based approaches for estimating demographic parameters from the Native American–inferred sequences. The first method uses a composite likelihood method used before in other contexts (Plagnol and Wall 2006; Wall et al. 2009). We started with a simple demographic model (fig. 2) roughly appropriate for the history of the East Asian and Native American samples: a panmictic ancestral population splits at time T into two daughter populations. One daughter population experiences a 1,000-year long population bottleneck, leading to a b-fold reduction in population size, ending at time tb. Then, at time tg (≤tb), that population experiences exponential growth, leading to a 100-fold increase in population size at the present.

FIG. 2.

FIG. 2.

Diagram of the demographic model used, with estimates and 95% confidence intervals (in parentheses) for T, the time when the two populations split; tg, the time of onset of population growth,; tb, the time since the end of the population bottleneck; and b, the strength of the bottleneck. Parameter estimates, along with approximate 95% confidence intervals in parentheses, are given to the right of the figure. Method 1 is the composite likelihood method described in Plagnol and Wall (2006), and method 2 is a summary likelihood method described in the Materials and Methods.

To estimate the model parameters, we summarized the data using several summary statistics and then calculated the (composite) likelihood of the summarized data on a grid of parameter values. The composite likelihood was estimated using modifications of the ancestral recombination graph (ARG) simulator ms (Hudson 2002). See Plagnol and Wall (2006) for further details.

Summary statistics were divided into two categories. The first category of summary statistics divided SNPs at a locus into four categories: private SNPs in population 1, private SNPs in population 2, shared SNPs with minor allele frequency (MAF) in the total sample ≤0.1, and shared SNPs with MAF >0.1. We label these summaries s1, s2, s3, and s4, respectively. For each branch of the ARG, all mutations on this branch will belong to a single category, so we can estimate probabilities f1, f2, f3, f4 that a particular SNP will fall into one of the four categories defined above. Our likelihoods here condition on the total number of SNPs s (= s1 + s2 + s3 + s4) at a locus. Conditional on the ARG and s, the distribution of (s1, s2, s3, s4) is multinomial and can be estimated explicitly by averaging over the computed probabilities for each simulated ARG. The second category included Tajima’s (1989) D from each population, Fu and Li’s (1993) D* in population 2, and FST (Hudson et al. 1992) between the two populations. Both D and D* are measures of the frequency spectrum, whereas FST measures the level of divergence between populations. For each parameter combination, we estimated the joint likelihood of these statistics by fitting the data to a multivariate normal distribution. Coalescent simulations were used to estimate the vector of means and the covariance matrix.

Even though these two sets of summary statistics are correlated, we cannot estimate their joint distribution. So, we estimated a composite likelihood approximation by assuming that the two categories of summary statistics are independent. We calculated composite likelihoods separately for each locus and then multiplied them together to obtain the overall (composite) likelihood of the data. We calculated point estimates for each parameter value, as well as approximate 95% confidence intervals, with a log-likelihood cutoff of 2.8 estimated from simulations (results not shown).

We also implemented a much quicker summary likelihood approach for estimating demographic parameters. We utilized the same demographic model as before (fig. 2) and used common summary statistics θW (Watterson 1975), D (Tajima 1989), FST (Hudson et al. 1992), and ρ^ (Hudson 2001) to estimate the four model parameters. Specifically, we ran coalescent simulations (Hudson 2002) and a rejection sampling algorithm to estimate the likelihood of obtaining the observed mean values (across loci) of θW, D, FST, and ρ^, as a function of the model parameters Θ = {T, b, tb, tg}. We then obtained a composite likelihood by assuming that the summary statistics used are independent of each other.

We assumed an average generation time of 25 years. For each parameter combination Θ = {T, b, tb, tg}, we ran 32,600 coalescent simulations, comprising 200 simulations with the same number of base pairs sequenced and total distance (from one end of the sequence to the other) for each of the 163 actual loci. We considered increments of 2.5 thousand years for T, tb, and tg and increments of 5–10 for b (5 if b ≤ 70, 10 otherwise). θ and ρ per base pair (for each simulation) were drawn from gamma distributions with parameters (8, 14700) and (0.5, 1850), respectively. These distributions, though ad hoc, reproduce the observed means and variances of θW, D, and ρ^ in the East Asian sample. We then calculated θW, D, FST, and ρ^ for each simulation, repeatedly subsampled 163 simulated loci and estimated Pr (|sample mean − actual mean | < 0.01 × actual mean | Θ) for each summary. Note that b, tb, and tg depend exclusively on θW, D, and ρ^, respectively, in the Native American (simulated or real) data. This simplifies some of the calculations.

For individual parameters, we used profile likelihood curves to calculate approximate 95% confidence intervals. Final calculations for the maximum likelihood estimate and confidence intervals were obtained using five times more simulations than described above for particular combinations of Θ.

Results and Discussion

First, we genotyped the 22 samples using the Affymetrix 6.0 platform. We then used this genotype data to estimate the continent of origin along the chromosomes of each genome in our sample. We assumed there were two ancestral populations, corresponding to Europeans and Native Americans and estimated allele frequencies in the ancestral populations from publicly available genotype data (International HapMap Consortium 2007; Mao et al. 2007). For each marker, we used a composite likelihood approach (see Materials and Methods) to estimate the most likely ancestral configuration (i.e., two European alleles, one European, and one Native American alleles or two Native American alleles). This approach runs quickly (several minutes to estimate local ancestry across the whole genome of an admixed individual on a standard desktop computer), and simulations suggest that it is substantially more accurate for estimating local ancestry than two commonly used programs that accept unphased data, Structure (Falush et al. 2003) and LAMP (Sankararaman et al. 2008). If phase is known, the accuracy of our composite likelihood method is slightly worse than that of Hapmix (Price et al. 2009) (e.g., 93.4% for our method vs. 96.1% for Hapmix with δ = 0.2; cf. table 1). Because genotypic phase is generally not experimentally determined, the results across methods are not directly comparable. Structure, LAMP, and the genotype version of our method use unphased genotype data from the ancestral and admixed populations, whereas the Hapmix runs used phased data from the ancestral populations and unphased data from the admixed population, and the haplotype version of our method uses phased data from the admixed population and unphased data from the ancestral populations.

For the remainder of our analyses, we stayed away from using local ancestry programs that require phased data (i.e., Hapmix and our haplotype-based local ancestry estimation) to avoid compounding ancestry estimation error with phasing error. For each of 244 genes sequenced as part of the NIEHS SNPs project, we tabulated the estimated continental ancestry of each diploid sequence, excluding all sequences with evidence of African ancestry or with ambiguous ancestral assignments (see Materials and Methods). We then analyzed subsets of the data consisting of sequences with the same ancestral configuration.

To test the accuracy of our ancestry inference, we compared patterns of genetic variation in European individuals and Mexican American individuals inferred to have two European-derived sequences. The two sets of samples show similar levels of genetic variation (Watterson 1975; Tajima 1983) and linkage disequilibrium (LD) (Hudson 2001) (fig. 3). In addition, there were no systematic differences in allele frequencies (mean FST = 0.001) between the two sets of samples, consistent with observed levels of population structure in different European populations (e.g., Novembre et al. 2008). From these and other observations, we conclude that the European-inferred sequences really were derived from European ancestors within the last several hundred years.

FIG. 3.

FIG. 3.

Plot of diversity θ (= 4, where N is the effective population size and μ is the mutation rate per base pair per generation) versus estimated recombination rate ρ (= 4Nr, where r is the recombination rate per base pair per generation) for individuals with different continental ancestries. The two blue diamonds refer to a European sample and a Mexican American sample with two European-derived sequences.

Next, we examined the relative numbers of individuals assigned to each of the three possible ancestral configurations for each gene. If mating were random with respect to genetic ancestry, the relative proportions are expected to be in HWE. Instead, we observe a significant deficit (16% less than expected) of individuals with mixed continental ancestry (i.e., one European and one Native American alleles). This could be a result of assortative mating with a trait that correlates with ancestry estimates, such as physical appearance or socioeconomic status, or a sign of ongoing immigration from a source population with a different average genetic ancestry from the current Latino population in Los Angeles. To explore the two potential explanations further, we estimated local ancestry in 23 pairs of Mexican American parents from HapMap phase 3 trio data. We found a significant correlation (P < 0.05) between the estimated Native American ancestry of the father and the estimated Native American ancestry of the mother (supplementary fig. S1, Supplementary Material online), suggesting that assortative mating is a significant factor in our observed deficit of individuals with mixed continental ancestry.

We then compared levels of genetic variation and LD in the NIEHS SNPs database for ethnic groups defined either by self-identity or our inference method (fig. 3). As with previous studies of human sequence variation (e.g., Voight et al. 2005; Wall et al. 2008), we find that sub-Saharan Africans have substantially more variation and less LD than do non-African populations. Additionally, for non-admixed populations, we observe a trend of decreasing diversity and increasing LD with increasing distance away from Africa, consistent with the serial bottleneck model of recent human evolution (Ramachandran et al. 2005).

To control for any possible biases associated with sample size, we reanalyzed a subset of our data consisting of six (inferred) Native American individuals, six East Asian individuals, six European individuals, and six West African individuals from 163 of the 244 loci. (The remaining loci had fewer than six individuals with both gene copies inferred to be of Native American ancestry.) We observed the same trends as before with increasing LD and decreasing diversity for the European, Asian, and Native American sequences, respectively. Interestingly, all four population samples show comparable numbers of polymorphisms shared across multiple continental regions, and the differences in overall levels of diversity are mostly explained by differences in the number of private alleles in each continental sample (see supplementary table S1, Supplementary Material online).

We then used two different likelihood-based methods on the 163 locus data set to estimate historical demographic parameters for the inferred Native American sequences (fig. 2). Previous archeological and linguistic studies suggest that humans first entered the Americas across the Bering land bridge and then migrated southwards to North and South America (e.g., Greenberg et al. 1986; Goebel 1999). It is likely that there was a significant population bottleneck associated with the initial founding of the Americas, though the timing of this bottleneck is disputed (e.g., Nichols 1990; Nettle 1999; Fiedel 2000; Hey 2005). Our main interest is in using the patterns of genetic variation to estimate the timing and strength of this bottleneck. Both methods estimate that the bottleneck ended roughly 12.5 Kya (tb, fig. 2), roughly consistent with the age (∼14 Kya) of the oldest undisputed New World archaeological site at Monte Verde, Chile (Meltzer 1997; Fiedel 2000). The estimated 95% confidence intervals for tb are 3–16 and 0–36 Kya for the two methods (see fig. 2 and Materials and Methods). The former suggests that an early occupation of the Americas (>30 Kya, cf. Nichols 1990) is unlikely.

In general, the first method (cf., Plagnol and Wall 2006 and Materials and Methods) has tighter confidence intervals and estimates a stronger bottleneck and a more recent split time than the second method does. We speculate that the difficulties in precisely estimating parameter values (in both methods) are due to the small sample sizes from each population or to heterogeneity within the Native American–inferred sequences (i.e., population structure within the Native American ancestors of our Latino samples). The demographic model considered is obviously a simplification of the truth, and additional studies with more Latino samples will be needed to obtain more precise parameter estimates or to address more complex questions, such as the number of different major migrations from North Asia into the Americas or the degree of structure within populations from the Americas.

Supplementary Material

Supplementary tables S1S3 and supplementary figure S1 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

Supplementary Data

Acknowledgments

This work was funded by National Institutes of Health grant 1R01HG004049-01A2 to J.D.W. and P.M.

References

  1. Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering. Am J Hum Genet. 2007;81:1084–1097. doi: 10.1086/521987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bryc K, Velez C, Karafet T, Moreno-Estrada A, Reynolds A, Auton A, Hammer M, Bustamante CD, Ostrer H. Genome-wide patterns of population structure and admixture among Hispanic/Latino populations. Proc Natl Acad Sci U S A. 2010;107:8954–8961. doi: 10.1073/pnas.0914618107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Cann RL, Stoneking M, Wilson AC. Mitochondrial DNA and human evolution. Nature. 1987;325:31–36. doi: 10.1038/325031a0. [DOI] [PubMed] [Google Scholar]
  4. Choudhry S, Coyle NE, Tang H, et al. (26 co-authors) Population stratification confounds genetic association studies among Latinos. Hum Genet. 2006;118:652–654. doi: 10.1007/s00439-005-0071-3. [DOI] [PubMed] [Google Scholar]
  5. Conrad DF, Jakobsson M, Coop G, Wen X, Wall JD, Rosenberg NA, Pritchard JK. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nat Genet. 2006;38:1251–1260. doi: 10.1038/ng1911. [DOI] [PubMed] [Google Scholar]
  6. Crawford DC, Bhangale T, Li N, Hellenthal G, Rieder MJ, Nickerson DA, Stephens M. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat Genet. 2004;36:700–706. doi: 10.1038/ng1376. [DOI] [PubMed] [Google Scholar]
  7. ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. doi: 10.1038/nature05874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Falush D, Stephens M, Prithcard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003;164:1567–1587. doi: 10.1093/genetics/164.4.1567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Fiedel SJ. The peopling of the New World: present evidence, new theories and future directions. J Archaeol Res. 2000;8:39–103. [Google Scholar]
  10. Fu YX, Li WH. Statistical tests of neutrality of mutations. Genetics. 1993;133:693–709. doi: 10.1093/genetics/133.3.693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Goebel T. Pleistocene human colonization of Siberia and peopling of the Americas: an ecological approach. Evol Anthropol. 1999;8:208–227. [Google Scholar]
  12. Greenberg JH, Turner CG, Zegura SL. The settlement of the Americas: a comparison of the linguistic, dental and genetic evidence. Curr Anthropol. 1986;27:477–497. [Google Scholar]
  13. Harding RM, Fullerton SM, Griffiths RC, Bond J, Cox MJ, Schneider JA, Moulin DS, Clegg JB. Archaic African and Asian lineages in the genetic ancestry of modern humans. Am J Hum Genet. 1997;60:772–789. [PMC free article] [PubMed] [Google Scholar]
  14. Hey J. On the number of New World founders: a population genetic portrait of the peopling of the Americas. PLoS Biol. 2005;3:e193. doi: 10.1371/journal.pbio.0030193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hudson RR. Two-locus sampling distributions and their application. Genetics. 2001;159:1805–1817. doi: 10.1093/genetics/159.4.1805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Hudson RR. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics. 2002;18:337–338. doi: 10.1093/bioinformatics/18.2.337. [DOI] [PubMed] [Google Scholar]
  17. Hudson RR, Slatkin M, Maddison WP. Estimation of levels of gene flow from DNA sequence data. Genetics. 1992;132:583–589. doi: 10.1093/genetics/132.2.583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. doi: 10.1038/nature06258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Jakobsson M, Scholz SW, Scheet P, et al. (24 co-authors) Genotype, haplotype and copy-number variation in worldwide human populations. Nature. 2008;451:998–1003. doi: 10.1038/nature06742. [DOI] [PubMed] [Google Scholar]
  20. Kaessmann H, Heissig F, von Haeseler A, Pääbo S. DNA sequence variation in a non-coding region of low recombination on the human X chromosome. Nat Genet. 1999;22:78–81. doi: 10.1038/8785. [DOI] [PubMed] [Google Scholar]
  21. Li JZ, Absher DM, Tang H, et al. (11 co-authors) Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008;319:1100–1104. doi: 10.1126/science.1153717. [DOI] [PubMed] [Google Scholar]
  22. Livingston RJ, von Niederhausern A, Jegga AG, Crawford DC, Carlson CS, Rieder MJ, Gowrisankar S, Aronow BJ, Weiss RB, Nickerson DA. Pattern of sequence variation across 213 environmental response genes. Genome Res. 2004;14:1821–1831. doi: 10.1101/gr.2730004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Mao X, Bigham AW, Mei R, et al. (12 co-authors) A genomewide admixture mapping panel for Hispanic/Latino populations. Am J Hum Genet. 2007;80:1171–1178. doi: 10.1086/518564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Meltzer DJ. Monte Verde and the Pleistocene peopling of the Americas. Science. 1997;276:754–755. [Google Scholar]
  25. Nettle D. Linguistic diversity of the Americas can be reconciled with a recent colonization. Proc Natl Acad Sci U S A. 1999;96:3325–3329. doi: 10.1073/pnas.96.6.3325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Nichols J. Linguistic diversity and the first settlement of the New World. Language. 1990;66:475–521. [Google Scholar]
  27. Novembre J, Johnson T, Bryc K, et al. (12 co-authors) Genes mirror geography within Europe. Nature. 2008;456:98–101. doi: 10.1038/nature07331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Plagnol V, Wall JD. Possible ancestral structure in human populations. PLoS Genet. 2006;2:e105. doi: 10.1371/journal.pgen.0020105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Price AL, Tandon A, Patterson N, Barnes KC, Rafaels N, Ruczinski I, Beaty TH, Mathias R, Reich D, Myers S. Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet. 2009;5:e1000519. doi: 10.1371/journal.pgen.1000519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, Cavalli-Sforza LL. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci U S A. 2005;102:15942–15947. doi: 10.1073/pnas.0507611102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Reich DE, Cargill M, Bolk S, et al. (11 co-authors) Linkage disequilibrium in the human genome. Nature. 2001;411:199–204. doi: 10.1038/35075590. [DOI] [PubMed] [Google Scholar]
  32. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW. Genetic structure of human populations. Science. 298:2381–2385. doi: 10.1126/science.1078311. [DOI] [PubMed] [Google Scholar]
  33. Salari K, Choudhry S, Tang H, et al. (23 co-authors) Genetic admixture and asthma-related phenotypes in Mexican American and Puerto Rican asthmatics. Genet Epidemiol. 2005;29:76–86. doi: 10.1002/gepi.20079. [DOI] [PubMed] [Google Scholar]
  34. Sankararaman S, Sridhar S, Kimmel G, Halperin E. Estimating local ancestry in admixed populations. Am J Hum Genet. 2008;82:290–303. doi: 10.1016/j.ajhg.2007.09.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Tajima F. Evolutionary relationship of DNA sequences in finite populations. Genetics. 1983;105:437–460. doi: 10.1093/genetics/105.2.437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123:585–595. doi: 10.1093/genetics/123.3.585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Tang H, Coram M, Wang P, Zhu X, Risch N. Reconstructing genetic ancestry blocks in admixed individuals. Am J Hum Genet. 2006;79:1–12. doi: 10.1086/504302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Vigilant L, Stoneking M, Harpending H, Hawkes K, Wilson AC. African populations and the evolution of human mitochondrial DNA. Science. 1991;253:1503–1507. doi: 10.1126/science.1840702. [DOI] [PubMed] [Google Scholar]
  39. Voight BF, Adams AM, Frisse LA, Qian Y, Hudson RR, Di Rienzo A. Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes. Proc Natl Acad Sci U S A. 2005;102:18508–18513. doi: 10.1073/pnas.0507325102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Wall JD, Cox MP, Mendez FL, Woerner A, Severson T, Hammer MF. A novel DNA sequence database for analyzing human demographic history. Genome Res. 2008;18:1354–1361. doi: 10.1101/gr.075630.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Wall JD, Lohmueller KE, Plagnol V. Detecting ancient admixture and estimating demographic parameters in multiple human populations. Mol Biol Evol. 2009;26:1823–1827. doi: 10.1093/molbev/msp096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Wang S, Lewis CM, Jakobsson M, et al. Genetic variation and population structure in native Americans. PLoS Genet. 2007;3:e185. doi: 10.1371/journal.pgen.0030185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Watterson GA. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975;7:256–276. doi: 10.1016/0040-5809(75)90020-9. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES