Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2014 Oct 2;95(4):394–407. doi: 10.1016/j.ajhg.2014.09.002

Widespread Signals of Convergent Adaptation to High Altitude in Asia and America

Matthieu Foll 1,2,5,, Oscar E Gaggiotti 3,4, Josephine T Daub 1,2, Alexandra Vatsiou 4, Laurent Excoffier 1,2
PMCID: PMC4185124  PMID: 25262650

Abstract

Living at high altitude is one of the most difficult challenges that humans had to cope with during their evolution. Whereas several genomic studies have revealed some of the genetic bases of adaptations in Tibetan, Andean, and Ethiopian populations, relatively little evidence of convergent evolution to altitude in different continents has accumulated. This lack of evidence can be due to truly different evolutionary responses, but it can also be due to the low power of former studies that have mainly focused on populations from a single geographical region or performed separate analyses on multiple pairs of populations to avoid problems linked to shared histories between some populations. We introduce here a hierarchical Bayesian method to detect local adaptation that can deal with complex demographic histories. Our method can identify selection occurring at different scales, as well as convergent adaptation in different regions. We apply our approach to the analysis of a large SNP data set from low- and high-altitude human populations from America and Asia. The simultaneous analysis of these two geographic areas allows us to identify several candidate genome regions for altitudinal selection, and we show that convergent evolution among continents has been quite common. In addition to identifying several genes and biological processes involved in high-altitude adaptation, we identify two specific biological pathways that could have evolved in both continents to counter toxic effects induced by hypoxia.

Introduction

Distinguishing between neutral and selected molecular variation has been a long-standing interest of population geneticists. This interest was fostered by the publication of Kimura’s seminal paper1 on the neutral theory of molecular evolution. Although the controversy rests mainly on the relative importance of genetic drift and selection as explanatory processes for the observed biodiversity patterns, another important question concerns the prevalent form of natural selection. Kimura1 argued that the main selective mechanism was negative selection against deleterious mutations. However, an alternative point of view emphasizes the prevalence of positive selection, the mechanism that can lead to local adaptation and eventually to speciation.2,3

A powerful approach to uncover positive selection is the study of mechanisms underlying convergent evolution. When different populations or evolutionary lineages are exposed to similar environments, positive selection should indeed lead to similar phenotypic features. Convergent evolution can be achieved through similar genetic changes (sometimes called “parallel evolution”) at different levels: the same mutation appearing independently in different populations, the same existing mutation being recruited by selection in different populations, or the involvement of different mutations in the same genes or the same biological pathways in separate populations.4 However, existing statistical genetic methods are not well adapted to the study of convergent evolution when data sets consist in multiple contrasts of populations living in different environments.5 The current strategy is to carry out independent genome scans in each geographic region and to look for overlaps between loci or pathways that are identified as outliers in different regions.6 Furthermore, studies are often split into a series of pairwise analyses that consider sets of populations inhabiting different environments. Whereas this strategy has the advantage of not requiring the modeling of complex demographic histories,7,8 it often ignores the correlation in gene frequencies between geographical regions when correcting for multiple tests.9 As a consequence, current approaches are restricted to the comparison of lists of candidate SNPs or genomic regions obtained from multiple pairwise comparisons. This suboptimal approach might also result in a global loss of power as compared to a single global analysis and thus to a possible underestimation of the genome-wide prevalence of convergent adaptation.

One particularly important example where this type of problem arises is in the study of local adaptation to high altitude in humans. Human populations living at high altitude need to cope with one of the most stressful environments in the world, to which they are likely to have developed specific adaptations. The harsh conditions associated with high altitude include not only low oxygen partial pressure, referred to as high-altitude hypoxia, but also other factors like low temperatures, arid climate, high solar radiation, and low soil quality. While some of these stresses can be buffered by behavioral and cultural adjustments, important physiological changes have been identified in populations living at high altitude (see below). Recently, genomic advances have unveiled the first genetic bases of these physiological changes in Tibetan, Andean, and Ethiopian populations.10–19 The study of convergent or independent adaptation to altitude is of primary interest,11,19,20 but this problem has been superficially addressed so far, because most studies focused on populations from a single geographical region.10,13,14,16–19

Several candidate genes for adaptation to altitude have nevertheless been clearly identified,21,22 the most prominent ones being involved in the hypoxia inducible factor (HIF) pathway, which plays a major role in response to hypoxia.23 In Andeans, VEGFA (vascular endothelial growth factor A, MIM 192240), PRKAA1 (protein kinase, AMP-activated, alpha 1 catalytic subunit, MIM 602739), and NOS2A (nitric oxide synthase 2A, MIM 163730) are the best-supported candidates, as well as EGLN1 (egl-9 family hypoxia-inducible factor 1, MIM 606425), a downregulator of some HIF targets.12,24 In Tibetans,10,11,13,14,16,25 the HIF pathway gene EPAS1 (endothelial PAS domain protein 1, MIM 603349) and EGLN1 have been repeatedly identified.22 Recently, three similar studies that focused on Ethiopian highlanders17–19 suggested the involvement of HIF genes other than those identified in Tibetans and Andeans, with BHLHE41 (MIM 606200), THRB (MIM 190160), RORA (MIM 600825), and ARNT2 (MIM 606036) being the most prominent candidates.

However, there is little overlap in the list of significant genes in these three regions,18,19 with perhaps the exception of alcohol dehydrogenase genes identified in two out of the three analyses. Another exception is EGLN1: a comparative analysis of Tibetan and Andean populations12 concluded that “the Tibetan and Andean patterns of genetic adaptation are largely distinct from one another,” identifying a single gene (EGLN1) under convergent evolution, but with both populations exhibiting a distinct dominant haplotype around this gene. This limited convergence does not contradict available physiological data, as Tibetans exhibit some phenotypic traits that are not found in Andeans.26 For example, Tibetan populations show lower hemoglobin concentration and oxygen saturation than Andean populations at the same altitude.27 Andeans and Tibetans also differ in their hypoxic ventilatory response, birth weight, and pulmonary hypertension.28 Finally, EGLN1 has also been identified as a candidate gene in Kubachians, a high altitude (∼2,000 m above sea level) Daghestani population from the Caucasus,15 as well as in Indians.29

Nevertheless, it is still possible that the small number of genes under convergent evolution is due to a lack of power of genome scan methods done on separate pairs of populations. In order to overcome these difficulties, we introduce here a Bayesian genome scan method that (1) extends the F-model30,31 to the case of a hierarchically subdivided population consisting of several migrant pools, and (2) explicitly includes a convergent selection model. We apply this approach to find genes, genomic regions, and biological pathways that have responded to convergent selection in the Himalayas and in the Andes.

Material and methods

Hierarchical Bayesian Model

One of the most widely used statistics for the comparison of allele frequencies among populations is FST,32,33 and most studies cited in the introduction used it to compare low- and high-altitude populations within a given geographical region (Tibet, the Andes, or Ethiopia). Several methods have been proposed to detect loci under selection from FST, and one of the most powerful approaches is based on the F-model (reviewed by Gaggiotti and Foll34). However, this approach assumes a simple island model where populations exchange individuals through a unique pool of migrants. This assumption is strongly violated when dealing with replicated pairs of populations across different regions, which can lead to a high rate of false positives.35

In order to relax the rather unrealistic assumption of a unique and common pool of migrants for all sampled populations, we extended the genome scan method first introduced by Beaumont and Balding30 and later improved by Foll and Gaggiotti.31 More precisely, we posit that our data come from G groups (migrant pools or geographic regions), each group g containing Jg populations. We then describe the genetic structure by a F-model that assumes that allele frequencies at locus i in population j from group g, pijg={pijg1,pijg2,,pijgKi} (where Ki is the number of distinct alleles at locus i), follow a Dirichlet distribution parameterized with group-specific allele frequencies pig={pig1,pig2,,pigKi} and with FSCijg coefficients measuring the extent of genetic differentiation of population j relative to group g at locus i. Similarly, at a higher group level, we consider an additional F-model where allele frequencies pig follow a Dirichlet distribution parameterized with meta-population allele frequencies pi={pi1,pi2,,piKi} and with FCTig coefficients measuring the extent of genetic differentiation of group g relative to the meta-population as a whole at locus i. Figure S1 (available online) shows the hierarchical structure of our model in the case of three groups (G=3) and four populations per group (J1=J2=J3=4) and Figure S2 shows the corresponding nonhierarchical F-model for the same number of populations. All the parameters of the hierarchical model can be estimated by further assuming that alleles in each population j are sampled from a multinomial distribution.36 These assumptions lead to an expression for the probability of observed allele counts aijg={aijg1,aijg2,,aijgKi}:

Pr(aijg|pijg,pig,pi,θijg,ϕig)=Pr(aijg|pijg)Pr(pijg|pig,θijg)Pr(pig|pi,ϕig)

where Pr(aijg|pijg) is the multinomial likelihood, Pr(pijg|pig,θijg) and Pr(pig|pi,ϕig) are Dirichlet prior distributions, θijg=1/FSCijg1, and ϕig=1/FCTig1. This expression can be simplified by integrating over pijg so as to obtain:

Pr(aijg|pig,pi,θijg,ϕig)=Pr(aijg|pig,θijg)Pr(pig|pi,ϕig)

where Pr(aijg|pig,θijg) is the multinomial-Dirichlet distribution.34 The likelihood is obtained by multiplying across loci, regions, and population

L(pig,pi,θijg,ϕig)=i=1Ig=1Gj=1JgPr(aijg|pig,pi,θijg,ϕig).

Using this model, we incorporate potential deviation from the genome wide F-statistics at each locus as in Beaumont and Balding.30 The genetic differentiation within each group g is:

log(FSCijg1FSCijg)=αig+βjg (Equation 1)

where αig is a locus-specific component of FSCijg shared by all populations in group g, and βjg is a population-specific component shared by all loci. Similarly, we decompose the genetic differentiation at the group level under a logistic model as:

log(FCTig1FCTig)=Ai+Bg (Equation 2)

where Ai is a locus-specific component of FCTig shared by all groups in the meta-population, and Bg is a group-specific component shared by all loci.

By doing this, our model also eliminates the ambiguity of having a single αi parameter for more than two populations, because we now have (1) different selection parameters in each geographic region (αig are group specific) and (2) separate parameters sensitive to adaptation among regions at the larger scale (Ai). We use the likelihood function and the logistic decomposition to derive the full Bayesian posterior:

Pr(pig,pi,Aig,Bjg,αi,βg|A)L(pig,pi,θijg,ϕig)Pr(pig|pi,αi,βg)Pr(pi)Pr(Aig)Pr(Bjg)Pr(αi)Pr(βg)

where the prior for pi is a noninformative Dirichlet distribution, the priors for αig and Ai are Gaussian with mean 0 and variance 1, and the priors for βjg and Bg are Gaussian with mean −1 and variance 1. Note that priors on βjg and Bg have practically no influence on the posteriors because these parameters use the huge amount of information coming from all loci.

Parameter Estimation

We extend the Reversible Jump Markov Chain Monte Carlo (RJ-MCMC) approach proposed by Foll and Gaggiotti31 to identify selection both within groups and among groups. For each locus and in each group separately, we consider a neutral model where αig=0 and a model with selection where the locus-specific effect αig0. Similarly, we consider two models at the group level for each locus where Ai=0 for the neutral model and Ai0 for the model with selection. In order to tailor our approach to study convergent adaptation, we also consider the case where different groups share a unique locus-specific component αi (see Figure 1 for an example of such a model with two groups of two populations). At each iteration of the MCMC algorithm, we update Ai and αig in a randomly chosen group g for all loci. As described in Foll and Gaggiotti,31 we propose to remove αig from the model if it is currently present or to add it if it is not, and we do the same for Ai. We also add a specific Reversible Jump proposal for convergent adaptation: if all groups but one are currently in the selection model (αig0 for all g but one), we propose with a probability 0.5 to move to the convergent evolution model (where we replace all αig by a single selection parameter αi shared by all groups), and with a probability 0.5 we perform a standard jump as described above.

Figure 1.

Figure 1

Hierarchical F-Model for the High-Altitude Data Analyzed

Directed acyclic graph describing the Bayesian formulation of the hierarchical F-model at a given locus i. Square nodes represent data and circles represent model parameters to be estimated. Dashed circles represent population allele frequencies, which are analytically integrated using a Dirichlet-multinomial distribution (see Material and Methods description). Lines between the nodes represent direct stochastic relationships within the model. With the exception of Figure 4, we use the same color codes in all figures, with blue for Asia, red for America, and yellow for convergent adaptation.

Genomic Data Set

In order to improve our understanding of the genetic bases of adaptation to altitude, we have applied our hierarchical Bayesian method to the data set from Bigham et al.12 This data set consists of 906,600 SNPs genotyped in four populations using the Affymetrix Genome-Wide Human SNP Array 6.0 platform (see Web Resources). These four populations consist of two populations living at high altitude in the Andes (49 individuals) and in Tibet (49 individuals), as well as two lowland related populations from Central America (39 Mesoamericans) and East Asia (90 individuals from the International HapMap Project37). Thus, we compared four alternative models for each locus at the population level: (1) a neutral model (αi1=αi2=0), (2) a model with selection acting only in Tibetans (αi2=0), (3) a model with selection acting only in Andeans (αi1=0), and (4) a convergent adaptation model with selection acting in both Tibetans and Andeans (αi1=αi2=αi). We estimate the posterior probability that a locus is under selection by summing up the posterior probabilities of the three nonneutral models (2, 3, and 4) and we control for False Discovery Rate (FDR) by calculating associated q values,38–40 which are a Bayesian analogs of p values taking into account multiple testing. For a given SNP, a q value corresponds to the expected FDR if its posterior probability is used as a significance threshold. We do not pay any particular attention to the Ai parameter here, as it can be interpreted as a potential adaptation at the continental level in Asians and Native Americans, which is not directly relevant in the context of adaptation to high altitude (but see Discussion).

We excluded SNPs with a global minor allele frequency below 5% to avoid potential biases due to uninformative polymorphisms.41 This left us with 632,344 SNPs that were analyzed using the hierarchical F-model described above. We identified genomic regions potentially involved in high altitude adaptation by using a sliding windows approach. We considered windows of 500 kb, with a shifting increment of 25 kb at each step. The average number of SNPs per window over the whole genome was 121.4 (sd = 44.6), after discarding any window containing less than 50 SNPs. We considered a window as a candidate target for selection if the 5% quantile of the q values in the window was lower than 0.01, and we merged overlapping significant windows into larger regions.

Detecting Polygenic Convergent Adaptation

We first used SNPs identified as being under convergent adaptation to perform classical enrichment tests for pathways using Panther (see Web Resources)42 and Gene Ontology (GO)43 using String 9.1 (see Web Resources).44 More specifically, we extracted the list of 302 genes within 10 kb of all SNPs assigned to the convergent adaptation model and showing a q value below 10%, to serve as input for these tests.

These two approaches have limited power to detect selection acting on polygenic traits, for which adaptive events might have arisen from standing variation rather than from new mutations.3,25 In order to detect polygenic adaptation, we used a recent gene set enrichment approach,45 which tests whether the distribution of a statistic computed across genes of a given gene set is significantly different from the rest of the genome. As opposed to the classical enrichment tests, this method does not rely on an arbitrary threshold to define the top outliers and it uses all genes that include at least one tested SNP. In short, we tested more than 1,300 gene sets listed in the Biosystems database46 for a significant shift in their distribution of selection scores relative to the baseline genome-wide distribution. In our case, the selection score of each SNP is its associated q value of convergent selection. As previously done,45 we calculated the sum of gene scores for each gene set and compared it to a null distribution of random sets (N = 500,000) to infer its significance (see “Gene set enrichment analysis method” section in the Appendix). In order to avoid any redundancy between gene sets, we iteratively removed genes belonging to the most significant gene sets from the less significant gene sets before testing them again in a process called “pruning.” This process leads to a list of gene sets whose significance is obtained by testing completely nonoverlapping lists of genes. See the Appendix for a more detailed description of the method.

Independent SNP Simulations

In order to evaluate the performance of our hierarchical method, we simulated data with features similar to the genomic data set analyzed here under our hierarchical F-model. Our simulated scenario thus includes two groups of two populations made of 50 diploids each, with FSC=0.02 for all four populations and FCT=0.08 for both groups. Note that these F-statistics correspond to those measured on the genomic data set we have analyzed here. In each group, a fraction of loci are under selective pressure in one of the two populations only. We simulated a total of 100,000 independent SNPs among which (1) 2,500 are under weak convergent evolution with αi=3, (2) 2,500 are under stronger convergent evolution with αi=5, (3) 2,500 are under weak selection in the first group with αi1=3 and neutral (αi2=0) in the second group, (4) 2,500 are under stronger selection in the first group with αi1=5 and neutral (αi2=0) in the second group, and (5) 90,000 remaining SNPs that are completely neutral (αi1=αi2=0). As in the real data, we conditioned the SNPs to have a global minor allele frequency above 5%. We analyzed this simulated data set by using three different approaches: (1) the hierarchical F-model introduced above, (2) two separate pairwise analyses (one for each group) containing two populations using the original F-model implemented in BayeScan31 (see Web Resources), and (3) a single analysis containing the four populations using the original F-model implemented in BayeScan31 ignoring the hierarchical structure of the populations. In our hierarchal model, the best selection model for each SNP was identified as described above using a q value < 0.01. When analyzing data in separate pairs of populations, we considered a SNP to be under convergent adaptation when it had a q value < 0.01 in the two regions.

Haplotype-Based Simulations and Statistics

Several alternative methods exist to detect natural selection. In particular, methods based on haplotype structure47–51 have been widely applied to identify local adaptation to high altitude in humans (including the data set from Bigham et al.12 we are using here). In order to compare the performance of our approach with haplotype-based methods (see below), we have simulated haplotypic data sets with features similar to the genomic data set analyzed here. We used the SimuPop package for Python52 (see Web Resources) and considered a scenario where an ancestral population gives rise to two descendant populations, which after 600 generations (15,000 years) undergo separate splits into two populations, one at sea level and the other at high altitude. After the second split, populations evolve for 200 generations (5,000 years) until the present time. This evolutionary scenario is supposed to approximate the divergence of Asian and Ameridian population followed by a subsequent divergence of highland and lowland population in Asia and in America, even though this history might have been more complex.53,54 We assume that there is no migration between populations, and we adjusted population sizes so that FSC=0.02 for all four populations and FCT=0.08 for both groups, to have F-statistics values comparable to the observed data set. More precisely, we used Ne = 10,000 for the ancestral population, Ne = 4,000 for the two descendant populations and Ne = 3,500 for each of the four populations after the second split. Recombination rate was set to 10−8 ( = 1cM/Mb) and the mutation rate to 1.2 × 10−855. We considered a strong selection scenario (Ns = 100) and a moderate selection scenario (Ns = 10), with positive selection operating only in high-altitude populations right after the second split. We simulated 1,500 genomic regions each with 101 SNPs spaced every 4 kb, of which (1) 1,000 were neutral, (2) 250 were under moderate convergent evolution (Ns = 10 in the two high-altitude populations), and (3) 250 were under strong convergent evolution (Ns = 100 in the two high-altitude populations). For selected regions, selection operates on the SNP located at the center of the genomic region (i.e., SNP 50). We generated data sets that differed in the initial allele frequency (IAF) of the selected variant: (1) IAF = 0.001, (2) IAF = 0.01, and (3) IAF = 0.1. At the end of the simulations, we sampled 50 individuals from each population and analyzed the resulting data set using different approaches. We used two commonly used statistics describing the pattern of long-range homozygosity: the integrated haplotype score iHS based on the decay of haplotype homozygosity with recombination distance48 and the cross-population extended haplotype homozygosity (XP-EHH), which contrasts the evidence of positive selection between two populations,49 and which is therefore particularly well suited to our case. Overall, we thus compared four different approaches: (1) the hierarchical F-model introduced above, (2) two separate pairwise analyses (one for each group) containing two populations using the original F-model implemented in BayeScan, (3) two separate pairwise analyses (one for each group) containing two population using XP-EHH, and (4) two separate analyses of the high-altitude populations using iHS. We used receiver operating characteristic (ROC) curves and the area under the curve (AUC) to compare the performance of the four approaches as implemented in the R package pROC.56 Except for the hierarchical F-model introduced above, none of these approaches can explicitly model convergent evolution, and convergent adaptation is only inferred after separate analyses when significance is reached in the two regions at the same time.

Results

Patterns of Selection at the SNP Level

Using our hierarchical Bayesian analysis, we identified 1,159 SNPs potentially under selection at the 1% FDR level (q value < 0.01). For each SNP, we identified the model of selection (selection only in Asia, selection only in South America, or convergent selection; see Material and Methods) with the highest posterior probability. With this procedure, 362 SNPs (31%) were found under convergent adaptation, whereas 611 SNPs (53%) were found under selection only in Asia, and 186 SNPs (16%) only under selection in South America. These results suggest that convergent adaptation is more common than previously thought,5,24,57 even at the SNP level, but consistent with results of a recent literature meta-analysis over several species.5

In order to evaluate the additional power gained with the simultaneous analysis of the four populations, we performed separate analyses in the two continents using the nonhierarchical F-model.31 These two pairwise comparisons identified 160 SNPs under selection in the Andes, and 940 in Tibet. The overlap in significant SNPs between these two separate analyses and that under the hierarchical model is shown in Figure 2A. Interestingly, only 6 SNPs are found under selection in both regions when performing separate analyses in Asians and Amerindians. This very limited overlap persists even if we strongly relax the FDR in both pairwise analyses: at the 10% FDR level, only 13 SNPs are found under selection in both continents. These results are consistent with those of Bigham et al.,12 who analyzed both continents separately with a different statistical method based on FST, and who found only 22 significant SNPs in common between the two geographic regions. It suggests that the use of intersecting sets of SNPs found significant in separate analyses is a suboptimal strategy to study the genome-wide importance of convergent adaptation. Interestingly, 15% of the SNPs (162 SNPs, see Figure 2A) identified as under selection by our method are not identified by any separate analyses, suggesting a net gain in power for our method to detect genes under selection (as confirmed by our simulation studies below).

Figure 2.

Figure 2

Overlap of Candidate SNPs under Selection in Asia and in America

Venn diagrams showing the overlap of SNPs potentially under selection in Asia and in America at a 1% FDR.

(A) Overlap between all SNPs found under any type of selection using our hierarchical model introduced here (green) with those found in separate analyses performed in Asia (blue) and in America (red).

(B) Overlap between SNPs found under convergent selection using our hierarchical model (yellow) with those found in separate analyses performed in Asia (blue) and in America (red).

We examined in more detail the 362 SNPs identified as under convergent adaptation. The overlap of these SNPs (the yellow circle) with those identified by the two separate analyzes is shown in Figure 2B. As expected, the 6 SNPs identified under selection in both regions by the two separate analyses are part of the convergent adaptation set. However, we note that 272 of the SNPs in the convergent adaptation set (75%) are identified as being under selection in only one of the two regions by the separate analyses. This suggests that although natural selection might be operating similarly in both regions, limited sample size might prevent its detection in one of the two continents.

Genomic Regions under Selection

Using a sliding window approach, we find 25 candidate regions with length ranging from 500 kb to 2 Mb (Figure 3; Table S1). Among these, 18 regions contain at least one significant SNP assigned to the convergent adaptation model, and 11 regions contained at least one 500 kb window where the convergent adaptation model was the most frequently assigned selection model among significant SNPs (Figure 3). Contrastingly, Bigham et al.12 identified 14 and 37 candidate 1 Mb regions for selection in Tibetans and Andeans, respectively, but none of these 1 Mb regions were shared between Asians and Amerindians. Moreover, only two of the regions previously found under positive selection in South America and only four in Asia overlap with our 25 significant regions.

Figure 3.

Figure 3

Manhattan Plot of q Values for Loci Potentially under Altitudinal Selection in Asian and Amerindian Populations

Each dot represents the 5% quantile of the SNPs q values in a 500 kb window. Windows are shifted by increment of 25 kb and considered as a candidate target for selection if the 5% quantile is lower than 0.01 (horizontal dashed line). Overlapping significant windows are merged into 25 larger regions (indicated by gray vertical bars, see Table S2). Significant windows are colored in yellow when they contain at least one significant SNP for convergent adaptation. Otherwise they are colored according to the most represented model of selection identified among the SNPs they contain: blue for selection only in Asia and red for selection only in America. We also report the names of genes discussed in the text.

As noted above, the only gene showing signs of convergent evolution found by Bigham et al.12 is EGLN1, which has also been identified in several other studies (see Table 1 in Simonson et al.22 for a review). EGLN1 is also present in one of our 25 regions where three out of eight significant SNPs are assigned to the convergent adaptation model. We note that the significant SNPs in this region are not found in EGLN1 directly but in two genes surrounding it (TRIM67 [MIM 610584] and DISC1 [MIM 605210]), as reported earlier.14,58 The HIF pathway gene EPAS1, which is the top candidate in many studies,22 is also present in one of our 25 regions, where 28 of the 80 significant SNPs are assigned to the convergent adaptation model. Recently a particular 5-SNP EPAS1 haplotype has been identified in Tibetans as being the result of introgression from Denisovans.54 Unfortunately none of the five SNPs of interest identified in this study are present in our data set, and additional sequencing will be required to check whether this haplotype is also present in Andeans.

Table 1.

Result of the FST-Based Simulated Data Analyses

SNP Category Selection Parameters Number of SNPs Hierarchical Model
Two Separate BayeScan Analyses
Single BayeScan
Neutral Selection
Neutral Selection
Neutral Selection
Convergent Group 1 Group 2 Convergent Group 1 Group 2
Convergent αi=3 2500 1824 306 148 222 1891 47 235 327 2127 373
αi=5 2500 550 1515 188 247 634 643 554 669 1048 1452
Group 1 αi1=3αi2=0 2500 2206 19 275 0 2214 0 285 1 2367 133
αi1=5αi2=0 2500 1308 65 1127 0 1307 0 1192 1 1860 640
Neutral αi1=αi2=0 90,000 89971 0 4 25 89,970 0 4 26 88,861 1139
All 100,000 95859 1905 1742 494 96,016 690 2270 1024 96,263 3737

We report for different methods and selection strengths the number of SNPs found to be neutral or under selection at a FDR of 1%.

Out of the 1,159 SNPs we identified above as being under one model of selection, 312 are located within our 25 regions (Table S1) where 120 of them are identified as under convergent adaptation (out of a total of 362 SNPs identified as under convergent adaptation in the whole data set, see Figure 2B). Almost all the 18 regions containing at least one significant SNP assigned to the convergent adaptation model also contain SNPs where the best-supported model is selection only in Asia, or selection only in America. However it is hard to distinguish if this reflects both convergent adaptation and region specific adaptation in the same genomic region or simply different statistical power.

Polygenic Convergent Adaptation

We identified three pathways significantly enriched for genes involved in convergent adaptation using Panther42 after Bonferroni correction at the 5% level. These are the “metabotropic glutamate receptor group I” pathway, the “muscarinic acetylcholine receptor 1 and 3” signaling pathway, and the “epidermal growth factor receptor” (EGFR, MIM 131550) signaling pathway. Using the String 9.1 database,44 two GO terms were significantly enriched for these genes when controlling for a 5% FDR: “ethanol oxidation” (GO:0006069) and “positive regulation of transmission of nerve impulse” (GO:0051971). Using a recent and more powerful gene set enrichment approach,45 we first identified 25 gene sets with an associated q value below 5% (Table S2). An enrichment map showing these sets and their overlap is presented in Figure 4. There are two big clusters of overlapping gene sets, one related to fatty acid oxidation with “fatty acid omega oxidation” as the most significant set and another immune system related cluster with “interferon gamma signaling” as the most significant gene set. After pruning, only these two above-mentioned gene sets are left with a q value below 5%. It is worth noting that the “fatty acid omega oxidation” pathway, which is the most significant gene set (q value < 10−6), contains many top scoring genes for convergent selection, including several alcohol and aldehyde dehydrogenases, as listed in Table S3. Interestingly, the GO term “ethanol oxidation” is no longer significant after excluding the genes involved in the “fatty acid omega oxidation” pathway. Out of the 362 SNPs identified under convergent adaptation above, only four are located in genes (±50 kb) belonging to the fatty acid omega oxidation pathway (rs3805322, rs2051428, rs4767944, rs4346023), and only seven SNPs are found in genes belonging to the interferon gamma signaling pathway (rs12531711, rs7105122, rs4237544, rs10742805, rs17198158, rs4147730, rs3115628). This apparent lack of significant SNPs in candidate pathways is expected, because our gene set enrichment approach does not rely on an arbitrary threshold to define the top outliers and is thus more suited to detect lower levels of selection acting synergistically on polygenic traits.

Figure 4.

Figure 4

Gene Sets Enriched for Signals of Convergent Adaptation

The 25 nodes represent gene sets with q value < 0.05. The size of a node is proportional to the number of genes in a gene set. The node color scale represents gene set p values. Edges represent mutual overlap: nodes are connected if one of the sets has at least 33% of its genes in common with the other gene set. The widths of the edges scale with the similarity between nodes.

Power of the Hierarchical F-Model

Our simulations show a net increase in power to detect selection using the global hierarchical approach as compared to using two separate pairwise analyses (Table 1; Figure 5 and 6). For the 2,500 SNPs simulated under the weak convergent selection model (αi=3), the hierarchical model detects 6.5 times more SNPs than the two separate analyses (306 versus 47). This difference can be explained by the smaller amount of information used when doing separate analyses instead of a single one. The power greatly increases when selection is stronger, and among the 2,500 SNPs simulated with αi=5, 1,515 are correctly classified using our hierarchical model, as compared to only 643 using separate analyses. Similarly to what we found with the real altitude data, the two separate analyses often wrongly classify the convergent SNPs correctly identified as such by our hierarchical method as being under selection in only one of the two groups, but sometimes also as completely neutral (64 such SNPs when αi=3 and 76 when αi=5, see Figures 5B and 5D). We note that the hierarchical model is also more powerful at detecting selected loci regardless of whether or not the SNPs are correctly assigned to the convergent evolution set. Indeed, our method identifies 2,626 SNPs as being under any model of selection (i.e., convergent evolution or in only one of the two regions) among the 5,000 simulated under convergent selection, whereas the separate analysis detects only 2,475 SNPs. When selection is present only in one of the two groups (αi1=3 or 5 and αi2=0), the power of the hierarchical model is comparable with the separate analysis in the corresponding group, implying that there is no penalty in using the hierarchical model even in presence of group-specific selection. A few of the group-specific selected SNPs are wrongly classified in the convergent adaptation model with a false positive rate of 1.7% (84 SNPs out of 5,000). Overall, the false discovery rate is well calibrated using our q value threshold of 0.01 in both cases, with 29 false positives out of 4,141 significant SNPs (FDR = 0.70%) for our hierarchical model, and 30 false positives out of 3,984 significant SNPs (FDR = 0.75%) for the two separate analyses. Finally, when the four populations are analyzed together without accounting for the hierarchical structure, a large number of false positives appears (Table 1; Figure 6C) in keeping with previous studies.35 Under this island model, 1,139 neutral SNPs are indeed identified as being under selection among the 90,000 simulated neutral SNPs (versus 29 and 30 using the hierarchical method or two separate analyses, respectively). The nonhierarchical approach does not allow one to distinguish different models of selection, but among the 10,000 SNPs simulated under different types of selection, only 2,598 are significant. This shows that the nonhierarchical analysis leads to both a reduced power, and a very large false discovery rate (FDR = 30.4%) in presence of a hierarchical genetic structure.

Figure 5.

Figure 5

Overlap of Significant SNPs for Simulated Convergent Evolution

Venn diagrams showing the overlap of SNPs simulated under a convergent evolution model and identified under selection at a 1% FDR.

(A and C) Overlap between SNPs found under any type of selection using our hierarchical model introduced here (green) with those found in separate analyses performed in group 1 (blue) and in group 2 (red).

(B and D) Overlap between SNPs found under convergent using our hierarchical model (yellow) with those found in separate analyses performed in group 1 (blue) and in group 2 (red). In (A) and (B), 2,500 SNPs are simulated under weak convergent selection (αi=3), whereas in (C) and (D) 2,500 SNPs are simulated under stronger convergent selection (αi=5).

Figure 6.

Figure 6

Power to Detect Loci under Selection as a Function of Their Effect on Population Differentiation

For simulated SNPs, we plot the best selection model inferred (A) under our hierarchical F-model, (B) using two separate analyses of pairs of populations, and (C) under a nonhierarchical F model performed on four populations, thus ignoring the underlying hierarchical population structure. The colors indicate the inferred model: convergent evolution (yellow), selection only in the first group (blue), selection only in the second group (red), and no selection (black). Note that we use purple in (C), as this approach does not allow one to distinguish between different models of selection. For better visualization, we only plot 10,000 neutral loci among the 90,000 simulated, but the missing data show a very similar pattern.

Our haplotype-based simulations also show that our hierarchical model has generally a much higher performance than iHS and XP-EHH to detect convergent adaptation (Table 2). iHS has very low power to detect selection in all scenarios tested here, whereas XP-EHH performs well (AUC = 0.75) when IAF = 0.1 and selection is moderate (Ns = 10), and very well (AUC = 0.94) when IAF = 0.001 and selection is strong (Ns = 100). ROC curves for these two cases are presented in Figure 7, and for the four other cases in Figure S3. In both of these cases, however, the hierarchical model has a higher performance (AUC = 0.92 and AUC = 0.999, respectively), and it also shows a high performance (AUC = 0.92 and AUC = 0.94) in the two other scenarios with strong selection (Ns = 100) where XP-EHH has almost no power to detect convergent adaptation (AUC = 0.57 and AUC = 0.50 respectively, see Table 2). Interestingly, for the case where IAF = 0.1 and Ns = 10, XP-EHH has a slightly higher performance to detect selection in each region individually than the F-model as implemented in BayeScan (AUC = 0.75 versus AUC = 0.71, respectively), but our hierarchical model outperforms these two approaches drastically (AUC = 0.92, Figure 7A). Note that XP-EHH performs somewhat better than the other methods in one scenario (Ns = 10, IAF = 0.01), but its performance (UAC = 0.60) is not particularly good and all methods seem to have problems to detect convergent adaptation in this case. Overall, our analyses confirm that the use of separate analyses results in reduced power to detect convergent adaptation, which explains the difference between results obtained using our and previous methods when detecting high-altitude adaptation in humans. The ROC analysis also shows that using a less stringent cutoff in separate analyses is far from being as powerful as our hierarchical model.

Table 2.

Performance of Different Methods to Detect Convergent Adaptation in the Case of Haplotype-Based Simulated Data Sets

Selection Strength Initial Allele Frequency (IAF) AUC
iHS XP-EHH BayeScan Hierarchical Model
Ns = 10 (moderate) 0.001 0.52 0.54 0.57 0.59
0.01 0.52 0.60 0.55 0.54
0.1 0.54 0.75 0.71 0.92
Ns = 100 (strong) 0.001 0.94 0.996 0.999
0.01 0.58 0.57 0.88 0.94
0.1 0.51 0.50 0.90 0.92

For the cases Ns = 100 and IAF = 0.001, iHS could not be computed. AUR, area under the ROC curve (see Figure 7).

Figure 7.

Figure 7

Haplotype-Based Simulation ROC Curves

ROC curves summarizing the relative performance of our hierarchical model, BayeScan, and XP-EHH to detect convergent adaptation for simulated scenarios when (A) IAF = 0.1 and Ns = 10 and (B) IAF = 0.001 and Ns = 100 (see also Table 2 for overall scores).

Discussion

Convergent Adaptation to High Altitude in Asia and America Is Not Rare

Our hierarchical F-model reveals that convergent adaptation to high altitude is more frequent than previously described in Tibetans and Andeans. Indeed, 31% (362/1,159) of all SNPs found to be potentially under selection at a FDR of 1% can be considered as under convergent adaptation in Asia and America. This is in sharp contrast with a previous analysis of the same data set where only a single gene was found to be responding to altitudinal selection in both Asians and Amerindians.12 Our model confirms the selection of EGLN1 in both Tibetans and Andeans. We also show that some genes already known to be involved in adaptation to high altitude in Tibetans, like EPAS1, might also have the same function in Andeans. Finally, we identified genomic regions, pathways, and GO terms potentially linked to convergent adaptation to high altitude in Tibetans and Andeans that have not been previously reported. Our approach seems thus more powerful than previous pairwise analyses, which is confirmed by our simulation studies. It suggests that data sets analyzed by previous studies that tried to uncover convergent adaptation by confronting lists of significant SNPs in separate pairwise analyses59–63 would benefit from being reanalyzed with our method. We note that more complex demography could lead to a false-positive rate higher than the nominal value. On the basis of the simple scenario of the divergence of four populations, we have simulated, we found that our method is robust to the assumed demographic model, but this might not be always the case, and significant SNPs have to be considered only as candidates for further investigations.

Polygenic and Convergent Adaptation in the Omega Oxidation Pathway

Our top significant GO term is linked to alcohol metabolism, in keeping with a recent study of a high-altitude population in Ethiopia.18,19 Indeed, one of the 25 regions identified in the present study includes several alcohol dehydrogenase (ADH) genes (ADH1A [MIM 103700], ADH1B [MIM 103720], ADH1C [MIM 103730], ADH4 [MIM 103740], ADH5 [MIM 103710], ADH6 [MIM 103735], ADH7 [MIM 600086]) located in a 370 kb segment of chromosome 4 (Figure 3), and another significant segment of 2 Mb portion of chromosome 12 includes ALDH2 (acetaldehyde dehydrogenase, MIM 100650). Some evidence of positive selection in ADH1B and ALDH2 had been reported in East Asian populations, but without any clear selective forces identified.64

Interestingly, our gene set enrichment analysis suggests a potential evolutionary adaptation of this group of genes, because they all belong to our most significant pathway, namely “fatty acid omega oxidation” (Table S3). Omega oxidation is an alternative to the beta-oxidation pathway involved in fatty-acid degradation and energy production in the mitochondrion. Degradation of fatty acids into sugar by omega oxidation is usually a minor metabolic pathway, which becomes more important when beta oxidation is defective,18,19,65 or in case of hypoxia.66 It is, however, unclear whether omega oxidation is a more efficient alternative to beta oxidation at high altitude, or if it would rather contribute to the degradation of fatty acids accumulating when beta oxidation is defective. The detoxifying role of this pathway is supported by the fact that it is usually mainly active in the liver and in the kidney.65 The fact that Ethiopians also show signals of adaptations in ADH and ALDH genes19 suggests that convergent adaptation in the omega-oxidation pathway could have occurred on three different continents in humans.

Response to Hypoxia-Induced Neuronal Damage

Hypoxia leads to neuronal damage through over-stimulation of glutamate receptors.67 Two out of our three significant pathways found with Panther (“metabotropic glutamate receptor group I” and “muscarinic acetylcholine receptor 1 and 3”) for convergent adaptation are involved with neurotransmitter receptors. The metabotropic glutamate receptor group I increases N-methyl-D-aspartate (NMDA) receptor activity, and this excitotoxicity is a major mechanism of neuronal damage and apoptosis.68 Consistently, the only significant GO term after excluding the genes involved in omega oxidation is also related to neurotransmission (“positive regulation of transmission of nerve impulse”) and contains two significant glutamate receptors genes (GRIK2 [MIM 138244] and GRIN2B [138252]), as well as IL6 (MIM 147620).

One of our top candidate regions for convergent adaptation includes 19 significant SNPs assigned to the convergent adaptation model, which are spread in a 100 kb region on chromosome 7 around IL6 (Figure 3), the gene encoding interleukin-6 (IL-6), an important cytokine. Interestingly it has been shown that IL-6 plasma levels increases significantly when sea-level resident individuals are exposed to high altitude (4,300 m),69 and IL-6 has been shown to have a neuroprotective effect against glutamate- or NMDA-induced excitotoxicity.70 Consistently the “metabotropic glutamate receptor group III” pathway seems to have responded to selection in Ethiopian highlanders.17 Together, these results suggest a genetic basis for an adaptive response to neuronal excitotoxicity induced by high-altitude hypoxia in humans.

Versatility of the Hierarchical Bayesian Model to Uncover Selection

Our statistical model is very flexible and can cope with a variety of sampling strategies to identify adaptation. For example, Pagani et al.15 used a different sampling scheme to uncover high-altitude adaptation genes in North-Caucasian highlanders. They sampled Daghestani from three closely related populations (Avars, Kubachians, and Laks) living at high altitude that they compared with two lowland European populations. Here again, our strategy would allow the incorporation of these five populations into a single analysis. A first group would correspond to the Daghestan region, containing the three populations and a second group containing the two lowland populations. However, in that case, it is the decomposition of FCT in Equation 2 that would allow the identification of loci overly differentiated between Daghestani (“group 1”) and European (“group 2”) populations.

Our approach could also be very useful in the context of genome-wide association studies (GWASs) meta-analysis. For example, Scherag et al.71 combined two GWAS on French and German samples to identify SNPs associated with extreme obesity in children. These two data sets could be combined and a single analysis could be performed under our hierarchical framework, explicitly taking into account the population structure. Our two “groups” in Figure 1 would correspond respectively to French and German individuals. In each group the two “populations” would correspond respectively to cases (obese children) and controls (children with normal weight). Like in the present study, the decomposition of FSC and the use of a convergent evolution model would allow the identification of loci associated with obesity in both populations. Additionally, a potential hidden genetic structure between cases and controls and any shared ancestry between French and Germans would be dealt with by the βjg and Bg coefficients in Equations 1 and 2, respectively.

We have introduced here a flexible hierarchical Bayesian model that can deal with complex population structure, and which allows the simultaneous analysis of populations living in different environments in several distinct geographic regions. Our model can be used to specifically test for convergent adaptation, and this approach is shown to be more powerful than previous methods that analyze pairs of populations separately. The application of our method to the detection of loci and pathways under selection reveals that many genes are under convergent selection in the American and Tibetan highlanders. Interestingly, we find that two specific pathways could have evolved to counter the toxic effects of hypoxia, which adds to previous evidence (e.g., EPAS and EGLN122) suggesting that human populations living at high altitude might have mainly evolved ways to limit the negative effects of normal physiological responses to hypoxia and might not have had enough time yet to develop more elaborate adaptations to this harsh environment.

Acknowledgments

We thank Abigail Bigham for making the genetic data analyzed here available. This work has been made possible by Swiss National Science Foundation grants No. 3100A0-126074, 31003A-143393, and CRSII3_141940 to L.E. O.E.G. was supported by French ANR grant No 09-GENM-017-001 and by the Marine Alliance for Science and Technology for Scotland (MASTS). The program BayeScan3 used to analyze the data is available from M.F. upon request.

Appendix A: Gene Set Enrichment Analysis Method

To find signals of selection at the pathway level, we applied a gene set enrichment approach as described by Daub et al.45 This method tests whether the genes in a gene set show a shift in the distribution of a selection score. In our case, we take as selection score sconv = 1-qconv, where qconv is the q value of a SNP computed from the probability of convergent selection. For the enrichment test, we needed one sconv value per gene, and we therefore transformed the SNP-based scores to gene-based scores. We first downloaded 19,683 protein coding human genes, located on the autosomes and on the X chromosome, from the NCBI Entrez Gene website72 (see Web Resources). Next we converted the SNPs to hg19 coordinates; 670 SNPs could not be mapped, resulting in 631,674 remaining SNPs. These SNPs were assigned to genes: if a SNP was located within a gene transcript, it was assigned to that gene; otherwise, it was assigned to the closest gene within 50 kb distance. For each gene, we selected the highest sconv value of all SNPs assigned to this gene. After removing 2,411 genes with no SNPs assigned, a list of 17,272 genes remained.

We downloaded 2,402 gene sets from the NCBI Biosystems database46 (see Web Resources). After discarding genes that were not part of the aforementioned gene list, removing gene sets with less than ten genes and pooling (nearly) identical gene sets, 1,339 sets remained that served as input in our enrichment tests.

We computed the SUMSTAT73 score for each set, which is the sum of the sconv values of all genes in a gene set. Gene sets with a high SUMSTAT score are likely candidates for convergent selection. To assess the significance of each tested gene set, we compared its SUMSTAT score with a null distribution of SUMSTAT scores from random gene sets (N = 500,000) of the same size. We could not approximate the null distribution with a normal distribution as applied in Daub et al.,45 because random gene sets of small to moderate size produced a skewed SUMSTAT distribution. Taking the highest sconv score among SNPs near a gene can induce a bias, because genes with many SNPs are more likely to have an extreme value assigned. To correct for this possible bias, we placed each gene in a bin containing all genes with approximately the same number of SNPs and constructed the random gene sets in the null distribution in such a way that they were composed of the same number of genes from each bin as the gene set being tested. To remove overlap among the candidate gene sets, we applied a pruning method where we assigned iteratively overlapping genes to the highest scoring gene set. Because these tests are not independent anymore, we empirically estimated the q value of these pruned sets. All sets that scored a q value < 5% (before and after pruning) were reported.

Supplemental Data

Document S1. Figures S1–S3 and Table S2
mmc1.pdf (611.2KB, pdf)
Document S2. Table S1
mmc2.xlsx (48.1KB, xlsx)
Document S3. Table S3
mmc3.xlsx (31KB, xlsx)
Document S4. Article plus Supplemental Data
mmc4.pdf (17.9MB, pdf)

Web Resources

The URLs for data presented herein are as follows:

References

  • 1.Kimura M. Evolutionary rate at the molecular level. Nature. 1968;217:624–626. doi: 10.1038/217624a0. [DOI] [PubMed] [Google Scholar]
  • 2.Nosil P., Funk D.J., Ortiz-Barrientos D. Divergent selection and heterogeneous genomic divergence. Mol. Ecol. 2009;18:375–402. doi: 10.1111/j.1365-294X.2008.03946.x. [DOI] [PubMed] [Google Scholar]
  • 3.Pritchard J.K., Pickrell J.K., Coop G. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Curr. Biol. 2010;20:R208–R215. doi: 10.1016/j.cub.2009.11.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Tenaillon O., Rodríguez-Verdugo A., Gaut R.L., McDonald P., Bennett A.F., Long A.D., Gaut B.S. The molecular diversity of adaptive convergence. Science. 2012;335:457–461. doi: 10.1126/science.1212986. [DOI] [PubMed] [Google Scholar]
  • 5.Conte G.L., Arnegard M.E., Peichel C.L., Schluter D. The probability of genetic parallelism and convergence in natural populations. Proc. Biol. Sci. 2012;279:5039–5047. doi: 10.1098/rspb.2012.2146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Tennessen J.A., Akey J.M. Parallel adaptive divergence among geographically diverse human populations. PLoS Genet. 2011;7:e1002127. doi: 10.1371/journal.pgen.1002127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Crisci J.L., Poh Y.P., Bean A., Simkin A., Jensen J.D. Recent progress in polymorphism-based population genetic inference. J. Hered. 2012;103:287–296. doi: 10.1093/jhered/esr128. [DOI] [PubMed] [Google Scholar]
  • 8.Li J., Li H., Jakobsson M., Li S., Sjödin P., Lascoux M. Joint analysis of demography and selection in population genetics: where do we stand and where could we go? Mol. Ecol. 2012;21:28–44. doi: 10.1111/j.1365-294X.2011.05308.x. [DOI] [PubMed] [Google Scholar]
  • 9.Begum F., Ghosh D., Tseng G.C., Feingold E. Comprehensive literature review and statistical considerations for GWAS meta-analysis. Nucleic Acids Res. 2012;40:3777–3784. doi: 10.1093/nar/gkr1255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Simonson T.S., Yang Y., Huff C.D., Yun H., Qin G., Witherspoon D.J., Bai Z., Lorenzo F.R., Xing J., Jorde L.B. Genetic evidence for high-altitude adaptation in Tibet. Science. 2010;329:72–75. doi: 10.1126/science.1189406. [DOI] [PubMed] [Google Scholar]
  • 11.Beall C.M., Cavalleri G.L., Deng L., Elston R.C., Gao Y., Knight J., Li C., Li J.C., Liang Y., McCormack M. Natural selection on EPAS1 (HIF2alpha) associated with low hemoglobin concentration in Tibetan highlanders. Proc. Natl. Acad. Sci. USA. 2010;107:11459–11464. doi: 10.1073/pnas.1002443107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bigham A., Bauchet M., Pinto D., Mao X.Y., Akey J.M., Mei R., Scherer S.W., Julian C.G., Wilson M.J., López Herráez D. Identifying signatures of natural selection in Tibetan and Andean populations using dense genome scan data. PLoS Genet. 2010;6:e1001116. doi: 10.1371/journal.pgen.1001116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Yi X., Liang Y., Huerta-Sanchez E., Jin X., Cuo Z.X., Pool J.E., Xu X., Jiang H., Vinckenbosch N., Korneliussen T.S. Sequencing of 50 human exomes reveals adaptation to high altitude. Science. 2010;329:75–78. doi: 10.1126/science.1190371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Xu S., Li S., Yang Y., Tan J., Lou H., Jin W., Yang L., Pan X., Wang J., Shen Y. A genome-wide search for signals of high-altitude adaptation in Tibetans. Mol. Biol. Evol. 2011;28:1003–1011. doi: 10.1093/molbev/msq277. [DOI] [PubMed] [Google Scholar]
  • 15.Pagani L., Ayub Q., MacArthur D.G., Xue Y., Baillie J.K., Chen Y., Kozarewa I., Turner D.J., Tofanelli S., Bulayeva K. High altitude adaptation in Daghestani populations from the Caucasus. Hum. Genet. 2012;131:423–433. doi: 10.1007/s00439-011-1084-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Peng Y., Yang Z., Zhang H., Cui C., Qi X., Luo X., Tao X., Wu T., Ouzhuluobu, Basang Genetic variations in Tibetan populations and high-altitude adaptation at the Himalayas. Mol. Biol. Evol. 2011;28:1075–1081. doi: 10.1093/molbev/msq290. [DOI] [PubMed] [Google Scholar]
  • 17.Scheinfeldt L.B., Soi S., Thompson S., Ranciaro A., Woldemeskel D., Beggs W., Lambert C., Jarvis J.P., Abate D., Belay G., Tishkoff S.A. Genetic adaptation to high altitude in the Ethiopian highlands. Genome Biol. 2012;13:R1. doi: 10.1186/gb-2012-13-1-r1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Alkorta-Aranburu G., Beall C.M., Witonsky D.B., Gebremedhin A., Pritchard J.K., Di Rienzo A. The genetic architecture of adaptations to high altitude in Ethiopia. PLoS Genet. 2012;8:e1003110. doi: 10.1371/journal.pgen.1003110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Huerta-Sánchez E., Degiorgio M., Pagani L., Tarekegn A., Ekong R., Antao T., Cardona A., Montgomery H.E., Cavalleri G.L., Robbins P.A. Genetic signatures reveal high-altitude adaptation in a set of ethiopian populations. Mol. Biol. Evol. 2013;30:1877–1888. doi: 10.1093/molbev/mst089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Losos J.B. Convergence, adaptation, and constraint. Evolution. 2011;65:1827–1840. doi: 10.1111/j.1558-5646.2011.01289.x. [DOI] [PubMed] [Google Scholar]
  • 21.Scheinfeldt L.B., Tishkoff S.A. Living the high life: high-altitude adaptation. Genome Biol. 2010;11:133. doi: 10.1186/gb-2010-11-9-133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Simonson T.S., McClain D.A., Jorde L.B., Prchal J.T. Genetic determinants of Tibetan high-altitude adaptation. Hum. Genet. 2012;131:527–533. doi: 10.1007/s00439-011-1109-3. [DOI] [PubMed] [Google Scholar]
  • 23.Guillemin K., Krasnow M.A. The hypoxic response: huffing and HIFing. Cell. 1997;89:9–12. doi: 10.1016/s0092-8674(00)80176-2. [DOI] [PubMed] [Google Scholar]
  • 24.Bigham A.W., Mao X., Mei R., Brutsaert T., Wilson M.J., Julian C.G., Parra E.J., Akey J.M., Moore L.G., Shriver M.D. Identifying positive selection candidate loci for high-altitude adaptation in Andean populations. Hum. Genomics. 2009;4:79–90. doi: 10.1186/1479-7364-4-2-79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Jeong C., Alkorta-Aranburu G., Basnyat B., Neupane M., Witonsky D.B., Pritchard J.K., Beall C.M., Di Rienzo A. Admixture facilitates genetic adaptations to high altitude in Tibet. Nat Commun. 2014;5:3281. doi: 10.1038/ncomms4281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bigham A.W., Wilson M.J., Julian C.G., Kiyamu M., Vargas E., Leon-Velarde F., Rivera-Chira M., Rodriquez C., Browne V.A., Parra E. Andean and Tibetan patterns of adaptation to high altitude. Am. J. Hum. Biol. 2013;25:190–197. doi: 10.1002/ajhb.22358. [DOI] [PubMed] [Google Scholar]
  • 27.Beall C.M. Andean, Tibetan, and Ethiopian patterns of adaptation to high-altitude hypoxia. Integr. Comp. Biol. 2006;46:18–24. doi: 10.1093/icb/icj004. [DOI] [PubMed] [Google Scholar]
  • 28.Hornbein T.F., Schoene R.B. Marcel Dekker, Inc.; New York: 2001. High altitude: an exploration of human adaptation. [Google Scholar]
  • 29.Aggarwal S., Negi S., Jha P., Singh P.K., Stobdan T., Pasha M.A.Q., Ghosh S., Agrawal A., Prasher B., Mukerji M., Indian Genome Variation Consortium EGLN1 involvement in high-altitude adaptation revealed through genetic analysis of extreme constitution types defined in Ayurveda. Proc. Natl. Acad. Sci. USA. 2010;107:18961–18966. doi: 10.1073/pnas.1006108107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Beaumont M.A., Balding D.J. Identifying adaptive genetic divergence among populations from genome scans. Mol. Ecol. 2004;13:969–980. doi: 10.1111/j.1365-294x.2004.02125.x. [DOI] [PubMed] [Google Scholar]
  • 31.Foll M., Gaggiotti O. A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective. Genetics. 2008;180:977–993. doi: 10.1534/genetics.108.092221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wright S. The genetical structure of populations. Ann. Eugen. 1951;15:323–354. doi: 10.1111/j.1469-1809.1949.tb02451.x. [DOI] [PubMed] [Google Scholar]
  • 33.Wright S. The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution. 1965;19:395–420. [Google Scholar]
  • 34.Gaggiotti O.E., Foll M. Quantifying population structure using the F-model. Mol Ecol Resour. 2010;10:821–830. doi: 10.1111/j.1755-0998.2010.02873.x. [DOI] [PubMed] [Google Scholar]
  • 35.Excoffier L., Hofer T., Foll M. Detecting loci under selection in a hierarchically structured population. Heredity (Edinb) 2009;103:285–298. doi: 10.1038/hdy.2009.74. [DOI] [PubMed] [Google Scholar]
  • 36.Rannala B., Hartigan J.A. Estimating gene flow in island populations. Genet. Res. 1996;67:147–158. doi: 10.1017/s0016672300033607. [DOI] [PubMed] [Google Scholar]
  • 37.International HapMap Consortium A haplotype map of the human genome. Nature. 2005;437:1299–1320. doi: 10.1038/nature04226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc., B. 1995;57:289–300. [Google Scholar]
  • 39.Storey J.D. The positive false discovery rate: A Bayesian interpretation and the q-value. Ann. Stat. 2003;31:2013–2035. [Google Scholar]
  • 40.Fischer M.C., Foll M., Excoffier L., Heckel G. Enhanced AFLP genome scans detect local adaptation in high-altitude populations of a small rodent (Microtus arvalis) Mol. Ecol. 2011;20:1450–1462. doi: 10.1111/j.1365-294X.2011.05015.x. [DOI] [PubMed] [Google Scholar]
  • 41.Roesti M., Salzburger W., Berner D. Uninformative polymorphisms bias genome scans for signatures of selection. BMC Evol. Biol. 2012;12:94. doi: 10.1186/1471-2148-12-94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Mi H., Muruganujan A., Thomas P.D. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res. 2013;41(Database issue):D377–D386. doi: 10.1093/nar/gks1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., The Gene Ontology Consortium Gene ontology: tool for the unification of biology. Nat. Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Franceschini A., Szklarczyk D., Frankild S., Kuhn M., Simonovic M., Roth A., Lin J., Minguez P., Bork P., von Mering C., Jensen L.J. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2013;41(Database issue):D808–D815. doi: 10.1093/nar/gks1094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Daub J.T., Hofer T., Cutivet E., Dupanloup I., Quintana-Murci L., Robinson-Rechavi M., Excoffier L. Evidence for polygenic adaptation to pathogens in the human genome. Mol. Biol. Evol. 2013;30:1544–1558. doi: 10.1093/molbev/mst080. [DOI] [PubMed] [Google Scholar]
  • 46.Geer L.Y., Marchler-Bauer A., Geer R.C., Han L., He J., He S., Liu C., Shi W., Bryant S.H. The NCBI BioSystems database. Nucleic Acids Res. 2010;38(Database issue):D492–D496. doi: 10.1093/nar/gkp858. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Sabeti P.C., Reich D.E., Higgins J.M., Levine H.Z.P., Richter D.J., Schaffner S.F., Gabriel S.B., Platko J.V., Patterson N.J., McDonald G.J. Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002;419:832–837. doi: 10.1038/nature01140. [DOI] [PubMed] [Google Scholar]
  • 48.Voight B.F., Kudaravalli S., Wen X., Pritchard J.K. A map of recent positive selection in the human genome. PLoS Biol. 2006;4:e72. doi: 10.1371/journal.pbio.0040072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Sabeti P.C., Varilly P., Fry B., Lohmueller J., Hostetter E., Cotsapas C., Xie X., Byrne E.H., McCarroll S.A., Gaudet R., International HapMap Consortium Genome-wide detection and characterization of positive selection in human populations. Nature. 2007;449:913–918. doi: 10.1038/nature06250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Liu X., Ong R.T.-H., Pillai E.N., Elzein A.M., Small K.S., Clark T.G., Kwiatkowski D.P., Teo Y.-Y. Detecting and characterizing genomic signatures of positive selection in global populations. Am. J. Hum. Genet. 2013;92:866–881. doi: 10.1016/j.ajhg.2013.04.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Ferrer-Admetlla A., Liang M., Korneliussen T., Nielsen R. On detecting incomplete soft or hard selective sweeps using haplotype structure. Mol. Biol. Evol. 2014;31:1275–1291. doi: 10.1093/molbev/msu077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Peng B., Kimmel M. simuPOP: a forward-time population genetics simulation environment. Bioinformatics. 2005;21:3686–3687. doi: 10.1093/bioinformatics/bti584. [DOI] [PubMed] [Google Scholar]
  • 53.Reich D., Patterson N., Campbell D., Tandon A., Mazieres S., Ray N., Parra M.V., Rojas W., Duque C., Mesa N. Reconstructing Native American population history. Nature. 2012;488:370–374. doi: 10.1038/nature11258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Huerta-Sánchez E., Jin X., Asan, Bianba Z., Peter B.M., Vinckenbosch N., Liang Y., Yi X., He M., Somel M. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature. 2014;512:194–197. doi: 10.1038/nature13408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Scally A., Durbin R. Revising the human mutation rate: implications for understanding human evolution. Nat. Rev. Genet. 2012;13:745–753. doi: 10.1038/nrg3295. [DOI] [PubMed] [Google Scholar]
  • 56.Robin X., Turck N., Hainard A., Tiberti N., Lisacek F., Sanchez J.-C., Müller M. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. doi: 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Lachance J., Tishkoff S.A. Population Genomics of Human Adaptations. Annu. Rev. Ecol. Evol. Syst. 2012;44:123–143. doi: 10.1146/annurev-ecolsys-110512-135833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Ji L.-D., Qiu Y.-Q., Xu J., Irwin D.M., Tam S.-C., Tang N.L., Zhang Y.-P. Genetic adaptation of the hypoxia-inducible factor pathway to oxygen pressure among eurasian human populations. Mol. Biol. Evol. 2012;29:3359–3370. doi: 10.1093/molbev/mss144. [DOI] [PubMed] [Google Scholar]
  • 59.Campbell D., Bernatchez L. Generic scan using AFLP markers as a means to assess the role of directional selection in the divergence of sympatric whitefish ecotypes. Mol. Biol. Evol. 2004;21:945–956. doi: 10.1093/molbev/msh101. [DOI] [PubMed] [Google Scholar]
  • 60.Egan S.P., Nosil P., Funk D.J. Selection and genomic differentiation during ecological speciation: isolating the contributions of host association via a comparative genome scan of Neochlamisus bebbianae leaf beetles. Evolution. 2008;62:1162–1181. doi: 10.1111/j.1558-5646.2008.00352.x. [DOI] [PubMed] [Google Scholar]
  • 61.Nosil P., Egan S.P., Funk D.J. Heterogeneous genomic differentiation between walking-stick ecotypes: “isolation by adaptation” and multiple roles for divergent selection. Evolution. 2008;62:316–336. doi: 10.1111/j.1558-5646.2007.00299.x. [DOI] [PubMed] [Google Scholar]
  • 62.Hohenlohe P.A., Bassham S., Etter P.D., Stiffler N., Johnson E.A., Cresko W.A. Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genet. 2010;6:e1000862. doi: 10.1371/journal.pgen.1000862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Bradbury I.R., Hubert S., Higgins B., Borza T., Bowman S., Paterson I.G., Snelgrove P.V.R., Morris C.J., Gregory R.S., Hardie D.C. Parallel adaptive evolution of Atlantic cod on both sides of the Atlantic Ocean in response to temperature. Proc. Biol. Sci. 2010;277:3725–3734. doi: 10.1098/rspb.2010.0985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Li H., Gu S., Cai X., Speed W.C., Pakstis A.J., Golub E.I., Kidd J.R., Kidd K.K. Ethnic related selection for an ADH Class I variant within East Asia. PLoS One. 2008;3:e1881. doi: 10.1371/journal.pone.0001881. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Wanders R.J.A., Komen J., Kemp S. Fatty acid omega-oxidation as a rescue pathway for fatty acid oxidation disorders in humans. FEBS J. 2011;278:182–194. doi: 10.1111/j.1742-4658.2010.07947.x. [DOI] [PubMed] [Google Scholar]
  • 66.Bhatnagar A. Surviving hypoxia: the importance of rafts, anchors, and fluidity. Circ. Res. 2003;92:821–823. doi: 10.1161/01.RES.0000071524.93731.34. [DOI] [PubMed] [Google Scholar]
  • 67.Banasiak K.J., Xia Y., Haddad G.G. Mechanisms underlying hypoxia-induced neuronal apoptosis. Prog. Neurobiol. 2000;62:215–249. doi: 10.1016/s0301-0082(00)00011-3. [DOI] [PubMed] [Google Scholar]
  • 68.Skeberdis V.A., Lan J., Opitz T., Zheng X., Bennett M.V., Zukin R.S. mGluR1-mediated potentiation of NMDA receptors involves a rise in intracellular calcium and activation of protein kinase C. Neuropharmacology. 2001;40:856–865. doi: 10.1016/s0028-3908(01)00005-3. [DOI] [PubMed] [Google Scholar]
  • 69.Mazzeo R.S., Donovan D., Fleshner M., Butterfield G.E., Zamudio S., Wolfel E.E., Moore L.G. Interleukin-6 response to exercise and high-altitude exposure: influence of alpha-adrenergic blockade. J. Appl. Physiol. 2001;91:2143–2149. doi: 10.1152/jappl.2001.91.5.2143. [DOI] [PubMed] [Google Scholar]
  • 70.Fang X.-X., Jiang X.-L., Han X.-H., Peng Y.-P., Qiu Y.-H. Neuroprotection of interleukin-6 against NMDA-induced neurotoxicity is mediated by JAK/STAT3, MAPK/ERK, and PI3K/AKT signaling pathways. Cell. Mol. Neurobiol. 2013;33:241–251. doi: 10.1007/s10571-012-9891-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Scherag A., Dina C., Hinney A., Vatin V., Scherag S., Vogel C.I., Müller T.D., Grallert H., Wichmann H.E., Balkau B. Two new Loci for body-weight regulation identified in a joint analysis of genome-wide association studies for early-onset extreme obesity in French and german study groups. PLoS Genet. 2010;6:e1000916. doi: 10.1371/journal.pgen.1000916. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Maglott D., Ostell J., Pruitt K.D., Tatusova T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2011;39(Database issue):D52–D57. doi: 10.1093/nar/gkq1237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Tintle N.L., Borchers B., Brown M., Bekmetjev A. Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16. BMC Proc. 2009;3(Suppl 7):S96. doi: 10.1186/1753-6561-3-s7-s96. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S3 and Table S2
mmc1.pdf (611.2KB, pdf)
Document S2. Table S1
mmc2.xlsx (48.1KB, xlsx)
Document S3. Table S3
mmc3.xlsx (31KB, xlsx)
Document S4. Article plus Supplemental Data
mmc4.pdf (17.9MB, pdf)

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES