Abstract
A powerful way to detect selection in a population is by modeling local allele frequency changes in a particular region of the genome under scenarios of selection and neutrality and finding which model is most compatible with the data. A previous method based on a cross-population composite likelihood ratio (XP-CLR) uses an outgroup population to detect departures from neutrality that could be compatible with hard or soft sweeps, at linked sites near a beneficial allele. However, this method is most sensitive to recent selection and may miss selective events that happened a long time ago. To overcome this, we developed an extension of XP-CLR that jointly models the behavior of a selected allele in a three-population tree. Our method - called “3-population composite likelihood ratio” (3P-CLR) - outperforms XP-CLR when testing for selection that occurred before two populations split from each other and can distinguish between those events and events that occurred specifically in each of the populations after the split. We applied our new test to population genomic data from the 1000 Genomes Project, to search for selective sweeps that occurred before the split of Yoruba and Eurasians, but after their split from Neanderthals, and that could have led to the spread of modern-human-specific phenotypes. We also searched for sweep events that occurred in East Asians, Europeans, and the ancestors of both populations, after their split from Yoruba. In both cases, we are able to confirm a number of regions identified by previous methods and find several new candidates for selection in recent and ancient times. For some of these, we also find suggestive functional mutations that may have driven the selective events.
Keywords: composite likelihood, Denisova, Neanderthal, population differentiation, positive selection
GENETIC hitchhiking will distort allele frequency patterns at regions of the genome linked to a beneficial allele that is rising in frequency (Smith and Haigh 1974). This is known as a selective sweep. If the sweep is restricted to a particular population and does not affect other closely related populations, one can detect such an event by looking for extreme patterns of localized population differentiation, like high values of at a specific locus (Lewontin and Krakauer 1973). This and other related statistics have been used to scan the genomes of present-day humans from different populations, to detect signals of recent positive selection (Akey et al. 2002; Weir et al. 2005; Oleksyk et al. 2008; Yi et al. 2010).
Once it became possible to sequence entire genomes of archaic humans (like Neanderthals) (Green et al. 2010; Meyer et al. 2012; Prüfer et al. 2014), researchers also began to search for selective sweeps that occurred in the ancestral population of all present-day humans. For example, Green et al. (2010) searched for genomic regions with a depletion of derived alleles in a low-coverage Neanderthal genome, relative to what would be expected given the derived allele frequency in present-day humans. This is a pattern that would be consistent with a sweep in present-day humans. Later on, Prüfer et al. (2014) developed a hidden Markov model (HMM) that could identify regions where Neanderthals fall outside of all present-day human variation (also called “external regions”) and are therefore likely to have been affected by ancient sweeps in early modern humans. They applied their method to a high-coverage Neanderthal genome. Then, they ranked these regions by their genetic length, to find segments that were extremely long and therefore highly compatible with a selective sweep. Finally, Racimo et al. (2014) used summary statistics calculated in the neighborhood of sites that were ancestral in archaic humans but fixed derived in all or almost all present-day humans, to test whether any of these sites could be compatible with a selective sweep model. While these methods harnessed different summaries of the patterns of differentiation left by sweeps, they did not attempt to explicitly model the process by which these patterns are generated over time.
Chen et al. (2010) developed a test called “cross-population composite likelihood ratio” (XP-CLR), which is designed to test for selection in one population after its split from a second, outgroup, population generations ago. It does so by modeling the evolutionary trajectory of an allele under linked selection and under neutrality and then comparing the likelihood of the data for each of the two models. The method detects local allele frequency differences that are compatible with the linked selection model (Smith and Haigh 1974), along windows of the genome.
XP-CLR is a powerful test for detecting selective events restricted to one population. However, it provides little information about when these events happened, as it models all sweeps as if they had immediately occurred in the present generation. Additionally, if one is interested in selective sweeps that took place before two populations a and b split from each other, one would have to run XP-CLR separately on each population, with a third outgroup population c that split from the ancestor of a and b generations ago (with ). Then, one would need to check that the signal of selection appears in both tests. This may miss important information about correlated allele frequency changes shared by a and b, but not by c, limiting the power to detect ancient events.
To overcome this, we developed an extension of XP-CLR that jointly models the behavior of an allele in all three populations, to detect selective events that occurred before or after the closest two populations split from each other. Below we briefly review the modeling framework of XP-CLR and describe our new test, which we call the “3-population composite likelihood ratio” (3P-CLR). In Results, we show this method outperforms XP-CLR, when testing for selection that occurred before the split of two populations, and can distinguish between those events and events that occurred after the split, unlike XP-CLR. We then apply the method to population genomic data from the 1000 Genomes Project (Abecasis et al. 2012), to search for selective sweep events that occurred before the split of Yoruba and Eurasians, but after their split from Neanderthals. We also use it to search for selective sweeps that occurred in the Eurasian ancestral population and to distinguish those from events that occurred specifically in East Asians or specifically in Europeans.
Materials and Methods
XP-CLR
First, we review the procedure used by XP-CLR to model the evolution of allele frequency changes of two populations a and b that split from each other generations ago (Figure 1A). For neutral SNPs, Chen et al. (2010) use an approximation to the Wright–Fisher diffusion dynamics (Nicholson et al. 2002). Namely, the frequency of a SNP in a population a () in the present is treated as a random variable governed by a normal distribution with mean equal to the frequency in the ancestral population (β) and variance proportional to the drift time ω from the ancestral to the present population,
(1) |
where and is the effective size of population A.
This is a Brownian motion approximation to the Wright–Fisher model, as the drift increment to variance is constant across generations. If a SNP is segregating in both populations—i.e., has not hit the boundaries of fixation or extinction—this process is time reversible. Thus, one can model the frequency of the SNP in population a with a normal distribution having mean equal to the frequency in population b and variance proportional to the sum of the drift time (ω) between a and the ancestral population and the drift time between b and the ancestral population (ψ):
(2) |
For SNPs that are linked to a beneficial allele that has produced a sweep in population a only, Chen et al. (2010) model the allele as evolving neutrally until the present and then apply a transformation to the normal distribution that depends on the distance to the selected allele r and the strength of selection s (Fay and Wu 2000; Durrett and Schweinsberg 2004). Let where is the frequency of the beneficial allele in population a before the sweep begins. The frequency of a neutral allele is expected to increase from p to if the allele is linked to the beneficial allele, and this occurs with probability equal to the frequency of the neutral allele (p) before the sweep begins. Otherwise, the frequency of the neutral allele is expected to decrease from p to This leads to the following transformation of the normal distribution,
(3) |
where and = 1 on the interval and 0 otherwise.
For or this distribution converges to the neutral case. Let v be the vector of all drift times that are relevant to the scenario we are studying. In this case, it will be equal to but in more complex cases below, it may include additional drift times. Let r be the vector of recombination fractions between the beneficial alleles and each of the SNPs within a window of arbitrary size. We can then calculate the product of likelihoods over all k SNPs in that window for either the neutral or the linked selection model, after binomial sampling of alleles from the population frequency and conditioning on the event that the allele is segregating in the population:
(4) |
This is a composite likelihood (Lindsay 1988; Varin et al., 2011), because we are ignoring the correlation in frequencies produced by linkage among SNPs that is not strictly due to proximity to the beneficial SNP. We note that the denominator in the above equation is not explicitly stated in Chen et al. (2010) for ease of notation, but appears in the published online implementation of the method.
Finally, we obtain a composite-likelihood-ratio statistic of the hypothesis of linked selection over the hypothesis of neutrality:
(5) |
For ease of computation, Chen et al. (2010) assume that r is given (via a recombination map) instead of maximizing the likelihood with respect to it, and we do so too. Furthermore, they empirically estimate v using statistics (Patterson et al., 2012) calculated over the whole genome and assume selection is not strong or frequent enough to affect their genome-wide values. Therefore, the likelihoods in the above equation are maximized only with respect to the selection coefficient, using a grid of coefficients on a logarithmic scale.
3P-CLR
We are interested in the case where a selective event occurred more anciently than the split of two populations (a and b) from each other, but more recently than their split from a third population c (Figure 1B). We begin by modeling and as evolving from an unknown common ancestral frequency β:
(6) |
(7) |
Let χ be the drift time separating the most recent common ancestor of a and b from the most recent common ancestor of a, b, and c. Additionally, let ν be the drift time separating population c in the present from the most recent common ancestor of a, b, and c. Given these parameters, we can treat β as an additional random variable that either evolves neutrally or is linked to a selected allele that swept immediately more anciently than the split of a and b. In both cases, the distribution of β will depend on the frequency of the allele in population c () in the present. In the neutral case,
(8) |
In the linked selection case,
(9) |
where
The frequencies in a and b given the frequency in c can be obtained by integrating β out. This leads to a density function that models selection in the ancestral population of a and b:
(10) |
Additionally, Equation 10 can be modified to test for selection that occurred specifically in one of the terminal branches that lead to a or b (Figure 1, C and D), rather than in the ancestral population of a and b. For example, the density of frequencies for a scenario of selection in the branch leading to a can be written as
(11) |
We henceforth refer to the version of 3P-CLR that is tailored to detect selection in the internal branch that is ancestral to a and b as 3P-CLR(Int). In turn, the versions of 3P-CLR that are designed to detect selection in each of the daughter populations a and b are designated as 3P-CLR(A) and 3P-CLR(B), respectively.
We can now calculate the probability density of specific allele frequencies in populations a and b, given that we observe derived alleles in a sample of size from population c,
(12) |
and
(13) |
where B(x, y) is the Beta function. We note that Equation 13 assumes that the unconditioned density function for the population derived allele frequency comes from the neutral infinite-sites model at equilibrium and is therefore equal to the product of a constant and (Ewens 2012).
Conditioning on the event that the site is segregating in the population, we can then calculate the probability of observing and derived alleles in a sample of size from population a and a sample of size from population b, respectively, given that we observe derived alleles in a sample of size from population c, using binomial sampling,
(14) |
where
(15) |
and
(16) |
This allows us to calculate a composite likelihood of the derived allele counts in a and b given the derived allele counts in c:
(17) |
As before, we can use this composite likelihood to produce a composite-likelihood-ratio statistic that can be calculated over regions of the genome to test the hypothesis of linked selection centered on a particular locus against the hypothesis of neutrality. Due to computational costs in numerical integration, we skip the sampling step for population c (Equation 13) in our implementation of 3P-CLR. In other words, we assume but this is also assumed in XP-CLR when computing its corresponding outgroup frequency. To perform the numerical integrations, we used the package Cubature (v.1.0.2). We implemented our method in a freely available C++ program that can be downloaded from https://github.com/ferracimo. The program requires the neutral drift parameters α, β, and to be specified as input. These can be obtained using statistics (Felsenstein 1981; Patterson et al. 2012), which have previously been implemented in programs like MixMapper (Lipson et al. 2013). For example, α can be obtained via while can be obtained via When computing statistics, we use only sites where population C is polymorphic, and so we correct for this ascertainment in the calculation. Another way of calculating these drift times is via (Gutenkunst et al. 2009). Focusing on two populations at a time, we can fix one population’s size and allow the split time and the other population’s size to be estimated by the program, in this case using all polymorphic sites, regardless of which population they are segregating in. We then obtain the two drift times by scaling the inferred split time by the two different population sizes. We provide scripts in our github page for the user to obtain these drift parameters, using both of the above ways.
Results
Simulations
We generated simulations in SLiM (Messer 2013) to test the performance of XP-CLR and 3P-CLR in a three-population scenario. We first focused on the performance of 3P-CLR(Int) in detecting ancient selective events that occurred in the ancestral branch of two sister populations. We assumed that the population history had been correctly estimated (i.e., the drift parameters and population topology were known). First, we simulated scenarios in which a beneficial mutation arose in the ancestor of populations a and b, before their split from each other but after their split from c (Table 1). Although both XP-CLR and 3P-CLR are sensitive to partial or soft sweeps [as they do not rely on extended patterns of haplotype homozygosity (Chen et al., 2010)], we required the beneficial allele to have fixed before the split (at time ) to ensure that the allele had not been lost by then and also to ensure that the sweep was restricted to the internal branch of the tree. We fixed the effective size of all three populations at Each simulation consisted of a 5-cM region and the beneficial mutation occurred in the center of this region. The mutation rate was set at per generation and the recombination rate between adjacent nucleotides was set at per generation.
Table 1. Description of models tested.
Model | Population where selection occurred | s | ||||
---|---|---|---|---|---|---|
A | Ancestral population | 500 | 2,000 | 1,800 | 0.1 | 10,000 |
B | Ancestral population | 1,000 | 4,000 | 2,500 | 0.1 | 10,000 |
C | Ancestral population | 2,000 | 4,000 | 3,500 | 0.1 | 10,000 |
D | Ancestral population | 3,000 | 8,000 | 5,000 | 0.1 | 10,000 |
E | Ancestral population | 2,000 | 16,000 | 8,000 | 0.1 | 10,000 |
F | Ancestral population | 4,000 | 16,000 | 8,000 | 0.1 | 10,000 |
I | Daughter population a | 2,000 | 4,000 | 1,000 | 0.1 | 10,000 |
J | Daughter population a | 3,000 | 8,000 | 2,000 | 0.1 | 10,000 |
All times are in generations. Selection in the “ancestral population” refers to a selective sweep where the beneficial mutation and fixation occurred before the split time of the two most closely related populations. Selection in “daughter population a” refers to a selective sweep that occurred in one of the two most closely related populations (a), after their split from each other. split time (in generations ago) of populations a and b; split time of population c and the ancestral population of a and b; time at which the selected mutation is introduced; s, selection coefficient; effective population size.
To make a fair comparison to 3P-CLR(Int), and given that XP-CLR is a two-population test, we applied XP-CLR in two ways. First, we pretended population b was not sampled, and so the “test” panel consisted of individuals from a only, while the “outgroup” consisted of individuals from c. In the second implementation (which we call “XP-CLR-avg”), we used the same outgroup panel, but pooled the individuals from a and b into a single panel, and this pooled panel was the test. The window size was set at 0.5 cM and the number of SNPs sampled between each window’s central SNP was set at 600 (this number is large because it includes SNPs that are not segregating in the outgroup, which are later discarded). To speed up computation, and because we are largely interested in comparing the relative performance of the three tests under different scenarios, we used only 20 randomly chosen SNPs per window in all tests. We note, however, that the performance of all of these tests can be improved by using more SNPs per window.
Figure 2 shows receiver operating characteristic (ROC) curves comparing the sensitivity and specificity of 3P-CLR(Int), 3P-CLR(A), XP-CLR, and XP-CLR-avg in the first six demographic scenarios described in Table 1. Each ROC curve was made from 100 simulations under selection (with for the central mutation) and 100 simulations under neutrality (with and no fixation required). In each simulation, 100 haploid individuals (or 50 diploids) were sampled from population a, 100 individuals from population b, and 100 individuals from the outgroup population c. For each simulation, we took the maximum value at a region in the neighborhood of the central mutation (±0.5 cM) and used those values to compute ROC curves under the two models.
When the split times are recent or moderately ancient (models A–D), 3P-CLR(Int) outperforms the two versions of XP-CLR. Furthermore, 3P-CLR(A) is the test that is least sensitive to selection in the internal branch as it is meant to detect selection only in the terminal branch leading to population a. When the split times are very ancient (models E and F), none of the tests perform well. The root mean-squared error (RMSE) of the genetic distance between the true selected site and the highest scored window is comparable across tests in all six scenarios (Supporting Information, Figure S5). 3P-CLR(Int) is the best test at finding the true location of the selected site in almost all demographic scenarios. We observe that we lose almost all power if we simulate demographic scenarios where the population size is 10 times smaller () (Figure S1). Additionally, we observe that the power and specificity of 3P-CLR decrease as the selection coefficient decreases (Figure S2).
We also simulated a situation in which only a few individuals are sequenced from the outgroup, while large numbers of sequences are available from the tests. Figure S3 and Figure S6 show the ROC curves and RMSE plots, respectively, for a scenario in which 100 individuals were sampled from the test populations but only 10 individuals (5 diploids) were sampled from the outgroup. Unsurprisingly, all tests have less power to detect selection when the split times and the selection events are recent to moderately ancient (models A–D). Interestingly though, when the split times and the selective events are very ancient (models E and F), both 3P-CLR and XP-CLR perform better when using a small outgroup panel (Figure S3) than when using a large outgroup panel (Figure 2). This is due to the Brownian motion approximation that these methods utilize. Under the Wright–Fisher model, the drift increment at generation t is proportional to p(t)×(1 − p(t)), where p(t) is the derived allele frequency. The derivative of this function gets smaller the closer p(t) is to 0.5 (and is exactly 0 at that point). Small outgroup panels serve to filter out loci with allele frequencies far from 0.5, and so small changes in allele frequency will not affect the drift increment much, making Brownian motion a good approximation to the Wright–Fisher model. Indeed, when running 3P-CLR(Int) in a demographic scenario with very ancient split times (model E) and a large outgroup panel (100 sequences) but restricting only to sites that are at intermediate frequencies in the outgroup (), we find that performance is much improved relative to the case when we use all sites that are segregating in the outgroup (Figure S4).
Importantly, the usefulness of 3P-CLR(Int) resides not just in its performance at detecting selective sweeps in the ancestral population, but in its specific sensitivity to that particular type of events. Because the test relies on correlated allele frequency differences in both population a and population b (relative to the outgroup), selective sweeps that are specific to only one of the populations will not lead to high 3P-CLR(Int) scores, but will instead lead to high 3P-CLR(A) scores or 3P-CLR(B) scores, depending on where selection took place. Figure 3 shows ROC curves in two scenarios in which a selective sweep occurred only in population a (models I and J in Table 1), using 100 sampled individuals from each of the three populations. Here, XP-CLR performs well, but is outperformed by 3P-CLR(A). Furthermore, 3P-CLR(Int) shows almost no sensitivity to the recent sweep. For example, in model I, at a specificity of , 3P-CLR(A) and XP-CLR(A) have and sensitivity, respectively, while at the same specificity, 3P-CLR(Int) has only sensitivity. One can compare this to the same demographic scenario but with selection occurring in the ancestral population of a and b (model C, Figure 2), where at specificity, 3P-CLR(A) and XP-CLR(A) have and sensitivity, respectively, while 3P-CLR(Int) has sensitivity. We also observe that 3P-CLR(A) is the best test at finding the true location of the selected site when selection occurs in the terminal branch leading to population a (Figure S7).
Finally, we tested the behavior of 3P-CLR under selective scenarios that we did not explicitly model. First, we simulated a selective sweep in the outgroup population. We find that all three types of 3P-CLR statistics [3P-CLR(Int), 3P-CLR(A), and 3P-CLR(B)] are largely insensitive to this type of event, although 3P-CLR(Int) is relatively more sensitive than the other two. Second, we simulated two independent selective sweeps in populations a and b (convergent evolution). This results in elevated 3P-CLR(A) and 3P-CLR(B) statistics, but 3P-CLR(Int) remains largely insensitive (Figure S8). We note that 3P-CLR should not be used to detect selective events that occurred before the split of all three populations (i.e., before the split of c and the ancestor of a and b), as it relies on allele frequency differences between the populations.
Selection in Eurasians
We first applied 3P-CLR to modern human data from phase 1 of the 1000 Genomes Project (Abecasis et al. 2012). We used the African–American recombination map (Hinch et al. 2011) to convert physical distances into genetic distances. We focused on Europeans - including Utah residents with European ancestry (CEU), Finnish (FIN), British (GBR), Spanish (IBS) and Toscani (TSI) - and East Asians - including Han Chinese (CHB), Southern Han Chinese (CHS) and Japanese (JPT) - as the two sister populations, using Yoruba (YRI) as the outgroup population (Figure S9A). We randomly sampled 100 individuals from each population and obtained sample derived allele frequencies every 10 SNPs in the genome. We then calculated likelihood-ratio statistics by a sliding-window approach, where we sampled a “central SNP” once every 10 SNPs. The central SNP in each window was the candidate beneficial SNP for that window. We set the window size to 0.25 cM and randomly sampled 100 SNPs from each window, centered around the candidate beneficial SNP. In each window, we calculated 3P-CLR to test for selection at three different branches of the population tree: the terminal branch leading to Europeans (3P-CLR Europe), the terminal branch leading to East Asians (3P-CLR East Asia), and the ancestral branch of Europeans and East Asians (3P-CLR Eurasia). Results are shown in Figure S10. For each scan, we selected the windows in the top quantile of scores and merged them together if their corresponding central SNPs were contiguous, effectively resulting in overlapping windows being merged. Table S1, Table S2, and Table S3 show the top hits for Europeans, East Asians, and the ancestral Eurasian branch, respectively, while Table 2 shows the 10 strongest candidate regions for each population.
Table 2. Genes from top 10 candidate regions for each of the branches on which 3P-CLR was run for the Eurasian population tree.
Window size | Position (hg19) | Genes |
---|---|---|
European | Chr9:125424000–126089000 | ZBTB26, RABGAP1, GPR21, STRBP, OR1L1, OR1L3, OR1L4, OR1L6, OR5C1, PDCL, OR1K1, RC3H2, ZBTB6 |
Chr22:35,528,100–35,754,100 | HMGXB4, TOM1 | |
Chr8:52,361,800–52,932,100 | PXDNL, PCMTD1 | |
Chr2:74,450,100–74,972,700 | INO80B, WBP1, MOGS, MRPL53, CCDC142, TTC31, LBX2, PCGF1, TLX2, DQX1, AUP1, HTRA2, LOXL3, DOK1, M1AP, SEMA4F, SLC4A5, DCTN1, WDR54, RTKN | |
Chr1:35,382,000–36,592,200 | DLGAP3, ZMYM6NB, ZMYM6, ZMYM1, SFPQ, ZMYM4, KIAA0319L, NCDN, TFAP2E, PSMB2, C1orf216, CLSPN, AGO4, AGO1, AGO3, TEKT2, ADPRHL2, COL8A2 | |
Chr15:29,248,000–29,338,300 | APBA2 | |
Chr12:111,747,000–113,030,000 | BRAP, ACAD10, ALDH2, MAPKAPK5, TMEM116, ERP29, NAA25, TRAFD1, RPL6, PTPN11, RPH3A, CUX2, FAM109A, SH2B3, ATXN2 | |
Chr9:90,909,300–91,210,000 | SPIN1, NXNL2 | |
Chr19:33,504,200–33,705,700 | RHPN2, GPATCH1, WDR88, LRP3, SLC7A10 | |
Chr9:30,085,400–31,031,600 | — | |
East Asian | Chr15:63,693,900–64,188,300 | USP3, FBXL22, HERC1 |
Chr10:94,830,500–95,093,900 | CYP26A1, MYOF | |
Chr2:72,353,500–73,170,800 | CYP26B1, EXOC6B, SPR, EMX1, SFXN5 | |
Chr2:72,353,500–73,170,800 | PCDH15 | |
Chr1:234,209,000–234,396,000 | SLC35F3 | |
Chr5:117,344,000–117,714,000 | — | |
Chr17:60,907,300–61,547,900 | TANC2, CYB561 | |
Chr2:44,101,400–44,315,200 | ABCG8, LRPPRC | |
Chr11:6,028,090–6,191,240 | OR56A1, OR56B4, OR52B2 | |
Chr2:108,905,000–109,629,000 | LIMS1, RANBP2, CCDC138, EDAR, SULT1C2, SULT1C4, GCC2 | |
Eurasian | Chr2:72,353,500–73,170,800 | CYP26B1, EXOC6B, SPR, EMX1, SFXN5 |
Chr20:53,876,700–54,056,200 | — | |
Chr10:22,309,300–22,799,200 | EBLN1, COMMD3, COMMD3-BMI1, BMI1, SPAG6 | |
Chr3:25,726,300–26,012,000 | NGLY1, OXSM | |
Chr18:67,523,300–67,910,500 | CD226, RTTN | |
Chr10:65,794,400–66,339,100 | — | |
Chr11:39,587,400–39,934,300 | — | |
Chr7:138,806,000–139,141,000 | TTC26, UBN2, C7orf55, C7orf55-LUC7L2, LUC7L2, KLRG2 | |
Chr9:90,909,300–91,202,200 | SPIN1, NXNL2 | |
Chr4:41,454,200–42,195,300 | LIMCH1, PHOX2B, TMEM33, DCAF4L1, SLC30A9, BEND4 |
All positions were rounded to the nearest 100 bp. Windows were merged together if the central SNPs that define them were contiguous.
We observe several genes that were identified in previous selection scans. In the East Asian branch, one of the top hits is EDAR. Figure 4A shows that this gene appears to be under selection exclusively in this population branch. It codes for a protein involved in hair thickness and incisor tooth morphology (Fujimoto et al. 2008; Kimura et al. 2009) and has been repeatedly identified as a candidate for a sweep in East Asians (Sabeti et al. 2007; Grossman et al. 2010).
Furthermore, 3P-CLR allows us to narrow down the specific time at which selection for previously found candidates occurred in the history of particular populations. For example, Chen et al. (2010) performed a scan of the genomes of East Asians, using XP-CLR with Yoruba as the outgroup, and identified a number of candidate genes. 3P-CLR confirms several of their loci when looking specifically at the East Asian branch: OR56A1, OR56B4, OR52B2, SLC30A9, BBX, EPHB1, ACTN1, and XKR6. However, when applied to the ancestral Eurasian branch, 3P-CLR finds some genes that were previously found in the XP-CLR analysis of East Asians, but that are not among the top hits in 3P-CLR applied to the East Asian branch: COMMD3, BMI1, SPAG6, NGLY1, OXSM, CD226, ABCC12, ABCC11, LONP2, SIAH1, PPARA, PKDREJ, GTSE1, TRMU, and CELSR1. This suggests selection in these regions occurred earlier, i.e., before the European–East Asian split. Figure 4B shows a comparison between the 3P-CLR scores for the three branches in the region containing genes BMI1 [a proto-oncogene (Siddique and Saleem 2012)] and SPAG6 [involved in sperm motility (Sapiro et al. 2002)]. Here, the signal of Eurasia-specific selection is evidently stronger than the other two signals. Finally, we also find some candidates from Chen et al. (2010) that appear to be under selection in both the ancestral Eurasian branch and the East Asian daughter branch: SFXN5, EMX1, SPR, and CYP26B1. Interestingly, both CYP26B1 and CYP26A1 are very strong candidates for selection in the East Asian branch. These two genes lie in two different chromosomes, so they are not part of a gene cluster, but they both code for proteins that hydrolize retinoic acid, an important signaling molecule (White et al. 2000; Topletz et al. 2012).
Other selective events that 3P-CLR infers to have occurred in Eurasians include the region containing HERC2 and OCA2, which are major determinants of eye color (Eiberg et al. 2008; Han et al. 2008; Branicki et al. 2009). There is also evidence that these genes underwent selection more recently in the history of Europeans (Mathieson et al. 2015), which could suggest an extended period of selection—perhaps influenced by migrations between Asia and Europe—or repeated selective events at the same locus.
When running 3P-CLR to look for selection specific to Europe, we find that TYRP1, which plays a role in human skin pigmentation (Halaban and Moellmann 1990), is among the top hits. This gene has been previously found to be under strong selection in Europe (Voight et al. 2006), using a statistic called iHS, which measures extended patterns of haplotype homozygosity that are characteristic of selective sweeps. Interestingly, a change in the gene TYRP1 has also been found to cause a blonde hair phenotype in Melanesians (Kenny et al. 2012). Another of our top hits is the region containing SH2B3, which was identified previously as a candidate for selection in Europe based on both and (Pickrell et al. 2009). This gene contains a nonsynonymous SNP (rs3184504) segregating in Europeans. One of its alleles (the one in the selected haplotype) has been associated with celiac disease and type 1 diabetes (Todd et al. 2007; Hunt et al. 2008) but is also protective against bacterial infection (Zhernakova et al. 2010).
We used Gowinda (v1.12) (Kofler and Schlötterer 2012) to find enriched Gene Ontology (GO) categories among the regions in the 99.5% highest quantile for each branch score, relative to the rest of the genome [P , false discovery rate (FDR) ]. The significantly enriched categories are listed in Table S4. In the East Asian branch, we find categories related to alcohol catabolism, retinol binding, vitamin metabolism, and epidermis development, among others. In the European branch, we find cuticle development and hydrogen peroxide metabolic process as enriched categories. We find no enriched categories in the Eurasian branch that pass the above cutoffs.
Selection in ancestral modern humans
We applied 3P-CLR to modern human data combined with recently sequenced archaic human data. We sought to find selective events that occurred in modern humans after their spit from archaic groups. We used the combined Neanderthal and Denisovan high-coverage genomes (Meyer et al. 2012; Prüfer et al. 2014) as the outgroup population, and, for our two test populations, we used Eurasians (CEU, FIN, GBR, IBS, TSI, CHB, CHS, and JPT) and YRI, again from phase 1 of the 1000 Genomes Project (Abecasis et al. 2012) (Figure S9B). As before, we randomly sampled 100 genomes for each of the two daughter populations at each site and tested for selective events that occurred more anciently than the split of Yoruba and Eurasians, but more recently than the split from Neanderthals. Figure S11 shows an ROC curve for a simulated scenario under these conditions, based on the history of population size changes inferred by the Pairwise Sequentially Markovian Coalescent (PSMC) model (Li and Durbin 2011; Prüfer et al. 2014), suggesting we should have power to detect strong (s = 0.1) selective events in the ancestral branch of present-day humans. We observe that 3P-CLR(Int) has similar power to XP-CLR and XP-CLR-avg at these timescales, but is less prone to also detect recent (postsplit) events, making it more specific to ancestral sweeps.
We ran 3P-CLR using 0.25-cM windows as above (Figure S13). As before, we selected the top windows and merged them together if their corresponding central SNPs were contiguous (Table S5). The top 20 regions are in Table 3. Figure S13 shows that the outliers in the genome-wide distribution of scores are not strong. We wanted to verify that the density of scores was robust to the choice of window size. By using a larger window size (1 cM), we obtained a distribution with slightly more extreme outliers (Figure S12 and Figure S13). For that reason, we also show the top hits from this large-window run (Table S6 and Table 3), using a smaller density of SNPs (200/1 cM rather than 100/0.25 cM), due to costs in speed. To find putative candidates for the beneficial variants in each region, we queried the catalogs of modern human-specific high-frequency or fixed derived changes that are ancestral in the Neanderthal and/or the Denisova genomes (Castellano et al. 2014; Prüfer et al. 2014) and overlapped them with our regions.
Table 3. Genes from top 20 candidate regions for the modern human ancestral branch.
Window size | Position (hg19) | Genes |
---|---|---|
0.25 cM (100 SNPs) | Chr2:95,561,200–96,793,700 | ZNF514, ZNF2, PROM2, KCNIP3, FAHD2A, TRIM43, GPAT2, ADRA2B, ASTL, MAL, MRPS5 |
Chr5:86,463,700–87,101,400 | RASA1, CCNH | |
Chr17:60,910,700–61,557,700 | TANC2, CYB561, ACE | |
Chr14:71,649,200–72,283,600 | SIPA1L1 | |
Chr18:15,012,100–19,548,600 | ROCK1, GREB1L, ESCO1, SNRPD1, ABHD3, MIB1 | |
Chr3:110,513,000–110,932,000 | PVRL3 | |
Chr2:37,917,900–38,024,200 | CDC42EP3 | |
Chr3:36,836,900–37,517,500 | TRANK1, EPM2AIP1, MLH1, LRRFIP2, GOLGA4, C3orf35, ITGA9 | |
Chr7:106,642,000–10,7310,000 | PRKAR2B, HBP1, COG5, GPR22, DUS4L, BCAP29, SLC26A4 | |
Chr12:96,823,000–97,411,500 | NEDD1 | |
Chr2:200,639,000–201,340,000 | C2orf69, TYW5, C2orf47, SPATS2L | |
Chr1:66,772,600–66,952,600 | PDE4B | |
Chr10:37,165,100–38,978,800 | ANKRD30A, MTRNR2L7, ZNF248, ZNF25, ZNF33A, ZNF37A | |
Chr2:155,639,000–156,767,000 | KCNJ3 | |
Chr17:56,379,200–57,404,800 | BZRAP1, SUPT4H1, RNF43, HSF5, MTMR4, SEPT4, C17orf47, TEX14, RAD51C, PPM1E, TRIM37, SKA2, PRR11, SMG8, GDPD1 | |
Chr5:18,493,900–18,793,500 | — | |
Chr2:61,050,900–61,891,900 | REL, PUS10, PEX13, KIAA1841, AHSA2, USP34, XPO1 | |
Chr22:40,360,300–41,213,400 | GRAP2, FAM83F, TNRC6B, ADSL, SGSM3, MKL1, MCHR1, SLC25A17 | |
Chr2:98,996,400–99,383,400 | CNGA3, INPP4A, COA5, UNC50, MGAT4A | |
Chr4:13,137,000–13,533,100 | RAB28 | |
1 cM (200 SNPs) | Chr14:71,349,200–72,490,300 | PCNX, SIPA1L1, RGS6 |
Chr4:145,023,000–146,522,000 | GYPB, GYPA, HHIP, ANAPC10, ABCE1, OTUD4, SMAD1 | |
Chr2:155,391,000–156,992,000 | KCNJ3 | |
Chr5:92,415,600–94,128,600 | NR2F1, FAM172A, POU5F2, KIAA0825, ANKRD32, MCTP1 | |
Chr7:106,401,000–107,461,000 | PIK3CG, PRKAR2B, HBP1, COG5, GPR22, DUS4L, BCAP29, SLC26A4, CBLL1, SLC26A3 | |
Chr7:151,651,000–152,286,000 | GALNTL5, GALNT11, KMT2C | |
Chr2:144,393,000–145,305,000 | ARHGAP15, GTDC1, ZEB2 | |
Chr19:16,387,600–16,994,000 | KLF2, EPS15L1, CALR3, C19orf44, CHERP, SLC35E1, MED26, SMIM7, TMEM38A, NWD1, SIN3B | |
Chr2:37,730,400–38,054,600 | CDC42EP3 | |
Chr2:62,639,800–64,698,300 | TMEM17, EHBP1, OTX1, WDPCP, MDH1, UGP2, VPS54, PELI1, LGALSL | |
Chr10:36,651,400–44,014,800 | ANKRD30A, MTRNR2L7, ZNF248, ZNF25, ZNF33A, ZNF37A, ZNF33B, BMS1, RET, CSGALNACT2, RASGEF1A, FXYD4, HNRNPF | |
Chr1:26,703,800–27,886,000 | LIN28A, DHDDS, HMGN2, RPS6KA1, ARID1A, PIGV, ZDHHC18, SFN, GPN2, GPATCH3, NUDC, NR0B2, C1orf172, TRNP1, FAM46B, SLC9A1, WDTC1, TMEM222, SYTL1, MAP3K6, FCN3, CD164L2, GPR3, WASF2, AHDC1 | |
Chr12:102,308,000–103,125,000 | DRAM1, CCDC53, NUP37, PARPBP, PMCH, IGF1 | |
Chr2:132,628,000–133,270,000 | GPR39 | |
Chr15:42,284,300–45,101,400 | PLA2G4E, PLA2G4D, PLA2G4F, VPS39, TMEM87A, GANC, CAPN3, ZNF106, SNAP23, LRRC57, HAUS2, STARD9, CDAN1, TTBK2, UBR1, EPB42, TMEM62, CCNDBP1, TGM5, TGM7, LCMT2, ADAL, ZSCAN29, TUBGCP4,TP53BP1, MAP1A, PPIP5K1, CKMT1B, STRC, CATSPER2, CKMT1A, PDIA3, ELL3, SERF2, SERINC4HYPK, MFAP1, WDR76, FRMD5, CASC4, CTDSPL2, EIF3J, SPG11, PATL2, B2M, TRIM69 | |
Chr2:73,178,500–74,194,400 | SFXN5, RAB11FIP5, NOTO, SMYD5, PRADC1, CCT7, FBXO41, EGR4, ALMS1, NAT8, TPRKB, DUSP11, C2orf78, STAMBP, ACTG2, DGUOK | |
Chr5:54,193,000–55,422,100 | ESM1, GZMK, GZMA, CDC20B, GPX8, MCIDAS, CCNO, DHX29, SKIV2L2, PPAP2A, SLC38A9, DDX4, IL31RA, IL6ST, ANKRD55 | |
Chr3:50,184,000–53,602,300 | SEMA3F, GNAT1, GNAI2, LSMEM2, IFRD2, HYAL3, NAT6, HYAL1, HYAL2, TUSC2, RASSF1, ZMYND10, NPRL2, CYB561D2, TMEM115, CACNA2D2, C3orf18, HEMK1, CISH, MAPKAPK3, DOCK3, MANF, RBM15B, RAD54L2, TEX264, GRM2, IQCF6, IQCF3, IQCF2, IQCF5, IQCF1, RRP9, PARP3, GPR62, PCBP4, ABHD14B, ABHD14A, ACY1, RPL29, DUSP7, POC1A, ALAS1, TLR9, TWF2, PPM1M, WDR82, GLYCTK, DNAH1, BAP1, PHF7, SEMA3G, TNNC1, NISCH, STAB1, NT5DC2, SMIM4, PBRM1, GNL3, GLT8D1, SPCS1, NEK4, ITIH1, ITIH3, ITIH4, MUSTN1, TMEM110-MUSTN1, TMEM110, SFMBT1, RFT1, PRKCD, TKT, CACNA1D | |
Chr13:96,038,900–97,500,100 | CLDN10, DZIP1, DNAJC3, UGGT2, HS6ST3 | |
Chr18:14,517,500–19,962,400 | POTEC, ANKRD30B, ROCK1, GREB1L, ESCO1, SNRPD1, ABHD3, MIB1, GATA6 |
All positions were rounded to the nearest 100 bp. Windows were merged together if the central SNPs that define them were contiguous.
We found several genes that were identified in previous studies that looked for selection in modern humans after their split from archaic groups (Green et al. 2010; Prüfer et al. 2014), including SIPA1L1, ANAPC10, ABCE1, RASA1, CCNH, KCNJ3, HBP1, COG5, CADPS2, FAM172A, POU5F2, FGF7, RABGAP1, SMURF1, GABRA2, ALMS1, PVRL3, EHBP1, VPS54, OTX1, UGP2, GTDC1, ZEB2, and OIT3. One of our strongest candidate genes among these is SIPA1L1 (Figure 5A), which is in the first and the fourth highest-ranking region, when using 1- and 0.25-cM windows, respectively. The protein encoded by this gene (E6TP1) is involved in actin cytoskeleton organization and controls neural morphology (UniProt by similarity). Interestingly, it is also a target of degradation of the oncoproteins of high-risk papillomaviruses (Gao et al. 1999).
Another candidate gene is ANAPC10 (Figure 5B). This gene codes for a core subunit of the cyclosome, which is involved in progression through the cell cycle (Pravtcheva and Wise 2001) and may play a role in oocyte maturation and human T-lymphotropic virus infection [KEGG pathway (Kanehisa and Goto 2000)]. ANAPC10 is noteworthy because it was found to be significantly differentially expressed in humans compared to great apes and macaques: it is upregulated in the testes (Brawand et al. 2011). The gene also contains two intronic changes that are fixed derived in modern humans and ancestral in both Neanderthals and Denisovans and that have evidence for being highly disruptive, based on a composite score that combines conservation and regulatory data [PHRED-scaled C scores >11 (Kircher et al. 2014; Prüfer et al. 2014)]. The changes, however, appear not to lie in any obvious regulatory region (Rosenbloom et al. 2011; Dunham et al. 2012).
We also find ADSL among the list of candidates. This gene is known to contain a nonsynonymous change that is fixed in all present-day humans but homozygous ancestral in the Neanderthal genome, the Denisova genome, and two Neanderthal exomes (Castellano et al. 2014) (Figure 6A). It was previously identified as lying in a region with strong support for positive selection in modern humans, using summary statistics implemented in an ABC method (Racimo et al. 2014). The gene is interesting because it is one of the members of the Human Phenotype ontology category “aggression/hyperactivity” that is enriched for nonsynonymous changes that occurred in the modern human lineage after the split from archaic humans (Robinson et al. 2008; Castellano et al. 2014). ADSL codes for adenylosuccinase, an enzyme involved in purine metabolism (Van Keuren et al. 1987). A deficiency of adenylosuccinase can lead to apraxia, speech deficits, delays in development, and abnormal behavioral features, like hyperactivity and excessive laughter (Gitiaux et al. 2009). The nonsynonymous mutation (A429V) is in the C-terminal domain of the protein (Figure 6B) and lies in a highly conserved position [primate PhastCons = 0.953; GERP score = 5.67 (Siepel et al. 2005; Cooper et al. 2010; Kircher et al. 2014)]. The ancestral amino acid is conserved across the tetrapod phylogeny, and the mutation is only three residues away from the most common causative SNP for severe adenylosuccinase deficiency (Maaswinkel-Mooij et al. 1997; Marie et al. 1999; Kmoch et al. 2000; Race et al. 2000; Edery et al. 2003). The change has the highest probability of being disruptive to protein function, of all the nonsynonymous modern-human-specific changes that lie in the top-scoring regions (C score = 17.69). While ADSL is an interesting candidate and lies in the center of the inferred selected region (Figure 6A), there are other genes in the region too, including TNRC6B and MKL1. TNRC6B may be involved in miRNA-guided gene silencing (Meister et al. 2005), while MKL1 may play a role in smooth muscle differentiation (Du et al. 2004) and has been associated with acute megakaryocytic leukemia (Mercher et al. 2001).
RASA1 was also a top hit in a previous scan for selection (Green et al. 2010) and was additionally inferred to have evidence in favor of selection in Racimo et al. (2014). The gene codes for a protein involved in the control of cellular differentiation (Trahey et al. 1988) and has a modern human-specific fixed nonsynonymous change (G70E). Human diseases associated with RASA1 include basal cell carcinoma (Friedman et al. 1993) and arteriovenous malformation (Eerola et al. 2003; Hershkovitz et al. 2008).
The GABAA gene cluster in chromosome 4p12 is also among the top regions. The gene within the putatively selected region codes for a subunit (GABRA2) of the GABAA receptor, which is a ligand-gated ion channel that plays a key role in synaptic inhibition in the central nervous system (see review by Whiting et al. 1999). GABRA2 is significantly associated with risk of alcohol dependence in humans (Edenberg et al. 2004), perception of pain (Knabl et al. 2008), and asthma (Xiang et al. 2007).
Two other candidate genes that may be involved in brain development are FOXG1 and CADPS2. FOXG1 was not identified in any of the previous selection scans and codes for a protein called forkhead box G1, which plays an important role during brain development. Mutations in this gene are associated with a slowdown in brain growth during childhood, resulting in microcephaly, which in turn causes various intellectual disabilities (Ariani et al. 2008; Mencarelli et al. 2010). CADPS2 was identified in Green et al. (2010) as a candidate for selection and has been associated with autism (Sadakata and Furuichi 2010). The gene has been suggested to be specifically important in the evolution of all modern humans, as it was not found to be selected earlier in great apes or later in particular modern human populations (Crisci et al. 2011).
Finally, we find a signal of selection in a region containing the genes EHBP1 and OTX1. This region was identified in both of the two previous scans for modern human selection (Green et al. 2010; Prüfer et al. 2014). EHBP1 codes for a protein involved in endocytic trafficking (Guilherme et al. 2004) and has been associated with prostate cancer (Gudmundsson et al. 2008). OTX1 is a homeobox family gene that may play a role in brain development (Gong et al. 2003). Interestingly, EHBP1 contains a single-nucleotide intronic change (chr2:63206488) that is almost fixed in all present-day humans and homozygous ancestral in Neanderthal and Denisova (Prüfer et al. 2014). This change is also predicted to be highly disruptive (C score = 13.1) and lies in a position that is extremely conserved across primates (PhastCons = 0.942), mammals (PhastCons = 1), and vertebrates (PhastCons = 1). The change is 18 bp away from the nearest splice site and overlaps a VISTA conserved enhancer region (element 1874) (Pennacchio et al. 2006), suggesting a putative regulatory role for the change.
We again used Gowinda (Kofler and Schlötterer 2012) to find enriched GO categories among the regions with high 3P-CLR scores in the modern human branch. The significantly enriched categories (P , FDR ) are listed in Table S4. We find several GO terms related to the regulation of the cell cycle, T-cell migration, and intracellular transport.
We overlapped the genome-wide association studies (GWAS) database (Li et al. 2011; Welter et al. 2014) with the list of fixed or high-frequency modern human-specific changes that are ancestral in archaic humans (Prüfer et al. 2014) and that are located within our top putatively selected regions in modern humans (see Table S7 and Table S8 for the 0.25- and 1-cM scans, respectively). None of the resulting SNPs are completely fixed derived, because GWAS can yield associations only from sites that are segregating. We find several SNPs in the RAB28 gene (Rosenbloom et al. 2011; Dunham et al. 2012), which are significantly associated with obesity (Paternoster et al. 2011). We also find a SNP with a high C score (rs10171434) associated with urinary metabolites (Suhre et al. 2011) and suicidal behavior in patients with mood disorders (Perlis et al. 2010). The SNP is located in an enhancer regulatory feature (Rosenbloom et al. 2011; Dunham et al. 2012) located between genes PELI1 and VPS54, in the same putatively selected region as that of genes EHBP1 and OTX1 (see above). Finally, there is a highly C-scoring SNP (rs731108) that is associated with renal cell carcinoma (Henrion et al. 2013). This SNP is also located in an enhancer regulatory feature (Rosenbloom et al. 2011; Dunham et al. 2012), in an intron of ZEB2. In this last case, though, only the Neanderthal genome has the ancestral state, while the Denisova genome carries the modern human variant.
Discussion
We have developed a new method called 3P-CLR, which allows us to detect positive selection along the genome. The method is based on an earlier test [XP-CLR (Chen et al. 2010)] that uses linked allele frequency differences between two populations to detect population-specific selection. However, unlike XP-CLR, 3P-CLR can allow us to distinguish between selective events that occurred before and after the split of two populations. Our method has some similarities to an earlier method developed by Schlebusch et al. (2012), which used an -like score to detect selection ancestral to two populations. In that case, though, the authors used summary statistics and did not explicitly model the process leading to allele frequency differentiation. It is also similar to a more recent method (Fariello et al. 2013) that models differences in haplotype frequencies between populations, while accounting for population structure.
We used our method to confirm previously found candidate genes in particular human populations, like EDAR, TYRP1, and CYP26B1, and find some novel candidates too (Table S1, Table S2, and Table S3). Additionally, we can infer that certain genes, which were previously known to have been under selection in East Asians (like SPAG6), are more likely to have undergone a sweep in the population ancestral to both Europeans and East Asians than in East Asians only. We find that genes involved in epidermis development and alcohol catabolism are particularly enriched among the East Asian candidate regions, while genes involved in peroxide catabolism and cuticle development are enriched in the European branch. This suggests these biological functions may have been subject to positive selection in recent times.
We also used 3P-CLR to detect selective events that occurred in the ancestors of modern humans, after their split from Neanderthals and Denisovans (Table S5). These events could perhaps have led to the spread of phenotypes that set modern humans apart from other hominin groups. We find several interesting candidates, like SIPA1L1, ADSL, RASA1, OTX1, EHBP1, FOXG1, RAB28, and ANAPC10, some of which were previously detected using other types of methods (Green et al. 2010; Prüfer et al. 2014; Racimo et al. 2014). We also find an enrichment for GO categories related to cell cycle regulation and T-cell migration among the candidate regions, suggesting that these biological processes might have been affected by positive selection after the split from archaic humans.
An advantage of differentiation-based tests like XP-CLR and 3P-CLR is that, unlike other patterns detected by tests of neutrality [like extended haplotype homozygostiy (Sabeti et al. 2002)] that are exclusive to hard sweeps, the patterns that both XP-CLR and 3P-CLR are tailored to find are based on regional allele frequency differences between populations. These patterns can also be produced by soft sweeps from standing variation or by partial sweeps (Chen et al. 2010), and there is some evidence that the latter phenomena may have been more important than classic sweeps during human evolutionary history (Hernandez et al. 2011).
Another advantage of both XP-CLR and 3P-CLR is that they do not rely on an arbitrary division of genomic space. Unlike other methods that require the partition of the genome into small windows of fixed size, our composite-likelihood ratios can theoretically be computed over windows that are as big as each chromosome, while switching only the central candidate site at each window. This is because the likelihood ratios use the genetic distance to the central SNP as input. SNPs that are very far away from the central SNP will not contribute much to the likelihood function of both the neutral and the selection models, while those that are close to it will. In the interest of speed, we heuristically limit the window size in our implementation and use fewer SNPs when calculating likelihoods over larger windows. Nevertheless, these parameters can be arbitrarily adjusted by the user as needed and if enough computing resources are available. The use of genetic distance in the likelihood function also allows us to take advantage of the spatial distribution of SNPs as an additional source of information, rather than only relying on patterns of population differentiation restricted to tightly linked SNPs.
3P-CLR also has an advantage over HMM-based selection methods, like the one implemented in Prüfer et al. (2014). The likelihood-ratio scores obtained from 3P-CLR can provide an idea of how credible a selection model is for a particular region, relative to the rest of the genome. The HMM-based method previously used to scan for selection in modern humans (Prüfer et al. 2014) can rank putatively selected regions only by genetic distance, but cannot output a statistical measure that may indicate how likely each region is to have been under selection in ancient times. In contrast, 3P-CLR provides a composite-likelihood-ratio score, which allows for a statistically rigorous way to compare the neutral model and a specific selection model (for example, recent or ancient selection).
The outliers from Figure S10 have much higher scores (relative to the rest of the genome) than the outliers from Figure S13. This may be due to both the difference in timescales in the two sets of tests and the uncertainty that comes from estimating outgroup allele frequencies using only two archaic genomes. This pattern can also be observed in Figure S14, where the densities of the scores looking for patterns of ancient selection (3P-CLR modern human and 3P-CLR Eurasia) have much shorter tails than the densities of scores looking for patterns of recent selection (3P-CLR Europe and 3P-CLR East Asia). Simulations show that 3P-CLR(Int) score distributions are naturally shorter than 3P-CLR(A) scores (Figure S15), which could explain the short tail of the 3P-CLR Eurasia distribution. Additionally, the even shorter tail in the distribution of 3P-CLR modern human scores may be a consequence of the fact that the split times of the demographic history in that case are older than the split times in the Eurasian tree, as simulations show that ancient split times tend to further shorten the tail of the 3P-CLR score distribution (Figure S15). We note, though, that using a larger window size produces a larger number of strong outliers (Figure S12).
A limitation of composite-likelihood-ratio tests is that the composite likelihood calculated for each model under comparison is obtained from a product of individual likelihoods at each site, and so it underestimates the correlation that exists between SNPs due to linkage effects (Lindsay 1988; Chen et al. 2010; Pace et al. 2011; Varin et al. 2011). One way to partially mitigate this problem is by using corrective weights based on linkage disequilibrium (LD) statistics calculated on the outgroup population (Chen et al. 2010). Our implementation of 3P-CLR allows the user to incorporate such weights, if appropriate LD statistics are available from the outgroup. However, in cases where these are unreliable, it may not be possible to correct for this (for example, when only a few unphased genomes are available, as in the case of the Neanderthal and Denisova genomes).
While 3P-CLR relies on integrating over the possible allele frequencies in the ancestors of populations a and b (Equation 10), one could envision using ancient DNA to avoid this step. Thus, if enough genomes could be sampled from that ancestral population that existed in the past, one could use the sample frequency in the ancient set of genomes as a proxy for the ancestral population frequency. This may soon be possible, as several early modern human genomes have already been sequenced in recent years (Fu et al. 2014; Lazaridis et al. 2014; Seguin-Orlando et al. 2014).
Although we have focused on a three-population model in this article, it should be straightforward to expand our method to a larger number of populations, albeit with additional costs in terms of speed and memory. 3P-CLR relies on a similar framework to that of the demographic inference method implemented in TreeMix (Pickrell and Pritchard 2012), which can estimate population trees that include migration events, using genome-wide data. With a more complex modeling framework, it may be possible to estimate the time and strength of selective events with better resolution and using more populations and also to incorporate additional demographic forces, like continuous migration between populations or pulses of admixture.
Acknowledgments
We thank Montgomery Slatkin, Rasmus Nielsen, Joshua Schraiber, Nicolas Duforet-Frebourg, Emilia Huerta-Sánchez, Hua Chen, Benjamin Peter, Nick Patterson, David Reich, Joachim Hermisson, Graham Coop, and members of the Slatkin and Nielsen laboratories for helpful advice and discussions. We also thank two anonymous reviewers for their helpful comments. This work was supported by National Institutes of Health grant R01-GM40282 to Montgomery Slatkin.
Footnotes
Communicating editor: N. A. Rosenberg
Supporting information is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.178095/-/DC1.
Literature Cited
- Abecasis G. R., Auton A., Brooks L. D., DePristo M. A., Durbin R. M., et al. , 2012. An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422): 56–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akey J. M., Zhang G., Zhang K., Jin L., Shriver M. D., 2002. Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 12(12): 1805–1814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ariani F., Hayek G., Rondinella D., Artuso R., Mencarelli M. A., et al. , 2008. Foxg1 is responsible for the congenital variant of rett syndrome. Am. J. Hum. Genet. 83(1): 89–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Branicki W., Brudnik U., Wojas-Pelc A., 2009. Interactions between herc2, oca2 and mc1r may influence human pigmentation phenotype. Ann. Hum. Genet. 73(2): 160–170. [DOI] [PubMed] [Google Scholar]
- Brawand D., Soumillon M., Necsulea A., Julien P., Csárdi G., et al. , 2011. The evolution of gene expression levels in mammalian organs. Nature 478(7369): 343–348. [DOI] [PubMed] [Google Scholar]
- Castellano S., Parra G., Sánchez-Quinto F. A., Racimo F., Kuhlwilm M., et al. , 2014. Patterns of coding variation in the complete exomes of three Neandertals. Proc. Natl. Acad. Sci. USA 111(18): 6666–6671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen H., Patterson N., Reich D., 2010. Population differentiation as a test for selective sweeps. Genome Res. 20(3): 393–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cooper G. M., Goode D. L., Ng S. B., Sidow A., Bamshad M. J., et al. , 2010. Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nat. Methods 7(4): 250–251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crisci J. L., Wong A., Good J. M., Jensen J. D., 2011. On characterizing adaptive events unique to modern humans. Genome Biol. Evol. 3: 791–798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Du K. L., Chen M., Li J., Lepore J. J., Mericko P., et al. , 2004. Megakaryoblastic leukemia factor-1 transduces cytoskeletal signals and induces smooth muscle cell differentiation from undifferentiated embryonic stem cells. J. Biol. Chem. 279(17): 17578–17586. [DOI] [PubMed] [Google Scholar]
- Dunham I., Kundaje A., Aldred S. F., Collins P. J., Davis C., et al. , 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414): 57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Durrett R., and J. Schweinsberg, 2004. Approximating selective sweeps. Theor. Popul. Biol. 66(2): 129–138. [DOI] [PubMed] [Google Scholar]
- Edenberg H. J., Dick D. M., Xuei X., Tian H., Almasy L., et al. , 2004. Variations in gabra2, encoding the α2 subunit of the gaba a receptor, are associated with alcohol dependence and with brain oscillations. Am. J. Hum. Genet. 74(4): 705–714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edery P., Chabrier S., Ceballos-Picot I., Marie S., Vincent M.-F., et al. , 2003. Intrafamilial variability in the phenotypic expression of adenylosuccinate lyase deficiency: a report on three patients. Am. J. Med. Genet. A. 120(2): 185–190. [DOI] [PubMed] [Google Scholar]
- Eerola I., Boon L. M., Mulliken J. B., Burrows P. E., Dompmartin A., et al. , 2003. Capillary malformation–arteriovenous malformation, a new clinical and genetic disorder caused by rasa1 mutations. Am. J. Hum. Genet. 73(6): 1240–1249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eiberg H., Troelsen J., Nielsen M., Mikkelsen A., Mengel-From J., et al. , 2008. Blue eye color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the herc2 gene inhibiting oca2 expression. Hum. Genet. 123(2): 177–187. [DOI] [PubMed] [Google Scholar]
- Ewens W. J., 2012. Mathematical Population Genetics 1: Theoretical Introduction, Vol27 Springer Science & Business Media, New York, NY. [Google Scholar]
- Fariello M. I., Boitard S., Naya H., SanCristobal M., Servin B., 2013. Detecting signatures of selection through haplotype differentiation among hierarchically structured populations. Genetics 193: 929–941. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fay J. C., Wu C.-I., 2000. Hitchhiking under positive Darwinian selection. Genetics 155: 1405–1413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Felsenstein J., 1981. Evolutionary trees from gene frequencies and quantitative characters: finding maximum likelihood estimates. Evolution 35: 1229–1242. [DOI] [PubMed] [Google Scholar]
- Friedman E., Gejman P. V., Martin G. A., McCormick F., 1993. Nonsense mutations in the c–terminal sh2 region of the gtpase activating protein (gap) gene in human tumours. Nat. Genet. 5(3): 242–247. [DOI] [PubMed] [Google Scholar]
- Fu Q., Li H., Moorjani P., Jay F., Slepchenko S. M., et al. , 2014. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514(7523): 445–449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fujimoto A., Kimura R., Ohashi J., Omi K., Yuliwulandari R., et al. , 2008. A scan for genetic determinants of human hair morphology: Edar is associated with Asian hair thickness. Hum. Mol. Genet. 17(6): 835–843. [DOI] [PubMed] [Google Scholar]
- Gao Q., Srinivasan S., Boyer S. N., Wazer D. E., Band V., 1999. The e6 oncoproteins of high-risk papillomaviruses bind to a novel putative gap protein, e6tp1, and target it for degradation. Mol. Cell. Biol. 19(1): 733–744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gitiaux C., Ceballos-Picot I., Marie S., Valayannopoulos V., Rio M., et al. , 2009. Misleading behavioural phenotype with adenylosuccinate lyase deficiency. Eur. J. Hum. Genet. 17(1): 133–136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gong S., Zheng C., Doughty M. L., Losos K., Didkovsky N., et al. , 2003. A gene expression atlas of the central nervous system based on bacterial artificial chromosomes. Nature 425(6961): 917–925. [DOI] [PubMed] [Google Scholar]
- Green, R. E., J. Krause, A. W. Briggs, T. Maricic, U. Stenzel et al., 2010 A draft sequence of the Neandertal genome. Science 328(5979): 710–722. [DOI] [PMC free article] [PubMed]
- Grossman S. R., Shylakhter I., Karlsson E. K., Byrne E. H., Morales S., et al. , 2010. A composite of multiple signals distinguishes causal variants in regions of positive selection. Science 327(5967): 883–886. [DOI] [PubMed] [Google Scholar]
- Gudmundsson J., Sulem P., Rafnar T., Bergthorsson J. T., Manolescu A. et al, 2008. Common sequence variants on 2p15 and xp11. 22 confer susceptibility to prostate cancer. Nat. Genet. 40(3): 281–283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guilherme A., Soriano N. A., Furcinitti P. S., Czech M. P., 2004. Role of ehd1 and ehbp1 in perinuclear sorting and insulin-regulated glut4 recycling in 3t3-l1 adipocytes. J. Biol. Chem. 279(38): 40062–40075. [DOI] [PubMed] [Google Scholar]
- Gutenkunst R. N., Hernandez R. D., Williamson S. H., Bustamante C. D., 2009. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5(10): e1000695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Halaban R., Moellmann G., 1990. Murine and human b locus pigmentation genes encode a glycoprotein (gp75) with catalase activity. Proc. Natl. Acad. Sci. USA 87(12): 4809–4813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han J., Kraft P., Nan H., Guo Q., Chen C. et al, 2008. A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation. PLoS Genet. 4(5): e1000074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henrion M., Frampton M., Scelo G., Purdue M., Ye Y., et al. , 2013. Common variation at 2q22. 3 (zeb2) influences the risk of renal cancer. Hum. Mol. Genet. 22(4): 825–831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hernandez, R. D., J. L. Kelley, E. Elyashiv, S. C. Melton, A. Auton et al., 2011 Classic selective sweeps were rare in recent human evolution. Science 331(6019): 920–924. [DOI] [PMC free article] [PubMed]
- Hershkovitz D., Bercovich D., Sprecher E., Lapidot M., 2008. Rasa1 mutations may cause hereditary capillary malformations without arteriovenous malformations. Br. J. Dermatol. 158(5): 1035–1040. [DOI] [PubMed] [Google Scholar]
- Hinch A. G., Tandon A., Patterson N., Song Y., Rohland N., et al. , 2011. The landscape of recombination in African Americans. Nature 476(7359): 170–175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hunt K. A., Zhernakova A., Turner G., Heap G. A. R., Franke L., et al. , 2008. Newly identified genetic risk variants for celiac disease related to the immune response. Nat. Genet. 40(4): 395–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanehisa M., Goto S., 2000. Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1): 27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kenny E. E., Timpson N. J., Sikora M., Yee M.-C., Moreno-Estrada A., et al. , 2012. Melanesian blond hair is caused by an amino acid change in tyrp1. Science 336(6081): 554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kimura R., Yamaguchi T., Takeda M., Kondo O., Toma T., et al. , 2009. A common variation in edar is a genetic determinant of shovel-shaped incisors. Am. J. Hum. Genet. 85(4): 528–535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kircher M., Witten D. M., Jain P., O’Roak B. J., Cooper G. M., et al. , 2014. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46(3): 310–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kmoch S., Hartmannová H., Stibùrková B., Krijt J., Zikánová M., et al. , 2000. Human adenylosuccinate lyase (adsl), cloning and characterization of full-length cDNA and its isoform, gene structure and molecular basis for adsl deficiency in six patients. Hum. Mol. Genet. 9(10): 1501–1513. [DOI] [PubMed] [Google Scholar]
- Knabl J., Witschi R., Hösl K., Reinold H., Zeilhofer U. B., et al. , 2008. Reversal of pathological pain through specific spinal gabaa receptor subtypes. Nature 451(7176): 330–334. [DOI] [PubMed] [Google Scholar]
- Kofler R., Schlötterer C., 2012. Gowinda: unbiased analysis of gene set enrichment for genome-wide association studies. Bioinformatics 28(15): 2084–2085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lazaridis I., Patterson N., Mittnik A., Renaud G., Mallick S., et al. , 2014. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513(7518): 409–413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewontin R. C., Krakauer J., 1973. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74: 175–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H., Durbin R., 2011. Inference of human population history from individual whole-genome sequences. Nature 475(7357): 493–496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li M. J., Wang P., Liu X., Lim E. L., Wang Z., et al. , 2011. GWASdb: a database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res. 40: D1047–D1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lindsay B. G., 1988. Composite likelihood methods. Contemp. Math. 80(1): 221–239. [Google Scholar]
- Lipson M., Loh P.-R., Levin A., Reich D., Patterson N., et al. , 2013. Efficient moment-based inference of admixture parameters and sources of gene flow. Mol. Biol. Evol. 30(8): 1788–1802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maaswinkel-Mooij P. D., Laan L. A. E. M., Onkenhout W., Brouwer O. F., Jaeken J., et al. , 1997. Adenylosuccinase deficiency presenting with epilepsy in early infancy. J. Inherit. Metab. Dis. 20(4): 606–607. [DOI] [PubMed] [Google Scholar]
- Marie S., Cuppens H., Heuterspreute M., Jaspers M., Tola E. Z., et al. , 1999. Mutation analysis in adenylosuccinate lyase deficiency: eight novel mutations in the re-evaluated full adsl coding sequence. Hum. Mutat. 13(3): 197–202. [DOI] [PubMed] [Google Scholar]
- Mathieson, I., I. Lazaridis, N. Rohland, S. Mallick, B. Llamas et al., 2015 Eight thousand years of natural selection in Europe. bioRxiv: 016477.
- Meister G., Landthaler M., Peters L., Chen P. Y., Urlaub H., et al. , 2005. Identification of novel argonaute-associated proteins. Curr. Biol. 15(23): 2149–2155. [DOI] [PubMed] [Google Scholar]
- Mencarelli M. A., Spanhol-Rosseto A., Artuso R., Rondinella D., De Filippis R., et al. , 2010. Novel foxg1 mutations associated with the congenital variant of rett syndrome. J. Med. Genet. 47(1): 49–53. [DOI] [PubMed] [Google Scholar]
- Mercher T., Busson-Le Coniat M., Monni R., Mauchauffé M., Khac F. N., et al. , 2001. Involvement of a human gene related to the Drosophila spen gene in the recurrent t (1; 22) translocation of acute megakaryocytic leukemia. Proc. Natl. Acad. Sci. USA 98(10): 5776–5779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Messer P. W., 2013. Slim: simulating evolution with selection and linkage. Genetics 194: 1037–1039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meyer M., Kircher M., Gansauge M.-T., Li H., Racimo F. et al, 2012. A high-coverage genome sequence from an archaic Denisovan individual. Science 338(6104): 222–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nicholson G., Smith A. V., Jónsson F., Gústafsson Ó., Stefánsson K., et al. , 2002. Assessing population differentiation and isolation from single-nucleotide polymorphism data. J. R. Stat. Soc. Ser. B Stat. Methodol. 64(4): 695–715. [Google Scholar]
- Oleksyk, T. K., K. Zhao, M. Francisco, D. A. Gilbert, S. J. O’Brien et al., 2008 Identifying selected regions from heterozygosity and divergence using a light-coverage genomic dataset from two human populations. PLoS ONE 3(3): e1712. [DOI] [PMC free article] [PubMed]
- Pace L., Salvan A., Sartori N., 2011. Adjusting composite likelihood ratio statistics. Stat. Sin. 21(1): 129. [Google Scholar]
- Paternoster L., Evans D. M., Nohr E. A., Holst C., Gaborieau V., et al. , 2011. Genome-wide population-based association study of extremely overweight young adults–the Goya study. PLoS ONE 6(9): e24303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patterson N., Moorjani P., Luo Y., Mallick S., Rohland N., et al. , 2012. Ancient admixture in human history. Genetics 192: 1065–1093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pennacchio L. A., Ahituv N., Moses A. M., Prabhakar S., Nobrega M. A., et al. , 2006. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444(7118): 499–502. [DOI] [PubMed] [Google Scholar]
- Perlis R. H., Huang J., Purcell S., Fava M., Rush A. J., et al. , 2010. Genome-wide association study of suicide attempts in mood disorder patients. Genome 167(12): 1499–1507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pickrell J. K., Pritchard J. K., 2012. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8(11): e1002967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pickrell J. K., Coop G., Novembre J., Kudaravalli S., Li J. Z., et al. , 2009. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19(5): 826–837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pravtcheva D. D., Wise T. L., 2001. Disruption of apc10/doc1 in three alleles of oligosyndactylism. Genomics 72(1): 78–87. [DOI] [PubMed] [Google Scholar]
- Prüfer K., Racimo F., Patterson N., Jay F., Sankararaman S., et al. , 2014. The complete genome sequence of a Neanderthal from the Altai mountains. Nature 505(7481): 43–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Race V., Marie S., Vincent M.-F., Van den Berghe G., 2000. Clinical, biochemical and molecular genetic correlations in adenylosuccinate lyase deficiency. Hum. Mol. Genet. 9(14): 2159–2165. [DOI] [PubMed] [Google Scholar]
- Racimo F., Kuhlwilm M., Slatkin M., 2014. A test for ancient selective sweeps and an application to candidate sites in modern humans. Mol. Biol. Evol. 31(12): 3344–3358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robinson P. N., Köhler S., Bauer S., Seelow D., Horn D., et al. , 2008. The human phenotype ontology: a tool for annotating and analyzing human hereditary disease. Am. J. Hum. Genet. 83(5): 610–615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosenbloom K. R., Dreszer T. R., Long J. C., Malladi V. S., Sloan C. A., et al. , 2011. ENCODE whole-genome data in the UCSC genome browser: update 2012. Nucleic Acids Res. 40: D912–D917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sabeti P. C., Reich D. E., Higgins J. M., Levine H. Z. P., Richter D. J., et al. , 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419(6909): 832–837. [DOI] [PubMed] [Google Scholar]
- Sabeti P. C., Varilly P., Fry B., Lohmueller J., Hostetter E. et al, 2007. Genome-wide detection and characterization of positive selection in human populations. Nature 449(7164): 913–918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sadakata T., Furuichi T., 2010. Ca 2+-dependent activator protein for secretion 2 and autistic-like phenotypes. Neurosci. Res. 67(3): 197–202. [DOI] [PubMed] [Google Scholar]
- Sapiro R., Kostetskii I., Olds-Clarke P., Gerton G. L., Radice G. L., et al. , 2002. Male infertility, impaired sperm motility, and hydrocephalus in mice deficient in sperm-associated antigen 6. Mol. Cell. Biol. 22(17): 6298–6305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schlebusch C. M., Skoglund P., Sjödin P., Gattepaille L. M., Hernandez D., et al. , 2012. Genomic variation in seven khoe-san groups reveals adaptation and complex African history. Science 338(6105): 374–379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seguin-Orlando A., Korneliussen T. S., Sikora M., Malaspinas A.-S., Manica A., et al. , 2014. Genomic structure in Europeans dating back at least 36,200 years. Science 346(6213): 1113–1118. [DOI] [PubMed] [Google Scholar]
- Siddique H. R., Saleem M., 2012. Role of bmi1, a stem cell factor, in cancer recurrence and chemoresistance: preclinical and clinical evidences. Stem Cells 30(3): 372–378. [DOI] [PubMed] [Google Scholar]
- Siepel A., Bejerano G., Pedersen J. S., Hinrichs A. S., Hou M., et al. , 2005. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15(8): 1034–1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith J. M., Haigh J., 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23(01): 23–35. [PubMed] [Google Scholar]
- Suhre K., Wallaschofski H., Raffler J., Friedrich N., Haring R., et al. , 2011. A genome-wide association study of metabolic traits in human urine. Nat. Genet. 43(6): 565–569. [DOI] [PubMed] [Google Scholar]
- Todd J. A., Walker N. M., Cooper J. D., Smyth D. J., Downes K., et al. , 2007. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nat. Genet. 39(7): 857–864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Topletz A. R., Thatcher J. E., Zelter A., Lutz J. D., Su Tay W. L., 2012. Comparison of the function and expression of cyp26a1 and cyp26b1, the two retinoic acid hydroxylases. Biochem. Pharmacol. 83(1): 149–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trahey M., Wong G., Halenbeck R., Rubinfeld B., Martin G. A., et al. , 1988. Molecular cloning of two types of gap complementary DNA from human placenta. Science 242(4886): 1697–1700. [DOI] [PubMed] [Google Scholar]
- Van Keuren M. L., Hart I. M., Kao F.-T., Neve R. L., Bruns G. A. P., et al. , 1987. A somatic cell hybrid with a single human chromosome 22 corrects the defect in the cho mutant (ade–i) lacking adenylosuccinase activity. Cytogenet. Genome Res. 44(2–3): 142–147. [DOI] [PubMed] [Google Scholar]
- Varin C., Reid N., Firth D., 2011. An overview of composite likelihood methods. Stat. Sin. 21(1): 5–42. [Google Scholar]
- Voight B. F., Kudaravalli S., Wen X., Pritchard J. K., 2006. A map of recent positive selection in the human genome. PLoS Biol. 4(3): e72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weir B. S., Cardon L. R., Anderson A. D., Nielsen M. D., Hill W. G., 2005. Measures of human population structure show heterogeneity among genomic regions. Genome Res. 15(11): 1468–1476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Welter D., MacArthur J., Morales J., Burdett T., Hall P., et al. , 2014. The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42(D1): D1001–D1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- White J. A., Ramshaw H., Taimi M., Stangle W., Zhang A., et al. , 2000. Identification of the human cytochrome p450, p450rai-2, which is predominantly expressed in the adult cerebellum and is responsible for all-trans-retinoic acid metabolism. Proc. Natl. Acad. Sci. USA 97(12): 6403–6408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whiting P. J., Bonnert T. P., McKernan R. M., Farrar S., Le Bourdelles B., et al. , 1999. Molecular and functional diversity of the expanding gaba-a receptor gene family. Ann. N. Y. Acad. Sci. 868(1): 645–653. [DOI] [PubMed] [Google Scholar]
- Xiang Y.-Y., Wang S., Liu M., Hirota J. A., Li J., et al. , 2007. A gabaergic system in airway epithelium is essential for mucus overproduction in asthma. Nat. Med. 13(7): 862–867. [DOI] [PubMed] [Google Scholar]
- Yi X., Liang Y., Huerta-Sanchez E., Jin X., Ping Cuo Z. X., et al. , 2010. Sequencing of 50 human exomes reveals adaptation to high altitude. Science 329(5987): 75–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhernakova A., Elbers C. C., Ferwerda B., Romanos J., Trynka G., et al. , 2010. Evolutionary and functional analysis of celiac risk loci reveals sh2b3 as a protective factor against bacterial infection. Am. J. Hum. Genet. 86(6): 970–977. [DOI] [PMC free article] [PubMed] [Google Scholar]