Abstract
Genetic evidence has revealed that the ancestors of modern human populations outside Africa and their hominin sister groups, notably Neanderthals, exchanged genetic material in the past. The distribution of these introgressed sequence-tracts along modern-day human genomes provides insight into the selective forces acting on them and the role of introgression in the evolutionary history of hominins. Studying introgression patterns on the X-chromosome is of particular interest, as sex chromosomes are thought to play a special role in speciation. Recent studies have developed methods to localize introgressed ancestries, reporting long regions that are depleted of Neanderthal introgression and enriched in genes, suggesting negative selection against the Neanderthal variants. On the other hand, enriched Neanderthal ancestry in hair- and skin-related genes suggests that some introgressed variants facilitated adaptation to new environments. Here, we present a model-based introgression detection method called diCal-admix. We demonstrate its efficiency and accuracy through extensive simulations, and apply it to detect tracts of Neanderthal introgression in modern human individuals from the 1000 Genomes Project. Our findings are largely concordant with previous studies, consistent with weak selection against Neanderthal ancestry. We find evidence that selection against Neanderthal ancestry was due to higher genetic load in Neanderthals resulting from small effective population size, rather than widespread Dobzhansky-Müller incompatibilities (DMIs) that could contribute to reproductive isolation. Moreover, we confirm the previously reported low level of introgression on the X-chromosome, but find little evidence that DMIs contributed to this pattern.
1. Introduction
In recent years, researchers have gathered an increasing amount of high-quality genomic sequencing data from human individuals that lived thousands of years ago (Mathieson et al., 2015) and individuals of extinct hominin sister groups (Meyer et al., 2012; Prüfer et al., 2014; Prüfer et al., 2017). These ancient samples provide unprecedented opportunities to elucidate the evolution of modern human populations and their relation to other hominins. Previous genetic evidence revealed that the ancestors of non-African humans exchanged genetic material with Neanderthal individuals after emerging out of Africa. Traces of this introgression can still be found in the genomes of modern-day humans. The emerging high-quality genomic sequence data for ancient hominins not only confirm these findings, but also allow detection of the exact location of these introgressed sequence fragments in the genomes of modern human individuals and a better understanding of their functional relevance.
Recent studies reported long regions in modern non-African individuals that are depleted of Neanderthal ancestry and enriched in genes, suggesting general negative selection against Neanderthal variants in genes (Sankararaman et al., 2014; Vernot and Akey, 2014). Harris and Nielsen (2016) and Juric et al. (2016) provided further evidence that natural selection has acted to remove these introgressed segments, but there is some debate about the precise cause of this selective pressure. Dobzhansky-Müller incompatibilities (DMIs) are a classic explanation for selection acting against introgressed alleles and play an important role in the evolution of reproductive isolation during speciation. DMIs involve alleles that have arisen separately in each population and are neutral or adaptive in isolation, but are deleterious when brought together in individuals of hybrid ancestry (Dobzhansky, 1936; Orr, 1995). DMIs have been observed in the hybrids of other species—e.g., Drosophila simulans and D. melanogaster (Brideau et al., 2006); Mimulus guttatus and M. nasutus (Fishman and Willis, 2001); and Ambystoma californiense and A. tigrinum mavortium (Fitzpatrick, 2008)—and were hypothesized to be the cause (particular in male hybrids) of selection against Neanderthal ancestry in modern humans (Sankararaman et al., 2014). This hypothesis was motivated by finding significant enrichment of testes-expressed genes in regions of low Neanderthal ancestry and a substantial reduction of Neanderthal ancestry on the X-chromosome. Indeed, sex chromosomes are thought to play a special role in speciation, and it has been observed, for example in Drosophila, that loci contributing to reduced male fertility in hybrids are concentrated on the X-chromosome (Presgraves, 2008).
Sankararaman et al. (2014) and Vernot and Akey (2014) reported some allelic variants associated with genetic diseases in genome-wide association studies (GWAS) that might have originated from the Neanderthal population. The studies also reported enriched Neanderthal ancestry in hair and skin related genes (keratin pathways), which suggests that these introgressed variants could have helped modern non-African populations to adapt to their local environments. All in all, the availability of Neanderthal introgression maps in modern humans has led to numerous follow-up studies investigating the various functional, evolutionary and medical implications of archaic hominin introgression into modern humans (Dannemann et al., 2017; Gittelman et al., 2016; Rogers, 2015; Simonti et al., 2016; Schumer et al., 2017).
Methods to detect Neanderthal introgression tracts include a machine-learning based approach (Sankararaman et al., 2014) that operates on suitably chosen “features” of the genetic data, and an approach based on sequence identity and divergence (Vernot and Akey, 2014). These methods have been extended to jointly detect Neanderthal and Denisovan introgression (Sankararaman et al., 2016; Vernot et al., 2016); the latter is found to be more prevalent in Oceania and Southeast-Asia.
In this article, we present a modification of the method diCal 2.0, previously developed by Steinrücken et al. (2015) for the inference of complex demographic histories. This modification, which we call diCal-admix, can be used to efficiently detect tracts of introgressed Neanderthal DNA. It is based on a hidden Markov model (HMM) approach that explicitly accounts for the underlying demographic history relating modern human and Neanderthal populations, including the introgression event. We first present our model-based method for detecting Neanderthal introgression and demonstrate through extensive simulations that our method is able to efficiently and accurately detect introgression in simulated data. We then apply our method to sequence data of modern humans from the 1000 Genomes Project and a high coverage genome from a Neanderthal individual from the Altai mountains (Prüfer et al., 2014). Our results are in general agreement with previously obtained results, and we discuss similarities, differences, and their functional implications. In particular, we do not find evidence to support Sankararaman et al.’s hypothesis that DMIs played a role in shaping the pattern of introgression either genome-wide or specifically on the X-chromosome.
2. Materials and Methods
2.1. Overview of our method
The method to detect Neanderthal introgression that we present here accounts explicitly for the underlying demographic history relating modern humans and Neanderthals. Therefore we briefly present some of the key features of this demographic model. Researchers have studied various aspects of the ancestral relations between modern African and non-African individuals using different methodologies. Furthermore, many studies have investigated the divergence of Neanderthals and modern humans. These studies resulted in several different, albeit largely consistent, estimates for the relevant demographic parameters, and here we closely follow Sankararaman et al. (2014) (specifically Figure SI2.1 of that paper) for consistency. However, note that Sankararaman et al. used a mutation rate of 2.5 × 10−8 per-site per-generation, whereas we use 1.25 × 10−8, thus some of the numbers need to be adjusted.
The demographic model we use is depicted in Figure 1(a). The size of the most ancestral population and the size of the population ancestral to modern humans before the expansion are set to N = 11, 000. The size of the population ancestral to modern humans after the expansion and the size of the African population are set to N = 23, 000. Theses sizes are consistent with the estimates provided by Jouganous et al. (2017). The size of the non-African population after the split is set to N = 2, 000. The size of the Neanderthal population is set to N = 2, 000 as well, since it has been shown by Prüfer et al. (2014) that the Neanderthal population size declined rapidly as the population neared extinction. The more recent small size will have a stronger impact on genetic variation than the larger ancestral size. Several studies (Jouganous et al., 2017) report a strong population bottleneck in European and Asian populations after the out-of-Africa event, followed by rapid exponential population growth, and Sankararaman et al. (2014) incorporate this into their demographic model. As we will detail later, our method considers each non-African haplotype one at a time. The genetic processes along a single ancestral lineage are not affected by the exact population size history, and thus we do not explicitly include these details in the demographic model used here.
Following Prüfer et al. (2014), we set the time of divergence between modern humans and Neanderthals, Tnean, to 26,000 generations ago, which corresponds to 650 kya, assuming a generation time of 25 years, and the time that the population ancestral to modern humans expands is set to 12,000 generations ago (Jouganous et al., 2017). The split between African and Non-African Tdiv is set to 4,000 generations ago, or 100 kya (Scally and Durbin, 2012), and the time of the introgression or admixture event Tadmix is set to 2, 000 generations ago (Sankararaman et al., 2012), which corresponds to 50 kya (Prüfer et al., 2014). Finally, the introgression coefficient is set as 3%, that is, a non-African individual at the time of introgression had a 3% chance that its parent was a Neanderthal individual. This is consistent with previous estimates of this quantity obtained by Green et al. (2010) and Juric et al. (2016). In Section 2.2, we will demonstrate the robustness of our method to misspecification of key parameters. It has been debated whether all non-African populations received genetic material from Neanderthals in a single pulse of introgression or several pulses. Here we assume that there has been only one pulse, and will defer disentangling such models to future work.
Before we describe our method to detect tracts of Neanderthal introgression, we provide a brief overview of the methods developed by Sankararaman et al. (2014) and Vernot and Akey (2014) for comparison. In Sankararaman et al. (2014), the authors employ a machine learning approached based on Conditional Random Fields, discriminative analogs of HMMs. To apply this framework, the authors represent the genotype data of the reference Neanderthal, the reference African population, and the focal non-African population in terms of “features” that are informative to distinguish between introgressed and non-introgressed sequence tracts. The authors chose three different classes of features: the distribution of alleles at informative SNPs; a measure of sequence divergence between the focal individual and both reference populations; and a feature to match the length distribution of the observed tracts to be consistent with the expectation from an introgression event 37–86 kya. The authors train the model on data simulated under a demographic scenario similar to Figure 1(a), and apply the trained model to detect introgression tracts in individuals from the 1000 Genomes dataset (The 1000 Genomes Project Consortium, 2012). A modified version of this methodology was also applied in Sankararaman et al. (2016) to detect Denisovan ancestry in Southeast-Asians.
Vernot and Akey (2014) developed a two-stage procedure to detect introgression tracts. In the first stage, the authors computed S* statistics (Plagnol and Wall, 2006) in sliding windows along the genome. This statistic is sensitive to increased levels of diversity in high linkage disequilibrium, indicating a more ancient TMRCA of a given region and also considers the tract length to detect archaic introgression. Notably, this stage does not require a reference sequence for an archaic individual. In the second stage, the authors compare the identified segments to a reference Neanderthal genome to reliably identify Neanderthal introgression. The same two-stage approach was applied by Vernot et al. (2016) to identify Denisovan introgression.
Here, we apply a modified version of the method diCal 2.0, developed by Steinrücken et al. (2015) for inference of ancient demographies, to detect sequence tracts of Neanderthal DNA introgressed into modern humans. The method is based on the conditional sampling distribution (CSD) (Paul and Song, 2010; Paul et al., 2011; Steinrücken et al., 2013), which is similar to the copying model of Li and Stephens (2003), and is depicted in Figure 1(b). The CSD describes the distribution of sampling an additional focal genome or haplotype, conditional on having already observed a certain set of haplotypes. Steinrücken et al. (2015) introduce a version of the CSD that can be applied to haplotypes sampled from several subpopulations, accounting explicitly for the underlying demographic history. Under this model, the unknown genealogy relating the already observed haplotypes is approximated by a trunk genealogy of unchanging ancestral lineages extending infinitely into the past. At each locus, the ancestral lineage of the additional haplotype is absorbed into a lineage of the trunk. The dynamics of absorption depends on the underlying demographic history. In brief, lineages in different subpopulations cannot coalesce, unless continuous or point migration is possible at given rates, and the likelihood of coalescence is larger in small populations, but decreases in large populations. If an ancestral recombination event separates two loci, the haplotype of absorption and the time of absorption can change, thus different genomic segments can be copied from different haplotypes in the trunk, and the additional haplotype is realized as a mosaic of the observed haplotypes. The CSD can be implemented as an HMM along the genome of the additional haplotype, where the hidden state is the trunk haplotype that the genetic material is currently copied from, and a time of absorption. This absorption time is proportional to the likelihood that mutations can alter the genetic type at a given locus. Steinrücken et al. (2015) derived the emission and transition probabilities for the underlying HMM under general demographic models.
This CSD can be applied to detect introgressed tracts of Neanderthal ancestry in modern humans as follows. First, fix the underlying demography given in Figure 1(a) that relates modern African populations, modern non-African populations, and the Neanderthal population. The introgression event is modeled as a point migration. As depicted in Figure 1(b), the haplotypes sampled in the African population and a Neanderthal haplotype are used as the trunk haplotypes in their respective sub-population. Then, each non-African sample is, in turn, used as the additional haplotype in the non-African sub-population. Computing the forward and backward algorithm under the HMM for this CSD yields a marginal posterior distribution over the hidden states at each locus. Recall that these hidden states consist of both an absorption time and an absorbing haplotype. Marginalizing over the absorption time results in a posterior distribution over the trunk haplotypes, and, grouping haplotypes by sub-population, this gives a probability at each locus that the locus in the non-African haplotype is obtained from either an African (modern human) ancestor, or from an Neanderthal ancestor through introgression. Note that using an explicit time for the introgression event in the demographic model implicitly specifies a prior distribution for the length of the introgressed tracts. The software implementation of this method, diCal-admix, is available at http://dical-admix.sourceforge.net/.
2.2. Simulation study
To demonstrate that diCal-admix can be used to accurately and efficiently identify tracts of Neanderthal introgression, we performed an extensive simulation study. To this end, we used the coalescent simulator msprime (Kelleher et al., 2016) to simulate sequence data under the demographic model given in Figure 1(a). Specifically, we simulated 176 haplotypes in the African population, one haplotype in the Neanderthal population, and 20 in the focal non-African population. For each such dataset we simulated 20 Mbp of sequence data, using a per generation population scaled mutation and recombination rate of 0.0005 per base (corresponding to a per-base per-generation rate of 1.25×10−8). We simulated 50 replicates in each scenario, and estimated the introgression tracts on each focal haplotype.
To investigate how robust our method is to misspecification of the demographic model used for inference, our simulation study was two-fold. First, we simulated data under the demographic model given in Figure 1(a), and varied the demographic model used for the analysis. We kept all parameters fixed and only varied one focal parameter at a time. We varied the divergence time between modern humans and Neanderthal, using 0.5 and 2 times Tnean = 26, 000 generations, and the divergence time between Africans and non-Africans, using 0.8 and 2 times Tdiv = 4, 000 generations. Moreover, we varied the fraction of introgressed Neanderthal individuals, using 0.5 and 2 times admix = 3%, and the time of the introgression event, using 0.8 and 1.25 times Tadmix = 2, 000 generations. We also considered the effect of simulating under spatially heterogeneous recombination rates, taking 20 Mbp from the recombination maps inferred in Myers et al. (2005); we call this model the “HapMap” model. Finally, we considered four more complicated models. These models are the same as that given in Figure 1(a) except in the ways that we highlight below. While some of these models may not be indicative of any hypothesized demographic events, they show the robustness of diCal-admix and highlight its potential utility in other scenarios. In the “CEU ghost” model, a non-African “ghost” population diverges from a basal CEU population 2, 300 generations ago, and 500 generations ago the two populations admix with 30% of their ancestry deriving from the ghost population and the remaining 70% coming from the basal CEU population. In the “YRI ghost” model, an ancient hominid ghost population diverges from the Neanderthal lineage 17, 000 generations ago, and then there is a 3% pulse of admixture from this ghost population into YRI 1, 700 generations ago. In the “Nean into YRI ” model, there is a 3% pulse of admixture into both YRI and CEU 2, 000 generations ago. Lastly, in the “Mig” model, YRI and CEU exchange migrants at a rate of 0.001% of the population per generation.
For simulated data, the true introgression status at each locus is known. Running diCal-admix to detect introgressed tracts in the simulated non-African individuals yields a posterior probability of introgression at each locus. Using different thresholds on this posterior probabilities for calling a locus introgressed, we can assess the true and false positive rate and the precision to generate receiver operating characteristic (ROC) and precision-recall curves for each analysis. Figure 2(a) shows these curves when the data simulated under the “true” model (Figure 1(a)) are analyzed using the different demographic models. We observe an overall good performance and robustness against misspecification of the parameters for analyzing the data. From these plots, we determined that a threshold of 0.42 yields a good balance of the different performance metrics. We indicated this threshold along the curves by a red cross. Note that it is possible to increase the true positive rate by lowering the threshold, however, the decrease in precision would be more severe. Thus, we used this threshold in the remainder to call introgression tracts in the 1000 Genomes dataset. The ROC and precision-recall curves for the four more complicated scenarios and the HapMap model are provided in Section S2.1 of the Supporting Information.
For the second part of the simulation study, we simulated data under the different demographic models. We then analyzed each dataset using the same parameters as used for the simulation on the one hand, and using the parameters of the “true” model (Figure 1(a)) on the other hand. The ROC and the Precision-recall curves for varying the introgression percentage are depicted in Figure 2(b) and Figure 2(c), and the curves for the remaining scenarios are given in Section S2.1 of the Supporting Information. Again, we observe that misspecifying the parameters of the analysis does not affect the performance substantially. The ROC curves demonstrate a good performance in terms of true and false positive rate in most scenarios, however, the precision-recall curves exhibit a poorer performance in some scenarios. This is to be expected, as in some scenarios, the Neanderthal and African population are genetically closer, for example, when the divergence time between Africans and non-Africans is increased, or the divergence time between Neanderthal and modern humans is decreased. If the populations are more closely related, it becomes more difficult to distinguish introgressed variation from variation shared between modern human populations. Lastly, we examine the detection power of our method with varying tract length in Section S2.2 of the Supporting Information, and observe good performance across all sizes.
3. Results
3.1. Neanderthal introgression in the 1000 Genomes data
We applied diCal-admix to detect tracts of Neanderthal introgression in non-African individuals from the Phase I dataset of the 1000 Genomes Project (The 1000 Genomes Project Consortium, 2012), focusing on Europeans (CEU) and East Asians (CHB and CHS) in particular. These data were collected using low-coverage short-read sequencing, potentially affecting the quality of the genotype calls. However, we believe that this should not introduce substantial bias into our results. Moreover, we applied the strict mappability mask (The 1000 Genomes Project Consortium, 2012) to exclude from our analysis the genomic regions where no confident genotype calls were made in the 1000 Genomes dataset. We used the 88 YRI individuals (176 haplotypes) from this dataset as reference African haplotypes, assumed to have no introgressed genetic material from Neanderthals, that serve essentially as a modern human reference panel. We applied diCal-admix to compute the marginal posterior introgression probability along the genomic sequences of each of the 85 CEU individuals (170 haplotypes), 97 CHB individuals (194 haplotypes), and 100 CHS individuals (200 haplotypes) in turn. We used a high-coverage genomic sequence from an Altai Neanderthal individual (Prüfer et al., 2014) as a Neanderthal reference. Prüfer et al. (2014) presented different genome alignability filters (Prüfer et al., 2014, SI 5b), and we used the map35_50%-filter, since this filter was suggested by the authors to be most appropriate for population genomic analyses.
diCal-admix requires that the genomic data be phased into haplotype sequences. The 1000 Genomes dataset is computationally phased, so we could use this data as provided; however, the diploid sequence of the Neanderthal individual cannot be phased using standard statistical methods. We instead used an additional pre-processing step to obtain a pseudo-haplotype sequence. As noted by Prüfer et al. (2014), the Altai Neanderthal individual exhibits only a sixth of the heterozygosity of modern non-African individuals. Thus, the number of ambiguous sites that require phasing is small. We tested three different methods to obtain a haplotype allele for these remaining sites: choosing an allele uniformly at random, using the ancestral allele only, and using the derived allele only, where the ancestral states at each locus were determined using a six-primate consensus (Paten et al., 2008). We observed little difference in our results between the different approaches, and thus we only present the results using the derived-allele approach. We used a mutation rate of 1.25 × 10−8 per-site per-generation (Scally and Durbin, 2012) and chromosome-specific recombination rates obtained by averaging the fine-scale rates provided by Kong et al. (2010). Note that we made the simplifying assumption that the recombination rate is constant within each chromosome. Due to computational considerations, we did not compute the posterior at every genomic site, but rather grouped sites together into 500 bp windows; the details of this procedure are provided in Steinrücken et al. (2015). Furthermore, we applied a moving average filter of length 15 kbp to the raw posterior in a post-processing step, to smooth sudden changes. We empirically observed that this filtering step improves detection.
Moreover, we obtained from Sankararaman et al. (2014) the likelihoods of Neanderthal introgression that they computed for the same individuals. To compare these calls obtained at the SNPs to the ones obtained using diCal-admix, we interpolated the Sankararaman et al. likelihoods at the position in the middle of the 500 bp windows employed by diCal-admix. As advised by Sankararaman et al., we used a threshold of 0.89 to call Neanderthal introgression tracts. Vernot and Akey (2014) also identified tracts of Neanderthal ancestry in the individuals from the 1000 Genomes dataset, excluding the X-chromosome. We downloaded the population summaries from http://akeylab.princeton.edu/downloads.html and compared them to the results obtained with diCal-admix, when possible.
We computed the average of the marginal posterior introgression probability obtained at each locus using diCal-admix across each chromosome and across all CEU individuals, and separately across all CHB+CHS individuals. We performed the same averaging for the posterior probabilities obtained by Sankararaman et al. (2014). Figure 3(a) shows the results for each chromosome in the CEU population, while Figure 3(b) shows the results for the CHB+CHS population. We find an average introgression probability of 1.48% in the CEU autosomes, and 1.80% in the CHB+CHS autosomes, whereas Sankararaman et al. report 1.13% and 1.35%, respectively. However, the average amount of introgression detected using diCal-admix varies, from as low as 0.57% on chromosome 17 in the CEU population, to as high as 3.29% on chromosome 9 in the CHB+CHS population. Compared to the autosomes, Sankararaman et al. previously reported a lower amount of introgression on the X-chromosome in CEU (0.18%) as well as in CHB+CHS (0.24%). Similarly, we observed a roughly four-fold decrease on the X-chromosome when compared to the autosomes in CEU (0.38%) and CHB+CHS (0.54%). The general patterns of introgression stratified by chromosome are in good agreement with Sankararaman et al.. However, diCal-admix detects on average 30% more introgression, which might be attributable to the fact that it detects more short tracts (see also Figure 7). Note that most Neanderthal ancestry proportions are lower than the 3% that was assumed in our model, which indicates that Neanderthal ancestry has been preferentially removed. In Section 3.3, we discuss in more detail possible mechanisms for this purging of Neanderthal ancestry.
The posterior distributions along the chromosomes allow for a more detailed view of Neanderthal introgression into modern humans as it varies along the genome. We determined whether a given locus is admixed on a particular haplotype by thresholding the posterior generated using diCal-admix at 0.42 and thresholding the posterior from Sankararaman et al. (2014) at 0.89. We then averaged these calls across 1 Mbp windows and across the individuals in the respective populations, and plotted the result as piece-wise constant functions. The skyline plots in Figure 4 show the percentage of Neanderthal introgression along chromosome 4 in CEU and the X-chromosome in the CEU population. (In Section S3 of the Supporting Information, we provide skyline plots for all chromosomes in the CEU and the CHB+CHS population.) In addition, we indicated the regions on the autosomes that were identified in Vernot and Akey (2014) to be introgressed. As mentioned earlier, the X-chromosome was excluded in their study. We see good agreement between the calls made using diCal-admix and the calls from Sankararaman et al. (2014). Furthermore, the regions of introgression detected by Vernot and Akey (2014) cluster in regions were the skyline plots indicate introgressed genetic material.
To investigate the shared features and differences between the introgression call-sets, we generated Venn diagrams. For the diCal-admix posterior and the posterior from Sankararaman et al. (2014), we used the aforementioned thresholds to call introgression tracts in the CEU and CHB+CHS individuals. For each individual, we assessed at each locus whether either method, both methods, or no method detected Neanderthal introgression, and averaged these indicators to get population-wide percentages for the whole genome. Figures 5(a) and 5(b) show the Venn diagrams for the different call-sets in the CEU population, for the autosomes and the X-chromosome, respectively. Figures 5(c) and 5(d) depict the results for the CHB+CHS population, on the autosomes and the X-chromosome, respectively. We observe a large overlap between the calls based on diCal-admix and Sankararaman et al. (2014) on the autosomes, but less agreement on the X-chromosome. We also generated population-wide introgression maps in each population, called a tiling path by Sankararaman et al. (2014). To this end, we identified those regions on the chromosomes where introgression was called for at least one individual in the respective population. We then compared these population-wide introgression maps with the population-level introgression maps published by Vernot and Akey (2014). We generated three-way Venn diagrams for the autosomes in the CEU population and the CHB+CHS population, shown in Figure 6, both in units of percentage of the whole autosome. Again, we observe a large overlap between diCal-admix and Sankararaman et al. (2014), but less so with Vernot and Akey (2014). This discordance might be explained to some degree by the fact that in the two-stage procedure of Vernot and Akey (2014) the first step does not use sequence information from the Neanderthals as in the other methods. Thus, regions of high sequence identity between modern non-Africans and Neanderthals might be missed in this first step.
We also investigated the distribution of fragment lengths that were detected by the different methods. For all individual from a given population, we counted the number of times an introgression tract of a specific length was detected. Figure 7(a) and 7(c) depict the distributions of the absolute frequencies in the autosomes of the individuals in the CEU population and the CHB+CHS population, respectively. Figure 7(b) and 7(d) show the same distributions for the X-chromosomes. In addition to the empirical tract length distribution obtained from the 1000 Genomes individuals, we plotted the neutral expectation of the absolute frequencies. This neutral expectation of the tract length distribution is computed under the following simple model. Approximating the chromosome as continuous, and considering the introgression tract in an individual at present, the distance between recombination breakpoints is exponentially distributed with parameter g × r, where r is the per generation per base-pair recombination probability and g is the number of generations since the introgression event, because in each generation, there is a chance that recombination breaks down the introgression tract. Here we used g = 2, 000 and r = 1.19 × 10−8 per-base per-generation for the autosomes and per-base per-generation for the X-chromosome. The exponential rate g × r can also be used to obtain the expected number of sequence tracts in a genome of a certain size, 3% of which are introgressed from an ancestral Neanderthal individual, which yields the expected absolute frequency.
This simple model for the neutral expectation is certainly oversimplified, but it serves as a first approximation. It is not discernible in these plots whether deviation from the neutral expectation is due to incorrect detection of the tracts, or the true underlying tracts actually being subject to non-neutral evolution. The fact that both methods deviate from the neutral expectation in qualitatively similar ways suggests both factors may be playing a role. However, it is surprising that, for the autosomes, both methods detect more long fragments than expected under the simple neutral model. This may be due to recombination rate variation along the genome or due to our slightly lower power to detect shorter fragments (see S2.2 of the Supporting Information). It could also suggest that either there was an additional introgression event that happened more recently than 2, 000 generations ago, or that some form of selection is acting that favors longer fragments.
In general, diCal-admix detects more short fragments and fewer long fragments than reported by Sankararaman et al. (2014). Moreover, the empirical distribution of diCal-admix is closer to the neutral model. This and the other statistics of the empirical distribution of the introgressed Neanderthal tracts presented in this section suggest that there is merit in applying different methodologies for the detection of introgression. While all methods perform reasonably well on simulated data, they seem to be sensitive to slightly different features of the introgression tracts. Thus we suggest using the consensus of the three methods for highly confident introgression calls, and using regions unique to only some of the methods for more exploratory research.
3.2. Functional implications of Neanderthal introgression
To explore the functional implications of Neanderthal Introgression we performed a gene ontology (GO) analysis using GOrilla (Eden et al., 2007, 2009), which looks for overrepresentation of GO terms at the top of a ranked list of genes. For each population, we ranked genes by their mean posterior probability of introgression (as determined by the diCal-admix posterior decoding) and looked for GO terms associated with either a lack of Neanderthal introgression or an enrichment of introgression. We restricted our analyses to the 500 bp resolution introgression calls where no more than half of the bases are masked by the 1000 Genomes strict mappability mask (The 1000 Genomes Project Consortium, 2012). Furthermore, we only included genes where less than 10% of sites were masked. The results (shown in Section S1 of the Supporting Information) are broadly concordant between populations. Like Sankararaman et al. (2014), we find that genes associated with keratin are more likely to be introgressed than other genes, which hints at the possibility of adaptive introgression for these genes. Intriguingly, sensory perception, particularly olfaction, both had genes more likely to be introgressed as well as genes less likely to be introgressed. It is possible that adaptive introgression has played a role for some of these genes, for example by helping to adapt to local environments and that selection has removed introgression at other olfaction-related genes. It is also possible, however, that this is an artifact of either the introgression calls (e.g. due to high amount of polymorphism in ofactory genes Malnic et al. (2004)), or high variance in mean introgression rate (e.g. due to being smaller than other genes, or being spatially clustered in the genome).
We also investigated whether SNPs associated with particular phenotypes are more or less likely to be introgressed on average. To this end, we downloaded the results of 2, 419 GWASs that were performed on data from the UK Biobank (Sudlow et al., 2015; Global Biobank Engine, 2017) and extracted all of the SNPs that were significant at a genome-wide significance level of 5.0×10−8 for each GWAS. We then tested whether the mean posterior probability of introgression at these SNPs was significantly higher or lower than expected using our bootstrap-like test (described below). Perhaps due to the large number of tests performed, we found only one statistically significant result: loci associated with being treated with desloratadine, a drug used to treat allergies, were significantly more likely to be introgressed (Bonferroni-corrected p < 0.025 in CEU and CHB+CHS, two-sided bootstrap-like test). The possibility that Neanderthal introgression may play a role in immune disorders such as allergic diseases is a tantalizing direction for future research.
Meanwhile, a number of tests were nominally significant at the p = 0.05 level in both populations and may be of interest for future research. In particular, we found that loci associated with sodium in urine may be less likely to be introgressed in both populations (nominal p = 0.013 in CEU, nominal p = 0.0035 in CHB+CHS), as are loci associated with the mean time to correctly identify matches, a test in which subjects attempted to quickly determine whether abstract symbols matched (Miller et al., 2016) (nominal p = 0.0061 in CEU, nominal p = 0.0229 in CHB+CHS). Some sets of loci may be more likely to be introgressed in both populations: loci associated with vaginal/uterine prolapse (nominal p = 0.0049 in CEU, nominal p = 0.174 in CHB+CHS); loci associated with being treated with glyceryl trinitrate, a drug used to treat heart disease (nominal p = 0.0224 in CEU, nominal p = 0.0017 in CHB+CHS); loci associated with being treated with budesonide, a steroid used to treat asthma, COPD, allergies, and Crohn’s disease (nominal p = 0.0118 in CEU, nominal p = 0.0161 in CHB+CHS); loci associated with being treated with cardioplen, another drug used to treat heart disease (nominal p = 0.0205 in CEU, nominal p = 0.0082 in CHB+CHS); loci associated with being treated with diclomax, a drug used to treat inflammation for example in rheumatoid arthritis (nominal p = 0.0186 in CEU, nominal p = 0.0106 in CHB+CHS); and loci associated with living in small towns (nominal p = 0.011 in CEU, nominal p = 0.0233 in CHB+CHS). We again urge caution in interpreting these results, due to the multiple testing burden making the above results not statistically significant.
3.3. Causes of selection against Neanderthal introgression
As mentioned in the Introduction, Sankararaman et al. (2014) hypothesized DMIs (particular in male hybrids) to be a cause of selection against Neanderthal ancestry in modern humans. This hypothesis was motivated by noting significant depletion of introgression in testes-expressed genes, as well as a substantial reduction of Neanderthal ancestry on the X-chromosome, which had been associated with DMIs in Drosophila (Presgraves, 2008). Our results obtained using diCal-admix also show a reduction in Neanderthal ancestry on the X-chromosome, but this reduction can be explained without appealing to DMIs, as we will discuss in more detail in Section 3.4. Here, we focus on global features of the genome and how they relate to potential DMIs.
Juric et al. (2016) and Harris and Nielsen (2016) recently independently proposed that selection acts against Neanderthal ancestry due to a higher mutational load in Neanderthals rather than DMIs. In their study, Juric et al. developed a likelihood method to explicitly infer the strength of selection against Neanderthal introgression based on the introgression maps obtained by Sankararaman et al. (2014). The authors estimated a selection coefficient for deleterious exonic Neanderthal alleles around −3 × 10−4 for the autosomes. Since the estimated coefficient is on the order of the inverse of the effective population size in humans, they hypothesized that the deleterious alleles could have accumulated as a result of the small longterm effective population size in Neanderthals (Prüfer et al., 2014), which reduced the efficacy of selection in this population. When these deleterious alleles entered the larger human population through introgression, they were subjected to more efficient selection, and this led to the observed widespread selection against Neanderthal alleles. The authors use simulations to confirm that the population size history of Neanderthals could have indeed allowed for the accumulation of deleterious alleles on the observed order of magnitude. Harris and Nielsen (2016) arrived at similar conclusions while studying the strength of selection against Neanderthal introgression using forward simulations of autosomal genetic material.
In an attempt to disentangle the DMI and mutational load hypotheses, we performed a number of statistical tests. As detailed below, we found evidence that there was selection against Neanderthal ancestry, but could not find statistically significant evidence that the selection was due to DMIs. We interpret this as evidence in favor of the mutational load hypothesis, although it is possible that more data or more powerful statistical tests may show evidence in favor of the DMI hypothesis. In particular, our tests are underpowered to detect individual DMIs and would not be able to detect if there were a small number of strong DMIs. On the other hand, if there are relatively few DMIs, then they would not be able to explain the reduction in Neanderthal ancestry across the whole genome.
We repeatedly used a test similar to bootstrapping, which we describe presently and refer to subsequently as the bootstrap-like test. When performing hypothesis tests, we must account for the spatial correlation of both our introgression calls and many genomic features of interest (e.g. gene locations or local recombination rates). We also would like to account for uncertainty in the introgression calls themselves. To this end, we left the genomic features of interest in place, and then sampled new introgression calls for each chromosome by drawing, with replacement, non-overlapping 5 Mb segments of our original introgression calls from the same chromosome. We then recalculated our test statistic using this resampled set of introgression calls. Repeating this resampling procedure many times provided an approximate empirical distribution of our test-statistic under the null hypothesis of no association between our introgression calls and the genomic feature of interest. We then used this distribution to compute approximate p-values. For all of the tests presented below, we again restricted our analyses to the 500 bp windows where no more than half of the bases were masked by the 1000 Genomes strict mappability mask (The 1000 Genomes Project Consortium, 2012).
To begin, we looked into whether selection against Neanderthal ancestry has occurred. First, note that the admixture proportion has previously been estimated as 3% (Green et al., 2010; Juric et al., 2016). If there was no subsequent “dilution” of Neanderthal ancestry (Sankararaman et al. (2016), but see Vernot and Akey (2015), and see Slatkin and Racimo (2016) for a comprehensive review), then under neutrality we would expect about 3% ancestry on average in present-day populations, and we would expect about half of chromosomes to have more than 3% Neanderthal introgression on average and about half of chromosomes to have less than 3% introgression. Yet, no chromosome in either CEU or CHB+CHS has, on average, more than 3.29% introgression, and most chromosomes have less than 1.5% average Neanderthal ancestry. Thus, under this simple null model we can reject neutrality (p = 2.4 × 10−7 in CEU, p = 5.7 × 10−6 in CHB+CHS, two-sided sign test, n = 23) in both CEU and CHB+CHS. While the above test assumed that the admixture proportion was 3%, we would would be able to reject the null hypothesis of neutrality for any admixture proportion greater than 1.48% in CEU or 1.96% in CHB+CHS, both of which are much lower than the findings of the previous studies discussed above. Indeed, Juric et al. report a confidence a 95% confidence interval of [3.22%, 3.52%] for the admixture proportion in CEU and [3.45%, 3.86%] for the admixture proportion in CHB+CHS (Juric et al., 2016). Using the estimates of Juric et al. to infer that selection has occurred is a somewhat circular argument because Juric et al. used the assumption of selection against Neanderthal ancestry to infer their admixture proportions. Yet, an ancient European individual has been found with greater than 6% Neanderthal ancestry, and a number of other ancient Europeans have been found with greater than 3% Neanderthal ancestry without the assumption of selection against Neanderthal ancestry (Fu et al., 2015). These ancient genomes together with our findings suggest that there has been a significant reduction in Neanderthal ancestry.
To explore if this reduction in Neanderthal ancestry is more pronounced in genic regions, we compared the mean (across individuals and loci) frequency of introgression in regions marked as exons in the RefSeq annotation (O’Leary et al., 2016) to the chromosome-wide mean. The results were largely concordant when we considered transcripts or coding sequences instead of exons, so we present below only results based on exons. For CEU, we found that in 13 out of the 23 chromosomes there is less introgression in genic regions than in the rest of the chromosome, which is not statistically significant (p = 0.678, two-sided sign test, n = 23). Likewise, for CHB+CHS, we again found that 13 chromosomes have less introgression in genic regions, which is also not statistically significant (p = 0.678, two-sided sign test, n = 23). Furthermore, only 9 chromosomes showing a reduction in introgression at exons were shared between the two populations, which is not statistically significant (p = 0.09, two-sided permutation test). We also performed our bootstrap-like test to see if there was any significant decrease in genic regions on any chromosome, but we did not find any significant results (unadjusted p > 0.05) on any chromosome in either population, except for the X chromosome in CHB+CHS (unadjusted p = 0.02) which can be explained simply by the multiple testing burden. We interpret these results as indicating that either selection against Neanderthal introgression is fairly weak if it is acting on most or all genes, or has only acted on a subset of genes. It is also possible that selection against Neanderthal introgression has acted on genomic elements other than exons, such as regulatory elements.
Meanwhile, we found evidence that a measure of conservation, phastCONS (Siepel et al., 2005), was significantly negatively correlated with mean introgression at a given locus (Spearman’s ρ = −0.029, p = 0.003 in CEU and ρ = −0.024, p = 0.043 in CHB+CHS, bootstrap-like test), which indicates that selection was more likely to remove Neanderthal ancestry at highly conserved loci. We also tested if proportion of Neanderthal ancestry was positively correlated with local population-scaled recombination rate (as inferred by Myers et al. (2005)), which would be suggestive of selection against Neanderthal ancestry because regions of high recombination would be more likely to separate neutral regions of Neanderthal ancestry from linked deleterious regions, an idea recently explored elegantly and in more detail by Schumer et al. (2017). Similar to Schumer et al., we found a positive association between local population-scaled recombination rate and frequency of introgression (Spearman’s ρ = 0.055, p < 0.001 in CEU and ρ = 0.054, p < 0.001 in CHB+CHS, bootstrap-like test) lending further credence to the hypothesis that selection is acting against certain regions of Neanderthal ancestry.
If DMIs were the cause of selection against introgression, then we would expect that genes that code for proteins with more binding partners would be less likely to be introgressed; each protein-protein interaction (PPI) can be thought of as a possible DMI. To test this hypothesis, we used the PICKLE2.0 PPI network (Klapa et al., 2013; Gioutlakis et al., 2017) and associated a number of binding partners to each gene by counting the number of PPIs in which the protein coded by that gene participates. We found an insignificant correlation between number of binding partners and mean frequency of introgression in both populations (Spearman’s ρ = −0.016, p = 0.314 in CEU, ρ = −0.003, p = 0.858 in CHB+CHS, two-sided bootstrap-like test), which provides weak indirect evidence against the DMI hypothesis.
As a more direct test of the DMI hypothesis, we tested whether proteins that interact (according to the PICKLE2.0 PPI network) are more likely to be co-introgressed. In particular, for each gene in the PPI network, we say that that gene is introgressed if any part of any of its exons is in a called introgression tract. For each individual we then assign a weight to each edge in the PPI network as follows. Let gene A and gene B be the genes that code for the proteins involved in the interaction corresponding to the edge of interest. If each copy of gene A and gene B (i.e. on autosomes, we assume there are two copies of each gene and on the X-chromosome males have only one copy) has the same ancestry, the edge is assigned a weight of one – in this individual, this interaction is always between proteins from genes of the same ancestry. Meanwhile, if all of the copies of gene A are of one type of ancestry and all of the copies of gene B are of the other type of ancestry, then the edge is assigned a weight of zero – this interaction is never between proteins of the same ancestry. Finally, if either gene has mixed ancestry (i.e. one copy from one ancestry and the other copy from the other ancestry) the edge is assigned a weight of 0.5 – in this case it can be shown that if one randomly selects a copy of gene A and a copy of gene B, then the proteins produced by those copies will have the same ancestry 50% of the time. Thus, these edge weights are the probabilities that compatible proteins interact, assuming that both ancestry types at each locus produce the same amount of protein and the probability that a given protein is involved in a particular interaction does not depend on its ancestry. We then averaged these weights across individuals and across edges in the PPI network to obtain a test statistic. Using this test, we failed to obtain a significant result in either population (p = 0.117, CEU, p = 0.647, CHBS, one-sided permutation test), which again provides some evidence against the DMI hypothesis. If DMIs are not widespread, then this test would likely not have power to detect them, so we are unable to rule out the possibility that there are a small number of DMIs. Yet, a small number of DMIs would be unable to explain the overall reduction in Neanderthal ancestry across the genome.
Taken as a whole, we find that while there has been selection against Neanderthal introgression, for example at highly conserved loci, it seems that the negative selection is likely not due to widespread DMIs, which lends more credence to the mutational load hypothesis. We also note that in contrast to the broad findings presented here, a small number of specific loci have been found to have experienced positive selection for archaic introgression (Racimo et al., 2015; Sams et al., 2016; Racimo et al., 2017).
3.4. Patterns of introgression on the X-chromosome
To further explore whether selection against introgressed Neanderthal variation differed between autosomes and the X-chromosome, we performed forward simulations in a Wright-Fisher model, focusing on the dynamics of Neanderthal alleles in the modern human population after the introgression event 2, 000 generations ago. For the autosomes, we modeled each diploid individual to be comprised of two chromosomes. Each chromosome consisted of 5,000 loci, and recombination could act between these loci. The recombination rate was calibrated such that this corresponds to a 150 Mbp chromosome with a recombination rate of 1.25 × 10−8 per generation per base-pair. Similar to Harris and Nielsen (2016), the population size was set to N = 1, 860 for the first 900 generations, followed by an instantaneous decrease to N = 1, 032 with subsequent exponential growth at 0.38% per generation. For computational reasons, we limited the population size to N = 10, 000, which is reached roughly 400 generations before present. In the generation immediately after the introgression event, the chromosomes of 97% of the individuals in the population carry modern human alleles at all 5,000 loci on both chromosomes, and 3% carry Neanderthal alleles, representing the introgressed individuals. In the subsequent generations, the fitness of an individual is (1–s)D, with selection coefficient s, and D denoting the number of Neanderthal alleles that a diploid individual carries. Figure 8(a) depicts the amount of Neanderthal introgression measured in the autosomes in the CEU and the CHB+CHS population, as well as the amount of Neanderthal introgression in a sample of individuals at present from populations simulated with different values for s, repeated 16 times. These simulations suggest that a selection coefficient on the order of s = −2 × 10−5 is more than sufficient to explain the observed reduction in Neanderthal ancestry from the initial proportion of 3% on the autosomes. These results are largely consistent with the estimates of selection against introgression provided in (Juric et al., 2016, Table 1), where −3 × 10−8 is estimated for the effective selection strength per exonic site. In our simulations, each locus corresponds to 30,000 sites. According to the annotation from the UCSC genome browser (https://genome.ucsc.edu/), the genome-wide density of exonic sites is 2.8%, and thus a simulated locus contains roughly 840 exonic sites. The simulated selection strength of s = −2 × 10−5 then corresponds to a selection strength of −2.4 × 10−8 per exonic site.
We also performed simulations for the X-chromosome to see if the strength of selection is similar on the autosomes and X-chromosome. In our simulation, the fitness of males was determined solely by their single X-chromosome and their Y-chromosome was modeled as selectively neutral. In females, to model X-inactivation, we chose one chromosome randomly to determine the fitness. There are several methods to calibrate the selection coefficient, s. Here we chose to calibrate s such that, in both females and males, carrying only Neanderthal variants at a certain locus has the same affect on fitness for the X-chromosome as for an autosome. Consequently, to determine the exponent for the fitness in females, the number of Neanderthal alleles on the active chromosome is multiplied by two. This calibration effectively corresponds to an additive selection model in females. In males, the number of Neanderthal alleles on the single X-chromosome is also multiplied by two to obtain the exponent. Note that Juric et al. (2016) used a different calibration, where selection in males is half as strong. Figure 8(b) shows the Neanderthal proportions on the X-chromosome in both populations, and the results of the simulations for different values of s. We observe that using this calibration for the selection coefficient, for the same strength of selection, the amount of Neanderthal ancestry in the population at present is reduced on the X-chromosome compared to the autosomes. While this might seem at odds with the reduced effective population size for the X-chromosome, and hence a lower efficacy of selection, it can be explained by the fact that genetic variants in males have a stronger impact than they would in a diploid autosomal population of reduced effective size. Moreover, we observe in Figure 8(b) that a selection coefficient not much stronger than s = −2 × 10−5 can result in the reduction of Neanderthal introgression observed on the X-chromosomes in the 1000 Genomes data.
To explore the possibility of hybrid male infertility, we modified the simulations for the X-chromosome as follows. In addition to the global selection with coefficient s against Neanderthal alleles at all loci, we designated 0.5% of the 5000 loci to be incompatibility-loci. These incompatibility loci only affect fitness in male individuals. The fitness is multiplied by , where sI is the selection coefficient against incompatibility, M is the total number of incompatibility loci and C is the number of Neanderthal alleles a male individual carries at these loci. The exponent is proportional to the number of incompatible pairs. Thus, if an individual carries only modern human or only Neanderthal alleles at these loci, the exponent is zero, and the fitness is not affected. The exponent equals its maximal value of when , that is, half of the incompatibility loci carry the Neanderthal allele, and the other half carries the human allele. Figure 9 depicts the results of the simulations for s = −10−5, and different incompatibility selection coefficients sI. Note that the order of magnitude of sI is higher then s, because it is acting on fewer loci. The simulations show that a mechanism like this type of hybrid incompatibility could indeed decrease the introgression further than global weak selection against Neanderthal variants by itself, although such an explanation is not necessary to fit the observed levels of introgression.
4. Discussion
In this paper, we introduced a modification of the method diCal 2.0, which was developed by Steinrücken et al. (2015) to infer complex demographic histories from full-genomic sequence data. We applied this modification (diCal-admix) to detect tracts of genetic material in modern non-African individuals that introgressed into the population when non-Africans and Neanderthals exchanged genetic material about 2, 000 generations ago. We demonstrated in an extensive simulation study that diCal-admix can accurately and efficiently detect tracts of Neanderthal introgression. Furthermore, we applied diCal-admix to detect introgression in the individuals sampled from the CEU, CHB, and CHS populations as part of the 1000 Genomes Project (The 1000 Genomes Project Consortium, 2012). We exhibited some of the methodological and empirical differences between diCal-admix and previous results reported by Sankararaman et al. (2014) and by Vernot and Akey (2014). While they are generally in good agreement, we observed some differences. This highlights the importance of the development of different methodologies to generate a consensus. We also reported some of the functional implications of introgression, which confirms previously reported findings of wide-spread selection against introgression, enrichment of Neanderthal introgression in certain classes of genes, but a general signal of depleted Neanderthal introgression in conserved regions of the genome.
However, the role of the X-chromosome remains intriguing. As in previous studies, we observe a substantially lower amount of Neanderthal introgression on the X-chromosome compared to the autosomes. Sankararaman et al. (2014) hypothesized that this reduction is due to DMIs reducing male fertility, further supported by significantly reduced Neanderthal introgression in genes expressed in testes. However, we do not find significant evidence for DMIs, and the GO-term enrichment analyses did not reveal any patterns of enrichment or depletion of Neanderthal ancestry in genes related to spermatogenesis, the testes, or infertility more broadly.
Additionally, compared to modern humans and Neanderthals (separated by tens of thousands of generations), the species in which DMIs have been observed are substantially more diverged—e.g., about 20 million generations between D. simulans and D. melanogaster (Li et al., 1999); 200 to 500 thousand generations between M. guttatis and M. nasutus (Brandvain et al., 2014); and 750 thousand to 5 million generations between A. californiense and A. tigrinum mavortium (Shaffer and McKnight, 1996; Fitzpatrick et al., 2010). The divergence time separating modern humans and Neanderthals is only about a factor of two older than the divergence time between the most diverged human populations (e.g., as inferred by Veeramah et al. (2012) and Gronau et al. (2011) using an updated mutation rate as discussed in Scally and Durbin (2012)) and no DMIs are known to occur in admixtures of modern human populations, raising the question of how such incompatibilities between modern humans and Neanderthals could have arisen so quickly.
Other studies (Juric et al., 2016; Harris and Nielsen, 2016) and our simulations suggest that only a moderate strength of selection is required to explain the observed reduction on the X-chromosome. However, the evidence that has been collected to date does not seem to be sufficient to fully characterize the importance of the X-chromosome. Resolving these questions will require a more comprehensive analysis of larger samples of contemporary genetic data like the Simons Genome Diversity Project (Mallick et al., 2016) and the individuals from Phase III of the 1000 Genomes Project (The 1000 Genomes Project Consortium, 2012). Moreover, additional high-quality data for hominin sister groups (Meyer et al., 2012; Prüfer et al., 2017) will improve the detection of introgression. Detecting introgression in genetic samples of ancient modern humans (Mathieson et al., 2015) will also allow resolution of the evolutionary trajectory of introgressed genetic material over time. Incorporating the distribution of tracts on an individual level and different models of diploid selection into the inference frameworks will improve the inference of the strength of selection and allow reliable testing of different models. Additionally, a better understanding of the evolution of incompatibilities, and more careful investigation of the gene content on the X-chromosome will help shed light on the role of the X-chromosome in the Neanderthal introgression landscape. In general, maps of introgressed Neanderthal and Denisovan ancestry will facilitate the interpretation of patterns of human genomic variation and further the understanding of how archaic introgression influenced the trajectory of human evolution.
Supplementary Material
Acknowledgments
We thank Sriram Sankararaman for providing us with his introgression calls, and the Rivas lab for making the Global Biobank Engine resource available. Furthermore, we thank Cathy Pfister for helpful comments and suggestions. This research is supported in part by a National Institutes of Health grant R01-GM094402, and a Packard Fellowship for Science and Engineering. Y.S.S. is a Chan Zuckerberg Biohub investigator.
Footnotes
Data Accessibility
The marginal posterior probabilities of Neanderthal introgression for the CEU, CHB, and CHS haplotypes computed using diCal-admix are available at http://dical-admix.sourceforge.net/
Supporting Information
Additional supporting information may be found in the online version of this article.
Gene Ontology Analysis Results, ROC and Precision-Recall Curves from Simulated Data, Fine-scale Population Average Introgression
References
- Brandvain Y, Kenney AM, Flagel L, Coop G, and Sweigart AL (2014). Speciation and introgression between mimulus nasutus and mimulus guttatus. PLOS Genetics, 10,(6) 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brideau NJ, Flores HA, Wang J, Maheshwari S, Wang X, and Barbash DA (2006). Two dobzhansky-muller genes interact to cause hybrid lethality in drosophila. Science, 314,(5803) 1292–1295. [DOI] [PubMed] [Google Scholar]
- Dannemann M, Prüfer K, and Kelso J (2017). Functional implications of neandertal introgression in modern humans. Genome Biology, 18,(1) 61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dobzhansky T (1936). Studies on hybrid sterility. ii. localization of sterility factors in drosophila pseudoobscura hybrids. Genetics, 21,(2) 113–135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eden E, Lipson D, Yogev S, and Yakhini Z (2007). Discovering motifs in ranked lists of dna sequences. PLOS Computational Biology, 3,(3) 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eden E, Navon R, Steinfeld I, Lipson D, and Yakhini Z (2009). Gorilla: a tool for discovery and visualization of enriched go terms in ranked gene lists. BMC Bioinformatics, 10,(1) 48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fishman L and Willis JH (2001). Evidence for dobzhansky-muller incompatibilites contributing to the sterility of hybrids between mimulus guttatus and m. nasutus. Evolution, 55,(10) 1932–1942. [DOI] [PubMed] [Google Scholar]
- Fitzpatrick BM (2008). Dobzhansky–muller model of hybrid dysfunction supported by poor burst-speed performance in hybrid tiger salamanders. Journal of Evolutionary Biology, 21,(1) 342–351. [DOI] [PubMed] [Google Scholar]
- Fitzpatrick BM, Johnson JR, Kump DK, Smith JJ, Voss SR, and Shaffer HB (2010). Rapid spread of invasive genes into a threatened native species. Proceedings of the National Academy of Sciences, 107,(8) 3606–3610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu Q, Hajdinjak M, Moldovan OT, Constantin S, Mallick S, Skoglund P, Patterson N, Rohland N, Lazaridis I, Nickel B, Viola B, Prüfer K, Meyer M, Kelso J, Reich D, and Pääbo S (2015). An early modern human from romania with a recent neanderthal ancestor. Nature, 524, 216–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gioutlakis A, Klapa MI, and Moschonas NK (2017). Pickle 2.0: A human protein-protein interaction meta-database employing data integration via genetic information ontology. PLOS ONE, 12,(10) 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gittelman RM, Schraiber JG, Vernot B, Mikacenic C, Wurfel MM, and Akey JM (2016). Archaic hominin admixture facilitated adaptation to out-of-africa environments. Current Biology, 26,(24) 3375–3382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Global Biobank Engine, (2017). URL http://gbe.stanford.edu/.
- Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH-Y, Hansen NF, Durand EY, Malaspinas A-S, Jensen JD, Marques-Bonet T, Alkan C, Prüfer K, Meyer M, Burbano HA, Good JM, Schultz R, Aximu-Petri A, Butthof A, Höber B, Höffner B, Siegemund M, Weihmann A, Nusbaum C, Lander ES, Russ C, Novod N, Affourtit J, Egholm M, Verna C, Rudan P, Brajkovic D, Kucan Ž, Gušic I, Doronichev VB, Golovanova LV, Lalueza-Fox C, de la Rasilla M, Fortea J, Rosas A, Schmitz RW, Johnson PLF, Eichler EE, Falush D, Birney E, Mullikin JC, Slatkin M, Nielsen R, Kelso J, Lachmann M, Reich D, and Pääbo S (2010). A draft sequence of the neandertal genome. Science, 328,(5979) 710–722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gronau I, Hubisz MJ, Gulko B, Danko CG, and Siepel A (2011). Bayesian inference of ancient human demography from individual genome sequences. Nature Genetics, 43, 1031 EP –. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harris K and Nielsen R (2016). The genetic cost of neanderthal introgression. Genetics, 203,(2) 881–891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jouganous J, Long W, Ragsdale AP, and Gravel S (2017). Inferring the joint demographic history of multiple populations: Beyond the diffusion approximation. Genetics, 206,(3) 1549–1567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Juric I, Aeschbacher S, and Coop G (2016). The strength of selection against neanderthal introgression. PLOS Genet, 12,(11) e1006340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kelleher J, Etheridge AM, and McVean G (2016). Efficient coalescent simulation and genealogical analysis for large sample sizes. PLOS Comput. Biol, 12,(5) e1004842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klapa MI, Tsafou K, Theodoridis E, Tsakalidis A, and Moschonas NK (2013). Reconstruction of the experimentally supported human protein interactome: what can we learn? BMC Systems Biology, 7, (1) 96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kong A, Thorleifsson G, Gudbjartsson DF, Masson G, Sigurdsson A, Jonasdottir A, Walters GB, Jonasdottir A, Gylfason A, Kristinsson KT, Gudjonsson SA, Frigge ML, Helgason A, Thorsteinsdottir U, and Stefansson K (2010). Fine-scale recombination rate differences between sexes, populations and individuals. Nature, 467,(7319) 1099–1103. [DOI] [PubMed] [Google Scholar]
- Li N and Stephens M (2003). Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics, 165,(4) 2213–2233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Y-J, Satta Y, and Takahata N (1999). Paleo-demography of the Drosophila melanogaster subgroup: application of the maximum likelihood method. Genes & Genetic Systems, 74,(4) 117–127. [DOI] [PubMed] [Google Scholar]
- Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S, Tandon A, Skoglund P, Lazaridis I, Sankararaman S, Fu Q, Rohland N, Renaud G, Erlich Y, Willems T, Gallo C, Spence JP, Song YS, Poletti G, Balloux F, van Driem G, de Knijff P, Romero IG, Jha AR, Behar DM, Bravi CM, Capelli C, Hervig T, Moreno-Estrada A, Posukh OL, Balanovska E, Balanovsky O, Karachanak-Yankova S, Sahakyan H, Toncheva D, Yepiskoposyan L, Tyler-Smith C, Xue Y, Abdullah MS, Ruiz-Linares A, Beall CM, Di Rienzo A, Jeong C, Starikovskaya EB, Metspalu E, Parik J, Villems R, Henn BM, Hodoglugil U, Mahley R, Sajantila A, Stamatoyannopoulos G, Wee JTS, Khusainova R, Khusnutdinova E, Litvinov S, Ayodo G, Comas D, Hammer MF, Kivisild T, Klitz W, Winkler CA, Labuda D, Bamshad M, Jorde LB, Tishkoff SA, Watkins WS, Metspalu M, Dryomov S, Sukernik R, Singh L, Thangaraj K, Pääbo S, Kelso J, Patterson N, and Reich D (2016). The simons genome diversity project: 300 genomes from 142 diverse populations. Nature, 538,(7624) 201–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malnic B, Godfrey PA, and Buck LB (2004). The human olfactory receptor gene family. Proceedings of the National Academy of Sciences of the United States of America, 101,(8) 2584–2589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mathieson I, Lazaridis I, Rohland N, Mallick S, Patterson N, Roodenberg SA, Harney E, Stewardson K, Fernandes D, Novak M, Sirak K, Gamba C, Jones ER, Llamas B, Dryomov S, Pickrell J, Arsuaga JL, de Castro JB, Carbonell E, Gerritsen F, Khokhlov A, Kuznetsov P, Lozano M, Meller H, Mochalov O, Moiseyev V, Guerra MAR, Roodenberg J, Vergès JM, Krause J, Cooper A, Alt KW, Brown D, Anthony D, Lalueza-Fox C, Haak W, Pinhasi R, and Reich D (2015). Genome-wide patterns of selection in 230 ancient eurasians. Nature, 528,(7583) 499–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meyer M, Kircher M, Gansauge M-T, Li H, Racimo F, Mallick S, Schraiber JG, Jay F, Prüfer K, de Filippo C, Sudmant PH, Alkan C, Fu Q, Do R, Rohland N, Tandon A, Siebauer M, Green RE, Bryc K, Briggs AW, Stenzel U, Dabney J, Shendure J, Kitzman J, Hammer MF, Shunkov MV, Derevianko AP, Patterson N, Andrés AM, Eichler EE, Slatkin M, Reich D, Kelso J, and Pääbo S (2012). A high-coverage genome sequence from an archaic denisovan individual. Science, 338,(6104) 222–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller KL, Alfaro-Almagro F, Bangerter NK, Thomas DL, Yacoub E, Xu J, Bartsch AJ, Jbabdi S, Sotiropoulos SN, Andersson JLR, Griffanti L, Douaud G, Okell TW, Weale P, Dragonu I, Garratt S, Hudson S, Collins R, Jenkinson M, Matthews PM, and Smith SM (2016). Multimodal population brain imaging in the uk biobank prospective epidemiological study. Nature Neuroscience, 19, 1523–1536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Myers S, Bottolo L, Freeman C, McVean G, and Donnelly P (2005). A fine-scale map of recombination rates and hotspots across the human genome. Science, 310,(5746) 321–324. [DOI] [PubMed] [Google Scholar]
- O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, Astashyn A, Badretdin A, Bao Y, Blinkova O, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali VK, Li W, Maglott D, Masterson P, McGarvey KM, Murphy MR, O’Neill K, Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C, Shkeda A, Storz SS, Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan AR, Wallin C, Webb D, Wu W, Landrum MJ, Kimchi A, Tatusova T, DiCuccio M, Kitts P, Murphy TD, and Pruitt KD (2016). Reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation. Nucleic Acids Research, 44,(D1) D733–D745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orr HA (1995). The population genetics of speciation: the evolution of hybrid incompatibilities. Genetics, 139,(4) 1805–1813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paten B, Herrero J, Beal K, Fitzgerald S, and Birney E (2008). Enredo and pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res, 18,(11) 1814–1828. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paul JS and Song YS (2010). A principled approach to deriving approximate conditional sampling distributions in population genetics models with recombination. Genetics, 186,(1) 321–338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paul JS, Steinrücken M, and Song YS (2011). An accurate sequentially markov conditional sampling distribution for the coalescent with recombination. Genetics, 187,(4) 1115–1128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plagnol V and Wall JD (2006). Possible ancestral structure in human populations. PLOS Genet, 2,(7) e105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Presgraves DC (2008). Sex chromosomes and speciation in drosophila. Trends Genet, 24,(7) 336–343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, Heinze A, Renaud G, Sudmant PH, de Filippo C, Li H, Mallick S, Dannemann M, Fu Q, Kircher M, Kuhlwilm M, Lachmann M, Meyer M, Ongyerth M, Siebauer MF, Theunert C, Tandon A, Moorjani P, Pickrell J, Mullikin JC, Vohr SH, Green RE, Hellmann I, Johnson PLF, Blanche H, Cann H, Kitzman JO, Shendure J, Eichler EE, Lein ES, Bakken TE, Golovanova LV, Doronichev VB, Shunkov MV, Derevianko AP, Viola B, Slatkin M, Reich D, Kelso J, and Pääbo S (2014). The complete genome sequence of a Neanderthal from the Altai Mountains. Nature, 505,(7481) 43–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prüfer K, de Filippo C, Grote S, Mafessoni F, Korlević P, Hajdinjak M, Vernot B, Skov L, Hsieh P, Peyrégne S, Reher D, Hopfe C, Nagel S, Maricic T, Fu Q, Theunert C, Rogers R, Skoglund P, Chintalapati M, Dannemann M, Nelson BJ, Key FM, Rudan P, Kućan Ž, Gušić I, Golovanova LV, Doronichev VB, Patterson N, Reich D, Eichler EE, Slatkin M, Schierup MH, Andrés A, Kelso J, Meyer M, and Pääbo S (2017). A high-coverage neandertal genome from vindija cave in croatia. Science. aao1887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Racimo F, Sankararaman S, Nielsen R, and Huerta-Sánchez E (2015). Evidence for archaic adaptive introgression in humans. Nature Reviews Genetics, 16, 359 EP –. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Racimo F, Marnetto D, and Huerta-Sánchez E (2017). Signatures of archaic adaptive introgression in present-day human populations. Molecular Biology and Evolution, 34,(2) 296–317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rogers RL (2015). Chromosomal rearrangements as barriers to genetic homogenization between archaic and modern humans. Molecular Biology and Evolution, 32,(12) 3064–3078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sams AJ, Dumaine A, Nédélec Y, Yotova V, Alfieri C, Tanner JE, Messer PW, and Barreiro LB (2016). Adaptively introgressed neandertal haplotype at the oas locus functionally impacts innate immune responses in humans. Genome Biology, 17,(1) 246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sankararaman S, Patterson N, Li H, Pääbo S, and Reich D (2012). The date of interbreeding between neandertals and modern humans. PLOS Genet, 8,(10) e1002947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sankararaman S, Mallick S, Dannemann M, Prufer K, Kelso J, Paabo S, Patterson N, and Reich D (2014). The genomic landscape of neanderthal ancestry in present-day humans. Nature, 507,(7492) 354–357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sankararaman S, Mallick S, Patterson N, and Reich D (2016). The combined landscape of denisovan and neanderthal ancestry in present-day humans. Curr. Biol, 26,(9) 1241–1247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scally A and Durbin R (2012). Revising the human mutation rate: implications for understanding human evolution. Nat Rev Genet, 13,(10) 745–753. [DOI] [PubMed] [Google Scholar]
- Schumer M, Xu C, Powell D, Durvasula A, Skov L, Holland C, Sankararaman S, Andolfatto P, Rosenthal G, and Przeworski M (2017). Natural selection interacts with the local recombination rate to shape the evolution of hybrid genomes. bioRxiv. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shaffer HB and McKnight ML (1996). The polytypic species revisited: Genetic differentiation and molecular phylogenetics of the tiger salamander ambystoma tigrinum (amphibia: Caudata) complex. Evolution, 50,(1) 417–433. [DOI] [PubMed] [Google Scholar]
- Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, and Haussler D (2005). Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Research, 15,(8) 1034–1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simonti CN, Vernot B, Bastarache L, Bottinger E, Carrell DS, Chisholm RL, Crosslin DR, Hebbring SJ, Jarvik GP, Kullo IJ, Li R, Pathak J, Ritchie MD, Roden DM, Verma SS, Tromp G, Prato JD, Bush WS, Akey JM, Denny JC, and Capra JA (2016). The phenotypic legacy of admixture between modern humans and neandertals. Science, 351,(6274) 737–741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Slatkin M and Racimo F (2016). Ancient dna and human history. Proceedings of the National Academy of Sciences, 113,(23) 6380–6387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steinrücken M, Kamm J, and Song Y (2015). Inference of complex population histories using whole-genome sequences from multiple populations. Preprint at: 10.1101/026591. [DOI] [PMC free article] [PubMed]
- Steinrücken M, Paul JS, and Song YS (2013). A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor. Popul. Biol, 87, 51–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, Downey P, Elliott P, Green J, Landray M, Liu B, Matthews P, Ong G, Pell J, Silman A, Young A, Sprosen T, Peakman T, and Collins R (2015). Uk biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLOS Medicine, 12,(3) 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The 1000 Genomes Project Consortium. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature, 491,(7422) 56–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Veeramah KR, Wegmann D, Woerner A, Mendez FL, Watkins JC, Destro-Bisol G, Soodyall H, Louie L, and Hammer MF (2012). An early divergence of khoesan ancestors from those of other modern humans is supported by an abc-based analysis of autosomal resequencing data. Molecular Biology and Evolution, 29,(2) 617–630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vernot B and Akey JM (2014). Resurrecting surviving neandertal lineages from modern human genomes. Science, 343,(6174) 1017–1021. [DOI] [PubMed] [Google Scholar]
- Vernot B and Akey JM (2015). Complex history of admixture between modern humans and neandertals. The American Journal of Human Genetics, 96,(3) 448 – 453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vernot B, Tucci S, Kelso J, Schraiber JG, Wolf AB, Gittelman RM, Dannemann M, Grote S, McCoy RC, Norton H, Scheinfeldt LB, Merriwether DA, Koki G, Friedlaender JS, Wakefield J, Pääbo S, and Akey JM (2016). Excavating neandertal and denisovan dna from the genomes of melanesian individuals. Science aad9416. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.