Abstract
Distinguishing between hybridization and population structure in the ancestral species is a key challenge in our understanding of how permeable species boundaries are to gene flow. The doubly conditioned frequency spectrum (dcfs) has been argued to be a powerful metric to discriminate between these two explanations, and it was used to argue for hybridization between Neandertal and anatomically modern humans. The shape of the observed dcfs for these two species cannot be reproduced by a model that represents ancient population structure in Africa with two populations, while adding hybridization produces realistic shapes. In this letter, we show that this result is a consequence of the spatial coarseness of the demographic model and that a spatially structured stepping stone model can generate realistic dcfs without hybridization. This result highlights how inferences on hybridization between recently diverged species can be strongly affected by the choice of how population structure is represented in the underlying demographic model. We also conclude that the dcfs has limited power in distinguishing between the signals left by hybridization and ancient structure.
Keywords: hybridization, Neandertal, demography, population structure
Hybridization between different species can play a major role in evolution, both by bringing novel adaptations into species as well as by acting as a barrier to their divergence (Seehausen 2004; Abbott et al. 2013). However, detecting hybridization from genetic data can be challenging, as it requires distinguishing actual gene flow after the species split from shared variation that was present in the ancestral species (Abbott et al. 2013; Smith and Kronforst 2013; Sousa and Hey 2013). This problem is particularly challenging when considering hybridization among recently diverged species, where past population structure in the ancestral species can leave genetic signatures that are almost identical to those left by hybridization (Green et al. 2010; Eriksson and Manica 2012; Lowery et al. 2013).
The challenges of distinguishing between actual hybridization and ancient population structure have been highlighted by the recent publication of Neandertal genomes (Green et al. 2010; Prüfer et al. 2013). The main finding coming out of the first analysis of the draft sequence of the Neandertal genome (Green et al. 2010) was that populations of anatomically modern humans (AMHs) differed in genetic similarity to Neandertal. Specifically, modern Europeans and Asians were significantly more genetically similar to this hominin than Africans (Green et al. 2010). Patterson’s D statistics (SOM 15 in Green et al. 2010) is arguably the best-known approach to quantify this pattern. This statistics is based on a panel of four individuals and focuses on biallelic sites where either the Eurasian or the African match the Neandertal (but not both) and where the Neandertal is different from the chimp. D is calculated as the fraction of such sites where the Eurasian genome matches the Neandertal minus the fraction where the African genome matches Neandertal. In a simple four-population model without hybridization, we expect Eurasian and African genomes to have the same probability of matching the Neandertal through incomplete lineage sorting, but hybridization between Neandertal and one of the modern human populations would give rise to an unbalance (Green et al. 2010). An analysis using Patterson’s D revealed that the observed values for Neandertal were more extreme than expected by chance and were taken as evidence for hybridization (Green et al. 2010). This test has been used in a number of other taxa, such as primates (Prüfer et al. 2012), flycatchers (Rheindt et al. 2013), and Heliconius butterflies (Martin et al. 2013). However, a problem in interpreting Patterson’s D is that ancestral population structure can produce patterns undistinguishable from hybridization (Durand et al. 2011). In the case of Neandertal, a spatially structured model with realistic demographic parameters can produce D values identical to the ones measured from real genomes, even in the absence of hybridization (Eriksson and Manica 2012).
In an attempt to increase the power to detect hybridization, Yang et al. (2012) focused on the frequency distribution of Neandertal alleles in Eurasian populations at biallelic loci where Neandertal differ from the chimpanzee reference genome and modern-day Africans have the chimp allele. These loci have been called “doubly conditioned,” as they need to have the same allele in a modern African genome and the chimp genome (first condition) but to differ between chimp and Neandertal genomes (second condition; see fig. 1a for a schematic representation). Such loci should, in principle, be enriched for mutations that occurred in the Neandertal line and subsequently entered the human line through hybridization, and their relative frequency (the doubly conditioned frequency spectrum, dcfs, shown in fig. 1b) should be an informative measure of the strength of hybridization. Yang et al. (2012) showed that a population genetics model that represents ancient structure in Africa with two populations (see fig. 2a and b for a graphical representation of this model) predicts a deficit of rare doubly conditioned alleles (e.g., of frequency one in the sample) compared to the frequencies estimated from real data. Adding hybridization to such a model, however, restored the appropriate shape of the doubly conditioned allele frequency spectrum. Thus, the dcfs seems to be an informative metric to distinguish between hybridization and ancient population structure, and this result has been taken as a confirmation of hybridization between Neandertal and AMHs (e.g., Sankararaman et al. 2012).
However, it remains to be determined whether the dcfs can distinguish between hybridization and ancient structure when a spatially structured model with multiple populations is used instead of Yang et al.’s representation of ancient structure in the whole Africa continent with only two populations. Such spatially structured models better capture the global genetic clines in within-population genetic diversity observed in AMHs (Prugnolle et al. 2005; Ramachandran et al. 2005). Here we use the same spatially structured stepping stone model as previously presented in Eriksson and Manica (2012) to explore the properties of the dcfs with a fine-scale representation of ancient structure (fig. 2c, see supplementary material S1, Supplementary Material online, for details). Realistic demographic parameters were obtained by fitting the stepping stone to match worldwide patterns of spatial differentiation among modern populations and were further subsetted to focus on parameter combinations that predicted D between Africans and Europeans to be within 0.0020 U of the observed value 0.0457. This simple spatial model, which does not include any hybridization, predicts frequency spectra of doubly conditioned alleles (the dcfs) that are in line with observed values (gray lines and shaded ranges in fig. 3a), matching closely the empirical proportion of rare alleles (giving R2 = 99.2% for the best fit). Some demographic parameter combinations give rise to a slight excess of very common alleles, but there are a large number of combinations that fit the observed dcfs almost perfectly (ten examples are shown as lines in fig. 3a, gray lines; see SOM for details). This spatially explicit model (which has eight free parameters) provides a fit that is comparable (R2 = 99.2 vs. R2 = 99.7%) to the admixture model in Yang et al (2012) (which has nine free parameters; blue line in fig. 3a). It is also considerably better than the best model fit for ancient population structure presented in Yang et al (2012), which has an R2 = 93.7% (green line in fig. 3a). The large proportion of rare doubly conditioned alleles in our spatially structured model is a consequence of deep splits in gene genealogies, with old, relatively rare lineages being preserved by the fine-grained spatial structure in the model (fig. 3b). In other words, the presence of multiple (spatially structured) populations within Africa prevents lineages from coalescing too quickly, thereby allowing for a few European lineages to merge back with Neandertal before meeting any African lineage. In many cases, such lineages are only represented by one or two individuals, giving an excess of rare doubly conditioned loci.
It is beyond the scope of this short letter to provide a formal test for alternative hybridization scenarios with Neandertal. Population structure affects a number of aspects of the similarities between Eurasians and Neandertal. For example, the degree of matching between ancient and derived SNPs in candidate regions for hybridization (SOM 17 in Green et al. [2010]) can be reproduced by a spatial model analogous to the one presented in this letter, without any hybridization (Eriksson and Manica 2012). A number of studies, including the first analyses of two new Neandertal genomes (Prüfer et al. 2013), provides an intricate picture of possible hybridization events among a number of hominins. Possibly, the clearest analysis pointing to hybridization is the dating of the Neandertal gene flow into modern humans based on linkage disequilibrium patterns (Sankararaman et al. 2012). However, such dates are based on the same demographic representation used in Yang et al. (2012). Thus, it will be interesting to see whether linkage disequilibrium patterns are affected by different spatial representations of population structure or not.
In general, the very different results obtained by a model that represents genetic structure in Africa with two populations (Yang et al. 2012) versus our spatially structured model highlight the importance of the coarseness at which space is described. When investigating hybridization, especially in the case of recently diverged species, metrics have been devised to focus the power of the analysis on the key signals that would be expected from hybridization. However, spatial structuring of populations can easily mimic such signals. No matter how sophisticated the metrics are, the properties of different demographic models should be explored, in particular how robust the analysis is to the spatial scale of demographic processes.
Supplementary Material
Acknowledgments
This work was supported by the Leverhume Trust and the Biotechnology and Biological Sciences Research Council (grant BB/H005854/1). D. Daversa, C. Jiggins, and two anonymous referees provided useful comments on the manuscript.
References
- Abbott R, Albach D, Ansell S, Arntzen JW, Baird SJ, Bierne N, Boughman J, Brelsford A, Buerkle CA, Buggs R, et al. Hybridization and speciation. J Evol Biol. 2013;26:229–246. doi: 10.1111/j.1420-9101.2012.02599.x. [DOI] [PubMed] [Google Scholar]
- Durand EY, Patterson N, Reich D, Slatkin M. Testing for ancient admixture between closely related populations. Mol Biol Evol. 2011;28:2239–2252. doi: 10.1093/molbev/msr048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eriksson A, Manica A. Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins. Proc Natl Acad Sci U S A. 2012;109:13956–13960. doi: 10.1073/pnas.1200567109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, et al. A draft sequence of the Neandertal genome. Science. 2010;328:710–722. doi: 10.1126/science.1188021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lowery RK, Uribe G, Jimenez EB, Weiss MA, Herrera KJ, Regueiro M, Herrera RJ. Neanderthal and Denisova genetic affinities with contemporary humans: introgression versus common ancestral polymorphisms. Gene. 2013;530:83–94. doi: 10.1016/j.gene.2013.06.005. [DOI] [PubMed] [Google Scholar]
- Martin SH, Dasmahapatra KK, Nadeau NJ, Salazar C, Walters JR, Simpson F, Blaxter M, Manica A, Mallet J, Jiggins CD. Genome-wide evidence for speciation with gene flow in Heliconius butterflies. Genome Res. 2013;23:1817–1828. doi: 10.1101/gr.159426.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prüfer K, Munch K, Hellmann I, Akagi K, Miller JR, Walenz B, Koren S, Sutton G, Kodira C, Winer R, et al. The bonobo genome compared with the chimpanzee and human genomes. Nature. 2012;486:527–531. doi: 10.1038/nature11128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, Heinze A, Renaud G, Sudmant PH, de Filippo C, et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature. 2013;505:43–49. doi: 10.1038/nature12886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prugnolle F, Manica A, Balloux F. Geography predicts neutral genetic diversity of human populations. Curr Biol. 2005;15:R159–R160. doi: 10.1016/j.cub.2005.02.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, Cavalli-Sforza LL. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci U S A. 2005;102:15942–15947. doi: 10.1073/pnas.0507611102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rheindt FE, Fujita MK, Wilton PR, Edwards SV. Introgression and phenotypic assimilation in Zimmerius flycatchers (Tyrannidae): population genetic and phylogenetic inferences from genome-wide SNPs. Syst Biol. 2013;63:134–152. doi: 10.1093/sysbio/syt070. [DOI] [PubMed] [Google Scholar]
- Sankararaman S, Patterson N, Li H, Pääbo S, Reich D. The date of interbreeding between Neandertals and modern humans. PLoS Genet. 2012;8:e1002947. doi: 10.1371/journal.pgen.1002947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seehausen O. Hybridization and adaptive radiation. Trends Ecol Evol. 2004;19:198–207. doi: 10.1016/j.tree.2004.01.003. [DOI] [PubMed] [Google Scholar]
- Smith J, Kronforst MR. Do Heliconius butterfly species exchange mimicry alleles? Biol Lett. 2013;9:20130503. doi: 10.1098/rsbl.2013.0503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sousa V, Hey J. Understanding the origin of species with genome-scale data: modelling gene flow. Nat. Rev Genet. 2013;14:404–414. doi: 10.1038/nrg3446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang MA, Malaspinas A-S, Durand EY, Slatkin M. Ancient structure in Africa unlikely to explain Neanderthal and non-African genetic similarity. Mol Biol Evol. 2012;29:2987–2995. doi: 10.1093/molbev/mss117. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.