Significance
The relative contribution of random genetic drift and natural selection to the change in allele frequencies through time is a long-standing question in evolutionary biology. We show through theory and simulation how genomic time series—such as ancient DNA datasets—can be used to decompose the genome-wide contributions of selection, gene flow, and genetic drift to allele frequency change. We apply these methods to two time series from ancient Europeans and show that gene flow accounts for most allele frequency change over the last few thousand years, with genetic drift and not selection making up much of the rest of the contribution to genome-wide evolutionary change.
Keywords: linked selection, gene flow, time series, ancient DNA, human evolution
Abstract
Genomic time series from experimental evolution studies and ancient DNA datasets offer us a chance to directly observe the interplay of various evolutionary forces. We show how the genome-wide variance in allele frequency change between two time points can be decomposed into the contributions of gene flow, genetic drift, and linked selection. In closed populations, the contribution of linked selection is identifiable because it creates covariances between time intervals, and genetic drift does not. However, repeated gene flow between populations can also produce directionality in allele frequency change, creating covariances. We show how to accurately separate the fraction of variance in allele frequency change due to admixture and linked selection in a population receiving gene flow. We use two human ancient DNA datasets, spanning around 5,000 y, as time transects to quantify the contributions to the genome-wide variance in allele frequency change. We find that a large fraction of genome-wide change is due to gene flow. In both cases, after correcting for known major gene flow events, we do not observe a signal of genome-wide linked selection. Thus despite the known role of selection in shaping long-term polymorphism levels, and an increasing number of examples of strong selection on single loci and polygenic scores from ancient DNA, it appears to be gene flow and drift, and not selection, that are the main determinants of recent genome-wide allele frequency change. Our approach should be applicable to the growing number of contemporary and ancient temporal population genomics datasets.
There is a long-standing debate about the role of genetic drift versus selection in evolutionary change (1–7). While this debate has sometimes been contentious, the answers to these questions are quantitative, describing the relative contributions of basic evolutionary forces to allele frequency change and how this differs across species and different functional categories. Estimating these contributions is complicated, in part because selection can have direct and indirect effects, where the indirect effects include “linked selection,” the impact of correlated selection at nearby selected sites (7–12). The problem is made more difficult as we often rely on a single snapshot of contemporary genomes to tease apart multiple interacting processes (gene flow, demography, hard or soft sweeps, background selection, selective interference, etc).
Genomic time series, from museum collections and ancient DNA, offer a potent reservoir of temporal genetic data to track the changes in allele frequencies, identify selected loci, and understand the impact of other evolutionary forces (e.g., refs. 13–17). Ancient human DNA has already revolutionized our view of human history, revealing that large-scale population movement and gene flow are pervasive, with complex patterns of allele frequency change driven by population turnover.
Unlike genetic drift, allele frequency change due to either selection or gene flow is expected to be sustained and directional. Recent investigations have highlighted the role of gene flow and population structure in confounding our interpretation of genetic signals of selection in humans (18–21). Methods to look at selection at single loci and on polygenic scores in human ancient DNA now often account for the confounding effects of gene flow. These approaches have revealed persuasive signals of selection (13, 21–26). However, these methods only capture outlier signals, and so cannot give us a full picture of how gene flow, selection, and genetic drift have driven genome-wide change. Linked selection is thought to have a critical role in shaping patterns of genetic diversity and divergence in humans on long time scales, with an autosomal reduction in diversity of upward of 20% from background selection alone (27, 28). Some authors have also argued for a pervasive role of selective sweeps in shaping genome-wide patterns of diversity (29, 30). Thus, it is an open question how much of allele frequency change genome-wide is driven by selection in humans.
We set out to decompose the total variance in allele frequency change into the contributions of linked selection, gene flow, and drift. Unlike genetic drift, ongoing selection creates covariance in allele frequency change between nonoverlapping time intervals. The use of genome-wide allele frequency change covariances in time series data has recently been proposed to identify the proportion of genome-wide change due to selection in closed populations (31). In a panmictic population, a genome-wide positive covariance between a pair of time periods indicates the compounding of allele frequency change across generations, while negative covariance can potentially result from selection pressures in opposite directions. Many different modes of selection are expected to generate these covariance patterns (32, 33). This approach has been applied to experimental evolution datasets (33, 34) and to natural populations where temporal sampling is available (in Mimulus, oaks and cod; 35–37). These applications, along with related methods (38), have revealed that a reasonable proportion of total allele frequency change, especially in artificial selection experiments, can be attributed to widespread selection beyond just a few outliers. However, applying these methods when population structure and migration are present will give biased results, as sustained gene flow across time periods can also drive temporal covariance in allele frequency change.
Here, we develop theory and simulations to show how the effect of gene flow can be accounted for using an admixture model from a known set of sources to recover the genome-wide contribution of gene flow and linked selection. We demonstrate this approach using two European human ancient DNA time series from the United Kingdom and the Bohemian region of Central Europe to quantify the contributions of linked selection and gene flow to the total variance in allele frequency change between the Neolithic and a modern or recent time point. In both cases, we find a major contribution of gene flow to allele frequency change. After correcting for known gene flow in these time transects, we do not observe any signal of genome-wide linked selection. However, we detect a weak signal of linked selection in levels of temporal allele frequency variance in regions of the genome with low recombination and high gene density.
Results
Model.
We consider a model of data from a population sampled at multiple discrete time points () from an arbitrary geographic region. This population receives gene flow, modeled as single pulses of admixture, from other known source groups through time. We follow the population allele frequency over time, , and use , defined as , to denote the change in allele frequency between adjacent time points. The focal population has mean ancestry fractions () from source populations that will change over time due to gene flow. We assume that allele frequencies in the source populations are constant and that good proxy samples for these sources are available, and discuss the implications of those assumptions later.
Given sample frequencies at some large set of SNPs, we can calculate the empirical variance–covariance matrix of allele frequency change for our time series, , for all combinations of time intervals and , averaged over SNPs. In calculating these covariances we include adjustments for biases in the variance and covariance estimates due to shared sampling of an intermediate time point (see Materials and Methods, Calculating the Covariance Matrix, and Appendix A).
Under our model, the total variance in allele frequency change between the first sampled time point (0) and any following time point () in the time series can be decomposed into sums over time intervals of the contributions due to drift, selection and admixture:
[1] |
For simplicity, here we omit an interaction between drift in one time period that admixture in later time periods subsequently erases, that adds an additional term to covariances (Appendix C) that we account for.
The expected variance and covariance of allele frequency change due to admixture follow from the expected allele frequency change in ancestry proportions through time. Specifically, if in the th time interval admixture changes the ancestry proportion from the source from to (), then the expected change in frequency due to admixture is , where is the allele frequency in the source population . Thus, the admixture contribution to covariance in allele frequency change between time periods and can be expressed as
[2] |
As we only have sample allele frequencies from proxies of the sources of admixture, this matrix is corrected for sampling noise biases in (Appendix B). With this admixture covariance in hand, we can now calculate the contribution of gene flow to allele frequency change, and adjust for the contribution of admixture when looking for covariances induced by selection.
We express the estimated proportion of total variance in allele frequency change attributable to gene flow (admixture) up to time () as
[3] |
where the terms in the numerator are given by Eq. 2. Note that might be an under-estimate as it excludes the contribution of gene flow from unmodeled sources, as well as gene flow events that leave the admixture proportions relatively unchanged.
The proportion of total variance in allele frequency change between and due to linked selection, , is defined as the ratio of the total covariance due to linked selection over the total variance (31). Under our model, can be estimated by correcting the empirical covariance by the estimated covariance term due to admixture:
[4] |
We also report an estimate of not controlling for admixture (). Note that is a lower bound on the proportion as it does not account for the contribution of linked selection to the variance in allele frequency change within time periods. We attribute the residual proportion of the total allele frequency variance to drift-like allele frequency change. This proportion of the temporal variance is consistent with genetic drift as it excludes covariances between time periods, which can not come from drift, and the contribution of known gene flow.
Simulations.
To illustrate our decomposition of genome-wide allele frequency change, we simulated a simple scenario where a population receives pulses of admixture (arrows in Fig. 1A and SI Appendix, Fig. S1). This repeated admixture results in positive covariances between time intervals due to the admixture-driven allele frequency change in generations 160 to 100 (measured before present) being in the same direction as those in generations 60 to 0 (Fig. 1 B and C). We can remove the covariance due to admixture Eq. 2, resulting in covariances close to zero (Fig. 1 B and D).
We can calculate the total contribution of gene flow to allele frequency change in our simulated time series (Eq. 3) and see that gene flow accounts for much of the allele frequency change. Because the repeated gene flow creates positive covariance in allele frequency change, failing to account for this gene flow generates a spurious signal of linked selection (large nonzero ’s, dashed black line Fig. 1E). However, when we account for gene flow, the signal of linked selection is almost completely removed from our neutral simulations (, black line, Fig. 1E). The remaining slightly nonzero value in our final time period (Fig. 1E) results from a slight over-correction for the covariance due to the interaction of drift and gene flow (SI Appendix, Figs. S2 and S3).
To illustrate the effects of selection on the covariance, we extended the above admixture simulations to include a set of loci underlying traits under stabilizing selection around a moving optimum (Materials and Methods, Simulations). In these simulations, we can see the proportion of neutral allele frequency change being due to selection increasing as covariances build up over time (black line Fig. 1D and SI Appendix, Fig. S3). The effect of selection on the covariances in neutral allele frequencies is also well illustrated in a model without any gene flow (SI Appendix, Figs. S4 and S5).
Ancient DNA Time Transects.
We investigated two time transects of allele frequencies in ancient humans in restricted geographical regions, the first one in the United Kingdom (ref. 39, the England and Wales samples), and the second in the Central European region of Bohemia (ref. 40, samples from the current Czech Republic) spanning periods back to 5,600 y ago. Both these time series cover major migrations of people, where an initial mixture of early-farmer-like ancestry (EEF-like) and Western hunter-gatherer-like ancestry (WHG-like) is partially replaced by Steppe pastoralist-like ancestry (Steppe-like). This large turnover due to Steppe-like migration into Central and Western Europe was followed by a progressive increase in EEF-like ancestry over a longer time period.
We combined the Patterson et al. (39) UK data from 793 ancient individuals with the present-day GBR 1,000 genomes samples (“people of European ancestry from Great Britain (GBR)”), to form a time series that runs from 5,500 y ago to the present day. Following Patterson et al. (39), we broke the samples into seven time periods corresponding to transitions in ancestry proportions (Fig. 2C). We used the individual ancestry proportions of Patterson et al. (39) inferred from a qpAdm three population model. These ancestry proportions are calculated to reflect genetic similarity to a set of samples that are pre-specified proxies for sources of ancestry. Note that an increase in a particular ancestry likely does not reflect gene flow directly from that source but rather from more nearby groups who themselves were already mixtures. In turn, each of the three major putative source ancestries was a product of admixture in the past.
The covariance matrix of allele frequency changes between time periods is shown in Fig. 2A. The UK time transect shows several large negative covariances between allele frequency change in the first time period (, 5,424 to 4,005 y ago) and subsequent time periods (see also black points in Fig. 3A). This negative covariance largely reflects that the initial large population turnover due to Steppe-like migration (during the first time period), was followed by an increase in EEF-like ancestry Fig. 2C, (39) generating allele frequency changes in the opposite direction. After correction by admixture, covariances are strongly reduced with only a small subset differing significantly from 0 (Fig. 3C). In the UK time transect, most of the total variance in allele frequency change across time is due to admixture-based changes, with 60% of allele frequency change being driven by the Steppe-like gene flow, only for the contribution of admixture to drop gradually as the EEF-like ancestry increases due to subsequent migration(s) (Fig. 3A, E, and G). If we do not adjust for admixture, our estimate of the contribution of linked selection () is negative (Fig. 3E, ), reflecting the negative covariances induced by admixture. After adjusting for admixture there is no signal of a long-term contribution of linked selection, with not departing from zero, as there is no consistent pattern of residual positive covariances. Our empirical covariance results match those produced by a UK-like neutral simulation (41) model with UK matched admixture pulses (SI Appendix, Figs. S16 and S17). Finally, we checked for the ascertainment bias effects on our and estimates making use of the fact that the genotyping array consists of SNP sets discovered in different ascertainment panels. While using subsets of differently ascertained loci increases our uncertainty in our estimates, particularly for panels from more genetically distant samples, we find that our results are robust to the ascertainment scheme (SI Appendix, Fig. S8).
We investigated another time transect from the Bohemia region (40) spanning from 5,606 to 3,037 y ago, which we also split into seven time periods following the original paper (Fig. 2D). The largest covariance is negative, and an order of magnitude larger than in the UK dataset (Fig. 3B), and again seems to be due to the large influx from a Steppe-like source between 3,880 and 3,120 y ago (between time points 3 and 4, Fig. 2D) and subsequent recovery of EEF-like ancestry (between and , Fig. 2D). Again this large covariance due to admixture can be corrected for (Fig. 3D). While the older time points have a large amount of uncertainty, due to small sample sizes (Materials and Methods, Ancient DNA Analyses), nearly all of the variance in allele frequency change across the full time period is accounted for by admixture () and we see little evidence of allele frequency change attributable to linked selection in this Bohemian time series (Fig. 3F, a result that holds over SNP ascertainment scheme SI Appendix, Fig. S8).
In sum, we see little evidence, in either transect, of linked selection in the covariances in allele frequency change between time intervals, suggesting that having accounted for admixture, much of the residual change across time intervals is due to drift-like sampling processes. One caveat is that if selection operates over short time scales, e.g., selection pressures fluctuate or deleterious alleles are quickly lost, selection could generate substantial allele frequency change (variance) within time intervals but little to no covariance between the time intervals we consider.
To address the concern about the time intervals, we first reran our covariance analysis on the larger UK dataset splitting each time period in half. With this finer dissection of short-term covariances, we still see no evidence of covariance due to linked selection (SI Appendix, Fig. S12). To further explore the effect of time intervals we extended our simulations of Gaussian stabilizing selection and found that the sum of covariances decreases with the length of the time interval studied (Fig. 4A), however, this effect is only pronounced when the recombination rate is low (SI Appendix, Fig. S6). Temporal covariances are also generated under models of background selection (Fig. 4B, SI Appendix, Fig. S7, and ref. 33) and, while somewhat diminished, these also persist with longer sampling periods. Thus our simulations suggest that while temporal binning of ancient DNA samples will lead to lower covariances, the signal of linked selection should still be detectable.
One further prediction is that linked selection is expected to have larger effects in low recombination regions than high recombination regions, and in regions with a greater density of functional sites patterns that are seen in human polymorphism datasets, (27, 42, 43). In simulations, we can see this effect, with greater temporal covariances in regions of lower recombination (Fig. 4A) and larger variances in allele frequency changes with lower recombination (Fig. 4C). While the temporal covariances decrease with longer time interval, linked selection makes a greater contribution to the variance of allele frequency change within time intervals, so the overall signal of linked selection can be retained in the correlation of allele frequency temporal variances with recombination. To empirically examine this effect of selection we binned SNPs by their local recombination rate and a measure of the potential strength of linked selection, the B-value, which at each location in the genome combines the information of recombination rates and density of coding sites (27, 28). In both time transects, we do not observe any significant variation in the and statistics recalculated in bins of recombination rate (44) or the B-value (SI Appendix, Figs. S10 and S14). However, in the UK transect, we do see a significant increase in the total variance in allele frequency change in the lowest B-value bin (corresponding to the largest decrease in effective population size due to background selection, Fig. 5A, first bin mean is above the genome-wide 95% CI) and in the variances of change within some of the time intervals (Fig. 5B). Noise in the smaller Bohemia time transect precludes seeing such effects (SI Appendix, Figs. S13–S15). The allele frequency temporal variance increase in the United Kingdom for the lowest quintile B-value compared to the genome-wide mean is of 14.8%, suggesting a fifth of this, 3%, is an estimate of the genome-wide contribution of linked selection to allele frequency change.
Discussion
Here, we have shown how ancient DNA data can be used to decompose the contribution of gene flow, linked selection, and drift to genome-wide allele frequency change. Using two ancient DNA time transects, our results demonstrate that gene flow is the dominant force changing allele frequencies in the recent history of European human populations, and that selection-driven change is not common across the genome. This does not necessarily contradict the number of signals of temporal selection found to date, as a small fraction of loci could be subject to strong selection (e.g., refs. 13, 14, 23–26). Indeed, some of these methods apply similar admixture adjustments as ours, but look for genome-wide outliers and so only detect strong selection on single loci, e.g., Mathieson and Terhorst (13) expect to be able to detect selection coefficients . Another set of approaches looks for ancient selection on polygenic scores constructed from genome-wide association studies (13, 14). These approaches account for admixture and can detect subtle shifts at loci in ancient DNA, but rely on the fraction of genetic variation for specific traits captured in modern-day samples (45). Thus, our genome-wide method is complementary to both time series outlier approaches as well as phenotype-motivated approaches.
The large contribution of gene flow to evolutionary change in the past few thousand years is not surprising given the dynamic picture of population movement that has emerged from ancient DNA. Our results provide additional evidence that allele frequency changes are well fit by relatively simple admixture models and strengthen the view that multiple migrations events throughout the history of European Human populations have played a preponderant role in the composition of modern populations. The lack of a substantial contribution of linked selection is perhaps more surprising. Linked selection has been estimated to account for upward of 20% reduction in long-term patterns of human diversity under models of background selection (27, 28), and so we should expect a similar portion of the variance in allele frequencies to come from linked selection. Much of this effect should manifest itself in the compounding of positive covariances between allele frequency changes across the generations. While this effect has been seen empirically in selection experiments and in some natural populations, we currently do not see any evidence of this in humans. One possibility is that the time periods we consider are not long enough for strong covariances to build up, as the long-term patterns of linked selection reflect dynamics over coalescent time scales of hundreds of thousands of years. In contrast, the other possibility is that negative selection generating background selection is fast enough that it does not contribute to covariances among the time periods used here. However, under this latter interpretation, we should see higher allele frequency variances in regions predisposed to stronger linked selection, but we see this effect only weakly when partitioning loci by B-value. Larger collections of ancient DNA will allow better temporal resolution of allele frequency covariances, which could be combined with more individual-level approaches to avoid the need for sample lumping in time periods. It is also possible that some signals of linked selection may be washed out at the fine geographic scale of our time series, as our time series approach may partially be picking up ephemeral change which may average out over the much larger meta-population within which our time series are embedded.
Our approach uses ancestry proportions from ancient DNA for the three major inferred waves of gene flow into Europe. The sparsity of ancient DNA means that we rely on the ancestry proportions of relatively small samples of ancient individuals to be representative of people living in the past. However, the periods that we divide our samples into reflect reasonably well-established periods in the peopling of the regions. We also rely on allele frequencies in a set of samples as proxies for sources of gene flow. As we discuss below, the misspecification of the sources of gene flow may appear as evolutionary change within our focal time series. One future extension might be to use principal components analysis to learn about major axes of population structure involving samples in a time series and then to regress these PCAs out of our genotypes to account for variation in ancestry composition in a more model-free manner.
Our admixture correction seems to perform well on time intervals involving the large ancestry shift in Steppe-like and EEF-like ancestry (compare black points between Fig. 3A and C). However, we see several negative covariances that remain after adjustment for admixture (Fig. 3 A and C). In principle, these could reflect fluctuating selection, but that seems unlikely given the general lack of other evidence of selection. Rather, these covariances could reflect that our proxies for gene flow sources only capture part of the allele frequency change driven by gene flow. Indeed, the increase of EEF-like ancestry in the UK population is driven by subsequent migration(s) from populations similar to the UK but with a higher proportion of EEF-like ancestry, probably from mainland Europe. Therefore, modeling the increase of EEF-derived alleles with the ancestral EEF allele frequencies might not fully account for the impact of migration. More detailed modeling with admixture graphs and tree sequences could help better resolve the sources of gene flow in time series (e.g., refs. 25, 41, and 46, although such inferences may currently not be fully robust, ref. 47).
We attribute residual variance after accounting for gene flow and temporal covariances to drift-like processes. Genetic drift from the compounded sampling of parents to form each generation in our geographic area will obviously contribute to this. However, as our focal geographic areas are not homogeneous populations, small changes in ancestry composition over time not captured by larger-scale admixture analyses might be captured as drift-like processes. Finally, as we take a fixed sample to reflect the allele frequencies in the sources of gene flow, change in the actual groups contributing to gene flow can also contribute to the signal of drift; e.g., if the allele frequencies in the source of EEF-like ancestry early differs from those contributing EEF-like ancestry later in the time series, that will appear as drift-like change in our focal time series. Our drift-like change in allele frequency is small, corresponding to relatively large estimates of temporal effective population sizes (Materials and Methods, Ancient DNA Analyses). However, further work is needed to separate the long-term effect of drift and the combined contribution of other drift-like processes to our estimates of allele frequency change.
Extensions of our approach to larger geographical areas would allow the contributions of local genetic drift and migration among regions to be more fully explored. Such analyses would also pose an interesting set of modeling challenges to measure evolutionary change across spatially spread populations experiencing both local migration and more long-range gene flow events.
Finally, while a large body of ancient DNA work has focused on humans, ancient DNA and museum datasets for a wide range of other organisms are also being generated (e.g., dogs, (48); horses, (49); sticklebacks, (50); chipmunks, (51); Amaranthus, (52)). The spread of ancient DNA and museum DNA research as well as more widespread usage of genome-wide sequencing to temporally monitor contemporary natural populations will generate a rich set of resources of time series data. This offers the chance for comparative studies to decompose the contribution of different forces to genome-wide evolutionary change across systems, time scales, and ecological and selection regimes.
Materials and Methods
Calculating the Covariance Matrix.
We bin our samples into a set of discrete time points and then calculate the allele frequency change at SNP between adjacent time points, and , . We then calculate the empirical variance–covariance matrix of these allele frequency changes for all time points averaged across SNPs. We denote the raw covariance matrix by . We wish to quantify the expected contribution of admixture to this matrix, but in doing so we also have to correct for sampling noise in both the time series allele frequencies and sources of admixture. The corrected covariance matrix is given by
[5] |
is the expected matrix of biases from using sample frequencies in calculating the empirical covariance matrix (Eq. 6). Our admixture adjustment, , is the expected admixture variance–covariance matrix (Eq. 2), where proxy samples are used as references for the admixture sources. The matrix is the expected bias in the admixture matrix due to the sample noise from using sample frequencies in our admixture correlation (Eq. 7). Finally, is the expected drift/admixture interaction matrix (Appendix C).
Here we calculate the sampling biases in the specific case of pseudohaploid data in line with the ancient DNA datasets considered in this paper (Appendix A), using for the haploid sample size at time point and the sample frequency at SNP . The sampling noise from taking a small sample of individuals inflates the variance of allele frequency change and shared sample between adjacent time points creates covariance
[6] |
with all other terms in the matrix set to zero [Appendix A and ref. 31). These matrices are calculated as an average over all our SNPs. Second, sampling noise is also present in frequencies of the samples used as proxies for admixture, and so this biases the admixture expectation as the same reference samples are used for multiple time points (Appendix B):
[7] |
following Eq. B.3 with the admixture proportion from reference population at time , and the empirical allele frequency in the reference population .
Finally, the simple admixture covariance expectation is missing a term due to shared drift variances between time intervals. This can be estimated as shown in Appendix C and requires the assumption that only one parental population is contributing to gene flow during each time interval. is given by Eq. C.9 and is dependent on the estimated drift variance terms in parental populations and admixture proportions at each time step common between two time points.
Simulations.
We used the Demes format to write inter-operable demographic scenarios (53). This allowed us to run the same model with either msprime for neutral simulations (54) or SLiM (v3.7) for simulations including selection (55). Results were recorded as tree sequences and analyzed in Python using tskit (56). All results are based on 100 replicates of each scenario. The simulations pipeline was built with snakemake (57) and can be found in the zenodo archive https://doi.org/10.5281/zenodo.8093105 and includes the version of all software used.
In the main text simple scenario, an ancestral population splits into the source populations 1,500 generations before present (BP). All populations are kept at a constant size of 10,000 diploid individuals. 200 generations BP our focal population that receives the admixture pulses is created from the first parental population (pop0). Pulses of admixture from pop1 happen at regular 20 generations interval starting at 150 and finishing at 10 generations BP. We sample 30 individuals 10 generations before and after pulses in our focal population. For our admixture sources, we sample 30 individuals from each parental population at 200 generations BP for allele frequency computations. Samples are rendered pseudohaploid to mimic ancient DNA results (though no missing data was inserted). A census event of all populations is performed in the source populations when the admixed population is created to allow us to compute the admixture proportions of all descendants.
We simulated a chromosome 100 Mbp in length with a mutation rate of per bp and per generation and a uniform recombination rate of .
To simulate linked-selection in SLiM (55), we considered three independent polygenic traits with alleles having a random effect size of evolving under a model of stabilizing selection around an initial optimum of 0 for each trait. The fitness landscape is a Gaussian function centered on the optimum with a variance, , of 1. The optimum is gradually shifted from 0 to 3 SDs between 140 generations BP to the present similarly in all extant populations (by steps of shift/time). The ancestral population has a burn-in of 0.1N generations in SLiM and the complete ancestral history has been recapitated with pyslim (58). Mutations under selection are not used in the downstream analyses and neutral mutations are added a posteriori with msprime. We note that these simulations are not intended to mimic a particular selection scenario, as the density of loci underlying different traits per chromosome is unknown. Rather the parameters were chosen to generate results where both admixture and selection made comparable contributions for illustration purposes.
Finally, to investigate the role of sampling intervals and recombination rates we repeated the above Gaussian stabilizing selection simulations with varying recombination rates [, , , , ] approximately spanning the range of human recombination rates, and analyzing the outputted tree sequences at different time sampling intervals [2, 5, 10, 15, 20]. We also simulated background selection using a model similar to Buffalo and Coop (33) in SLiM where we use several deleterious mutations per haploid genome per generation and a negative selection coefficient . The rest of the model and analysis pipeline is similar to the Gaussian stabilizing selection case above.
Ancient DNA Analyses.
We used two datasets from Patterson et al. (39) and Papac et al. (40) for ancient DNA time transects in the United Kingdom and the Bohemian region respectively. Data from those papers were downloaded from the indicated sources and merged with a set of parental population proxies and modern samples from the AADR v50.0 1240k dataset (59) using data from refs. 23, 48, 60–85. Modern samples were used to provide a modern time point in the UK time transect. The data analysis snakemake pipeline can be found in the zenodo archive https://doi.org/10.5281/zenodo.8093107 and includes the version of all software used.
Individuals from each time period defined in the original analyses are pooled together to compute allele frequencies and the mean estimated age is taken as the time point date. For the UK dataset (39), we merged the published data with AADR v50.0 (providing modern samples and parental population proxies), and with data from Fowler et al. (86) to access 10 individuals missing from the other datasets. Only loci with more than 10 genotypes in each time point grouping and more than five genotypes in reference populations were kept, resulting in 474,554 SNPs kept over the initial 1,135,618. We used combined filters 0 and 1 from Table S5 of Patterson et al. (39) as our quality and relevance filtering. This resulted in sample sizes of [37, 69, 26, 23, 273, 38, 62] for periods labeled [“Neolithic,” “Chalcolithic/EBA,” “Middle Bronze Age,” “Late Bronze Age,” “Iron Age,” “Post Iron Age,” “Modern”] and mean nonmissing genotypes across all SNPs of [25.5, 46.3, 19.6, 14.8, 208.2, 25.5, 59.9]. Mean sample dates for those periods are [5424, 4005, 3326, 2929, 2215, 1180, 0] B.P. Reference sample sizes are [18, 21, 18] for groups labeled [“WHGA,” “Balkan_N,” “OldSteppe”] in the dataset interpreted as WHG-like, EEF-like, and Steppe-like. Those reference samples have mean nonmissing genotypes across all SNPs of [7.4, 15.4, 12.4] respectively.
For the Papac et al. (40) dataset, only loci with more than two genotypes in each time point grouping and more than two genotypes in reference samples were kept, resulting in 461,844 SNPs kept over the initial 1,150,639. Sample sizes in this dataset are [3, 5, 29, 14, 48, 59, 84] for periods labeled [“Neolithic,” “Proto-Eneolithic,” “Early Eneolithic,” “Middle Eneolithic,” “Corded Ware,” “Bell Beaker,” “Unetice”] and mean nonmissing genotypes across all SNPs of [3., 3.7, 23.4, 10.9, 33.1, 41.9, 55.1]. Mean sample dates for those periods are [5607, 5253, 4229, 3880, 3726, 3120, 3037] B.P. Reference sample sizes are [4, 17, 15] for groups labeled [“WHG,” “Anatolia_Neolithic,” “Yamnaya”] in the dataset interpreted as WHG-like, EEF-like, and Steppe-like. Those reference samples have mean nonmissing genotypes across all SNPs of [3.5, 13.1, 9.0], respectively.
We used admixture measures from both published papers produced by the qpAdm method, extracted from the Table S5 from Patterson et al. (39) and Table S9 from Papac et al. (40). In concordance with the literature on European Human demographic history during the last 5,000 y, we consider the simplest three-way admixture between populations genetically most similar to European early farmers (EEF-like, early migrants from Anatolia), Western hunter-gatherers (WHG-like) and individuals associated to the Steppe pastoralists Yamnaya culture (Steppe-like).
We computed CIs around estimates by block bootstrap sampling of windows of 1,000 SNPs along the genome. Statistics computed for each window were re-sampled times with replacement and a 95% CI was computed by the pivot method as in Buffalo and Coop (33). Statistics were computed through a weighted average to account for variability in the number of SNPs in each window (windows at the end of chromosomes often do not contain the required number of SNPs). When dealing with ratio statistics (like ), we computed separately the numerators and denominators and used the ratio of the weighted averages for the final values.
Each dataset was transformed from the eigenstrat to the sgkit format through a plink (87) conversion step. Sex chromosomes were removed from the datasets. To investigate the correlation of our statistics with recombination or background selection, we incorporated in the dataset recombination rates (44, sex-averaged version) and B-values (28) for each SNP—by using the value of the window the SNP was in. We split all SNPs into five quantile bins and computed and proportions for each one, as well as the variance and covariances.
We can compute a simple estimate of the diploid effective population size, , by equating the expected variance due to drift after generations,
[8] |
with the residual variances for each time period in the studied datasets (having adjusted for the variance due to admixture and sampling). Using a generation time of 30 y, and using the number of generations between the mid-points of each time interval for the UK dataset we obtain [351,472; 3,204,693; 2,827,982; 2,041,971; 361,192; 203,114] for each time interval. For the Bohemia dataset time intervals, we get [768,209; 2,576,065; 1,392,381; 1,181,146; 1,275,489; 1,167,632]. We note that these effective population size estimates are only approximate as they do not account for the more continuous distribution of sampling times present in the data.
Supplementary Material
Acknowledgments
We thank members of the Coop lab, Vince Buffalo, and Joshua Schraiber for helpful discussions. We also thank the editor and reviewers for helpful comments during the review process. A.S. and G.C. were supported by the National Institute of General Medical Sciences of the NIH (NIH R35 GM136290 to G.C.).
Author contributions
A.S. and G.C. designed research; performed research; analyzed data; and wrote the paper.
Competing interests
The authors declare no competing interest.
Appendix
A. Pseudohaploid Sampling.
Following Buffalo and Coop (33), the observed variance in allele frequency at time can be decomposed with the law of total variance:
[A.1] |
[A.2] |
This gives us a way to correct the observed variance for sampling noise.
Pseudohaploid representation is common in ancient DNA data to avoid errors when calling heterozygotes. Most often, one read (and therefore allele) is selected randomly among the mapped reads for each individual at a given position. Pseudohaploid calling can be modeled as a binomial sampling. We consider sampling individuals in a population with frequency of the alternate allele at time . Pseudohaploid calling is equivalent to each individual drawing one allele from the pool of alleles. We define the number of alternate alleles sampled and . Then the sampling noise is
[A.3] |
[A.4] |
Correction of the variance is carried out by subtracting both sampling variances and . As in Buffalo and Coop (33), the covariances between two overlapping time intervals, are negatively biased by the shared sampling noise in , and this needs to be corrected by adding the shared time point sampling variance back in. For these corrections, we need an unbiased estimator of the half heterozygosity. We define the sample heterozygosity as , then
[A.5] |
Therefore
[A.6] |
Similarly, if needed, we can compute the diploid sampling bias estimator:
[A.7] |
B. Pseudohaploid Sampling Noise in Reference Populations.
Let’s consider the allele frequency of reference population
[B.1] |
The observed allele frequency is equal to the true allele frequency () plus sampling noise.
Therefore
[B.2] |
Decomposing as and remembering that , we end up with .
Therefore
[B.3] |
with , the variance of sampling noise in the pseudohaploid case, equal to (similar to Appendix A).
C. Accounting for Drift in the Admixture Correction.
We consider a simple model where only the focal admixed population experiences drift and parental populations from which gene flow occurs are not. This is in line with our use of a single proxy sample for the sources of admixture. Under this model drift happening in any of our contributing populations is absorbed into the drift observed in the focal population. Drift that occurs in one time interval can partially be erased by admixture in subsequent time intervals. This interaction between drift and gene flow generates additional covariance that needs to be accounted for.
Let be the frequencies of parental populations for a particular SNP. At time 0, an admixed population of frequency is established with ancestry proportions from any of the populations. Subsequently, between time points and , this admixed population can receive a migration pulse from any of the populations, where a proportion of individuals in the focal population are replaced by migrants. Between time intervals and , drift happens changing the frequency by . is the allele frequency at time of our admixed population. are the ancestry proportions at time of this admixed population.
We define the proportion of individuals replaced by admixture as
[C.1] |
The change in allele frequency can then be written as
[C.2] |
[C.3] |
[C.4] |
We can expand this out in terms of the change in allele frequency due to admixture and drift in the preceding time periods:
[C.5] |
We can express the ancestry fraction from source at time in terms of the change due to admixture in previous time periods:
[C.6] |
[C.7] |
As a constraint, if at each time step there is only one , then it simplifies to:
[C.8] |
allowing us to compute the admixture fraction from the ancestry proportions.
When computing the covariance between two time intervals and (), a composite drift term depending on the admixture pulses will be shared between all time intervals between times to between the two and . This expected admixture-drift term can be computed as
[C.9] |
with
[C.10] |
can then be subtracted from the empirical covariance to remove this effect.
To compute , individual time interval drift variances need to be estimated (). We can use the fact that the variances at each time interval can be decomposed as a linear combination of drift variances to solve for them. Solving is only possible when considering that only one parental population is contributing to gene flow at a given time to be able to estimate the values of individual Eq. C.8. The system for is of the form:
[C.11] |
Footnotes
This article is a PNAS Direct Submission.
Data, Materials, and Software Availability
No new data were produced for this work. Analyses pipelines are available at https://doi.org/10.5281/zenodo.8093105 for simulations and at https://doi.org/10.5281/zenodo.8093107 for the ancient DNA data. Those analyses rely on a custom helper python module available at https://doi.org/10.5281/zenodo.8093101. Previously published data were used for this work (39, 40).
Supporting Information
References
- 1.Kimura M., Evolutionary rate at the molecular level. Nature 217, 624–626 (1968). [DOI] [PubMed] [Google Scholar]
- 2.J. H. Gillespie, M. Kimura, The Status of the Neutral Theory: The Neutral Theory of Molecular Evolution. Motoo Kimura. Cambridge University Press, New York, 1983. xvi, 367 pp., illus. \$69.50 Science 224, 732–733 (1984). [DOI] [PubMed]
- 3.Kreitman M., The neutral theory is dead. Long live the neutral theory. BioEssays 18, 678–683 (1996). [DOI] [PubMed] [Google Scholar]
- 4.Kern A. D., Hahn M. W., The neutral theory in light of natural selection. Mol. Biol. Evol. 35, 1366–1371 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Jensen J. D., et al. , The importance of the Neutral Theory in 1968 and 50 years on: A response to Kern and Hahn 2018. Evolution 73, 111–114 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Buffalo V., Quantifying the relationship between genetic diversity and population size suggests natural selection cannot explain Lewontin’s Paradox. Elife 10, e67509 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sella G., Petrov D. A., Przeworski M., Andolfatto P., Pervasive natural selection in the Drosophila genome? PLoS Genet. 5, e1000495 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Maynard-Smith J., Haigh J., The hitch-hiking effect of a favourable gene. Genet. Res. 23, 23–35 (1974). [PubMed] [Google Scholar]
- 9.Kaplan N. L., Hudson R. R., Langley C. H., The “hitchhiking effect’’ revisited. Genetics 123, 887–899 (1989). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Charlesworth B., Morgan M. T., Charlesworth D., The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289–1303 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Barton N. H., Genetic hitchhiking. Philos. Trans. R. Soc. Lond. B Biol. Sci. 355, 1553–1562 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.G. Coop, Does linked selection explain the narrow range of genetic diversity across species? bioRxiv [Preprint] (2016). 10.1101/042598 (Accessed 7 June 2023). [DOI]
- 13.Mathieson I., Terhorst J., Direct detection of natural selection in Bronze Age Britain. Genome Res. 32, 2057–2067 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.M. K. Le et al., 1,000 ancient genomes uncover 10,000 years of natural selection in Europe. arXiv [Preprint] (2022). 10.1101/2022.08.24.505188 (Accessed 29 August 2022). [DOI]
- 15.Z. He, X. Dai, W. Lyu, M. Beaumont, F. Yu, Estimating temporally variable selection intensity from ancient DNA Data. Mol. Biol. Evol. 40, msad008 (2023). [DOI] [PMC free article] [PubMed]
- 16.Bergland A. O., Behrman E. L., O’Brien K. R., Schmidt P. S., Petrov D. A., Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. PLoS Genet. 10, e1004775 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Machado H. E., et al. , Broad geographic sampling reveals the shared basis and environmental correlates of seasonal adaptation in Drosophila. Elife 10, e67577 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Petr M., Pääbo S., Kelso J., Vernot B., Limits of long-term selection against Neandertal introgression. Proc. Natl. Acad. Sci. U.S.A. 116, 1639–1644 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Sohail M., et al. , Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Elife 8, e39702 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Berg J. J., et al. , Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8, e39725 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Souilmi Y., et al. , Admixture has obscured signals of historical hard sweeps in humans. Nat. Ecol. Evol. 6, 2003–2015 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Field Y., et al. , Detection of human adaptation during the past 2000 years. Science 354, 760–764 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Mathieson I., et al. , Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528, 499–503 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wilde S., et al. , Direct evidence for positive selection of skin, hair, and eye pigmentation in Europeans during the last 5,000 y. Proc. Natl. Acad. Sci. U.S.A. 111, 4832–4837 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.E. K. Irving-Pease et al., The selection landscape and genetic legacy of ancient Eurasians. bioRxiv [Preprint] (2022). https://www.biorxiv.org/content/10.1101/2022.09.22.509027v1 (Accessed 23 September 2023).
- 26.Dan J., Mathieson I., The evolution of skin pigmentation-associated variation in West Eurasia. Proc. Natl. Acad. Sci. U.S.A. 118, e2009227118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.McVicker G., Gordon D., Davis C., Green P., Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.D. Murphy, E. Elyashiv, G. Amster, G. Sella, Broad-scale variation in human genetic diversity levels is predicted by purifying selection on coding and non-coding elements (2021). [DOI] [PMC free article] [PubMed]
- 29.Enard D., Messer P. W., Petrov D. A., Genome-wide signals of positive selection in human evolution. Genome Res. 24, 885–895 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Schrider D. R., Kern A. D., Soft sweeps are the dominant mode of adaptation in the human genome. Mol. Biol. Evol. 34, 1863–1877 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Buffalo V., Coop G., The linked selection signature of rapid adaptation in temporal genomic data. Genetics 213, 1007–1045 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Santiago E., Caballero A., Effective size and polymorphism of linked neutral loci in populations under directional selection. Genetics 149, 2105–2117 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Buffalo V., Coop G., Estimating the genome-wide contribution of selection to temporal allele frequency change. Proc. Natl. Acad. Sci. U.S.A. 117, 20672–20680 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Brennan R. S., et al. , Experimental evolution reveals the synergistic genomic mechanisms of adaptation to ocean warming and acidification in a marine copepod. Proc. Natl. Acad. Sci. U.S.A. 119, e2201521119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Saleh D., et al. , Genome-wide evolutionary response of European oaks during the Anthropocene. Evol. Lett. 6, 4–20 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kelly J. K., The genomic scale of fluctuating selection in a natural plant population. Evol. Lett. 6, 506–521 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Reid B. N., Star B., Pinsky M. L., Detecting parallel polygenic adaptation to novel evolutionary pressure in wild populations: A case study in Atlantic cod (Gadus morhua). Philos. Trans. Roy. Soc. B: Biol. Sci. 378, 20220190 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bertram J., Allele frequency divergence reveals ubiquitous influence of positive selection in Drosophila. PLoS Genet. 17, e1009833 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Patterson N., et al. , Large-scale migration into Britain during the Middle to Late Bronze Age. Nature 601, 588–594 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Papac L., et al. , Dynamic changes in genomic and social structures in third millennium BCE central Europe. Sci. Adv. 7, eabi6941 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.A. Pearson, R. Durbin, Local ancestry inference for complex population histories. bioRxiv [Preprint] (2023). 10.1101/2023.03.06.529121 (Accessed 18 May 2023). [DOI]
- 42.Cai J. J., Michael Macpherson J., Sella G., Petrov D. A., Pervasive hitchhiking at coding and regulatory sites in humans. PLoS Genet. 5, e1000336 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Hernandez R. D., et al. , Classic selective sweeps were rare in recent human evolution. Science 331, 920–924 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Bhérer C., Campbell C. L., Auton A., Refined genetic maps reveal sexual dimorphism in human meiotic recombination at multiple scales. Nat. Commun. 8, 14994 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Yair S., Coop G., Population differentiation of polygenic score predictions under stabilizing selection. Philos. Trans. Roy. Soc. B: Biol. Sci. 377, 20200416 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.M. E. Allentoft et al., Population genomics of stone age Eurasia. bioRxiv [Preprint] (2022). https://www.biorxiv.org/content/10.1101/2022.05.04.490594v1 (Accessed 6 June 2023).
- 47.Maier R., et al. , On the limits of fitting complex models of population history to f-statistics. Elife 12, e85492 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Bergström A., et al. , Origins and genetic legacy of prehistoric dogs. Science 370, 557–564 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Orlando L., The evolutionary and historical foundation of the modern horse: Lessons from ancient genomics. Annu. Rev. Genet. 54, 563–581 (2020). [DOI] [PubMed] [Google Scholar]
- 50.Kirch M., Anders Romundset M., Gilbert T. P., Jones F. C., Foote A. D., Ancient and modern stickleback genomes reveal the demographic constraints on adaptation. Curr. Biol. 31, 2027–2036.e8 (2021). [DOI] [PubMed] [Google Scholar]
- 51.Bi K., et al. , Temporal genomic contrasts reveal rapid evolutionary responses in an alpine mammal during recent climate change. PLoS Genet. 15, e1008119 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Kreiner J. M., et al. , Rapid weed adaptation and range expansion in response to agriculture over the past two centuries. Science 378, 1079–1085 (2022). [DOI] [PubMed] [Google Scholar]
- 53.G. Gower et al., Demes: A standard format for demographic models. Genetics 222, iyac131 (2022). [DOI] [PMC free article] [PubMed]
- 54.F. Baumdicker et al., Efficient ancestry and mutation simulation with msprime 1.0. Genetics 220, iyab229 (2022). [DOI] [PMC free article] [PubMed]
- 55.Haller B. C., Messer P. W., SLiM 3: Forward genetic simulations beyond the Wright–Fisher model. Mol. Biol. Evol. 36, 632–637 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Kelleher J., Thornton K. R., Ashander J., Ralph P. L., Efficient pedigree recording for fast population genetics simulation. PLoS Comput. Biol. 14, e1006581 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.F. Mölder et al., Sustainable data analysis with Snakemake [version 2; peer review: 2 approved]. F1000Research 10 (2021). [DOI] [PMC free article] [PubMed]
- 58.P. Ralph et al., tskit-dev/pyslim: 1.0.3. Zenodo. https://zenodo.org/records/8068030.
- 59.S. Mallick et al., The Allen Ancient DNA Resource (AADR): A curated compendium of ancient human genomes. arXiv [Preprint] (2023). 10.1101/2023.04.06.535797 (Accessed 4 July 2023). [DOI] [PMC free article] [PubMed]
- 60.W. Haak et al., Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211 (2015). [DOI] [PMC free article] [PubMed]
- 61.Genomes Project Consortium , A global reference for human genetic variation. Nature 526, 68–74 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Patterson N., et al. , Ancient admixture in human history. Genetics 192, 1065–1093 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Mallick S., et al. , The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Olalde I., et al. , The Beaker phenomenon and the genomic transformation of northwest Europe. Nature 555, 190–196 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Lamnidis T. C., et al. , Ancient Fennoscandian genomes reveal origin and spread of Siberian ancestry in Europe. Nat. Commun. 9, 5018 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Wang C.-C., et al. , Ancient human genome-wide data from a 3000-year interval in the Caucasus corresponds with eco-geographic regions. Nat. Commun. 10, 590 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.M. Rivollat et al., Ancient genome-wide DNA from France highlights the complexity of interactions between Mesolithic hunter-gatherers and Neolithic farmers. Sci. Adv. 6, eaaz5344 (2020). [DOI] [PMC free article] [PubMed]
- 68.Lazaridis I., et al. , Genetic origins of the Minoans and Mycenaeans. Nature 548, 214–218 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.V. M. Narasimhan et al., The formation of human populations in South and Central Asia. Science 365, eaat7487 (2019). [DOI] [PMC free article] [PubMed]
- 70.Villalba-Mouco V., et al. , Survival of Late Pleistocene hunter-gatherer ancestry in the Iberian Peninsula. Curr. Biol. 29, 1169–1177.e7 (2019). [DOI] [PubMed] [Google Scholar]
- 71.Margaryan A., et al. , Population genomics of the Viking world. Nature 585, 390–396 (2020). [DOI] [PubMed] [Google Scholar]
- 72.Scheib C. L., et al. , East Anglian early Neolithic monument burial linked to contemporary Megaliths. Ann. Hum. Biol. 46, 145–149 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Qiaomei F., et al. , The genetic history of Ice Age Europe. Nature 534, 200–205 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Lipson M., et al. , Parallel palaeogenomic transects reveal complex genetic history of early European farmers. Nature 551, 368–372 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Martiniano R., et al. , Genomic signals of migration and continuity in Britain before the Anglo-Saxons. Nat. Commun. 7, 10326 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Brace S., et al. , Ancient genomes indicate population replacement in Early Neolithic Britain. Nat. Ecol. Evol. 3, 765–771 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Mathieson I., et al. , The genomic history of southeastern Europe. Nature 555, 197–203 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Allentoft M. E., et al. , Population genomics of Bronze Age Eurasia. Nature 522, 167–172 (2015). [DOI] [PubMed] [Google Scholar]
- 79.Skoglund P., et al. , Genetic evidence for two founding populations of the Americas. Nature 525, 104–108 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Raghavan M., et al. , Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505, 87–91 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Lazaridis I., et al. , Genomic insights into the origin of farming in the ancient Near East. Nature 536, 419–424 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Schroeder H., et al. , Unraveling ancestry, kinship, and violence in a Late Neolithic mass grave. Proc. Natl. Acad. Sci. U.S.A. 116, 10705–10710 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Jones E. R., et al. , Upper Palaeolithic genomes reveal deep roots of modern Eurasians. Nat. Commun. 6, 8912 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Mittnik A., et al. , The genetic prehistory of the Baltic Sea region. Nat. Commun. 9, 442 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Coutinho A., et al. , The Neolithic Pitted Ware culture foragers were culturally but not genetically influenced by the Battle Axe culture herders. Am. J. Phys. Anthropol. 172, 638–649 (2020). [DOI] [PubMed] [Google Scholar]
- 86.Fowler C., et al. , A high-resolution picture of kinship practices in an Early Neolithic tomb. Nature 601, 584–587 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.C. C. Chang et al., Second-generation PLINK: Rising to the challenge of larger and richer datasets. GigaScience 4, s13742–015–0047–8 (2015). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
No new data were produced for this work. Analyses pipelines are available at https://doi.org/10.5281/zenodo.8093105 for simulations and at https://doi.org/10.5281/zenodo.8093107 for the ancient DNA data. Those analyses rely on a custom helper python module available at https://doi.org/10.5281/zenodo.8093101. Previously published data were used for this work (39, 40).