Abstract
Demographic processes directly affect patterns of genetic variation within contemporary populations as well as future generations, allowing for demographic inference from patterns of both present-day and past genetic variation. Advances in laboratory procedures, sequencing and genotyping technologies in the past decades have resulted in massive increases in high-quality genome-wide genetic data from present-day populations and allowed retrieval of genetic data from archaeological material, also known as ancient DNA. This has resulted in an explosion of work exploring past changes in population size, structure, continuity and movement. However, as genetic processes are highly stochastic, patterns of genetic variation only indirectly reflect demographic histories. As a result, past demographic processes need to be reconstructed using an inferential approach. This usually involves comparing observed patterns of variation with model expectations from theoretical population genetics. A large number of approaches have been developed based on different population genetic models that each come with assumptions about the data and underlying demography. In this article I review some of the key models and assumptions underlying the most commonly used approaches for past demographic inference and their consequences for our ability to link the inferred demographic processes to the archaeological and climate records.
This article is part of the theme issue ‘Cross-disciplinary approaches to prehistoric demography’.
Keywords: population genetics, statistical modelling, demographic modelling, ancient DNA, population history, archaeology
1. Introduction
Genetic information from present-day individuals has been used for past demographic and evolutionary inference for decades. However, recent advances in sequencing and genotyping technologies have reduced the cost of generating genetic data substantially, allowing for large high-quality population-wide datasets to be produced and used for addressing questions about past demographic and evolutionary processes ranging from the origin, timing and genetic consequences of past long-range migrations [1] to mate choice patterns and structure of genetic variation within present-day populations [2]. This reduction of cost per base pair sequenced, combined with advances in specialized [3,4] laboratory protocols for degraded genetic material [5–7], have also allowed large scale sequencing of genetic data from archaeological material, also known as ancient DNA, and resulted in an explosion of work exploring the evolutionary, demographic and environmental past of humans, other animals and plants. For example, ancient DNA has been used for establishing the biological sex [8], for providing species identification (e.g. [9,10]) and to infer familiar genetic relationships within archaeological communities [11–13] as well as to reconstruct past diets and environments (e.g. [14]). Ancient DNA has also been used to calibrate molecular clocks that measure the rates of evolution (e.g. [15–17]) and are thereby used for dating the inferred past demographic events and for generating date estimates for previously undated specimens. Ancient DNA can also be used for inferring or estimating past phenotypes (e.g. lactase persistence [18,19], skin pigmentation [19–21] and height [19,22,23]) but has found most frequent use in the inference of past demography: i.e. past population structure, continuity and admixture as well as long-range migrations and regional movement of individuals. The results and recent applications of ancient DNA to answer questions about past demography have been recently reviewed in [24–28].
While some aspects of past demography have simple predictions for genetic patterns of variation, such as the biological sex [8] of a past individual or familiar genetic relationships between individuals [29], most demographic processes leave behind a more convoluted signature. In fact, the patterns of genetic variation across individual genomes are a result of accumulated effects of different past demographic processes combined with the stochasticity of inheritance. As a result, demographic processes need to be inferred by comparing patterns of observed genetic variation to analytical or (in case of more complicated demographic scenarios) simulated model predictions, both relying on the population genetics theory with its roots in the work by R. Fisher, S. Wright, J. B. S. Haldane and others in the 1920s and 1930s [30].
All inferential approaches require assumptions about the data and the underlying demographic processes, which in some methods need to be made explicitly whereas in others they are implicit. These assumptions, and the level to which they are met, can heavily affect the suitability of different frameworks for testing competing demographic hypotheses as well as the interpretation of such demographic modelling. In this article I will review the assumptions underlying the most commonly used approaches for past demographic inference as well as their consequences, especially in relation to our ability to link the inferred demographic processes to the archaeological and climate records.
2. Pattern-based approaches to demographic inference
Phylogeographic inference is an approach to reconstruct population histories that was especially popular in the early days of past demographic inference using genetic data (e.g. [31,32] but also [33]). Here, a phylogenetic tree is constructed based on the substitutions (mutations) in a single locus (usually a non-recombining part of the genome like the mitochondrial DNA or the Y chromosome). All sampled individuals are assigned to haplogroups and haplotypes, which are the branches and sub-branches, or lineages, in the inferred tree. The root of the tree corresponds to the most recent common ancestor of all the samples. When the mutation rate is known (or can be estimated), it is possible to estimate the dates of each branching point of the tree, using the accumulation of mutations along branches in the tree as a ‘molecular clock’. Inferences about the past are based on the phylogenetic relationship between different haplotypes or haplogroups, their estimated splitting times and their distribution in space and time.
The key challenge of inferring the past from such trees lies in that events in a phylogenetic tree, based on a single locus, do not generally directly correspond to population-level events because they are stochastic outcomes of a given population history. As a result, different demographic scenarios can give rise to qualitatively similar gene trees and distribution of haplotypes (i.e. are equifinal) [34]. Although studies relying on data from a single locus often lack the statistical power to test complex population histories, especially when samples are exclusively from modern populations [35–37], robust inference can in principle be achieved when using sufficient data (e.g. samples from before, during and after the demographic event of interest) and explicit statistical modelling that accounts for the randomness of individual loci and allows formal consideration of the likelihood of different alternative demographic scenarios (e.g. [38,39]). Although ancient DNA alone already provides additional resolution that often enables researchers to exclude less likely scenarios (e.g. [40–42]), without explicit demographic modelling the conclusions can be easily steered by the subjective biases of a particular researcher [37].
The latter also applies to neighbour joining trees based on many more loci or even whole genomes. Such algorithms are designed to join samples based on genetic similarity without explicit consideration of demographic processes such as gene flow, genetic drift, isolation and identity by descent, let alone more complex scenarios combining these processes, that may have caused the observed similarities and differences between populations. Therefore, the resulting tree does not in itself provide sufficient evidence to conclude that the studied populations have split like the tree suggests. Nevertheless, these approaches are extremely useful for identifying issues with data quality, such as sequencing batch effects and other artefacts resulting from data generation, as well as for generating hypotheses or demographic scenarios, to be formally tested within a hypothesis testing framework.
(a). Descriptive methods for inferring population structure
Principal component analysis (PCA) is a commonly used technique for assessing the genetic similarity between individuals and the extent to which populations form distinct clusters. In a PCA of genetic data, each locus is treated as an independent variable (dimension) and genetic variation among samples is reduced into main axes of variation—principal components (PCs)—using the covariance matrix. PCA analysis of modern human populations has been shown to reflect geographical relationships between populations on different scales: on a global scale, populations from the same sub-continent tend to group together [43], and across Europe the genetic locations of samples in the PCA show striking similarity to their geographical locations [44].
In ancient DNA literature, modern populations are typically used to define the PCs and ancient individuals are projected onto them. As a general rule, only the first few axes of variation (PCs) are presented, representing a small proportion (typically only few percentages) of total variation present in modern populations. However, the older the sample, or otherwise more distantly related it is to the modern populations used in the analysis, the more likely it is that more of its variation ends up in an orthogonal axis to the ones presented/used for inference. This problem can be exacerbated by the projection method as individuals with more missing data are more likely to end up closer to the average of the modern data points. In principle, it would be possible to overcome this problem by turning the analysis around: using ancient samples to define PCs and projecting modern samples onto this variation, in order to investigate the genetic similarity between modern and ancient populations. However, this would require a representative high-quality ancient DNA sample from each time slice and geographical region of interest.
The main problem with inferring population history from PCA is that it lacks an underlying population genetic model, and several different scenarios can result in similar distributions of samples on PCs [45]. For example, populations can form clusters further away from each other on a PCA because they have been separated for a long time period, or because a recent population bottleneck has caused extensive drift in allele frequencies within one or more of them [46]. Conversely, individuals can appear similar on a PCA plot either because of an extended period of shared history or recent, extensive homogenizing gene flow (e.g. though admixture). Therefore, it is not possible to directly relate the inferred distances between samples on PCs to demographic processes behind the observed variation.
Similar criticism applies to tools commonly used in demographic inference using modern and ancient DNA, such as STRUCTURE [47], ADMIXTURE [48], Finestructure, and Chromopainter [49]. These tools vary in their technical implementations but are all designed to identify major genetic clusters and express each sample as a mixture of these clusters. For example, people living in central America today can be modelled as admixture between present-day Native American, European and African populations [50]. The underlying model of such approaches assumes that (present-day) individuals are a product of admixture of distinct ‘source’ groups that existed in the past. Problems with such approaches may arise when this cannot be assumed a priori and the inferred statistical clusters are misleadingly taken as evidence of the existence of ‘ancestral’ or ‘source’ populations when, in reality, such clusters could be explained by multiple distinct demographic histories [51]. Unless more complex demographic scenarios are explicitly considered, it is inherently not possible to identify which demographic scenarios have resulted in such clusters using clustering approaches alone. However, sometimes it is possible to use such clustering tools to test competing demographic scenarios, provided that it is clear that the tested scenarios are expected to produce distinct clustering patterns (e.g. [52]).
In summary, statistical approaches such as the PCAs and neighbour joining trees (regardless whether based on a single locus or whole genome data), as well as clustering tools such as ADMIXTURE, can be very useful for summarizing and visualizing complex population genetic data (e.g. for generating hypotheses or identifying problems with data quality). However, the translation of such patterns into explicit demographic scenarios is less than straightforward as such tools lack a formal demographic model as well as a hypothesis testing component. This results in inference that can be easily steered by subjective interpretation of individual researchers, unless the demographic scenarios of interest can be expected to produce vastly different patterns of variation.
(b). Methods for inferring spatial barriers to past mobility
From theoretical population genetics and ecology, it is known that reduced mobility, for example caused by geographical, ecological or cultural barriers, can increase differences in allele frequencies between populations. For pairs of populations, this has traditionally been captured by FST, the ratio of the variance of allele frequencies between sub-populations to the variance in the whole population [53]. In an ideal setting, such as Wright's infinite island model [54] where populations are located on an infinite square lattice, FST is directly related to the number M of migrants per generation between neighbouring populations in the lattice [FST = 1/(1 + 2M)], such that no migrants yield FST = 1 and ‘infinite’ migration rate (i.e. a panmictic population) yields FST = 0.
Several tools have been developed to extend this principle. CircuitScape [55,56] is commonly used in landscape population genetics to test the effect of geographical features on population connectivity. It approximates the effect of gene flow as current through an electric circuit, where FST values between pairs of populations correspond to pairwise voltage measurements in the circuit. By solving the corresponding electrostatic equations, the program estimates gene flow along all paths. Recently, this tool was used to estimate different routes during the initial colonization of Australia [57].
Pagani et al. [1] developed an analytical technique, based on Gaussian kernel interpolation, to study past barriers to gene flow by quantifying spatial gradients in allele frequencies. They applied it to human whole genome data across Eurasia and found evidence of major mountain ranges and deserts acting as barriers to gene flow. Petkova et al. [58] introduced EEMS (estimated effective migration surfaces), a tool based on computational geometry designed to deal with spatially irregular patterns of data. In this framework space is represented as a polygon subdivided into triangles. Each of the triangles is associated with a local movement rate that is constant within the triangle. Finally, the method uses classical population genetics theory to find movement rates that correspond to observed pairwise differences in allele frequencies among samples.
Mathieson et al. [59] applied the EEMS approach to 116 samples dated to be older than 7000 BP in order to investigate population structure in European hunter–gatherers. These methods rely on the assumption that observed genetic similarities and differences between samples are a function of physical isolation between them, so that genetic differences between samples of different age (resulting from genetic drift and mutations) can cause spurious geographical barriers to be inferred in temporally heterogenous datasets. However, this issue can be mitigated by making sure that the temporal genetic differences are small enough compared to the geographical differences considered.
(c). Sampling bias
Bias in sampling is an issue that can potentially affect all population genetic inference from pattern-based qualitative descriptions of data to explicit demographic modelling. As a general rule, approaches for past demographic inference from genetic data rest on the assumption that individuals are sampled randomly from a population so that the observed differences between samples are representative of the whole population. This can pose a significant problem for demographic analyses based on individuals from museum collections or archaeological specimens where sampling is largely influenced by a number of non-random factors such as preservation, as well as excavation and sampling locations and periods that are not always spatially or temporally randomly distributed.
The issue of non-uniform sampling can be dealt with using approaches that explicitly consider spatial and temporal location of samples. For example, Loog et al. [60] developed an extension to the FST measure that unifies spatial and temporal distances into a single metric. They showed that this metric directly informs on past levels of mobility and applied it to genome-wide data from a set of spatially and temporally sparsely distributed ancient Europeans.
However, even such methods can give misleading results if the sampled individuals are closely genetically related. For example, consider a situation where related individuals are more likely to be subjected to similar funerary practices or be buried in a similar location. If these practices or locations were more likely to be excavated or result in better preservation, sampled populations could end up appearing less genetically diverse, leading also to biased downstream demographic analyses. Although familial relatives can be identified and are usually removed from subsequent demographic inference, more subtle sampling biases can be very difficult to detect and account for in the analyses.
3. Methods for inferring demographic histories from explicit models
As discussed above, approaches such as PCA and ADMIXTURE are descriptive, and narratives based purely on such results do not constitute a formal analysis of population history. In order to achieve robust demographic inference, alternative hypotheses need to be formulated as different demographic scenarios. The likelihood of different demographic scenarios can then be formally compared by calculating the probability of observing the data given each scenario. Here the level of detail described in each demographic scenario is heavily constrained by the available data so that more information-rich data can be used to discriminate between increasingly more nuanced demographic scenarios.
For example, having information from before, during and after an evolutionary event of interest provides greatly increased power to distinguish between different past demographic scenarios compared to data from just a single time slice (e.g. modern individuals). Although no model can capture full complexity of the real world, formal comparison of models that represent different demographic scenarios allows formal assessment of the importance of different processes in the demographic history of populations.
(a). Tree-like population history models and the history
In order to understand the properties of modern methods for inference of past population dynamics—and some of their weaknesses—it is useful to review the concept of population in population genetics. Conceptually, either directly or indirectly, most population genetic models treat populations as lineages (or taxa) within a phylogenetic framework. That is, populations are distinct, homogeneous entities with a history that can be represented as a tree, where branch points represent splits between ancient populations and leaves of the tree are extant or archaeological populations for which samples are available.
Several methods of varying degrees of complexity exist for constructing the phylogenetic tree representing the joint history of populations. Here, I will review the approaches most commonly used in the inferences of past demography, as an exhaustive enumeration of all population genetic modelling methods would be beyond the scope of this paper.
Several computationally intensive methods have been developed to estimate past population sizes, gene flow between different branches in the population tree and, if the mutation rate is known, date of divergence between populations from sequence data (e.g. IMa [61], fastSIMCOAL2 [62] and G-PhoCS [63]) or allele frequency data (e.g. ∂a∂i [64]). For example, Freedman et al. [65] used G-PhoCS and seven high-quality genomes from present-day wolves and dogs to formally compare demographic models involving different divergence times, ancestral population sizes and rates of post-divergence gene flow between the different branches of the Canid phylogenetic tree (each sample representing a different branch).
The approaches based on the phylogenetic framework rely on the assumption that the relationship between populations can be represented as, essentially, a phylogenetic tree, i.e. as abrupt splits between different branches of the tree, followed by independent evolution with potential for subsequent episodes of gene flow between them. As a result, such approaches are not well suited to testing past demographic scenarios where the relationship between populations has historically been more complex than can be represented in a splitting tree model (i.e. populations in close geographical proximity).
A different but related issue is that such approaches do not scale well to large histories with complex historical interactions because they require all potential demographic events and interactions to be explicitly defined. As a result, a potentially many a priori modelling decisions about a demographic history which is, as a general rule, unknown required for the inference, making it challenging to apply such approaches to most archaeological and anthropological questions.
This has created a need for inference methods that require fewer a priori assumptions, and as a result, are sometimes informally referred to as ‘data driven’ or ‘model free’ approaches. Although these approaches do not require a predefined demographic model, they make a number of assumptions about the possible genetic histories of samples within and between populations. As I will illustrate with examples below, such approaches are far from model free.
(b). Population genetic inference for estimating past population sizes
One such approach has been developed for estimating changes in past population sizes. Coalescent theory predicts that gene linages coalesce with a rate dependent on the effective population size (the size of the population contributing offspring to the next generation). This provides a unique relationship between the size of a population through time and the distribution of time to the most recent common ancestor for pairs of sequences. Li & Durbin [66] developed the PSMC (pairwise sequentially Markovian coalescent) approach to implement this principle and reconstruct a detailed history of population size from a single high-quality (i.e. with sufficient sequencing depth to resolve heterozygotes) genome sequence. They applied PSMC to high-quality genomes from Africa, East Asia, Europe and found a severe reduction in population size for all non-African individuals around 10–60 thousand years ago, which they attributed to an out-of-Africa population bottleneck.
Extensions of this method have been subsequently developed to additionally handle multiple samples from a single population for an increased resolution [67,68] and to infer the history of gene flow between pairs of populations [68]. The large computational cost of these methods has limited their application to a relatively small numbers of genomes, but recently developed tools such as Relate [69] and tsinfer [70] offer the possibility of analysing the population size history using thousands of samples from a single population (e.g. using genetic sequences found in large biobanks or generated by case–control studies).
In these approaches the inference of effective population size changes through time rests on two key assumptions: (i) that mutation and recombination rates are known and (ii) that the population from which the sample was drawn has been panmictic throughout history, i.e. that all contemporary pairs of individuals within that population are equally likely to mate, regardless of geographical separation or other barriers. Uncertainty of both mutation and recombination rate estimates results in large confidence intervals for estimated population sizes and time scales of inferred changes (when this is taken into account, often only single mutation and recombination rate values are considered).
Deviations from the assumption of panmictic populations can have more obscure consequences for the inferred histories. For example, Mazet et al. [71] showed that when PSMC is applied to samples from spatially structured populations, it tends to infer changes in the effective population size with time also when the true population size has been stable. This can be understood by considering the case of local sub-populations where each sub-population is panmictic and connected to its closest neighbours by migration (gene flow). For short time scales, the ancestors of an individual sampled from one of the sub-populations will tend to belong to the same sub-population, and the inferred effective population size will reflect the size of this sub-population. On longer time scales, migration will cause the ancestors to be spread out over a larger geographical area, with increasingly rare contacts (and, as a consequence, lower coalescent rates). From this reduced coalescence rate the PSMC will infer an increasing effective population size, even when there has been no change in effective population size. In this case, the detailed shape of the inferred population size history will depend on the size and number of sub-populations as well as the migration rate between sub-populations [71]. Although the magnitude of this effect will largely depend on the level of historic structure present in populations, Mazet et al. [71] suggested that estimates of population size that ignore population structure should be interpreted as estimates of past population size and population structure as it is not clear which aspect is being captured for any given demographic history.
(c). f-Statistics and admixture graphs
Another class of methods that do not require a detailed relationship between populations to be explicitly modelled are built on so-called f-statistics [72] (reviewed in [73] in further detail) and have been used extensively in the demographic inference using ancient DNA data to test hypotheses of the relationship between ancient and modern populations. These statistics also use classical phylogenetic view of a population (described above) and rely on the assumption that genetic drift occurs independently in each population, i.e. on each branch of the tree. Pairs of samples inferred to share more genetic drift are expected to share more of their demographic history. The f2 statistic measures the amount of drift along both lineages since the divergence from a shared ancestral population in the tree. The f3 statistic measures the amount of shared drift between two pairs of samples relative to an outgroup population. As a result, the f3 statistic is often used to quantify the level of shared drift between modern-day populations and archaeological (ancient DNA) samples, and to identify the closest present-day relatives to an archaeological sample.
For example, Rasmussen et al. [74] used the f3 statistic and a sample from an ancient North American individual directly associated with the Clovis tools to test the two competing hypotheses about the origins of the Clovis culture. They found that present-day Native Americans from Central and South America share the greatest amount of genetic drift (highest f3) with the ancient Clovis individual, followed by individuals from Central Eurasia and only then individuals from Europe. This suggests that the demographic history of people associated with Clovis culture involved a more recent split from Asian populations than European populations, and consequently supports the hypothesis proposing that people practising Clovis culture arrived in the Americas from Asia (via Beringia) above the European origins (Solutrean) hypothesis.
The f4 statistic estimates the amount of shared genetic drift between pairs of populations, and can be used to statistically test hypotheses of gene flow between sub-populations [72], or hybridization between populations of different species [75]: if the shared drift is zero, the two pairs of populations must belong to different parts of the tree (i.e. separated by a more ancestral node in the tree). Conversely, if the f4 statistic is significantly different from zero, the null hypothesis of complete isolation between populations or species is formally rejected, and it is concluded that gene flow or hybridization occurred. The related D statistic (the f4 statistic divided by a positive scale factor that makes the D statistic have a range between −1 and 1) is often used in these tests, referred to as ABBA–BABA tests, measuring the proportion of two gene-tree configurations across the genome.
The f4 statistic can also be used to statistically test whether different ancestral tree topologies are compatible with the observed data. Frameworks such as TreeMix [47], AdmixtureGraph [73] and qpGraph [76] build on this principle to construct taxonomic trees, using genome-wide data and bootstrap analysis to assess the significance of population splits.
Because of their conceptual simplicity and ease-of-use, the f-statistics have been hugely popular and an integral part of a ‘standard package’ of tools usually applied to novel ancient DNA data. They have been successfully employed in answering many long standing archaeological and anthropological questions and have led to a number of important new insights (e.g. Neanderthal admixture to the present-day human gene pool [75] or demographic origins of modern European populations [77]). However, care needs to be taken when interpreting such results as the analysis can be confounded by population structure (i.e. owing to geography) that has not been explicitly accounted. For example, Eriksson & Manica [78] showed, using a spatially structured model of the demographic history of humans and Neandertals with no gene-flow post-dating their divergence, that population structure in the ancestral population shared by humans and Neanderthals in combination with incomplete linage sorting in the human linage produces f4 patterns consistent with post-divergence admixture between humans and Neandertals (because modern humans within Africa, prior to the expansion of out of Africa, were not equally related to the populations ancestral to Neanderthals).
(d). Population genetic inference for testing population continuity through time
A common question that can be addressed with ancient DNA is whether a present population is directly ancestral to a past population or if the latter has experienced admixture from external populations. Testing this is not straightforward because alleles present in any generation represent a randomly drawn sample of alleles in the previous generation and, as a consequence, allele frequencies can change considerably even under a scenario of complete population continuity (no gene flow from an external source), a phenomenon usually referred to as ‘genetic drift’ in the population genetic literature. What is more, the expected level of drift greatly depends on the population size: because of the law of large numbers, genetic drift is smaller in large populations and larger in small ones.
Formal tests of continuity are usually (directly or indirectly) based on comparison of observed allele frequency differences between ancient and modern populations with a distribution of expected differences under a null hypothesis of population continuity at a given size (or sizes) (e.g. [74,79–82]). Here the difficulty lies in specifying a meaningful null model, as very few population histories involve complete isolation from neighbouring groups. As demonstrated by Silva et al. [83], the threshold level above which continuity is rejected largely depends on expected level of gene-flow with neighbouring populations. Here, a high level of continuous gene flow from an external source can leave a genetic signal indistinguishable from a more sudden population replacement.
One way to overcome this problem is to use inference methods that allow explicit modelling of gene flow (e.g. in [74,38,83,39]). External lines of evidence, such as archaeological, linguistic and geographical information could be used to inform on the expected levels of gene flow to build a more realistic null model for testing continuity. Additionally, a dense serial sampling of a geographical location can help to distinguish between models with more continuous gene flow over longer time periods from models that involve a more instantaneous replacement.
4. Linking demographic processes to the archaeological and climate records
Ascribing demographic events inferred from the genetic data to well-defined geographical areas and/or time periods is important for investigating ecological, cultural or climatic drivers of demographic processes and for testing spatially and temporally explicit archaeological and historical hypotheses. However, population genetic approaches differ substantially in the extent to which they enable this.
(a). Timing of demographic events
The methods discussed above, including the frameworks that build on the f-statistics (such as TreeMix, AdmixtureGraph and qpGraph) use genetic drift to determine branch lengths in the inferred population phylogeny, which depends on both time and population size (genetic drift is stronger in small populations than in large ones). This means that there is no absolute time scale associated with any node in the tree (or graph), and nodes in different sub-trees have no well-defined temporal order. Thus, in these trees (or graphs), only the leaves (that represent samples or populations) have known dates, while the internal nodes of the tree (or graph), corresponding to ancestral populations and associated demographic events, are abstract in time, making it challenging to link them to archaeological or environmental information.
Approaches such as TreeMix, AdmixtureGraph and qpGraph model admixture by introducing directed gene flow between pairs of branches in the inferred population tree. Sometimes this is not sufficient to explain the pattern of genetic variation in the data, and it is necessary to introduce hypothetical ‘ghost’ populations that contributed to some past admixture events. Similarly to the internal nodes of the tree, dates and the geographical locations of these ‘ghost’ populations are unknown.
Dated ancient samples can help in anchoring the internal structure of the tree and provide a way to infer approximate times of the internal nodes and subsequently add some temporal resolution to the inference. However, the extent to which this is effective depends on the availability of samples in close temporal and geographical proximity to the nodes and ancestral populations of interest.
Explicit modelling of past populations offers a way around this problem. Rasmussen et al. [74] introduced a maximum-likelihood method for testing whether an ancient sample can be considered directly ancestral to a given modern sample. This test is based on coalescent theory and does not require explicit modelling of population size changes. However, it does assume an instantaneous split between the population of the ancient sample and the population ancestral to the modern sample, with no gene flow between the two groups following the separation. Posth et al. [38] created a temporally explicit coalescent model of the history of late Pleistocene populations in Europe based on the mitochondrial tree of directly dated individuals. The combination of a powerful maximum-likelihood framework and samples from both before and after the last glacial maximum allowed them to test the hypothesis of population turnover during this time period using mitochondrial DNA alone.
(b). Linking demographic processes to space
Even with temporally explicit modelling, a key obstacle for interpreting the genetic history of past populations is that the inferred ancestral populations, as well as the ‘ghost populations’ discussed above, lack well-defined geographical locations. Thus, even though approximate temporal boundaries for these populations can sometimes be informally inferred using genetic data from closely related dated ancient samples, geographical areas remain largely unknown when there is not sufficient geographical coverage of ancient samples to inform about the boundaries of these populations. As a result, it is often challenging to compare the inferred demographic history to the geographically explicit archaeological record.
Overcoming this problem calls for demographic models to be spatially as well as temporally explicit, with geographically well-defined subpopulations through time. Usually this is done by defining demographic processes in terms of local population changes and migrations, where past and present populations are typically represented as a network of sub-populations (demes) with explicit locations in space. Such explicit simulation methods allow modelling of complex population history outside of the phylogenetic framework. In these spatially explicit frameworks, complex spatial patterns can emerge from simple local demographic processes, allowing explicit modelling of the effects of population structure and other more subtle or emergent patterns of genetic variation with greater ease. Crucially to archaeological inference, such simulation modelling can incorporate information from various sources (e.g. archaeological, anthropological, demographic or linguistic data and/or climatic and geographical information). For example, estimated population densities from radiocarbon [84,85] or paleoclimate data [86,87] can be used to inform on the relative population densities though time, and presence or absence of different fossils and/or material cultures can be used to constrain possible geographical ranges. However, such simulation methods require explicit defining of assumptions underlying the demographic model, e.g. connectivity of different demes. In particular, the explicit nature of such models makes them represent very specific demographic scenarios, therefore care must be taken that a wide range of plausible scenarios are represented in the analysis. Here it is especially important to make sure that the considered models capture the key patterns of variation observed in the data, for example using the descriptive statistics and other tools described above on simulated data from these models.
Despite their ability to explicitly represent a wide range of demographic processes, so far only relatively few studies have used spatially explicit models to reconstruct demographic history. This is probably due to the fact that such simulation approaches can be complicated to set up and computationally demanding to perform. Early studies used linear steppingstone models to represent founder effects during the expansion of anatomically modern humans out of Africa [88,89] and the levels of shared genetic variation between humans and Neanderthals [78]. Warmuth et al. [90] used a spatially explicit model of Eurasia to infer the origin and timing of horse domestication. The SPLATCHE2 framework [91] uses a two-dimensional spatially explicit model, which has been used to study interbreeding between humans and Neanderthals [92]. Eriksson et al. [86] used a global spatial model to link late Pleistocene human demography to climate; this model was later used to test the hypothesis of whether Eskimo–Inuit populations in the Arctic derive from the same population as the original founders of Native American populations [93]. Loog et al. [39] used a spatiotemporally explicit framework to reconstruct wolf demography in the past 50 000 years using mitochondrial genomes. In the past, researchers have relied on in-house custom simulation code that is usually difficult to adapt to different demographic models and/or data. However, recent years have seen the development of simulation tools that are flexible, powerful and easy to use. Tools such as msprime [94] (for coalescent demographic modelling) and SLiM [95] (for forward in time demographic modelling) can accommodate very complicated demographic scenarios (including genetic selection and ecological interactions in the case of SLiM [95]) are now publicly available for population geneticists and other researchers to use.
The complex relationships between populations in such models typically mean that formal likelihoods cannot be calculated analytically but can be estimated by comparing descriptive statistics of simulated and observed data using, for example, the approximate Bayesian computation (ABC) [35,96]. Thus, unless parameters can be constrained using independent data (e.g. historical, anthropological or archaeological information), it is usually necessary to consider a very large number of value combinations for each parameter in order to make reliable inferences about the past, adding to the computational cost. Another potential disadvantage of an explicit modelling approach is that it is often time-consuming to set up and calibrate (especially for testing more involved demographic scenarios) compared to using a descriptive approaches or summary statistics for demographic inference.
5. Discussion and conclusion
Demographic processes directly affect patterns of genetic variation between and within populations. The large body of population genetic theory and mathematical modelling developed to describe these patterns has allowed researchers to take advantage of genetic information from both present and past populations for powerful past demographic inference.
The starting point of this inference is quantification of genetic variation patterns. These patterns can provide some insights and be used for formulating hypothesis about the past demographic processes. However, care must be taken that samples and their relatedness to each other are well representative of all populations of interest. This holds for both ancient and present-day samples: for example, it is common to use sets of present-day genetic data (such as the 1000 Genomes Project [97,98]) that focus on a small number of homogenous (but well sampled) populations for representing large and diverse geographical areas (e.g. Africa) when testing hypotheses about past demography (e.g. alongside ancient samples). However, in order to reconstruct past demographic processes across genetically diverse regions (such as long-range migrations and admixtures) it is important to have a comprehensive representation of genetic variation across the relevant geographical areas. Such data have recently become available: several projects have generated high-quality whole genome sequence data from a large number of ethnically diverse populations around the world, including the Simons Genome Diversity Project (SGDP) [99], the Estonian Biocentre Human Genome Diversity Panel (EGDP) [1] and the Human Genome Diversity Project (HGDP) [100].
Patterns of genetic variation, especially when ancient DNA data is included, can provide valuable insights. Although it might be tempting to weave a compelling story based on striking patterns alone, it is important to formally quantify the likelihood of different demographic scenarios as different scenarios can result in similar spatio-temporal patterns of genetic variation. Population genetic frameworks that allow formal testing of alternative demographic scenarios usually represent demographic history as a phylogenetic tree, where populations are represented as independently evolving branches (linages), sometimes connected by gene flow. Certain methods require additional assumptions (such as the relationships between population, timing of the changes in population sizes and the direction(s) of gene flow) to be explicitly defined. But as these parameters are often unknown, the application of such approaches is limited to more tractable demographic histories. A number of non-parametric approaches have been developed that, on one hand, allow more flexibility but, on the other, come with a number of implicit assumptions that need to be taken into consideration when interpreting the results.
In general, population genetic tools that allow formal testing of competing demographic scenarios range from very simple models, with many generalizing assumptions, to very complex ones that require explicit modelling decisions. As a general rule, the ability of formal analyses and hypothesis testing to distinguish between complex scenarios is heavily constrained by the amount and type of genetic data available. Very simple models may lack key aspects of demography or not provide sufficient resolution, missing out on important phenomena or leaving the details to be filled by a post hoc narrative. Complicated models, on the other hand, require much more information (data) to robustly distinguish between different demographic scenarios, or external lines of evidence (such as climate or archaeological information) to guide parameter ranges in the model. Thus, the right tool for the job is determined not only by the questions but also by the available data.
For archaeological samples, this depends on the degree of preservation of DNA, which affects both the data quality (i.e. DNA damage leading to sequencing errors), quantity (i.e. the amount of endogenous DNA available in each sample) and ultimately the distribution of samples across space and time [28,101]. Methods that rely on allele frequencies for analyses, such as most pattern-based methods and tree-based methods, are relatively robust against challenges to do with data quality. By contrast, methods that require accurate genotype or haplotype information (such as those for inferring past population sizes) can only be reliably used for well-preserved samples with low error rates and high coverage. As a result, the latter have not yet been extensively used in combination with ancient DNA data.
The distribution of samples in space and time also has a significant effect on the power to differentiate between the fit of competing scenarios to the genetic data. In general, samples that provide information about past genetic variation spanning the time periods and geographical regions of interest substantially increase the potential resolution of different models, allowing for more powerful inference for the same sample size. Sparseness and unevenness of samples across space and time mean that genetic patterns often cannot be directly interpreted in terms of demographic change, but this challenge can be overcome using formal methods that take sampling times and locations into account.
Currently very few population genetic approaches allow formal inclusion of dates and geographical locations of ancient samples at the hypothesis testing phase of the inference, not only missing out on the opportunity to gain additional power to detect more subtle demographic changes, but also leaving the timing and the locations of the inferred demographic events approximate at best. However, recent developments in simulation (e.g. SLiM [95] and msprime [94]) and analytical tools (e.g. [19,60,102]) will not only make linking past demographic events to the archaeological, historical and climate records more straightforward and robust but will also allow direct inclusion of data from these lines of evidence.
Acknowledgements
I am grateful to Anders Eriksson (University of Tartu), Mark Thomas (UCL), and Andrea Manica (University of Cambridge) for valuable discussions on population genetic modelling using ancient DNA data and Anders Eriksson (University of Tartu) for comments on this manuscript.
Data accessibility
This article has no additional data.
Competing interests
I declare I have no competing interests.
Funding
I am grateful to the Herchel Smith Trust (University of Cambridge) for supporting this work.
References
- 1.Pagani L, et al. 2016. Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538, 238–242. ( 10.1038/nature19792) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Robinson MR, et al. 2017. Genetic evidence of assortative mating in humans. Nat. Hum. Behav. 1, 1–13. ( 10.1038/s41562-016-0016) [DOI] [Google Scholar]
- 3.Gamba C, et al. 2014. Genome flux and stasis in a five millennium transect of European prehistory. Nat. Commun. 5, 5257 ( 10.1038/ncomms6257) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Hansen HB, Damgaard PB, Margaryan A, Stenderup J, Lynnerup N, Willerslev E, Allentoft ME. 2017. Comparing ancient DNA preservation in petrous bone and tooth cementum. PLOS ONE 12, e0170940 ( 10.1371/journal.pone.0170940) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Yang DY, Eng B, Waye JS, Dudar JC, Saunders SR. 1998. Improved DNA extraction from ancient bones using silica-based spin columns. Am. J. Phys. Anthropol. 105, 539–543. () [DOI] [PubMed] [Google Scholar]
- 6.Rohland N, Hofreiter M. 2007. Ancient DNA extraction from bones and teeth. Nat. Protoc. 2, 1756–1762. ( 10.1038/nprot.2007.247) [DOI] [PubMed] [Google Scholar]
- 7.Dabney J, et al. 2013. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl Acad. Sci. USA 110, 15 758–15 763. ( 10.1073/pnas.1314445110) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Skoglund P, Storå J, Götherström A, Jakobsson M. 2013. Accurate sex identification of ancient human remains using DNA shotgun sequencing. J. Archaeol. Sci. 40, 4477–4482. ( 10.1016/j.jas.2013.07.004) [DOI] [Google Scholar]
- 9.Horsburgh KA. 2008. Wild or domesticated? An ancient DNA approach to canid species identification in South Africa's Western Cape Province. J. Archaeol. Sci. 35, 1474–1480. ( 10.1016/j.jas.2007.10.012) [DOI] [Google Scholar]
- 10.Dalén L, et al. 2017. Identifying bird remains using ancient DNA barcoding. Genes 8, 169 ( 10.3390/genes8060169) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Amorim CEG, et al. 2018. Understanding 6th-century barbarian social organization and migration through paleogenomics. Nat. Commun. 9, 1–11. ( 10.1038/s41467-018-06024-4) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Schroeder H, et al. 2019. Unraveling ancestry, kinship, and violence in a Late Neolithic mass grave. Proc. Natl Acad. Sci. USA 116, 10 705–10 710. ( 10.1073/pnas.1820210116) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Mittnik A, et al. 2019. Kinship-based social inequality in Bronze Age Europe. Science 366, 731–734. ( 10.1126/science.aax6219) [DOI] [PubMed] [Google Scholar]
- 14.Willerslev E, et al. 2007. Ancient biomolecules from deep ice cores reveal a Forested Southern Greenland. Science 317, 111–114. ( 10.1126/science.1141758) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rieux A, Eriksson A, Li M, Sobkowiak B, Weinert LA, Warmuth V, Ruiz-Linares A, Manica A, Balloux F. 2014. Improved calibration of the human mitochondrial clock using ancient genomes. Mol. Biol. Evol. 31, 2780–2792. ( 10.1093/molbev/msu222) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Skoglund P, Ersmark E, Palkopoulou E, Dalén L. 2015. Ancient wolf genome reveals an early divergence of domestic dog ancestors and admixture into high-latitude breeds. Curr. Biol. 25, 1515–1519. ( 10.1016/j.cub.2015.04.019) [DOI] [PubMed] [Google Scholar]
- 17.Palkopoulou E, et al. 2015. Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Curr. Biol. 25, 1395–1400. ( 10.1016/j.cub.2015.04.007) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Burger J, Kirchner M, Bramanti B, Haak W, Thomas MG. 2007. Absence of the lactase-persistence-associated allele in early Neolithic Europeans. Proc. Natl Acad. Sci. USA 104, 3736–3741. ( 10.1073/pnas.0607187104) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Mathieson I, et al. 2015. Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528, 499–503. ( 10.1038/nature16152) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wilde S, et al. 2014. Direct evidence for positive selection of skin, hair, and eye pigmentation in Europeans during the last 5,000 y. Proc. Natl Acad. Sci. USA 111, 4832–4837. ( 10.1073/pnas.1316513111) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Olalde I, et al. 2014. Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European. Nature 507, 225–228. ( 10.1038/nature12960) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Cox SL, Ruff CB, Maier RM, Mathieson I. 2019. Genetic contributions to variation in human stature in prehistoric Europe. Proc. Natl Acad. Sci. USA 116, 21 484–21 492. ( 10.1073/pnas.1910606116) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Martiniano R, et al. 2017. The population genomics of archaeological transition in west Iberia: investigation of ancient substructure using imputation and haplotype-based methods. PLoS Genet. 13, e1006852 ( 10.1371/journal.pgen.1006852) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Leonardi M, et al. 2017. Evolutionary patterns and processes: lessons from ancient DNA. Syst. Biol. 66, e1–e29. ( 10.1093/sysbio/syw059) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Marciniak S, Perry GH. 2017. Harnessing ancient genomes to study the history of human adaptation. Nat. Rev. Genet. 18, 659–674. ( 10.1038/nrg.2017.65) [DOI] [PubMed] [Google Scholar]
- 26.Skoglund P, Mathieson I. 2018. Ancient genomics of modern humans: the first decade. Annu. Rev. Genomics Hum. Genet. 19, 381–404. ( 10.1146/annurev-genom-083117-021749) [DOI] [PubMed] [Google Scholar]
- 27.Frantz LAF, Bradley DG, Larson G, Orlando L. 2020. Animal domestication in the era of ancient genomics. Nat. Rev. Genet. 21, 449–460. ( 10.1038/s41576-020-0225-0) [DOI] [PubMed] [Google Scholar]
- 28.Loog L, Larson G. 2020. Ancient DNA. In Archaeological science: an introduction (eds Richards M, Britton K), pp. 13–34. Cambridge, UK: Cambridge University Press; ( 10.1017/9781139013826.002) [DOI] [Google Scholar]
- 29.Kuhn JMM, Jakobsson M, Günther T. 2018. Estimating genetic kin relationships in prehistoric populations. PLoS ONE 13, e0195491 ( 10.1371/journal.pone.0195491) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hartl DL, Clark AG. 1997. Principles of population genetics, Vol. 116. Sunderland, MA: Sinauer Associates. [Google Scholar]
- 31.Cann RL, Stoneking M, Wilson AC. 1987. Mitochondrial DNA and human evolution. Nature 325, 31–36. ( 10.1038/325031a0) [DOI] [PubMed] [Google Scholar]
- 32.Underhill PA, Passarino G, Lin AA, Shen P, Lahr MM, Foley RA, Oefner PJ, Cavalli-Sforza LL. 2001. The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Ann. Hum. Genet. 65, 43–62. ( 10.1046/j.1469-1809.2001.6510043.x) [DOI] [PubMed] [Google Scholar]
- 33.Chan EKF, et al. 2019. Human origins in a southern African palaeo-wetland and first migrations. Nature 575, 185–189. ( 10.1038/s41586-019-1714-1) [DOI] [PubMed] [Google Scholar]
- 34.Gerbault P, et al. 2014. Storytelling and story testing in domestication. Proc. Natl Acad. Sci. USA 111, 6159–6164. ( 10.1073/pnas.1400425111) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Rosenberg NA, Nordborg M. 2002. Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat. Rev. Genet. 3, 380–390. ( 10.1038/nrg795) [DOI] [PubMed] [Google Scholar]
- 36.Ballard JWO, Whitlock MC. 2004. The incomplete natural history of mitochondria. Mol. Ecol. 13, 729–744. ( 10.1046/j.1365-294X.2003.02063.x) [DOI] [PubMed] [Google Scholar]
- 37.Nielsen R, Beaumont MA. 2009. Statistical inferences in phylogeography. Mol. Ecol. 18, 1034–1047. ( 10.1111/j.1365-294X.2008.04059.x) [DOI] [PubMed] [Google Scholar]
- 38.Posth C, et al. 2016. Pleistocene mitochondrial genomes suggest a single major dispersal of non-Africans and a Late Glacial population turnover in Europe. Curr. Biol. 26, 827–833. ( 10.1016/j.cub.2016.01.037) [DOI] [PubMed] [Google Scholar]
- 39.Loog L, et al. 2019. Ancient DNA suggests modern wolves trace their origin to a Late Pleistocene expansion from Beringia. Mol. Ecol. 29, 1596–1610. ( 10.1111/mec.15329) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Larson G, et al. 2007. Ancient DNA, pig domestication, and the spread of the Neolithic into Europe. Proc. Natl Acad. Sci. USA 104, 15 276–15 281. ( 10.1073/pnas.0703411104) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Valdiosera CE, et al. 2007. Staying out in the cold: glacial refugia and mitochondrial DNA phylogeography in ancient European brown bears. Mol. Ecol. 16, 5140–5148. ( 10.1111/j.1365-294X.2007.03590.x) [DOI] [PubMed] [Google Scholar]
- 42.Thalmann O, et al. 2013. Complete mitochondrial genomes of ancient canids suggest a European origin of domestic dogs. Science 342, 871–874. ( 10.1126/science.1243650) [DOI] [PubMed] [Google Scholar]
- 43.Li JZ, et al. 2008. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104. ( 10.1126/science.1153717) [DOI] [PubMed] [Google Scholar]
- 44.Novembre J, et al. 2008. Genes mirror geography within Europe. Nature 456, 98–101. ( 10.1038/nature07331) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.McVean G. 2009. A genealogical interpretation of principal components analysis. PLoS Genet. 5, e1000686 ( 10.1371/journal.pgen.1000686) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Skoglund P, Sjödin P, Skoglund T, Lascoux M, Jakobsson M. 2014. Investigating population history using temporal genetic differentiation. Mol. Biol. Evol. 31, 2516–2527. ( 10.1093/molbev/msu192) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Pritchard JK, Stephens M, Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics 155, 945–959. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Alexander DH, Novembre J, Lange K. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664. ( 10.1101/gr.094052.109) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Lawson DJ, Hellenthal G, Myers S, Falush D. 2012. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 ( 10.1371/journal.pgen.1002453) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Wang S, et al. 2008. Geographic patterns of genome admixture in Latin American Mestizos. PLoS Genet. 4, e1000037 ( 10.1371/journal.pgen.1000037) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Lawson DJ, van Dorp L, Falush D. 2018. A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots. Nat. Commun. 9, 1–11. ( 10.1038/s41467-018-05257-7) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.van Dorp L, et al. 2015. Evidence for a common origin of blacksmiths and cultivators in the Ethiopian Ari within the last 4500 years: lessons for clustering-based inference. PLoS Genet. 11, e1005397 ( 10.1371/journal.pgen.1005397) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wright S. 1949. The genetical structure of populations. Ann. Eugen. 15, 323–354. ( 10.1111/j.1469-1809.1949.tb02451.x) [DOI] [PubMed] [Google Scholar]
- 54.Wright S. 1931. Evolution in Mendelian populations. Genetics 16, 97–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.McRae BH, Beier P. 2007. Circuit theory predicts gene flow in plant and animal populations. Proc. Natl Acad. Sci. USA 104, 19 885–19 890. ( 10.1073/pnas.0706568104) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.McRae BH, Dickson BG, Keitt TH, Shah VB. 2008. Using circuit theory to model connectivity in ecology, evolution, and conservation. Ecology 89, 2712–2724. ( 10.1890/07-1861.1) [DOI] [PubMed] [Google Scholar]
- 57.Malaspinas A-S. 2016. Methods to characterize selective sweeps using time serial samples: an ancient DNA perspective. Mol. Ecol. 25, 24–41. ( 10.1111/mec.13492) [DOI] [PubMed] [Google Scholar]
- 58.Petkova D, Novembre J, Stephens M. 2016. Visualizing spatial population structure with estimated effective migration surfaces. Nat. Genet. 48, 94–100. ( 10.1038/ng.3464) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Mathieson I, et al. 2018. The genomic history of southeastern Europe. Nature 555, 197–203. ( 10.1038/nature25778) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Loog L, Lahr MM, Kovacevic M, Manica A, Eriksson A, Thomas MG. 2017. Estimating mobility using sparse data: application to human genetic variation. Proc. Natl Acad. Sci. USA 114, 12 213–12 218. ( 10.1073/pnas.1703642114) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Hey J, Nielsen R. 2007. Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc. Natl Acad. Sci. USA 104, 2785–2790. ( 10.1073/pnas.0611164104) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M. 2013. Robust demographic inference from genomic and SNP data. PLoS Genet. 9, e1003905 ( 10.1371/journal.pgen.1003905) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Gronau I, Hubisz MJ, Gulko B, Danko CG, Siepel A. 2011. Bayesian inference of ancient human demography from individual genome sequences. Nat. Genet. 43, 1031–1034. ( 10.1038/ng.937) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. 2009. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 ( 10.1371/journal.pgen.1000695) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Freedman AH, et al. 2014. Genome sequencing highlights the dynamic early history of dogs. PLoS Genet. 10, e1004016 ( 10.1371/journal.pgen.1004016) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Li H, Durbin R. 2011. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496. ( 10.1038/nature10231) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Sheehan S, Harris K, Song YS. 2013. Estimating variable effective population sizes from multiple genomes: a sequentially Markov conditional sampling distribution approach. Genetics 194, 647–662. ( 10.1534/genetics.112.149096) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Schiffels S, Durbin R. 2014. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925. ( 10.1038/ng.3015) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Speidel L, Forest M, Shi S, Myers SR. 2019. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 51, 1321–1329. ( 10.1038/s41588-019-0484-x) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Kelleher J, Wong Y, Wohns AW, Fadil C, Albers PK, McVean G. 2019. Inferring whole-genome histories in large population datasets. Nat. Genet. 51, 1330–1338. ( 10.1038/s41588-019-0483-y) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Mazet O, Rodríguez W, Grusea S, Boitard S, Chikhi L. 2016. On the importance of being structured: instantaneous coalescence rates and human evolution—lessons for ancestral population size inference? Heredity 116, 362–371. ( 10.1038/hdy.2015.104) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Reich D, Thangaraj K, Patterson N, Price AL, Singh L. 2009. Reconstructing Indian population history. Nature 461, 489–494. ( 10.1038/nature08365) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D. 2012. Ancient admixture in human history. Genetics 192, 1065–1093. ( 10.1534/genetics.112.145037) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Rasmussen M, et al. 2014. The genome of a Late Pleistocene human from a Clovis burial site in western Montana. Nature 506, 225–229. ( 10.1038/nature13025) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Green RE, et al. 2010. A draft sequence of the Neandertal genome. Science 328, 710–722. ( 10.1126/science.1188021) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Castelo R, Roverato A. 2012. Inference of regulatory networks from microarray data with R and the bioconductor package qpgraph. Methods Mol. Biol. Clifton NJ 802, 215–233. ( 10.1007/978-1-61779-400-1_14) [DOI] [PubMed] [Google Scholar]
- 77.Lazaridis I, et al. 2014. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513, 409–413. ( 10.1038/nature13673) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Eriksson A, Manica A. 2012. Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins. Proc. Natl Acad. Sci. USA 109, 13 956–13 960. ( 10.1073/pnas.1200567109) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Malmström H, et al. 2009. Ancient DNA reveals lack of continuity between Neolithic hunter-gatherers and contemporary Scandinavians. Curr. Biol. 19, 1758–1762. ( 10.1016/j.cub.2009.09.017) [DOI] [PubMed] [Google Scholar]
- 80.Bramanti B, et al. 2009. Genetic discontinuity between local hunter-gatherers and central Europe's first farmers. Science 326, 137–140. ( 10.1126/science.1176869) [DOI] [PubMed] [Google Scholar]
- 81.Hofmanová Z, et al. 2016. Early farmers from across Europe directly descended from Neolithic Aegeans. Proc. Natl Acad. Sci. USA 113, 6886–6891. ( 10.1073/pnas.1523951113) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Saag L, et al. 2019. The arrival of Siberian ancestry connecting the Eastern Baltic to Uralic speakers further east. Curr. Biol. 29, 1701–1711. ( 10.1016/j.cub.2019.04.026) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Silva NM, Rio J, Currat M. 2017. Investigating population continuity with ancient DNA under a spatially explicit simulation framework. BMC Genet. 18, 114 ( 10.1186/s12863-017-0575-6) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Shennan S, Downey SS, Timpson A, Edinborough K, Colledge S, Kerig T, Manning K, Thomas MG. 2013. Regional population collapse followed initial agriculture booms in mid-Holocene Europe. Nat. Commun. 4, 2486 ( 10.1038/ncomms3486) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Bevan A, Colledge S, Fuller D, Fyfe R, Shennan S, Stevens C. 2017. Holocene fluctuations in human population demonstrate repeated links to food production and climate. Proc. Natl Acad. Sci. USA 114, E10524–E10531. ( 10.1073/pnas.1709190114) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Eriksson A, Betti L, Friend AD, Lycett SJ, Singarayer JS, von Cramon-Taubadel N, Valdes PJ, Balloux F, Manica A. 2012. Late Pleistocene climate change and the global expansion of anatomically modern humans. Proc. Natl Acad. Sci. USA 109, 16 089–16 094. ( 10.1073/pnas.1209494109) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Timmermann A, Friedrich T. 2016. Late Pleistocene climate drivers of early human migration. Nature 538, 92–95. ( 10.1038/nature19365) [DOI] [PubMed] [Google Scholar]
- 88.Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, Cavalli-Sforza LL. 2005. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc. Natl Acad. Sci. USA 102, 15 942–15 947. ( 10.1073/pnas.0507611102) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Liu H, Prugnolle F, Manica A, Balloux F. 2006. A geographically explicit genetic model of worldwide human-settlement history. Am. J. Hum. Genet. 79, 230–237. ( 10.1086/505436) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Warmuth V, et al. 2012. Reconstructing the origin and spread of horse domestication in the Eurasian steppe. Proc. Natl Acad. Sci. USA 109, 8202–8206. ( 10.1073/pnas.1111122109) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Ray N, Currat M, Foll M, Excoffier L. 2010. SPLATCHE2: a spatially explicit simulation framework for complex demography, genetic admixture and recombination. Bioinformatics 26, 2993–2994. ( 10.1093/bioinformatics/btq579) [DOI] [PubMed] [Google Scholar]
- 92.Currat M, Excoffier L. 2011. Strong reproductive isolation between humans and Neanderthals inferred from observed patterns of introgression. Proc. Natl Acad. Sci. USA 108, 15 129–15 134. ( 10.1073/pnas.1107450108) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Raghavan M, et al. 2015. Genomic evidence for the Pleistocene and recent population history of Native Americans. Science 349, aab3884 ( 10.1126/science.aab3884) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Kelleher J, Etheridge AM, McVean G. 2016. Efficient coalescent simulation and genealogical analysis for large sample sizes. PLOS Comput. Biol. 12, e1004842 ( 10.1371/journal.pcbi.1004842) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Haller BC, Messer PW. 2019. SLiM 3: Forward genetic simulations beyond the Wright–Fisher model. Mol. Biol. Evol. 36, 632–637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Sunnåker M, Busetto AG, Numminen E, Corander J, Foll M, Dessimoz C. 2013. Approximate Bayesian computation. PLoS Comput. Biol. 9, e1002803 ( 10.1371/journal.pcbi.1002803) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Fairley S, Lowy-Gallego E, Perry E, Flicek P. 2020. The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Res. 48, D941–D947. ( 10.1093/nar/gkz836) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.The 1000 Genomes Project Consortium. 2015. A global reference for human genetic variation. Nature 526, 68–74. ( 10.1038/nature15393) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Mallick S, et al. 2016. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206. ( 10.1038/nature18964) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Bergström A, et al. 2020. Insights into human genetic variation and population history from 929 diverse genomes. Science 367, aay5012 ( 10.1126/science.aay5012) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Llamas B, Valverde G, Fehren-Schmitz L, Weyrich LS, Cooper A, Haak W. 2017. From the field to the laboratory: controlling DNA contamination in human ancient DNA research in the high-throughput sequencing era. STAR Sci. Technol. Archaeol. Res. 3, 1–14. ( 10.1080/20548923.2016.1258824) [DOI] [Google Scholar]
- 102.Racimo F, Woodbridge J, Fyfe RM, Sikora M, Sjögren K-G, Kristiansen K, Linden MV. 2020. The spatiotemporal spread of human migrations during the European Holocene. Proc. Natl Acad. Sci. USA 117, 8989–9000. ( 10.1073/pnas.1920051117) [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
This article has no additional data.