Skip to main content
Genome Biology logoLink to Genome Biology
. 2025 Jul 16;26:206. doi: 10.1186/s13059-025-03664-w

The genomic footprints of migration: how ancient DNA reveals our history of mobility

Matthew P Williams 1,, Christian D Huber 1,
PMCID: PMC12265385  PMID: 40671036

Abstract

Ancient DNA has emerged as a powerful tool for studying human migration through the detection of admixture signatures. Here, we present the theoretical principles and methodologies for admixture analysis, with an emphasis on f-statistics and qpAdm. We review case studies from the literature demonstrating how these methods uncover patterns of human mobility, and discuss challenges related to data quality, demographic complexity, and sample representativeness on admixture and migration inferences. Finally, we highlight promising advancements in admixture analysis and underscore the importance of integrating genetic, archaeological, and historical data to achieve a more interdisciplinary and nuanced reconstruction of human history.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13059-025-03664-w.

Background

The advent of ancient DNA technology has ushered in a new era in the study of human history, giving rise to the fields of paleogenomics and archaeogenomics — loosely the study of prehistoric and historical genomes, respectively. These subdisciplines have revealed that human populations, including Neanderthals and Denisovans, are not neatly delineated into isolated groups. Rather, we are all related in a complex tapestry of genetic threads, where gene flow is the rule and isolation the exception [16]. However, the canonization of ancient DNA’s new human history [7] into the broader field of archaeology has not been without controversy [815]. Since the 1960s, archaeology had been moving away from migration-based explanations for changes in material culture [16], adopting the principle that cultural artifacts are not inherently linked to specific populations. This shift in archaeological thinking, encapsulated in the phrase “pots don’t equal people” [17], established an interpretative framework that would become conceptually antagonistic with early ancient DNA analyses published during the mid to late 2010s revealing signatures suggestive of widespread population movements. As such, much of the resulting friction between some geneticists and archaeologists was fueled by divergent perspectives on the significance of migration as explanations for historical events, a tendency of geneticists to treat archaeological and cultural groupings as well-defined categorical entities, as well as mutual misunderstandings and oversimplifications [15, 1820].

In perhaps what should not have come as a surprise, these early misunderstandings exemplify the difficulties of collaboration between disciplines that possess distinct methodological approaches and epistemological foundations. Although archaeology and paleo/archaeogenomics both study the human past, they operate within separate scientific domains, sometimes employing the same language to describe different — albeit related concepts. For instance, whilst there is no single consensus on a definition of “migration” in archaeology [21], broadly the concept refers to individuals’ movements occurring within or between various geographic locations [2224]. Conversely, in population genetics, migration refers specifically to the proportion of individuals who have immigrated into a population in the past, with only those migration events that result in integration through mating being informative. It is commonly quantified as the backward migration rate or admixture proportion, representing the probability that a randomly selected individual in the current generation originated from a different population in the previous generation.

The study of human migration necessitates a nuanced understanding of the limitations inherent in both genetic and archaeological methodologies. Genetic data, while informative about biological relationships, lacks intrinsic geographical and cultural context, rendering it indirectly informative about migration patterns. In contrast, archaeological evidence, such as stable isotope signatures, can elucidate recent course geographical origins but are incapable of providing insights into genetic relatedness or inheritance. In addition, whilst historical documents provide invaluable information regarding dynamic cultural and ethnic identity, they are often produced by the elite class — scribes, officials, and religious functionaries — which may limit their broader ability to inform migration histories. As such, each discipline’s unique analytical approaches and data types necessitate a cautious approach to inference-making, ensuring that conclusions derived from genetic data do not inappropriately inform archaeological and historical interpretations, and vice versa. So-called domain-specific inferences [25] are crucial for maintaining the integrity of interdisciplinary research.

In light of these challenges, this paper begins by providing an introduction to the principles underlying genetic admixture and methods commonly employed to identify its signatures in ancient DNA. We then discuss some of the limitations of these methods and their implications for inferring admixture and migration patterns. Subsequently, we review case studies from the ancient DNA literature that highlight different approaches in admixture inference to study historical mobility. Finally, we highlight computational and interpretative advancements poised to enhance our resolution in detecting signatures of admixture and foster a more nuanced, interdisciplinary understanding of past human mobility.

The admixture process

In population genetics, admixed populations are conceptualized as a linear combination of their distinct sources [26]. This model posits that in the generation following admixture, allele frequencies at any locus in a randomly mating admixed population are weighted averages of the corresponding frequencies in parental populations, with admixture weights determined by their relative parental contributions. Genetic drift in the admixed and source populations following admixture causes random deviations at individual loci, yet on average this relationship persists, highlighting the importance of analyzing numerous independent loci in admixture analysis (readers less familiar with population genetic concepts are directed to Supplementary Note 1, which provides a glossary of key terminology used throughout this manuscript). Under a simplified model of neutrality and admixture confined to a single founding generation, the expected genetic contribution from each source population is solely defined by the initial mixing parameters [27] and remains, on average, unchanged across all subsequent generations as genetic drift is agnostic to the alleles’ ancestral source. Whilst often viewed from the perspective of population-level processes, at the individual level, admixed offspring inherit recombined parental haploid chromosomes that may themselves reflect diverse grandparental origins [28, 29]. The genome-wide admixture fraction thus refers to the proportion of an individual’s genome that traces back to source populations [27, 30].

While these idealized concepts provide a clear theoretical framework for understanding admixture, underlying them is the fundamental question of what constitutes a “population” — is it a real biological entity and if so, how is it defined, bounded, and identified — especially in the context of sparse and often non-contemporaneous sampling characteristic of ancient DNA research. Delineating human groups amid continuous genetic variation increasingly concerns both population genetics [28, 3134] and humanities research [3537]. It has been argued that biological populations are best understood as research constructs or as statistical modeling tools that simplify reality [31, 36]. Genetic terminology and population labeling presents related challenges, with recent recommendations advocating phrases like “genetically similar to” over definitive ancestry labels, cautioning against conflating genealogical ancestry, genetic ancestry, and genetic similarity [28, 31] — challenges undoubtedly amplified in ancient DNA contexts [38].

Principles behind testing for – and quantifying – admixture

Whilst initially inhibited by its inherent limitations of scarcity and degradation [39], ancient DNA is now routinely used to both test for, and estimate admixture proportions in ancient individuals and populations. Here, we present an accessible overview of the fundamental principles and some frequently employed methodologies for detecting admixture in ancient populations with ancient DNA.

Testing admixture models and estimating proportions with D- and f-statistics

A suite of methods has gained popularity in the ancient DNA community that both formally test for admixture and estimate the contributing proportions from putative source populations (described below). Whilst these methods differ in their approach, assumptions, and complexity of the demographic history they attempt to infer, most leverage covariances in allele frequency differences between populations estimated from Patterson’s F-statistics (hereafter, f-statistics) [4042]. To this end, f-statistics have become a foundational tool for researching the admixture history of ancient populations. Within the family of f-statistics, the subscript denotes the number of populations included, with f2 = E[(p1p2)2], f3 = E[(pX − p1)(pX − p2)], and f4 = E[(p1 − p2)(p3 − p4)] analyzing two, three, and four populations, respectively. Below we present an introduction to the theoretical foundations of f-statistics, and their practical use in deciphering admixture histories. To guide intuition, throughout we reference a simplified demographic model introduced in Fig. 1A — hereafter Model 1.

Fig. 1.

Fig. 1

A Demographic model used throughout to communicate principles underlying admixture analysis and methods. The population labels in our model are defined as follows: PANC represents the ancestral population from which all sampled populations (PX, PO, and P1-P6) descend; PO functions as an outgroup population, exhibiting approximately equal genetic distance to all other sampled populations; and PX denotes an admixed population whose genetic ancestry derives from source populations P1 and P2 with admixture proportions α and (1 − α), respectively. B Illustrative expectations of commonly used admixture inference methods with respect to the above (A) demographic model

The f2-statistic

The most fundamental of the f-statistics, the f2-statistic, quantifies the amount of genetic drift separating two sampled populations as a measure of the average squared difference in their allele frequencies. As demonstrated in Fig. 2A, the f2-statistic serves as a quantitative measure of population divergence, with its value increasing in proportion to the amount of genetic drift (a function of generations of separation and effective population size) experienced by the populations. Importantly, independent genetic drift as estimated by the f2-statistic can be partitioned along branches of a phylogeny (otherwise known as the additivity principle), a feature that distinguishes it from other measures of population differentiation, such as FST. As such, for two populations that solely split from a shared ancestor, such as P1 and P3, the genetic drift separating them is equal to the sum of genetic drift along their connected branches: f2(P1, P3) = f2(P1, P13) + f2(P13, P3) — (see Fig. 2A).

Fig. 2.

Fig. 2

The impact of varying admixture and split-time parameters on the f3-statistic. A A no admixture model with the y-axis showing the f-statistic estimate for the f3-statistic (blue bars) computed as the combination of f2-statistics (green, orange, and yellow line plots). The genetic drift path associated with each f2-statistic test pair is depicted in the demography figure with the corresponding line color. B An admixture model with increasing generations separating the time of admixture from P5 to P3 and the split of populations P3 and P1 (x-axis). The three plots correspond to three proportions of admixture (α = 0.05, 0.25, and 0.5) from P5 to P3. Plots of the simulated demography (A and B) were generated by demesdraw 0.4.0 [43]. C The same f-statistic calculations as and B but under a fixed split-time demography (P3, P1 split = 45 gen) with varying proportions of admixture [0, 1] from population P5 to P3 (x-axis). All simulations (AC) were computed in msprime 1.3.0 [44] with the first 20 generations simulated with the DiscreteTimeWrightFisher model and the remaining generations with the StandardCoalescent using the following genome parameters: genome length (L) = 46,709,983 and recombination rate = (1.72e − 08) taken from human chromosome 21 under the stdpopsim HomSap model ID [45, 46], Ne = 1000e., sample size = 20 with msprime num_replicates = 50. All f-statistic calculations were computed with tskit 0.5.6 mode ='branch'. D Expected f3-statistic values computed as E[(p3 − p1)(p3 − p5)] where p represents an allele frequency across the range [0, 1] for populations P1 (x-axis) and P5 (y-axis) for eight different fixed p3 values (0.01, 0.05, 0.10, 0.20, 0.25, 0.35, 0.45, 0.5) in population P3

Notably, the additivity principle assumes populations are exclusively related through a tree-like population history. Its value lies in its capacity to infer non-tree-like population relationships and, in turn, admixture, by identifying significant deviations between expected and observed f2-statistics when assuming a tree-like structure. This is because if there are admixture events in the population’s history, genetic relationships cannot be represented by simple tree structures as they harbor divergent histories of genetic drift that trace their own unique paths through the phylogeny. For example, the tree-like evolutionary relationship between the Model 1 outgroup population PO and one of the admixing source populations, P1, exemplifies additivity (see Fig. 1). Here, the genetic drift separating them equals the cumulative drift along their connecting branches, namely f2(PO, P1) = f2(P1, P13) + f2(P13, P513) + f2(P513, PANC) + f2(PANC, PO). However, the admixture process systematically reduces the f2-statistic, resulting in the admixed population PX exhibiting allele frequencies that more closely approximate ancestral frequencies (PO) than either of its source populations (P1 and P2). As a consequence, although PX derives its entire ancestry from P1 and P2 and all have the same population size, its f2-statistic with PO is reduced below those of both sources (f2(PO, PX) < f2(PO, P1) and f2(PO, P2)), with the maximum reduction occurring at equal source (P1, P2) admixture contributions.

The f3-statistic

Within the broader family of f-statistics, the f3 can be represented as an algebraic arrangement of the f2-statistic: f3(P1; P2, P3) = ½ (f2(P1, P2) + f2(P1, P3) − f2(P2, P3)) [42]. Two common applications of the f3-statistic in archaeogenetics are to quantify the amount of genetic drift separating populations, and to test if a target population is admixed from two putative sources [41]. Under a no-admixture scenario from Model 1, the f3-statistic f3(P3; P1, P5) estimates how much genetic drift has occurred along the terminal branch leading to P3 (Fig. 2A). Viewing the f3-statistic as a combination of f2-statistics, i.e., f3(P3; P1, P5) = ½ (f2(P3, P1) + f2(P3, P5) − f2(P1, P5)), provides a complementary perspective from which to see that the genetic drift separating P1 and P5 is subtracted from their shared drift with P3 to reveal the genetic drift that has occurred exclusively since the split of population P3 from its ancestral population P13 (i.e., equal to f2(P13, P3)). Importantly, since f3 corresponds to a terminal branch length under the specific case of a no-admixture scenario, it cannot be negative (i.e., f3(P3; P1, P5) = f2(P3, P13) ≥ 0).

To illustrate how the f3-statistic can identify admixture, imagine individuals from population P5 migrating and intermixing with the individuals from population P3 at some time in the past (Fig. 2B). We first note that genetic drift unique to P5, when introduced to P3 through admixture, decreases the value of the statistic f2(P3, P5) and breaks the symmetry of P3 and P1 in their relationship with P5 (as f2(P3, P5) < f2(P1, P5)) (Fig. 2C). This has the effect of decreasing the value of f3(P3; P1, P5), such that significantly negative values provide strong evidence that P3 is the product of admixture between sources closely related to P1 and P5. The impact of admixture from P5 to P3 on the f3-statistic can also be seen by recognizing that admixture from P5 to P3 will drive the expected allele frequency in P3 (p3) to become intermediate between P5 (p5) and P1 (p1) (Fig. 2D). As such, when viewing the f3-statistic as originally defined in Reich et al. [40], as a covariance of two allele frequency differences f3(P3; P1, P5) = E[(p3 − p1)(p3 − p5)], the covariance becomes negative when p3 values are intermediate between p1 and p5 since f3-statistic negativity implies that a positive (p3 − p1) is linked to a negative (p3 − p5), or vice versa. Importantly, we note previous work [4042, 47] has identified specific demographic conditions, such as post-admixture drift, that can result in a positive f3-statistic even in the presence of admixture (i.e., false negative) which we discuss in subsequent sections below.

The f4-statistic

The f4-statistic, initially referred to as the four-population test [40], is another widely used approach for detecting admixture in ancient DNA research. Closely related is the D-statistic, originally developed to identify introgression between Neandertals and extant humans [48], with the f4- and D-statistic being the same up to a normalization factor (outlined in Patterson et al. [41]). One of the functions of the f4- and D-statistic in identifying admixture is to conduct a “treeness” test, which determines if four populations have a strictly cladal relationship with at most one internal branch separating them, a condition referred to as the “four-point condition.” The D-statistic, colloquially known as “ABBA BABA,” tests the null hypothesis that two sets of populations, such as (P1, P3) and (P2, P4), form clades relative to each other by quantifying the difference between two allele sharing patterns: (ABBA), which pairs populations two and three together based on their shared allele, and (BABA), where the inverse pattern is observed, standardized by the sum of those patterns (when applied to single nucleotide polymorphism (SNP) data, as is typically used in ancient DNA, these patterns are computed on allele frequencies [49]). Under the null hypothesis of no admixture, the two patterns are solely found due to incomplete lineage sorting in the ancestral population of P1, P2, and P3 (i.e., PANC from Model 1). Since coalescence of the P1, P2, and P3 lineages in this ancestral population are random, the frequency of the two ABBA/BABA patterns at bi-allelic sites are expected to be equal, leading to a D/f4-statistic of zero.

To gain intuition into how the D/f4-statistic tests for the presence of admixture, we now imagine an admixed population, PX, formed as a mixture of P1 and P2 sources, as shown in Model 1. The configuration of the f4-statistic “treeness” test as f4(P1, PX; P2, P4) will test the null hypothesis that (P1, PX) and (P2, P4) form distinct phylogenetic clades with the null hypothesis expectation f4(P1, PX; P2, P4) = E[(p1 − pX) (p2 − p4)] = 0. However, admixture from P2 to PX pulls PX’s allele frequency closer to P2’s, such that the allele frequency difference between P1 and PX is no longer independent of the allele frequency difference between P2 and P4. As a result, the covariance in allele frequency differences between (P1, PX) and (P2, P4) will become negative, and thus the null hypothesis of cladality will be rejected such that f4(P1, PX; P2, P4) < 0. We also note that the f2 formulation of the f4-statistic in our above example, f4(P1, PX; P2, P4) = ½ (f2(P1, P4) + f2(PX, P2) − f2(P1, P2) − f2(PX, P4)), reveals that admixture from P2 to PX reduces their allele frequency differentiation such that f2(PX, P2) < f2(PX, P4) which in turn, under favorable demographic conditions discussed further below, results in a negative f4-statistic and an overrepresentation of ABBA patterns relative to BABA patterns in the D-statistic formulation (see Supplementary Note 2 for the algebraic derivation of the f4-statistic as composed of as f2-statistics).

Importantly, in the absence of admixture at least one of the D/f4-statistic values will be zero (i.e., f4(P1, P3; P2, P4) = 0), and the others will have absolute values different from zero (i.e., f4(P3, P2; P1, P4) > 0 = |f4(P2, P3; P1, P4)| = |f4(P3, P2; P4, P1)|) [50]. In practice, an outgroup population (e.g., PO in Model 1) is typically included to polarize the test statistic and help identify the populations contributing to its significance [41]. Complex evolutionary history, however, complicates simplistic interpretations of admixture history based solely on D- and f4-statistics, as these values function as point estimates representing the average across diverse phylogenetic pathways within a population’s demographic history [51].

Estimating admixture proportions with the f4-ratio statistic

After a population of interest is confirmed to be admixed from putative sources, the next phase of analysis involves estimating its admixture proportions with respect to a set of candidate source populations. The f4-ratio statistic, first described as the f4-ancestry estimation by Reich et al. [40], offers a robust approach for determining the ancestral proportions present in a target population or individual that can be traced back to the source population (Fig. 1B). A comparable methodology was utilized by Green et al. [48] to estimate Neanderthal genetic contributions in extant non-African populations.

To elucidate the relationship between ratios of f4-statistics (i.e., the f4-ratio) and admixture proportions, we recall that in randomly mating admixed populations, expected allele frequencies are linear combinations of source population frequencies under the assumption of neutrality, and that we can ignore post-admixture drift. The coefficients in these combinations represent the admixture fractions, thus establishing a conceptual link between observed allele frequencies and the underlying admixture process. We now return to Model 1 (Fig. 1A) in which the allele frequency (pX) in the admixed population (PX) is expressed as a linear combination of the source population (P1 and P2) frequencies (p1 and p2); ignoring post-admixture genetic drift and assuming neutrality. The admixture proportion (α) serves as the weighting factor, such that: pX = αp1 + (1 − α)p2. From this, the f4-statistic involving population PX is derived exactly such that:

f4(PO, P5; PX, P4) = f4(pO, p5; pX p4) = f4(pO, p5; (αp1 + (1 − α)p2), p4) = αf4(pO, p5; p1, p4) + (1 − α)f4(pO, p5; p2, p4), revealing the f4-statistic of an admixed population is equivalent to the f4-statistics of its sources, weighted by their admixture coefficient [52]. Consequently, it also follows that f4(PO, P5; PX, P2) is equivalent to αf4(PO, P5; P1, P2), since f4(PO, P5; P2, P2) is by definition zero. From here, we can use the ratio of two f4-statistics to algebraically derive the admixture proportion (α) of P1 to PX, in the form α = f4(PO, P5; PX, P2)/f4(PO, P5; P1, P2), revealing the link between allele frequencies, f4-statistics, and admixture proportions (Fig. 1B).

Testing admixture models and estimating proportions with qpAdm

First introduced in Haak et al. [52], qpAdm leverages the principles behind both the f4-ratio and the f4-statistic to statistically test proposed models of admixture and provide estimated contributions from proposed sources to a target population of interest (Fig. 1B). Whilst the complex phylogenetic relationships between the target, source, and reference populations are not explicitly modeled in qpAdm, this in turn provides significant flexibility in practice. By using f-statistics to capture patterns of shared genetic drift between the target, souce, and reference populations, qpAdm is able to formally test the hypothesis that the putative source populations are the sole contributors to the target population’s ancestry (i.e., none of the reference populations contribute additional gene flow to the target that is not captured in the sources) without an explicit knowledge of the phylogenetic relationship between the source and reference populations. Although a detailed explanation of qpAdm is beyond the scope of this paper (see SI. 9 and 10 of Haak et al. [52], in addition to SI qpAdm User Guide and Supplementary Materials 2 of Harney et al. [53]), we provide a brief summary of how qpAdm employs principles from both the f4-statistic and f4-ratio to perform admixture analyses (also see Supplementary Note 3 for an introductory overview on the qpAdm theory and implementation).

When running qpAdm, a user will input three key population variables: a target population of interest, a set of putative source populations, and a set of reference populations selected to capture the genetic diversity of the region and time period. From the perspective of their positions in the f4-statistic, the target and source populations are commonly referred to as the “left-set” and the reference populations the “right-set”. With the left and right-group population sets, qpAdm generates matrices of f4-statistics which serve as the foundation for computing the model p-value and estimating admixture proportions from each of the candidate sources. Recalling from the above section: The f4-statistic, if two sets of populations form clades, e.g., (P1, P3) and (P2, P4), then f4(P1, P3; P2, P4) has expectation zero, indicating only a single branch connects the clades. In contrast, under an admixture model, f4(P1, PX; P2, P4) < 0 indicating there is at least more than one branch connecting the two population sets (P1, PX) and (P2, P4) due PX sharing unique genetic drift with P2.

In fact, the number of branches connecting the left and right population sets underlies the principle of the f4-statistic matrix rank first introduced in Reich et al. [54] and Moorjani et al. [55] and is the basis of the qpWave software. In practice, the f4-statistic matrices used by qpAdm is formed by fixing one population in the left population list (in practice the target population) and one population from the reference population set (in practice the population with the best genetic coverage), forming a matrix of dimensions (nL − 1)(nR − 1). It was shown in Reich et al. [54] and Moorjani et al. [55] that when fixing one population from the left set and one from the right set, the maximum rank of the matrix is nL − 1. The rank in this instance refers to the minimum number of branches connecting the two sets of populations. When the rank of the f4-statistic matrix encompassing both sources and target in the left set does not exceed that of the source-only matrix, it indicates that the sources’ genetic diversity fully captures the target’s ancestry, within the resolution of the right group.

To account for linkage disequilibrium — i.e., the correlation in genealogical history for SNPs located in close physical proximity along a chromosome — qpAdm implements a block jackknife resampling approach, dividing the genome into blocks and sequentially recalculating f4-statistics while omitting one block at a time. This method both accounts for the non-independence of demographic histories along the genome and quantifies the errors and uncertainty associated with each individual f4-statistic. In addition, qpAdm implements a likelihood ratio test to statistically test whether the target population’s ancestry is solely explained by the proposed admixture sources or requires a more complex model incorporating additional independent admixture from a population in the right-group. A significant p-value (typically p < 0.05 or 0.01) indicates that the null-hypothesis (i.e., the simpler model of the target solely explained by the proposed admixture sources) is rejected in favor of the more complex model. Importantly then, a large p-value indicates not that a plausible qpAdm model is statistically significant, but rather that it cannot be rejected based on the current data. In addition to a p-value, qpAdm will also generate an admixture proportion, which is typically only interpreted for plausible models, with weights outside of the biologically relevant ranges [0, 1] being one criteria used to reject proposed admixture models.

To estimate the admixture proportion, qpAdm extends the principles of the f4-ratio outlined above, whereby if an admixed target population of interest has ancestry related to N different source populations, then the f4-statistic of the admixed target population can be composed as a linear weight of the f4-statistic of its sources:

i=1Nαif4(T,Si;R1,R2)=f4(T,T;R1,R2)=0 (equation modified from page 129 of Haak et al. [52] where T represents the target population, R1,2 qpAdm right populations, and Si qpAdm source populations). Thus, the qpAdm admixture proportion estimates emerge from the requirement that the f4-statistics involving the source populations need to be weighted and added up in a way such that they equal the f4-statistics involving the target.

Modeling complex admixture histories with graphs

While qpAdm’s minimal model phylogeny allows for more flexibility, the absence of explicitly modeled relationships among all populations can limit the interpretability of the findings. In contrast, the ADMIXTUREGRAPH (AG) (Fig. 1B) framework [40] attempts to provide a comprehensive model of the phylogenetic and admixture relationships within a set of populations. An AG structure consists of an ordering of population splits, positions of admixture events, branch length parameters, and mixture proportions. However, the AG framework remains a reduced representation of a complete demographic model [56], as it does not include estimates of population sizes, nor population split or admixture times. Given the branch lengths and ordering of population splits and admixture events defined in the AG, admixture proportions (α) form part of a more explicit model of the populations’ demographic history. Various software packages are available for admixture graph (AG) analysis, where they differ in their level of automation and approaches to defining the initial AG configuration, exploration of alternative models, and assessment of the model fit to empirical allele frequency data. However, the f-statistics framework remains central to all of AG tools (with the exception of TreeMix).

Challenges in admixture analysis and interpretation using ancient DNA

Whilst ancient DNA has revolutionized our understanding of the human past, reconstructing the admixture history of ancient populations is not without its challenges. Both low-quality genetic data and aspects of demography can limit the statistical power of population genetic inferences from ancient DNA. Moreover, interpreting admixture signatures within the context of historical processes and mechanisms of migration and mating presents a significant challenge to unifying archaeological, historical, and genetic analyses. In the following discussion, we provide a concise overview of some of the key challenges in ancient DNA research for reconstructing admixture histories and inferring their underlying mechanisms, and discuss current approaches in the field for identifying signatures of human mobility from admixture signatures.

DNA degradation

One of the most fundamental limitations in ancient DNA research is overcoming the impacts of degradation due to post-mortem DNA damage on the quantity and quality of DNA [57]. Both time and the environment contribute to the breakdown of DNA molecules, resulting in highly fragmented and short DNA reads (~ 10–150 bp) [58], in addition to characteristic post-mortem deamination (C → T misincorporations are elevated at the 5′ end of DNA fragments [59]) to the base-pair sequence. The relationship between allele frequency and admixture proportion described above underscores the importance of obtaining unbiased allele frequency estimates for accurate admixture inference. This presents significant challenges throughout the admixture inference process, as the distinctive characteristics of ancient DNA can potentially introduce bias at various stages of analysis. For instance, the impact of ancient DNA damage on admixture inference through qpAdm has been demonstrated by Harney et al. [53] revealing that biases in the estimated admixture proportions can occur with the co-analysis of data with (ancient DNA) and without (present-day) damage patterns, recommending against this strategy.

The scarcity of endogenous DNA in ancient remains has necessitated the development of targeted “capture” technologies for pre-selected (ascertained) SNPs to increase the proportion of sequenced endogenous DNA [41, 52, 60, 61]. Due to the bias towards higher-frequency alleles introduced by the ascertainment process, “captured” ancient DNA datasets do not allow for the utilization of powerful demographic methods that depend on an unbiased site frequency spectrum (SFS) produced by whole-genome sequencing data. As a result, ancient DNA admixture analyses are typically reliant on methods that evaluate the less informative covariance in allele frequency differences (D/f-statistics) described above. In addition, “low-coverage” specimens, i.e., those which retain only a small fraction of sequenced ascertained SNPs, result in increased standard errors and decreased power to reject false and suboptimal admixture models in qpAdm analyses [47, 53]. Importantly, however, while lower coverage reduces qpAdm power, it does not appear to inherently bias admixture estimates [53]. An additional challenge in working with captured data is that hybridization technologies can introduce potential biases when co-analyzing genomes generated from different platforms and capture-free “shotgun” DNA sequencing approaches [62, 63]. Whilst various strategies permitting the co-analysis of ancient DNA across capture technologies have been developed [62, 63], and some technologies show no observable technical bias [63, 64], researchers should exercise caution when integrating ancient DNA data from different technological sources as SNP capture has demonstrated biases in co-analyses across human populations, particularly when modeling multiple sub-Saharan African populations or archaic human groups like Neanderthals and Denisovans [65].

Due to the fragmentation process, short DNA molecules pose significant challenges in and of themselves for accurate mapping to reference genomes [66]. One of the primary challenges in mapping short DNA molecules is overcoming reference bias, where DNA fragments carrying the reference allele are more likely to map successfully to the human reference genome, with the magnitude of the bias shown to be inversely proportional to read length [67]. Because of the heterogeneous composition of the human reference genome, this could result in both genome-wide and local ancestry bias [68], with varying degrees of impact on different admixture inference and genotype calling methodologies [69]. Importantly, amongst commonly used ancient DNA admixture analysis pipelines, pseudohaploid genotype calling and qpAdm appear to be one of the most robust to reference bias [69]. Future advancements, such as including the adoption of variation graphs over linear references in mapping [68], the use of algorithms to mask out sites vulnerable to ancient DNA characteristic damage from genotyping [70], or correcting genotype likelihoods based on empirical data [69] are promising approaches for mitigating reference bias.

Demographic history

Demographic histories that result in minimally differentiated populations present additional challenges to admixture reconstruction, compounding the DNA preservation limitations outlined above and effectively reducing admixture detection power — a topic we explore below.

Demography and f-statistic power

As described above, when populations diverge from a common ancestor, they undergo independent genetic drift, causing their allele frequencies to diverge over time. The rate of differentiation is shaped by biological and demographic factors, including the timing of population splits, effective population sizes, and mutation rates. Given the use of allele frequencies in calculating f-statistics, their power for inferring demographic history and admixture events is thus directly modulated by these factors. For instance, the f3-statistic has its greatest power for detecting admixture under the conditions of near-equal contributions from source populations, a large number of generations separating the sources from each other at the admixture event, and limited genetic drift in the admixed population [40, 42]. Importantly for the study of complex demographic history from ancient DNA, Williams et al. [47] demonstrate that both admixture between the sources following their split, and the use of diverged proxy sources — scenarios undoubtedly encountered in empirical human archaeogenetic studies — constrain the range of demographic parameters that can result in a negative f3-statistic, possibly leading to false negative results in ancient DNA analyses. Similarly, the use of the f4-statistic to detect admixture also faces statistical limitations under certain demographic scenarios. In Model 1, the covariance in the allele frequency differences between (PX and P2) from (P1 and PO) is determined by the admixture proportion (1 − α) and the evolutionary distance separating P2 from the shared ancestor of P1 and P2 (γ) [50] (Fig. 3A). As the value of this compound parameter (1 − α)γ decreases, the statistic f4(PO, P2; P1, PX) approaches zero, reducing the signal to detect admixture from P2 to PX (Fig. 3B). Under conditions where γ is small — indicating minimal independent drift between source populations (P1, P2) — the resulting decrease in the compound parameter reduces the value of the f4-statistic, such that estimates not significantly different from zero suggest P1 and PX form a clade relative to P2, even in the presence of substantial admixture from P2 to PX (Fig. 3C).

Fig. 3.

Fig. 3

The impact of varying admixture and split-time parameters on the f4-statistic. All simulations (BC) were computed in msprime 1.3.0 [44] with the first 20 generations simulated with the DiscreteTimeWrightFisher model and the remaining generations with the StandardCoalescent using the following genome parameters: genome length (L) = 46,709,983 and recombination rate = (1.72e − 08) taken from human chromosome 21 under the stdpopsim HomSap model ID [45, 46], Ne = 1,000e2, and sample size = 20 with msprime num_replicates = 50. B The value for the f4-statistic (blue bars) computed as the combination of f2-statistics (yellow, pink, green and orange line plots) under a fixed split-time demography (P1P2, PO split = 90 gens) with varying proportions of admixture [0, 1] from population P1 to PX (x-axis). C The same f-statistic calculations as B but for three fixed admixture parameters (0.25, 0.5, and 0.95) with the y-axis showing the f-statistic value for the f4-statistic (blue bars) computed as the combination of f2-statistics with increasing generations separating the time of PX formation and the split of populations P2 and P1 (x-axis). The genetic drift path associated with each f2-statistic test pair is depicted in the demography figure (A) with the corresponding color. Plots of the simulated demography (A) were generated by demesdraw 0.4.0 [43]

Because of its reliance on f-statistics, in theory qpAdm is similarly impacted by the demographic and biological constraints described above. Extensive simulations of qpAdm on admixture-graph-like models (similar to that displayed in Fig. 1A) confirm that population divergence is indeed the primary factor limiting its statistical power [47]. Rotating qpAdm approaches [53] (where populations are utilized as both sources and right-groups) for studying admixture in ancient DNA are most effective in identifying a minimal number of plausible models closely related to the true model when population divergence is greater than that of Eurasian Bronze Age (i.e., median pairwise FST > ~ 0.008) with a marked decline in performance towards the lower ends of human population differentiation (i.e., median pairwise FST < ~ 0.003) [47]. Despite challenges in studying historical admixture dynamics among minimally differentiated populations, broad inferences about potential admixing sources remain feasible, as sources, or ancestries, overrepresented in plausible models tend to be more closely related to the true admixing source, regardless of data quality [47, 53]. For high-coverage ancient DNA data, qpAdm p-value rankings may provide a means of differentiating between multiple plausible models, with simulations demonstrating that when comparing plausible models larger p-values are indicative that a model is closer to the simulated truth [47] — a feature utilized in recent research [71, 72] through the use of qpWave p-values as a distance measure to perform individual-based clustering.

Importantly, so long as qpAdm model assumptions are met (we discuss the impact of model violations below), the approach is remarkably effective in accurately detecting and quantifying admixture, with its primary limitations being proportional to the precision of f-statistic estimation. In simulations, the use of whole-genome branch-length f2-statistics to compute qpAdm’s f4-statistic matrix, i.e., a hypothetical scenario where f2-statistics could be calculated without estimation error, has shown to significantly enhance its power [47]. This finding suggests that more accurate f2-statistic estimates from empirical data could further increase qpAdm’s power in resolving historical admixture questions, even among very closely related populations. This raises the prospects for future empirical advancements — e.g., making more efficient use of the data using tree sequence methods — a concept we will revisit in a subsequent section.

Continuous migration landscapes

Thus far, our discussion of admixture histories has been restricted to single-pulse events between independently evolving populations. However, much of our recent human demographic history appears to look more like a complex network of ancient communities connected through an evolving web of migration and isolation [7, 7274]. Indeed, ancient DNA research has illuminated aspects of this complexity, particularly in Eurasia, revealing that the observed correlation between increasing distance from east Africa and decrease in present-day genetic diversity is not simply due to a serial founder effect following an out-of-Africa migration, but rather results from numerous localized admixture events in the Holocene [5].

In and of themselves, f-statistics do not make assumptions regarding the migration mechanisms responsible for correlations in allele frequency differences between populations. To this end, Peter [42] demonstrated that the expected values of f-statistics differ depending on the underlying model of population structure (including island, serial founder, and stepping stone models, as well as their respective hierarchical variants). For instance, a negative f4(P1, P2; P3, P4) prima facie suggests the rejection of the null-model of cladality between (P1, P2) and (P3, P4) due to unique shared genetic drift between populations (P2 and P3) and/or (P1 and P4). Indeed, under a one-dimensional stepping-stone model [75] where populations P1, P2, P3, and P4 are arranged linearly with one-way migration rates (M →), the expectation of f4(P1, P2; P3, P4) = -87M [42] is negative, asymptotically approaching zero as migration approaches one (Fig. S1). However, if populations P1-P4 are in-fact each composed of sub-demes with one-way migration between them, described as a hierarchical stepping-stone model, the expectation of f4(P1, P2; P3, P4) = 1455M [42] becomes positive (decreasing exponentially as M approaches one), with the prima facie interpretation of a positive f4(P1, P2; P3, P4) being unique shared genetic drift between (P2 and P4) and/or (P1 and P3) causing the rejection of the null-model of (P1, P2) (P3, P4) as clades (Fig. S1). Moreover, under the hierarchical stepping-stone model the value of the statistic f3(P2; P1, P3) = -0.06M [42], begins negative and asymptotically increases towards zero as M approaches one, counter intuitively suggesting that under low levels of migration population P2 can be modeled as originating from an admixture of populations P1 and P3 (Fig. S1). Together, these examples demonstrate the challenges of interpreting demographic and admixture histories from f-statistics alone under stepping-stone landscapes.

Given its use of f4-statistics, qpAdm can also be challenged in accurately modeling the admixture history of populations connected by stepping-stone landscapes. In particular, these models make it difficult to avoid violating the qpAdm assumption of no right-to-left or left-to-right gene flow events following the formation of the target lineage [53, 76, 77]. Using Model 1, we can illustrate the impact of admixture from a right-group to the target population between its formation and sampling on the f4-statistic — and by extension, qpAdm. One of the statistics in the qpAdm f4-statistic matrix modeling the genetic history of PX as a mixture of P1 and P2 will be f4(PX, P2; PO, P6). Under Model 1, the shared drift between P2 and P6 will result in a positive value. However, admixture from P6 to PX following its formation and prior to sampling can result in f4(PX, P2; PO, P6) being negative. Importantly, admixture from PX to P6 prior to the sampling of P6 will result in the same negative value of f4(PX, P2; PO, P6), highlighting the absence of admixture directionality information in a single f4-statistic.

Whilst the precise range of demographic parameters that will result in statistically significant model violations has yet to be explored (see Flegontova et al. [76] for the most comprehensive study using simulations of both two-dimensional stepping-stone landscapes and admixture-graph like models), violations such as these can result in an increase of the f4-statistic matrix rank (from rank = 1 under no model violations to rank = 2 with admixture from P6 to PX and vise-versa—see above section “Testing admixture models and estimating proportions with qpAdm” and Supplementary Note 3), and thus rejection of the “simpler” qpAdm (P1, P2) model. However, the implications of qpAdm model violations illustrated above on admixture inference are nuanced. In expectation, admixture from P6 to PX prior to its sampling will yield a statistically plausible qpAdm 3-source model (P1, P2, P6). However, depending on what aspect of the genetic history of PX one is interested in understanding. i.e., the period corresponding to its formation or their ancestry (as a mixture of P1 and P2), or its cumulative ancestry at the time of sampling (as a mixture of P1, P2, and P6) will influence how accurately a researcher considers the resulting model depicts the genetic history of PX. Similarly, when admixture occurs from PX to P6, the same three-source model is expected to be classified as statistically plausible (i.e., the expectation of a non-significant p-value for an increase in qpAdm matrix rank), yet this results in an incorrect interpretation of PX’s genetic history if qpAdm yields a positive non-zero admixture weight from P6 to PX. As such we propose classifying model violations into two camps; those resulting in the plausible inclusion of populations that in reality did not contribute ancestry as “false-inference” model violations, while those that would result in the inclusion of sources not involved in the admixture period of interest — for instance a plausible three-source model when only interested in the origin of PX — “misleading-inference” model violations.

Evaluations of the performance of qpAdm under simulated one-dimensional stepping-stone models [75] were conducted by Harney et al. [53] who modeled populations (named P1–P5) under a continuous symmetric migration model and showed that under a very high migration rate (M = 0.01; equivalent to 1% in each generation in a chosen deme coming from each neighboring deme) the “middle” population, i.e., P2, will frequently be plausibly modeled as a 50% mixture of its adjacent populations (P1 and P3). However, under lower migration rates (M = 0.001 and 0.0001), the qpAdm model of P2 with P1 and P3 as sources is consistently rejected, whilst the admixture estimate remained roughly 50%. Encouragingly, when simulating effectively the same one-dimensional stepping-stone demographic model as Harney et al. [53], Speidel et al. [77], using f-statistics estimated from genealogies with a generational time cut-off (discussed further below), demonstrate an increased power to accurately model admixture under continuous migration at rates M = 0.001 and 0.005.

Flegontova et al. [76] extended the evaluation of qpAdm using a demographic model of multiple panmictic demes evolving on an roughly circular two-dimensional landscape over ~ 2500 generations with non-uniform bidirectional gene flows. Their study reveals a disconcerting mismatch between qpAdm p-values and the optimal model consisting of the closest sources to the target population. Consequently, on their simulated complex stepping-stone landscape that was sampled either randomly or systematically, right-to-left and left-to-right gene flow events resulted in qpAdm’s tendency to favor geographically distant sources and reject nearby ones leading to a false impression of long-distance migration. Their study also raised concerns regarding low pre-study odds when exploring all possible source combinations on a stepping-stone landscape, wherein the proportion of models tested is dominated by non-optimal models — including sources distant from the target and positioned at small angular distances from each other — effectively creating a systematic bias against qpAdm success. Although qpAdm rejects most non-optimal models, a fraction inevitably remains unrejected, driving false discovery rates above 50% across substantial portions of the parameter space. Consequently, qpAdm protocols that evaluate hundreds to thousands of models per-target while aiming to identify only a few “fitting” ones may produce unreliable results.

Case studies in ancient DNA admixture analysis and migration inference

Notwithstanding the challenges outlined above, the utility of ancient DNA in studying human mobility hinges on the basic assumption that migration leads to detectable changes in population genetic ancestry. As such, when discussing examples of migration inferred from admixture signatures, it is important to recognize that this approach can miss “hidden” migrations that fail to alter ancestry detectably, either due to genetic similarities between migrant and recipient populations or because migrants did not reproduce in their new location (or remain unsampled). In the following sections, we review a selection of ancient DNA literature that demonstrates a range of approaches of using admixture to detect population migrations and shifts in mobility.

Detecting mobility from signatures of genetic discontinuity

Among the earliest and most comprehensively studied uses of ancient DNA for identifying human migration is the expansion of Neolithic farmers from the Near East to Europe. Whilst the Neolithic agricultural expansion from Southwest Asia to Europe was known to archaeologists as early as 1925 [78], its genetic impact on the local European populations remained unclear until a suite of whole-genome ancient DNA studies published between 2012 and 2018 [52, 60, 74, 7988] unveiled a complex interplay of expansion, admixture, and coexistence between Near Eastern farmers and European hunter-gatherers. Early evidence of complex migration interactions between European hunter-gatherers and farmers came from Skoglund et al. [80] where they analyzed ancient DNA from three key samples: a post-agricultural Scandinavian hunter-gatherer (Ajvide58), a pre-agricultural Iberian hunter-gatherer (LaBrana1), and an early Scandinavian farmer (Gökhem2). Through D-statistics, they showed that while the ancestors of Neolithic European farmers had admixed with hunter-gatherers (D(Yoruba, Ajvide58; Gökhem2; Sardinian) Z-score < − 6.66), European hunter-gatherer ancestors did not admix with European farmers (|D(Yoruba, Gökhem2; Ajvide58; LaBrana1) Z-score|< 1.5), despite at least 40 generations of cohabitation in Scandinavia. In addition, early evidence of distinct migration paths into Europe from the Near East was revealed by Hofmanová et al. [86] using ancient DNA from Aegean Neolithic farmers and various European farming communities. Among other signatures, the f4-statistic (f4(Central European-farmers, Spanish Farmers; Aegean Neolithic, ‡Khomani San) Z-score < − 3) — revealing that Aegean Neolithic individuals shared more genetic drift with Spanish farmers than with Central European farmers — supported the hypothesis of independent migration routes from the Aegean to Southwestern and Central Europe. Notably, the study’s authors contend that their data delivered the “coup de grâce” to the hypothesis that agriculture spread into and across Europe solely, or primarily, through ideological diffusion and without significant human movement — a perspective we will reevaluate in a subsequent discussion below.

Detecting mobility from genetic outliers

While the 25–30,000-year divergence between European hunter-gatherers and Near Eastern farmers [85, 89] yielded genetically distinct populations distinguishable through D/f-statistics, one may question the utility of ancient DNA in detecting migration between populations of relatively stable structure and less diverged — as observed in Europe and the Mediterranean over the last three millennia [52, 82]. In their analysis of British Bronze and Iron Age genetic history, Patterson et al. [90] developed an approach to detecting periods of increased migration rates based, in-part, on the identification of an elevated proportion of genetic outliers. Using qpAdm to quantify the proportion of Early European Neolithic farmer (EEF), Western European Mesolithic hunter-gatherer (WHG), and Yamnaya Steppe pastoralist (Steppe) ancestry in English and Welsh individuals dating between 2450 BC to 43 AD, Patterson et al. [90] identified 17% of individuals to be EEF-ancestry outliers in both the Chalcolithic/Early Bronze Age (C/EBA — 2450 and 1800 BC), and Middle to late Bronze Age (M-LBA — 1300 and 750 BC) periods.

The use of genetic outliers as a signature of periods of high mobility among genetically similar populations was also employed by Antonio et al. [72] in their study of historical mobility and population structure across Europe, the Mediterranean, and sub-regions of Southwest Asia. In studying three sub-periods from 10 000 BCE to 1950 CE, the authors revealed a striking degree of ancestry heterogeneity, particularly in the regions of Sardinia, the Levant and Egypt, Eastern Europe and the Steppe, and Italy. Employing a qpAdm approach — slightly different to that of Patterson et al. [90]— they estimate 11% of individuals as outliers and 7% putative first-generation migrants, culminating in a detailed admixture network map connecting Europe, the Mediterranean, North Africa, the Levant, and Caucasus.

Detecting mobility from measures of population genetic diversity and divergence

In addition to examining ancestry shifts and genetic outliers, summary statistics measuring genetic diversity (such as FST) and genetic distance (such as 1 − outgroup f3-statistic) have also been utilized to identify periods of mobility and admixture. In their study of historical mobility across Europe, the Mediterranean, and sub-regions of Southwest Asia described above, Antonio et al. [72] document a striking observation: despite their detection of high mobility, when computing FST across groups on a sliding spatial grid for each historical period and relating it to mean geographic distance, it remained stable from around the Bronze Age (~ 2300 BCE) onwards, suggesting relatively unchanged genetic differentiation across space and time down to the present day. Through simulations of Wright-Fisher populations evolving neutrally in continuous space, the authors confirmed that even with long-range dispersal as low as 4%, FST decreases over 120 generations, a result inconsistent with their qpAdm outlier-based estimation of ~ 7–11% long-range historical migration. To explain these seemingly incompatible findings, the authors proposed a “transient mobility” hypothesis, which suggests that whilst technological and political developments during the historical period facilitated increased migration, this was not followed by a sustained integration into the local population, resulting in a separation of movement and reproduction compared to prehistoric eras. While emphasizing the need for additional ancient DNA data from urban and rural contexts throughout the historical period to fully assess their hypothesis, the authors also highlight the intricacy of human migration, where migration and mating may not necessarily be random, leaving uncertainty regarding the demographic processes maintaining spatial genetic structure — a concept we revisit below.

When researching the Holocene spatiotemporal dynamics of human genetic diversity and inter-regional mobility across the Eastern Mediterranean and a wider span of Southwest Asia, Koptekin et al. [91] observe since approximately 6000 BP a pattern of decreasing inter-regional FST extending to the present day coupled with increasing inter-regional genetic distance (1 − outgroup f3). To explain these observations, the authors propose an expanding-mobility model consisting of two sequential historical processes. First is that the observed decrease in inter-regional FST can be attributed to heightened mobility within Southwest Asia and the Eastern Mediterranean in the wake of the Neolithic expansion. However, from approximately 6000–4000 BP onwards, populations in Southwest Asia and the East Mediterranean experienced varying degrees of gene flow from areas beyond their borders, exemplified by the influx of Steppe-related ancestry in the Aegean, South Caucasus, and the Levant, as well as South-Asian-related and West-Siberian-related ancestry on the Iranian plateau.

Detecting mobility through modeling shifts in spatio-temporal genetic ancestry

Given adequate sample sizes, the extraction of DNA from skeletons with precise spatial–temporal data promises the possibility of creating a multi-dimensional map illuminating shifts in genetic ancestry over space and time. Genetic differences between populations are expected to correlate with both temporal and spatial distances, with the strength of correlation dependent on mobility levels and spatial population structure. Loog et al. [92] argued that the strength of correlation between genetic differences and spatial or temporal factors will depend on mobility levels: low mobility (strong spatial structure) would lead to stronger space-based correlations, while high mobility would cause time to explain a larger proportion of differences due to its homogenizing effects across space. Leveraging this relationship, Loog et al. [92] developed a scaling factor value (Smax) that can identify relative periods of high mobility by maximizing the correlation between a matrix of genetic differences and a Euclidean matrix of spatial and temporal distances. Using measures of pairwise heterozygosity between approximately 300 genome-wide data covering a time period from the beginning of the Upper Paleolithic to the Iron Age, Loog et al. [92] found at least three distinct stages of high mobility. Whilst identifying relative high mobility as a result of the Neolithic expansion, consistent with previously described ancient DNA data and archaeological findings, their analysis revealed that mobility among Holocene farmers in Europe significantly exceeded that of European hunter-gatherers both before and after the Last Glacial Maximum, suggesting that hunter-gatherers moved either less frequently or over shorter distances compared to farmers.

Leveraging the recent rapid increase in genome-wide ancient DNA, Schmid and Schiffels [93] created an interpolated spatiotemporal ancestry field from which they can then study individual-level mobility through time. Applying their method to 3,138 ancient DNA samples from Western Eurasia, they first use multidimensional scaling (MDS) to capture the two axes that explain the greatest genetic diversity within their sample set and model these as dependent variables in a Gaussian Process Regression analysis with input variables describing the position of each sample in space and time. As the genetic axes stem from MDS analysis, the “ancestry” field in this context does not refer to an admixture coefficient, but rather orthogonal ancestry components captured in the MDS analysis. From this, they derive individual-wise mobility on a large scale by generating a probabilistic similarity search algorithm to determine the likelihood to observe a sample’s MDS coordinates at a certain point in space and time, and the relative spatial distribution of similarity probabilities in a given timeframe.

In addition to capturing previously known major Holocene migration events, such as the Neolithic demographic expansion and the arrival of Steppe ancestry in the third millennium, by considering the spatiotemporal nature of genetic ancestry, Schmid & Schiffels [93] approach highlights the fluidity of ancestry, dispelling notions of the geographically imputable nature of one’s genetic legacy. For example, tracing the spatiotemporal ancestry of Stuttgart, an early Neolithic sample from Central Europe, reveals a progression of highest similarity from the Levant (~ 7500 BC) to Anatolia (post–7000 BC), and then to western Anatolia (6750 BC), reflecting early population structure and migration in the Near East following the Last Glacial Maximum [89]. By 5250 BC, approximately the time of Stuttgart’s death, the peak similarity area includes its burial location in Central Europe, confirming the arrival of Near Eastern ancestry in the region.

Beyond genetic and spatial–temporal information, the incorporation of climatic and environmental data may yield additional insights when interpreting patterns of past mobility. To this end, Racimo et al. [94] developed a spatiotemporally explicit hierarchical Bayesian model to better understand the relationships between changes in climate, ancestry, and paleovegetation. Their model incorporated an interpolated map of major European ancestral components (Mesolithic hunter-gatherers, Neolithic farmers, and Yamnaya Steppe peoples) derived from Ohana [95], simulation-based Holocene paleoclimate reconstructions from PaleoClim [96], and paleovegetation records documenting various land cover types. Notably, they found that the Neolithic farmer migration, characterized by a two-pronged wavefront, did not strongly correlate with vegetational landscape alterations, whereas the faster Early Bronze Age Steppe migration coincided with substantial changes in vegetation, such as the reduction of broad-leaf forests and expansion of pasture lands throughout Europe.

The question of sample representativeness

Analyses of ancient DNA often rely on small sample sizes, especially in harsh environments that rapidly deteriorate DNA, or when attempting to reconstruct fine-scale spatio-temporal admixture histories where the density of archaeological samples is low. While Reich et al. [40] developed an unbiased estimator to address the challenges of computing the f2-statistic from small samples, the question remains as to how accurately a single, or small number of samples, can recover the admixture history of an entire ancient population or accurately represent true ancestral admixing sources. This is especially pertinent given that the true admixing sources are highly likely to originate from entirely different ancient communities than those excavated and sequenced. Importantly, the relevance of sample representation is context-dependent, varying with the specific research questions. For instance, Li and Durbin [97] demonstrated that even a single human diploid genome can offer valuable insights into the timing of population divergence, underscoring the breadth of genetic history contained within a single genome due to the processes of recombination and inheritance. In the context of ancient DNA analysis, benchmarks from simulated pulse-admixture and random mating demographic models demonstrate that while limited sample sizes diminish qpAdm’s power, they do not generate bias in the estimation of population admixture [53]. However, under the specific condition of genetic asymmetry between the tested and true sources in two-source qpAdm models, there does appear to be a slight upward bias towards the test population most closely related to the true admixing population [47].

Questions regarding sample representativeness also extend to how comprehensively geographical regions are sampled and subsequently incorporated as candidate sources when investigating a target population’s ancestry through qpAdm analyses. In addressing this issue, through progressively testing more complex qpAdm models until finding a non-rejected model with feasible admixture proportions — a common approach in literature Flegontova et al. [76] found that when the landscape in the vicinity of the target population is poorly sampled, analyses conclude with more complex models (four-way versus two-way). This outcome leads to misleading interpretations because only symmetrically arranged demes yield non-rejected two-way models, while complex models often face rejection due to uncertainty in estimating admixture fractions from similar sources.

The representativeness of ancient DNA samples is further challenged by the extent to which inferences about migration from a limited number of individuals can be reliably extrapolated to broader archaeological contexts, geographical regions or time-periods. Indeed, this sentiment is captured by Sikora et al. [83], commenting on the early discovery of a genetic affinity between present-day Sardinians and a 5300 year old mummy (the Tyrolean Iceman) located near the Austrian-Italian border [98], stating that “… this finding was hindered by the question of how representative this single individual really was for the Central Alpine region”. These challenges are not limited to inferences that cross millenia. In researching the purported “Sea-Peoples” migration hypothesis using ancient DNA extracted from the ancient Levantine archaeological site, Ashkelon, Feldman et al. [99] discovered a shift in the genetic ancestry between the Iron Age I (ASH_IA1) and II (ASH_IA2) periods whereby f4-statistics of form f4(ASH_IA2, ASH_IA1; Test, Mbuti) are all significantly negative (Z-score ≤ − 3) when Test populations are from either Europe or Anatolia. In explaining the transient nature of the European-related genetic affinity, Feldman et al. [99] state that it could have been diluted by admixture — either the local Ashkelon population or by a gene flow from a closely related population outside of Ashkelon as f4-statistics of form f4(ASH_IA2, ASH_LBA; Test, Mbuti) (|Z-score|< 2.8) confirmed the genetic symmetry between the Iron Age II and Late Bronze Age Ashkelon cohorts. Given the distinct archaeological contexts of the Iron Age I (infant burials under house floors on the central mound) and Iron Age II samples (cemetery adjacent to the city wall), it is also plausible that these represent coexisting — yet temporally unsampled — ancestries in Ashkelon, obviating the need for a dilution/migration explanation to account for differences in European-related ancestry. Acknowledging the crucial role of comprehensive genetic sampling in historical inquiries, the authors caution that insufficient sampling can lead to erroneous conclusions.

The prospect of contemporaneous groups residing in the same ancient settlement whilst possessing distinct ancestries accentuates the impact of sample partiality on admixture estimation and interpretation in archaeogenetic research. Sociological barriers to human interaction, whether they be economic, class, religious, linguistic, or ethnic, results in deviations from random mating and leads to population structure. Under random mating, admixed genomic segments quickly distribute evenly throughout the population whereby even single samples taken after five or so generations after admixture provide a close estimate of the true population ancestry proportion (Fig. 4). Conversely, non-random mating, particularly positive assortative mating for ancestry-associated traits at the time of admixture [100], maintains a wider spectrum of individual ancestry proportions for longer in a population due to increased pairings between individuals with similar ancestral backgrounds (Fig. 4). The heightened variance in ancestry increases the likelihood of small samples producing estimates that deviate strongly from the population-wide average (Fig. 4). Sampling multiple individuals can alleviate but not completely eliminate this issue (Fig. 4), particularly if archaeological sampling is not completely random and ancestry estimation uncertain due to low data quality. Despite, to our knowledge, there are no systematic studies of assortative mating in ancient populations, historical evidence indicates the presence of segregated communities within ancient societies [101, 102]. In addition, historical records document the presence of external forces shaping societal mobility, such as the Assyrian Empire which is thought to have deported approximately 4.5 million people between 830 and 640 BC, leading to significant linguistic, social, and cultural changes beyond their initial political and economic goals [103, 104]. While identifying any association between ancient societal stratification and ancestry will first require a site-by-site analysis, the impact of burial practices and conditions on DNA preservation [105107] offers one conceivable mechanism through which societal stratification could distort the perception of population ancestry and migration interpretation — such as socioeconomic status through kinship structures [108110] potentially being linked to sample survivorship bias. More broadly, skeletal preservation may be differentially impacted by certain funerary practices — such as sub-aerial exposure — resulting in systematic sampling biases distorting the picture of ancient community structure.

Fig. 4.

Fig. 4

The impact of deviations from random mating and sample sizes on admixture proportions. This SLiM model simulates assortative mating in a non-Wright-Fisher population with a constant size of 2000 individuals, focusing on genomic ancestry over 50 generations. The model represents a Full human genome with a recombination rate of 1 cM/Mb. At the outset, a proportion of individual genomes are tagged with ancestry markers, reflecting immigrant ancestry from a single admixture pulse. Mating is then based on ranked individual ancestry proportions, creating assortative mating according to a specified ancestry correlation between mates [111]. We simulate two scenarios: one with random mating (correlation = 0) and another with strong assortative mating (correlation = 0.9). The simulation tracks changes in individual ancestry proportions across generations

Prospects for the future use of ancient DNA to study human admixture and mobility

Ancient DNA research has already revolutionized our understanding of human mobility, but its full potential is yet to be realized. Future advancements in sampling density (both temporally and geographically), increasing data quality, new analytical techniques, and innovative interdisciplinary collaborations promise to provide a more comprehensive and holistic picture of ancient societal structures and evolution.

Data advancements

The availability of low-coverage whole-genome sequencing (WGS) imputation methods [112115], in addition to the growing research into the accuracy and performance of imputation and phasing of low-coverage ancient genomes [116122], are enabling the prospect of applying powerful demographic inference techniques. These include methods based on the SFS, complete genotype-haplotype information, and recombination-aware, to ancient samples with coverage as low as 0.5 ×, provided an adequate reference panel exists. Indeed, examples of innovative approaches and methods utilizing WGS imputed ancient genomes are beginning to emerge [89, 123127]. A notable example is the newly developed method twigstats [77], that leverages f2-statistics estimated from temporal subsets of inferred genome-wide genealogies [128] to enhance qpAdm’s power and has demonstrated promise for uncovering fine-scale population structure in ancient populations.

Recent methodological advancements (ancIBD Ringbauer et al. [126] and IBDSEQ (Browning and Browning [129]) have taken advantage of genome imputation and phasing, making it possible to identify long stretches of the genome shared by pairs of ancient individuals that result from recent genealogical ancestry — so-called identity-by-descent (IBD) segments. Because IBD segments are shared stretches of DNA preserved without intervening recombination events they exhibit an inverse relationship between their length and generational distance from common ancestors, with longer segments signifying recent shared ancestry while shorter fragments reflect more ancient genealogical connections. IBD assessments provide an important complementary approach to allele frequency-based methods in migration studies by detecting direct shared genealogical ancestry between individuals from different archaeological sites. As such, they capture direct signatures of migration-induced relationships between people from distinct homelands, rather than modeling migration using ancient populations as putative proxy sources only indirectly related to the true admixing populations by way of allele frequency similarities [130].

Moreover, with advancements in methods developed to harness inferred genome-wide genealogies for mapping the geographical distribution of genomic ancestors [131, 132], the prospective incorporation of ancient DNA into these approaches promises to dramatically increase our spatiotemporal resolution of genetic ancestry. However, given the prevalence of hybridization capture in ancient DNA generation, evaluating the potential downstream biases of imputation on capture data [116, 117, 126] from diverse geographical origins and underrepresented ancestries in reference panels will be crucial, both for comparative analyses with existing datasets and for samples where economic or technical constraints preclude shotgun sequencing at coverage levels adequate for accurate imputation.

Interdisciplinary developments

The disparity between archaeological approaches to researching migration, which emphasize the movement of individuals, and ancient DNA, which utilize genetic measures expected from idealized theoretical populations, underscores a fundamental challenge in synthesizing their inferences for describing past demographic events. Moreover, diverse migration scenarios, including single large pulse events, continuous small-scale migration, population size fluctuations, subsequent dilution by different migrant groups, and differential reproductive success due to cultural or genetic advantages, can all potentially affect the ancestry proportions within a population, underscoring the complexity of inferring specific historical demographic processes from ancient DNA data alone. Despite the capacity to detect major genetic shifts indicative of “mass-migrations,” ancient DNA methods cannot translate these findings into specific census counts of individuals who migrated and integrated into new societal contexts. As such, the use of admixture proportions to infer the underlying generative demographic processes presents a fundamental challenge in archaeogenetic research, prompting inquiries into the utility of these estimations for informing our understanding of the societal mechanisms underlying demographic history.

The value of integrating multiple lines of evidence to elucidate human mobility patterns has been demonstrated by a number of paleo/archaeogenomic analyses [133136]. In particular, the co-analysis of ancient DNA with coarse geographically informative markers, such as stable isotopes, has proven insightful for interrogating signatures of migration covering multiple temporal scales — such as revealing the complex interplay between decreasing generational mobility and the formation of a mixed Anatolian, Levantine, and Iranian/Caucasus ancestry profile in the Anatolian Pre-Pottery Neolithic samples from Nevalı Çori [133], and the inference of state-sponsored resettlement in the Chincha Valley during the pre-Colonial Andes revealed through the co-analysis of textual sources, textile analysis, strontium isotope data, ceramic evidence, and ancient DNA [135]. In addition, the co-analyses of IBD networks and archaeological status markers (such as belt sets for males and coat clasps for females) revealed that female mobility between communities was the main driver of genetic connectivity, with distinct marriage networks maintaining genetic barriers between sites, while individuals with high-status grave goods showed 1.27–2.55 times higher probability of having genetic connections to others, demonstrating a direct correlation between social status and biological relatedness [130].

Spatio-temporal simulations have emerged as a crucial tool in developing population genetic inference [137143]. In particular, agent-based spatio-temporal modeling incorporating interdisciplinary data has emerged as a powerful approach for unraveling the complex demographic dynamics underlying migration patterns. A compelling example of this methodology is the study by LaPolice et al. [144], who modeled the Neolithic expansion to investigate the interplay between demic and cultural mechanisms in the spread of cultural practices. Their study revealed a counterintuitive relationship between cultural transmission rates and genetic ancestry patterns, demonstrating that a wide range of low but sufficient learning rates can result in predominantly demic diffusion of culture without causing a turnover in genetic ancestry. This finding challenges simplistic interpretations of genetic data alone and highlights the importance of considering multiple factors when studying ancient population movements. Such modeling approaches offer a promising avenue for deepening our understanding of the sociocultural dynamics that shape human migration and genetic diversity.

The emergence of paleo/archaeogenomics is catalyzing a paradigm shift in our understanding of human migration. This interdisciplinary field, which integrates insights from population genetics, archaeology, ancient history, and other disciplines, is not only enhancing our knowledge of past population movements but also paving the way for new theoretical frameworks — such as new conceptual models of intergenerational material cultural transmission [145], and furthering our conceptions of the relationship between material cultural nomenclature and genetic clustering of ancient communities [38]. Looking forward, the synthesis of advancing data-generation techniques, analytical methods, and transdisciplinary integration holds the key to unlocking the true promise of paleo/archaeogenomics in painting the nuanced picture of societal dynamics throughout human history.

Supplementary Information

13059_2025_3664_MOESM1_ESM.pdf (801.9KB, pdf)

Additional file 1: Supplementary Note 1. Introduction to population genetic concepts Supplementary Note 2. Mathematical derivation of the f4-statistic as composed of as combinations of f2-statistics Supplementary Note 3: Implementation of qpAdm in studying ancient human genetic admixture Fig S1. Expected f-statistics under one dimensional stepping stone and hierarchical stepping stone models.

Acknowledgements

We thank Troy LaPolice and Abigail Sequeira for their contribution to the slim assortative mating simulation code. We are grateful to Dr. Hannah Moots, Dr. Thomas Booth, Dr. Iain Mathieson and Dr. Pavel Flegontov for their careful reading of an earlier draft and helpful comments. We are also grateful to Dr. Iosif Lazaridis for comments on the qpAdm software.

Peer review information

Andrew Cosgrove was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. The peer-review history is available in the online version of this article.

Authors’ contributions

M.P.W planned the review, read the literature and drafted the text. M.P.W drafted Figs. 1, 2 and 3 and C.D.H drafted Fig. 4. All contributed edits and improvements. All authors read and approved the final manuscript.

Funding

C.D.H. and M.P.W were funded by the National Institute of Health under award number R35GM146886.

Data availability

All scripts used for simulation generation and plot creation are publicly accessible through our GitHub and Zenodo repositories [146, 147] and distributed under the Creative Commons Attribution 4.0 International licence.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

The original version of this article was revised: The order of citations and references in this article have been corrected.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Change history

10/20/2025

A Correction to this paper has been published: 10.1186/s13059-025-03706-3

Contributor Information

Matthew P. Williams, Email: mkw5910@psu.edu

Christian D. Huber, Email: cdh5313@psu.edu

References

  • 1.Pickrell JK, Reich D. Toward a new history and geography of human genes informed by ancient DNA. Trends Genet. 2014;30:377–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Stoneking M, Krause J. Learning about human population history from ancient and modern genomes. Nat Rev Genet. 2011;12:603–14. [DOI] [PubMed] [Google Scholar]
  • 3.Williams M, Teixeira J. A genetic perspective on human origins. Biochem. 2020;42:6–10. [Google Scholar]
  • 4.Slatkin M, Racimo F. Ancient DNA and human history. Proc Natl Acad Sci. 2016;113:6380–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Skoglund P, Mathieson I. Ancient Genomics of Modern Humans: The First Decade. Annu Rev Genom Hum Genet. 2018;19:381–404. [DOI] [PubMed] [Google Scholar]
  • 6.Yang MA, Fu Q. Insights into Modern Human Prehistory Using Ancient Genomes. Trends Genet. 2018;34:184–96. [DOI] [PubMed] [Google Scholar]
  • 7.Haber M, Mezzavilla M, Xue Y, Tyler-Smith C. Ancient DNA and the rewriting of human history: be sparing with Occam’s razor. Genome Biol. 2016;17:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hofmann D. What Have Genetics Ever Done for Us? The Implications of aDNA Data for Interpreting Identity in Early Neolithic Central Europe. Eur J Archaeol Arch. 2015;18:454–76. [Google Scholar]
  • 9.Linden MV. Population history in third-millennium-BC Europe: assessing the contribution of genetics. World Archaeol. 2016;48:714–28. [Google Scholar]
  • 10.Furholt M. De-contaminating the aDNA – archaeology dialogue on mobility and migration: discussing the culture-historical legacy. Current Swedish Archaeology. 2019;27:53–68. [Google Scholar]
  • 11.Furholt M. Massive Migrations? The Impact of Recent aDNA Studies on our View of Third Millennium Europe. Eur J Archaeol. 2018;21:159–91. [Google Scholar]
  • 12.Heyd V. Kossinna’s smile. Antiquity. 2017;91:348–59. [Google Scholar]
  • 13.Ion A. How interdisciplinary is interdisciplinarity? Revisiting the impact of a DNA research for the archaeology of human remains. Current Swedish Archaeology. 2017;25:177–98. [Google Scholar]
  • 14.Klejn LS, Haak W, Lazaridis I, Patterson N, Reich D, Kristiansen K, et al. Discussion: Are the Origins of Indo-European Languages Explained by the Migration of the Yamnaya Culture to the West? Eur J Archaeol. 2018;21:3–17. [Google Scholar]
  • 15.Anthony DW. Ancient DNA and migrations: New understandings and misunderstandings. J Anthr Archaeol. 2023;70:101508. [Google Scholar]
  • 16.Adams WY, Gerven DPV, Levy RS. The Retreat from Migrationism. Annu Rev Anthr. 1978;7:483–532. [Google Scholar]
  • 17.Kramer C. Pots and Peoples. In: Levine LD, Young TC, editors. Mountains and Lowlands: Essays in the Archaeology of Greater Mesopotamia. Malibu: Undena Publications; 1977. p. 91–112. [Google Scholar]
  • 18.Booth TJ. A stranger in a strange land: a perspective on archaeological responses to the palaeogenetic revolution from an archaeologist working amongst palaeogeneticists. World Archaeol. 2019;51:586–601. [Google Scholar]
  • 19.Lalueza-Fox C. Agreements and Misunderstandings among Three Scientific Fields: Paleogenomics, Archaeology, and Human Paleontology. Curr Anthr. 2013;54:S214–20. [Google Scholar]
  • 20.Scally A. Roots of misunderstanding. In: Halle (Saale); Meller, H., Krause, J., Haak, W., Risch , R., Hrsg, editors. Kinship, Sex, and Biological Relatedness : The contribution of archaeogenetics to the understanding of social and biological relations. Tagungen des Landesmuseums für Vorgeschichte Halle; Propylaeum: Heidelberg. 2023. p. 61–4.
  • 21.Burmeister S. Archaeology and Migration: Approaches to an Archaeological Proof of Migration. Curr Anthr. 2000;41:539–67. [Google Scholar]
  • 22.Ames NP. Migration in Historical Archaeology. Encyclopedia of Global Archaeology. Springer, Cham. 2020:1–15.
  • 23.Han P. Soziologie der Migration: Erklärungsmodelle, Fakten, politische Konsequenzen, Perspektiven. UTB; 2016;2118.
  • 24.Albrecht G. Soziologie der geographischen Mobilität: zugleich ein Beitrag zur Soziologie des sozialen Wandels. Ferdinand Enke; 1972.
  • 25.MacLeod M. What makes interdisciplinarity difficult? Some consequences of domain specificity in interdisciplinary practice. Synthese. 2018;195:697–720. [Google Scholar]
  • 26.Long JC. The genetic structure of admixed populations. Genetics. 1991;127:417–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Verdu P, Rosenberg NA. A General Mechanistic Model for Admixture Histories of Hybrid Populations. Genetics. 2011;189:1413–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mathieson I, Scally A. What is ancestry? PLoS Genet. 2020;16:e1008624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Gopalan S, Smith SP, Korunes K, Hamid I, Ramachandran S, Goldberg A. Human genetic admixture through the lens of population genomics. Philos Trans R Soc B. 2022;377:20200410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Liang M, Nielsen R. Understanding Admixture Fractions. bioRxiv. 2014:008078. 10.1101/008078.
  • 31.Coop G. Genetic similarity versus genetic ancestry groups as sample descriptors in human genetics. arXiv preprint arXiv:220711595. 2022.
  • 32.Bradburd GS, Coop GM, Ralph PL. Inferring continuous and discrete population genetic structure across space. Genetics. 2018;210:33–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Miller JM, Cullingham CI, Peery RM. The influence of a priori grouping on inference of genetic clusters: simulation study and literature review of the DAPC method. Heredity. 2020;125:269–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Foster MW, Sharp RR. Beyond race: towards a whole-genome perspective on human populations and genetic variation. Nat Rev Genet. 2004;5:790–6. [DOI] [PubMed] [Google Scholar]
  • 35.Brodwin P. Genetics, identity and the anthropology of essentialism. In: Ifekwunigwe JO, editor. “Mixed Race” Studies. London: Routledge; 2004. p. 116–22. [Google Scholar]
  • 36.Gannett L. Making populations: Bounding genes in space and in time. Philosophy of Science. 2003;70:989–1001. [Google Scholar]
  • 37.Drost T. The genetic history and diversity of humanity-History, identity and meaning in the Human Genome Diversity Project and the Genographic Project. Utrecht University MS thesis. 2011.
  • 38.Eisenmann S, Bánffy E, van Dommelen P, Hofmann KP, Maran J, Lazaridis I, et al. Reconciling material cultures in archaeology with genetic data: The nomenclature of clusters emerging from archaeogenomic analysis. Sci Rep. 2018;8:13003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Orlando L, Allaby R, Skoglund P, Sarkissian CD, Stockhammer PW, Ávila-Arcos MC, et al. Ancient DNA analysis. Nat Rev Methods Prim. 2021;1:14. [Google Scholar]
  • 40.Reich D, Thangaraj K, Patterson N, Price AL, Singh L. Reconstructing Indian population history. Nature. 2009;461:489–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, et al. Ancient Admixture in Human History. Genetics. 2012;192:1065–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Peter BM. Admixture, Population Structure, and F -Statistics. Genetics. 2016;202:1485–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Gower G, Ragsdale AP, Bisschop G, Gutenkunst RN, Hartfield M, Noskova E, et al. Demes: a standard format for demographic models. Genetics. 2022;222:iyac131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Baumdicker F, Bisschop G, Goldstein D, Gower G, Ragsdale AP, Tsambos G, et al. Efficient ancestry and mutation simulation with msprime 1.0. Genetics. 2021;220:iyab229. [DOI] [PMC free article] [PubMed]
  • 45.Lauterbur ME, Cavassim MIA, Gladstein AL, Gower G, Pope NS, Tsambos G, et al. Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations. eLife. 2023;12:RP84874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Adrion JR, Cole CB, Dukler N, Galloway JG, Gladstein AL, Gower G, et al. A community-maintained standard library of population genetic models. eLife. 2020;9:e54967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Williams MP, Flegontov P, Maier R, Huber CD. Testing Times: Disentangling Admixture Histories in Recent and Complex Demographies using ancient DNA. GENETICS. 2024:iyae110. [DOI] [PMC free article] [PubMed]
  • 48.Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, et al. A Draft Sequence of the Neandertal Genome. Science. 2010;328:710–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Durand EY, Patterson N, Reich D, Slatkin M. Testing for Ancient Admixture between Closely Related Populations. Mol Biol Evol. 2011;28:2239–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Lipson M. Applying f4-statistics and admixture graphs: Theory and examples. Mol Ecol Resour. 2020;20:1658–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Atağ G, Somel M. An explanation for the neighbour repulsion phenomenon in Patterson’s f-statistics. Bio Rxiv. 2024;02(17):580509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, Llamas B, et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature. 2015;522:207–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Harney É, Patterson N, Reich D, Wakeley J. Assessing the Performance of qpAdm: A Statistical Tool for Studying Population Admixture. Genetics. 2021;217:iyaa045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Reich D, Patterson N, Campbell D, Tandon A, Mazieres S, Ray N, et al. Reconstructing native American population history. Nature. 2012;488:370–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Moorjani P, Thangaraj K, Patterson N, Lipson M, Loh P-R, Govindaraj P, et al. Genetic evidence for recent population mixture in India. The American Journal of Human Genetics. 2013;93:422–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Scally A. Complexity in Human Ancestral Demography JASs. 2021;99:179–82. [DOI] [PubMed] [Google Scholar]
  • 57.Briggs AW, Stenzel U, Johnson PLF, Green RE, Kelso J, Prüfer K, et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci. 2007;104:14616–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Sawyer S, Krause J, Guschanski K, Savolainen V, Pääbo S. Temporal Patterns of Nucleotide Misincorporations and DNA Fragmentation in Ancient DNA. PLoS ONE. 2012;7:e34131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Dabney J, Meyer M, Pääbo S. Ancient DNA Damage. Cold Spring Harb Perspect Biol. 2013;5:a012567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Mathieson I, Lazaridis I, Rohland N, Mallick S, Patterson N, Roodenberg SA, et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature. 2015;528:499–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Fu Q, Posth C, Hajdinjak M, Petr M, Mallick S, Fernandes D, et al. The genetic history of Ice Age Europe. Nature. 2016;534:200–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Davidson R, Williams MP, Roca-Rada X, Kassadjikova K, Tobler R, Fehren-Schmitz L, et al. Allelic bias when performing in-solution enrichment of ancient human DNA. Mol Ecol Resour. 2023;23:1823–40. [DOI] [PubMed] [Google Scholar]
  • 63.Rohland N, Mallick S, Mah M, Maier R, Patterson N, Reich D. Three Reagents for in-Solution Enrichment of Ancient Human DNA at More than a Million SNPs. bioRxiv. 2022;01(13):476259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Davidson R, Roca-Rada X, Ravishankar S, Taufik L, Haarkötter C, Collen E, et al. Optimised in-solution enrichment of over a million ancient human SNPs. Bio Rxiv. 2024;05(16):594432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Flegontov P, Işıldak U, Maier R, Yüncü E, Changmai P, Reich D. Modeling of African population history using f-statistics is biased when applying all previously proposed SNP ascertainment schemes. PLOS Genet. 2023;19:e1010931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Günther T, Nettelblad C. The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS Genet. 2019;15: e1008302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.de Filippo C, Meyer M, Prüfer K. Quantifying and reducing spurious alignments for the analysis of ultra-short ancient DNA sequences. BMC Biol. 2018;16:121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Martiniano R, Garrison E, Jones ER, Manica A, Durbin R. Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph. Genome Biol. 2020;21:250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Günther T, Schraiber JG. Estimating allele frequencies, ancestry proportions and genotype likelihoods in the presence of mapping bias. bioRxiv. 2024;07(01):601500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Koptekin D, Yapar E, Vural KB, Sağlıcan E, Altınışık NE, Malaspinas A-S, et al. Pre-processing of paleogenomes: Mitigating reference bias and postmortem damage in ancient genome data. Bio Rxiv. 2023;11(11):566695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Moots HM, Antonio M, Sawyer S, Spence JP, Oberreiter V, Weiß CL, et al. A genetic history of continuity and mobility in the Iron Age central Mediterranean. Nat Ecol Evol. 2023;7:1515–24. [DOI] [PubMed] [Google Scholar]
  • 72.Antonio ML, Weiß CL, Gao Z, Sawyer S, Oberreiter V, Moots HM, et al. Stable population structure in Europe since the Iron Age, despite high mobility. eLife. 2023;13. https://elifesciences.org/articles/79714. [DOI] [PMC free article] [PubMed]
  • 73.Chintalapati M, Patterson N, Moorjani P. The spatiotemporal patterns of major human admixture events during the European Holocene. eLife. 2022;11:e77625. https://elifesciences.org/articles/77625. [DOI] [PMC free article] [PubMed]
  • 74.Lazaridis I, Nadel D, Rollefson G, Merrett DC, Rohland N, Mallick S, et al. Genomic insights into the origin of farming in the ancient Near East. Nature. 2016;536:419–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Kimura M, Weiss GH. The stepping stone model of population structure and the decrease of genetic correlation with distance. Genetics. 1964;49:561–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Flegontova O, Işıldak U, Yüncü E, Williams MP, Huber CD, Kočí J, et al. Performance of qpAdm-based screens for genetic admixture on graph–shaped histories and stepping stone landscapes. Genetics. 2025;230:iyaf047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Speidel L, Silva M, Booth T, Raffield B, Anastasiadou K, Barrington C, et al. High-resolution genomic history of early medieval Europe. Nature. 2025;637:118–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Childe VG. The dawn of European civilization. London, New York: K. Paul, Trench, Trubner & Co.; A.A. Knopf; 1925. [Google Scholar]
  • 79.Skoglund P, Malmström H, Raghavan M, Storå J, Hall P, Willerslev E, et al. Origins and Genetic Legacy of Neolithic Farmers and Hunter-Gatherers in Europe. Science. 2012;336:466–9. [DOI] [PubMed] [Google Scholar]
  • 80.Skoglund P, Malmström H, Omrak A, Raghavan M, Valdiosera C, Günther T, et al. Genomic Diversity and Admixture Differs for Stone-Age Scandinavian Foragers and Farmers. Science. 2014;344:747–50. [DOI] [PubMed] [Google Scholar]
  • 81.Gamba C, Jones ER, Teasdale MD, McLaughlin RL, Gonzalez-Fortes G, Mattiangeli V, et al. Genome flux and stasis in a five millennium transect of European prehistory. Nat Commun. 2014;5:5257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature. 2014;513:409–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Sikora M, Carpenter ML, Moreno-Estrada A, Henn BM, Underhill PA, Sánchez-Quinto F, et al. Population Genomic Analysis of Ancient and Modern Genomes Yields New Insights into the Genetic Ancestry of the Tyrolean Iceman and the Genetic Structure of Europe. PLoS Genet. 2014;10:e1004353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Günther T, Valdiosera C, Malmström H, Ureña I, Rodriguez-Varela R, Sverrisdóttir ÓO, et al. Ancient genomes link early farmers from Atapuerca in Spain to modern-day Basques. Proc Natl Acad Sci. 2015;112:11917–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Broushaki F, Thomas MG, Link V, López S, van Dorp L, Kirsanow K, et al. Early Neolithic genomes from the eastern Fertile Crescent. Science. 2016;353:499–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Hofmanová Z, Kreutzer S, Hellenthal G, Sell C, Diekmann Y, Díez-del-Molino D, et al. Early farmers from across Europe directly descended from Neolithic Aegeans. Proc Natl Acad Sci. 2016;113:6886–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Lipson M, Szécsényi-Nagy A, Mallick S, Pósa A, Stégmár B, Keerl V, et al. Parallel palaeogenomic transects reveal complex genetic history of early European farmers. Nature. 2017;551:368–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Mathieson I, Alpaslan-Roodenberg S, Posth C, Szécsényi-Nagy A, Rohland N, Mallick S, et al. The genomic history of southeastern Europe. Nature. 2018;555:197–203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Marchi N, Winkelbach L, Schulz I, Brami M, Hofmanová Z, Blöcher J, et al. The genomic origins of the world’s first farmers. Cell. 2022;185:1842-1859.e18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Patterson N, Isakov M, Booth T, Büster L, Fischer C-E, Olalde I, et al. Large-scale migration into Britain during the Middle to Late Bronze Age. Nature. 2022;601:588–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Koptekin D, Yüncü E, Rodríguez-Varela R, Altınışık NE, Psonis N, Kashuba N, et al. Spatial and temporal heterogeneity in human mobility patterns in Holocene Southwest Asia and the East Mediterranean. Curr Biol. 2023;33:41-57.e15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Loog L, Lahr MM, Kovacevic M, Manica A, Eriksson A, Thomas MG. Estimating mobility using sparse data: Application to human genetic variation. Proc Natl Acad Sci. 2017;114:12213–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Schmid C, Schiffels S. Estimating human mobility in Holocene Western Eurasia with large-scale ancient genomic data. Proc Natl Acad Sci. 2023;120: e2218375120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Racimo F, Woodbridge J, Fyfe RM, Sikora M, Sjögren K-G, Kristiansen K, et al. The spatiotemporal spread of human migrations during the European Holocene. Proc Natl Acad Sci. 2020;117:8989–9000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Cheng JY, Mailund T, Nielsen R. Fast admixture analysis and population tree estimation for SNP and NGS data. Bioinformatics. 2017;33:2148–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Brown JL, Hill DJ, Dolan AM, Carnaval AC, Haywood AM. PaleoClim, high spatial resolution paleoclimate surfaces for global land areas. Sci Data. 2018;5:180254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475:493–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Keller A, Graefen A, Ball M, Matzas M, Boisguerin V, Maixner F, et al. New insights into the Tyrolean Iceman’s origin and phenotype as inferred by whole-genome sequencing. Nat Commun. 2012;3:698. [DOI] [PubMed] [Google Scholar]
  • 99.Feldman M, Master DM, Bianco RA, Burri M, Stockhammer PW, Mittnik A, et al. Ancient DNA sheds light on the genetic origins of early Iron Age Philistines. Sci Adv. 2019;5:eaax0061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Kim J, Edge MD, Goldberg A, Rosenberg NA. Skin deep: The decoupling of genetic admixture levels from phenotypes that differed between source populations. Am J Phys Anthr. 2021;175:406–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Spek RJ van der. Multi-ethnicity and ethnic segregation in Hellenistic Babylon. In: Derks T, Roymans N, editors. Ethnic Constructs in Antiquity: The role of power and tradition. Amsterdam University Press; 2009. p. 101–16.
  • 102.Bryce T. The Kingdom of the Hittites. 2005:21–40.
  • 103.Matney T. Material Culture and Identity: Assyrians, Aramaeans, and the Indigenous Peoples of Iron Age Southeastern Anatolia. In: Agency and Identity in the Ancient Near East New Paths Forward. London-Oakville (CT): Routledge; 2010. p. 129–47.
  • 104.Parpola S. National and Ethnic Identity in the Neo-Assyrian Empire and Assyrian Identity in Post-Empire Times. Journal of Assyrian Academic Studies. 2004:5–49.
  • 105.Elsner J, Schibler J, Hofreiter M, Schlumbaum A. Burial condition is the most important factor for mtDNA PCR amplification success in Palaeolithic equid remains from the Alpine foreland. Archaeol Anthr Sci. 2015;7:505–15. [Google Scholar]
  • 106.Raffone C, Baeta M, Lambacher N, Granizo-Rodríguez E, Etxeberria F, de Pancorbo MM. Intrinsic and extrinsic factors that may influence DNA preservation in skeletal remains: A review. Forensic Sci Int. 2021;325:110859. [DOI] [PubMed] [Google Scholar]
  • 107.Parker C, Rohrlach AB, Friederich S, Nagel S, Meyer M, Krause J, et al. A systematic investigation of human DNA preservation in medieval skeletons. Sci Rep. 2020;10:18225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Benz M, Gresky J, Štefanisko D, Alarashi H, Knipper C, Purschwitz C, et al. Burying power: New insights into incipient leadership in the Late Pre-Pottery Neolithic from an outstanding burial at Baʻja, southern Jordan. PLoS ONE. 2019;14:e0221171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Sánchez-Quinto F, Malmström H, Fraser M, Girdland-Flink L, Svensson EM, Simões LG, et al. Megalithic tombs in western and northern Neolithic Europe were linked to a kindred society. Proc Natl Acad Sci. 2019;116:9469–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Mittnik A, Massy K, Knipper C, Wittenborn F, Friedrich R, Pfrengle S, et al. Kinship-based social inequality in Bronze Age Europe. Science. 2019;366:731–4. [DOI] [PubMed] [Google Scholar]
  • 111.Border R, O’Rourke S, de Candia T, Goddard ME, Visscher PM, Yengo L, et al. Assortative mating biases marker-based heritability estimators. Nat Commun. 2022;13:660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Howie BN, Donnelly P, Marchini J. A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies. PLoS Genet. 2009;5:e1000529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Browning BL, Browning SR. Genotype Imputation with Millions of Reference Samples. Am J Hum Genet. 2016;98:116–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Rubinacci S, Ribeiro DM, Hofmeister RJ, Delaneau O. Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nat Genet. 2021;53:120–6. [DOI] [PubMed] [Google Scholar]
  • 115.Neuenschwander S, Dávalos DIC, Anchieri L, da Mota BS, Bozzi D, Rubinacci S, et al. Mapache: a flexible pipeline to map ancient DNA. Bioinformatics. 2023;39:btad028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Marques AG, Rubinacci S, Malaspinas A-S, Delaneau O, da Mota BS. Assessing the impact of post-mortem damage and contamination on imputation performance in ancient DNA. Sci Rep. 2024;14:6227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.da Mota BS, Rubinacci S, Dávalos DIC, Amorim CEG, Sikora M, Johannsen NN, et al. Imputation of ancient human genomes. Nat Commun. 2023;14:3660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Tretmanis JM, Jay F, Ávila-Arcos MC, Huerta-Sanchez E. Simulation-based Benchmarking of Ancient Haplotype Inference for Detecting Population Structure. bioRxiv. 2023;09(28):560049. [Google Scholar]
  • 119.Ausmees K, Nettelblad C. Achieving improved accuracy for imputation of ancient DNA. Bioinformatics. 2022;39:btac738. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Ausmees K, Sanchez-Quinto F, Jakobsson M, Nettelblad C. An empirical evaluation of genotype imputation of ancient DNA. G3. 2022;12:jkac089. [DOI] [PMC free article] [PubMed]
  • 121.Hui R, D’Atanasio E, Cassidy LM, Scheib CL, Kivisild T. Evaluating genotype imputation pipeline for ultra-low coverage ancient genomes. Sci Rep. 2020;10:18542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Çubukcu H, Kılınç GM. Evaluation of genotype imputation using Glimpse tools on low coverage ancient DNA. Mamm Genome. 2024;35:461–73. [DOI] [PubMed] [Google Scholar]
  • 123.Pearson A, Durbin R. Local Ancestry Inference for Complex Population Histories. bioRxiv. 2023;03(06):529121. [Google Scholar]
  • 124.Irving-Pease EK, Refoyo-Martínez A, Barrie W, Ingason A, Pearson A, Fischer A, et al. The selection landscape and genetic legacy of ancient Eurasians. Nature. 2024;625:312–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Scheib CL, Hui R, D’Atanasio E, Wohns AW, Inskip SA, Rose A, et al. East Anglian early Neolithic monument burial linked to contemporary Megaliths. Ann Hum Biol. 2019;46:145–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Ringbauer H, Huang Y, Akbari A, Mallick S, Olalde I, Patterson N, et al. Accurate detection of identity-by-descent segments in human ancient DNA. Nat Genet. 2024;56:143–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Hui R, Scheib CL, D’Atanasio E, Inskip SA, Cessford C, Biagini SA, et al. Genetic history of Cambridgeshire before and after the Black Death. Sci Adv. 2024;10:eadi5903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Speidel L, Forest M, Shi S, Myers SR. A method for genome-wide genealogy estimation for thousands of samples. Nat Genet. 2019;51:1321–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Browning BL, Browning SR. Detecting identity by descent and estimating genotype error rates in sequence data. The American Journal of Human Genetics. 2013;93:840–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Wang K, Tobias B, Pany-Kucera D, Berner M, Eggers S, Gnecchi-Ruscone GA, et al. Ancient DNA reveals reproductive barrier despite shared Avar-period culture. Nature. 2025:1–8. [DOI] [PMC free article] [PubMed]
  • 131.Osmond MM, Coop G. Estimating dispersal rates and locating genetic ancestors with genome-wide genealogies. bioRxiv. 2021;07(13):452277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Deraje P, Kitchens J, Coop G, Osmond MM. Inferring the geographic history of recombinant lineages using the full ancestral recombination graph. bioRxiv. 2024;04(10):588900. [Google Scholar]
  • 133.Wang X, Skourtanioti E, Benz M, Gresky J, Ilgner J, Lucas M, et al. Isotopic and DNA analyses reveal multiscale PPNB mobility and migration across Southeastern Anatolia and the Southern Levant. Proc Natl Acad Sci. 2023;120:e2210611120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Rivollat M, Rohrlach AB, Ringbauer H, Childebayeva A, Mendisco F, Barquera R, et al. Extensive pedigrees reveal the social organization of a Neolithic community. Nature. 2023;620:600–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Bongers JL, Nakatsuka N, O’Shea C, Harper TK, Tantaleán H, Stanish C, et al. Integration of ancient DNA with transdisciplinary dataset finds strong support for Inca resettlement in the south Peruvian coast. Proc Natl Acad Sci. 2020;117:18359–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.Tian Y, Koncz I, Defant S, Giostra C, Vyas DN, Sołtysiak A, et al. The role of emerging elites in the formation and development of communities after the fall of the Roman Empire. Proc Natl Acad Sci. 2024;121:e2317868121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Petr, M.; Haller, B. C.; Ralph, P. L.; Racimo, F. slendr: a framework for spatio-temporal population genomic simulations on geographic landscapes. Peer Community Journal, Volume 3 (2023), article no. e121. 10.24072/pcjournal.354. [DOI] [PMC free article] [PubMed]
  • 138.Haller BC, Messer PW. SLiM 2: Flexible, Interactive Forward Genetic Simulations. Mol Biol Evol. 2017;34:230–40. [DOI] [PubMed] [Google Scholar]
  • 139.Haller BC, Messer PW. SLiM 3: Forward Genetic Simulations Beyond the Wright-Fisher Model. Mol Biol Evol. 2019;36:632–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Currat M, Arenas M, Quilodràn CS, Excoffier L, Ray N. SPLATCHE3: simulation of serial genetic data under spatially explicit evolutionary scenarios including long-distance dispersal. Bioinformatics. 2019;35:4480–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Currat M, Ray N, Excoffier L. splatche: a program to simulate genetic diversity taking into account environmental heterogeneity. Mol Ecol Notes. 2004;4:139–42. [Google Scholar]
  • 142.Flegontova O, Işıldak U, Yüncü E, Williams MP, Huber CD, Koci J, et al. False discovery rates of qpAdm-based screens for genetic admixture. bioRxiv. 2023;04(25):538339.
  • 143.Silva NM, Rio J, Currat M. Investigating population continuity with ancient DNA under a spatially explicit simulation framework. BMC Genet. 2017;18:114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 144.LaPolice TM, Williams MP, Huber CD. The European Neolithic Expansion: A Model Revealing Intense Assortative Mating and Restricted Cultural Transmission. bioRxiv. 2024;04(29):591653.
  • 145.Riede F, Hoggard C, Shennan S. Reconciling material cultures in archaeology with genetic data requires robust cultural evolutionary taxonomies. Palgrave Commun. 2019;5:55. [Google Scholar]
  • 146.Williams, MP. genomic-footprints. Github. 2025.https://github.com/archgen/genomic-footprints.
  • 147.Williams MP, Huber CD. The genomic footprints of migration: how ancient DNA reveals our history of mobility – supplementary code. 2025. Zenodo. 10.5281/zenodo.15574872. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

13059_2025_3664_MOESM1_ESM.pdf (801.9KB, pdf)

Additional file 1: Supplementary Note 1. Introduction to population genetic concepts Supplementary Note 2. Mathematical derivation of the f4-statistic as composed of as combinations of f2-statistics Supplementary Note 3: Implementation of qpAdm in studying ancient human genetic admixture Fig S1. Expected f-statistics under one dimensional stepping stone and hierarchical stepping stone models.

Data Availability Statement

All scripts used for simulation generation and plot creation are publicly accessible through our GitHub and Zenodo repositories [146, 147] and distributed under the Creative Commons Attribution 4.0 International licence.


Articles from Genome Biology are provided here courtesy of BMC

RESOURCES