Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Dec 26.
Published in final edited form as: Mol Ecol. 2010 Jan 11;19(3):436–446. doi: 10.1111/j.1365-294X.2009.04515.x

In defence of model-based inference in phylogeography

Mark A Beaumont 1, Rasmus Nielsen 2, Christian Robert 3, Jody Hey 4, Oscar Gaggiotti 5, Lacey Knowles 6, Arnaud Estoup 7, Mahesh Panchal 8, Jukka Corander 9, Mike Hickerson 10, Scott A Sisson 11, Nelson Fagundes 12, Lounès Chikhi 13, Peter Beerli 14, Renaud Vitalis 15, Jean-Marie Cornuet 7, John Huelsenbeck 2, Matthieu Foll 16,17, Ziheng Yang 18, Francois Rousset 19, David Balding 20, Laurent Excoffier 16,17
PMCID: PMC5743441  NIHMSID: NIHMS907520  PMID: 29284924

Abstract

Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics.

Keywords: molecular evolution, phylogeography, population genetics-empirical, population genetics-theoretical

Introduction

How is it possible to use genetic data from related populations or species to figure out their recent evolutionary history? Each data set is open to various interpretations, yet in any particular case some interpretations might be better justified than others. The challenge is to develop a genetical and evolutionary theory that is general enough to include real histories, and yet simple but detailed enough that it can be used in a statistical framework to infer details of a specific history, including (importantly) measures of uncertainty.

The idea of a genealogy, or gene-tree, to represent the history of a sample of homologous gene copies is one of biology's most successful models thanks to its generality and flexibility. However, statistical inference under the gene-tree model is difficult. For many years investigators, often using mitochondrial sequences, struggled to interpret trees generated from their data in terms of demographic processes, such as population separation or gene exchange. In the early days, this field of phylogeography relied on heuristic and descriptive analyses, and it was essentially not statistical.

The situation changed with the introduction of Nested Clade Phylogeographical Analysis (NCPA) (Templeton 1998; Templeton et al. 1995). In combining an analysis of estimated gene-tree structure with an inference key to make conclusions about the demographic causes of the shape of the gene-tree, the method served a generation of evolutionary biologists eager to make sense of their data. To address the concern that gene-tree estimates can be wrong, the method accommodates a network of connections based on which haplotypes are likely to be connected in the true genealogy (Crandall 1996; Templeton et al. 1992). To address the concern that different unlinked genes can have widely different histories, even when sampled from the same organisms, ‘cross-validation’ of multiple loci was proposed (Templeton 2002, 2004a). Notwithstanding the apparent flexibility and generality of NCPA, or its popularity, the method has been subject to a number of criticisms (Knowles & Maddison 2002; Petit & Grivet 2002; Hey & Machado 2003; Panchal & Beaumont 2007; Knowles 2008; Manolopoulou 2008), and has been vigorously defended (Templeton 2004b, 2008, 2009b).

Today, in contrast to the years when NCPA first came on the scene, there are other approaches available for developing complex demographic inferences. The origins of these methods actually predate NCPA, going back to the first likelihood-based models for demographic and phylogenetic inference (Cavalli-Sforza & Edwards 1967; Thompson 1973; Felsenstein 1981) and the development of coalescent theory (Kingman 1982; Hudson 1983; Tajima 1983). Although they vary considerably in details, these methods differ sharply from NCPA in two fundamental ways. First, they are explicitly based on demographic models that include parameters such as population size and migration rates. Second they use the genealogy as an unobserved variable that connects data to model parameters but need not be explicitly inferred (Hey & Nielsen 2007). These model-based approaches share the goal of computing a likelihood function (i.e. the probability of the data as a function of the parameters within a given model). Being likelihood-based, these methods open doors for population geneticists and phylogeographers to the repertoire of likelihood-based analyses, including maximum likelihood estimation of model parameters and likelihood-ratio hypothesis tests (e.g. Griffiths & Tavaré 1994; Kuhner et al. 1995; Beerli & Felsenstein 1999), as well as Bayesian analyses (Wilson & Balding 1998), including Approximate Bayesian Computation (ABC) (Tavare et al. 1997; Pritchard et al. 1999; Beaumont et al. 2002).

Templeton (2010), in response to Nielsen & Beaumont (2009), heavily promotes NCPA for analysing phylogeographic data, incorrectly asserting that it uses ‘a likelihood function that explicitly incorporates the randomness associated with the coalescent and mutational processes’. He also repeats many claims from Templeton (2009a) where he strongly criticizes the use of ABC methods for analysing phylogeographic data in general, and their application to discriminate between various human evolutionary scenarios in particular (Fagundes et al. 2007). He concluded that ‘because of its multiple flaws, ABC should not be used for hypothesis testing’. Yet ABC is simply a Monte Carlo method that can be used to approximate posterior distributions or likelihood surfaces from a model (see e.g. Tavare et al. 1997; Pritchard et al. 1999; Beaumont et al. 2002, for more details on ABC approaches). It is a numerical tool for solving problems within a statistical framework. Thus the majority of criticisms that Templeton (2009a, 2010) aims at ABC are also aimed more generally against model-based inference in population genetics. We feel compelled to react against this broadly unsupported attack on model-based inference, and to point out important misconceptions underlying Templeton's critique.

First, we highlight Templeton misconceptions of model-based inference, of Bayesian methods in general and of ABC in particular. Next, we underline major deficiencies of NCPA when inferring past demographic scenarios, and errors or misleading statements in Templeton's promotion of the method.

Misconceptions about model-based methods

Model specification

In population biology, as in many other scientific areas, there has been a longstanding tension between proponents and opponents of model-based inferences. The most familiar example is the debate between cladists and likelihoodists in phylogenetics. Although Templeton (2009a) claims to accept both hypothesis testing and models, including likelihood and Bayesian methods, many of his criticisms echo old arguments against the use of model-based inferences in phylogenetics. He argues that it is a flaw of ABC, and of model-based methods that they do not cover the entire ‘hypothesis space’ (Templeton 2009a, p. 320), but instead compare only a small number of potentially mis-specified and subjectively chosen models (Templeton 2010). However, for realistic problems, exhaustive coverage of all hypotheses is impossible. Moreover, the situation that ‘all hypotheses being compared are false’ (Templeton 2009a, p. 320) is in fact the norm in science, since models at best only approximate reality, as recognized in the widely cited words attributed to George Box: ‘all models are wrong, but some are useful’ (Box & Draper 1987, p. 424). As an aside, the distinction between ‘(i) testing a null hypothesis and (ii) assessing the relative fit of alternative hypotheses’ (Templeton 2009a, p. 320) is reminiscent of the 1930s debate between Fisherian and Neyman-Pearson hypothesis testing; the Neyman–Pearson approach of choosing among a limited set of competing models came to dominate statistical practice (Gigerenzer et al. 1990).

Strong vs. weak inference

Invoking Popper (1959), Templeton (2007) contends that by relying on successive dichotomous tests NCPA can make ‘strong’ phylogeographic inferences, which is not possible with model-based methods. However, ‘strong scientific inference’ (cf. Platt 1964) arises when the influence of unknown factors on the final result is minimized by randomization (Macneil 2008), which also underlies Fisher's (1925) null hypothesis testing. That is, without a properly randomized experiment, causal explanations are necessarily weak because they are potentially confounded with unobserved effects. Since they are based on observational data, phylogeographic studies are not amenable to randomized interventions and therefore all phylogeographic inference methods, including NCPA, lead to ‘weak scientific inference’ in the sense that it does not arise from planned scientific experiments. Popper was fiercely opposed to inductivism, whereby facts are gathered and then general laws identified. In this regard, rather than being a Popperian falsification method, NCPA can in fact be viewed as an anti-Popperian inductivist approach (Beaumont & Panchal 2008), since a story is built out of the patterns in the data.

Unspecified models cannot be tested

Templeton (2009a) argues that since NCPA tests null hypotheses without reference to an explicit alternative, it does not rely on a restricted set of alternative models. However, except for testing the null hypothesis of no correlation between geographic and genetic distances, we show below that NCPA's inferences about specific phylogeographic hypotheses are invalid. Moreover, since no alternative model is specified, there can be no measure of the relative support for the different hypotheses entertained by NCPA. The specification of alternative models is necessary to correctly assess the support of data for a complex demographic model. This inevitably incurs additional possibilities of model misspecification, but there are many statistical techniques for assessing the fit of a model. The use of explicit models expose their authors to critiques, but it is the price to pay for science to make progress, as other researchers may propose better models that can be tested against the data, leading to an increasing refinement of the models, and in our understanding of the demographic patterns that they reflect.

Simulations under specific models

Templeton continuously rejects the use of simulations to validate models and to infer parameters. As evidence for ‘the extreme ambiguity of inference via computer simulations’, Templeton (2010) mentions two studies on human evolution (Eswaran et al. 2005; Fagundes et al. 2007) which simulate different evolutionary scenarios using different data sets and arrive at different conclusions. Two studies leading to different conclusions of course do not invalidate the common tools that are used. As previously stated, the use of simulations in the ABC inference procedure criticized by Templeton is just a means to evaluate or approximate the likelihood function. Templeton also argues against the use of simulations for evaluating the relative merits of different inference methods, because this requires the full specification of the parameter space to be explored, and implies that choices need to be made concerning which models are used and contrasted. A related criticism by Templeton (2009b) is that the models that have been used to test NCPA are unlikely and therefore the high false-positive rate attributed to NCPA is also unlikely. However, an explicit model specification procedure, which is the rule in physics and most other sciences, involves no hidden assumption, and the impact of alternative parameterizations can be conveniently studied. Because it is transparent, it is open to criticism and the use of alternative specifications. By varying the conditions of the simulations it is possible to determine when methods fail and when they perform well. Indeed, without such objective testing, it is impossible to have any assessment of the performance of a statistical procedure. If a method consistently leads to wrong inferences under all or most conditions explored, as we later argue is the case with NCPA, it should be discarded.

Misconceptions about Bayesian methods

We recognize that there are alternative ways to perform statistical inference. This is well reflected in this paper authorship, and arises from different epistemological traditions lying deep in the history of statistics. Our aim in this section is not to argue for the relative merits of one approach over another, but simply to correct factual errors concerning Bayesian inference that are to be found in Templeton (2009a, 2010), and to present the main arguments that underpin it.

Statistical validity of ABC

Templeton (2009a) presents an extensive critique of the ABC method, which is simply a way to perform model-based inference in a Bayesian setting when model likelihoods are intractable and thus need to be approximated by simulations. For example Templeton questions ‘the statistical validity of all inferences made by the ABC method’ (p. 325) and argues that ‘the ‘posterior probabilities’ that emerge from ABC [are] mathematically impossible … to be probabilities’ (p 329). However, when the summary statistics used in ABC are statistically sufficient and parameter estimation uses only the simulations that exactly match the observed data, ABC is exact Bayesian inference (Marjoram & Tavare 2006). Thus Templeton is in effect claiming that standard Bayesian inferences are invalid, and that Bayesian posterior probabilities are mathematically incapable of being probabilities.

Comparison of alternative models

Bayesian analysis is fundamentally a decision-making approach, in which the goal is to evaluate the relative support for different models under comparison. In contrast, Fisherian testing of a point null hypothesis using P-values only rejects models that inadequately explain the data. There is a large literature on the problems that arise when taking null hypothesis testing out of its original context in the analysis of designed experiments (see e.g. Berger & Sellke 1987). Templeton's claim that in ‘ABC there is no null hypothesis, which complicates the computation of sampling error’ (2009a, p.325) is incorrect: sampling error is evaluated in each model under consideration, and is not dependent on the specification of a null hypothesis.

Priors

Templeton's criticisms that in ABC a model can be rejected because ‘the simulated parameter values are wrong’ (Templeton 2009a, p. 323), and that ‘parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators’ (Templeton 2010), are classical objections made against Bayesian approaches, which need the specification of a prior distribution for all the parameters of a model. Priors might be mis-specified and their choice may indeed carry some subjectivity, but their impact on posterior distributions, parameter inference, and model choice can be quantified (Berger 1990; Gelman et al. 1996).

Global parameter inference

Templeton's comment that NCPA ‘separate[s] out different phylogeographical components is a great advantage over ABC’ (Templeton 2009a, p. 324) ignores the fact that testing subsets of the data separately precludes any assessment of uncertainty in the overall conclusions. The fact that a method, like ABC, permits this assessment is a clear advantage over NCPA. A sound statistical approach should work with all data and parameters at once, and thus incorporate dependencies among the parameters and avoid multiple uses of the data. In particular, unlike NCPA, Bayesian methods avoid the problem of using an estimate as if it were the true value. Uncertainty in parameter values is explicitly modelled, at odds with NCPA, where for instance very little or no uncertainty in the topology of the gene-tree is assumed for the analysis.

Sampling error

Templeton's argument that simulated statistics and observed statistics cannot be compared because the observed statistic (s) is ‘current generation’ while a simulated statistic (s) is ‘long-term’ (see fig. 2 in Templeton 2009a) is wrong. The error in the argument can be made explicit by replacing ‘statistics’ with ‘data’. The aim of model-based methods is to examine the relative probability of obtaining the data for different combinations of parameter values. It is acknowledged that the observations are influenced by both sampling error and evolutionary stochasticity in the model, and this is explicitly accounted for by ABC which simulates data sets with sample sizes and number of loci matching exactly those observed. As mentioned before, ABC is then simply a way of using simulations to make inferences.

Simulation weighting

Templeton's claim of an artefactual increase in statistical power by computing a distance between observed and simulated summary statistics, ‖s - s′‖, is incorrect. In ABC, ‖s - s′‖ is not ‘a generalized goodness of fit statistic’ (Templeton 2009a, p. 328), but is used to determine if a simulation is retained for parameter estimation. For retained simulations, ‖s - s′‖ is also used as a weight allocated to the simulated parameter values in approximating the posterior distribution. Note that the ABC method is exact when simulations are retained if ‖s - s′‖ = 0 and s is sufficient, since the fraction of retained simulations provide a direct estimate of the likelihood. If the retention interval increases then, typically, the posterior distributions become wider, and the posterior tends to the prior with increasing retention intervals. Thus the ABC approach is inherently conservative. How the approximated density converges to the true distribution (conditional on the summary statistics) as ‖s - s′‖ tends to zero is an area of active research (e.g. Ratmann et al. 2009; Blum & François 2010).

Posterior densities and Bayesian model choice

The section in Templeton (2009a, p. 326–327) that discusses full distributions and local probabilities contains a number of erroneous statements, as explained below. Templeton's Figure 3 is used to suggest that conditioning inferences on observed statistics may lead to wrong decisions in Bayesian model choice. The interpretation of the figure is actually problematic in itself. The graph plots the posterior density against the value of a summary statistic. Bayesian inference typically aims to compute the posterior distribution of parameter values, not statistics. Conceivably what is meant is the posterior predictive distribution of the values of a summary statistic, conditional on the observed summary statistic. The posterior predictive distribution is typically used in Bayesian model checking (Gelman et al. 1996). Central to Templeton's argument are (i) the assumption that observed statistics may often lie in the tails of this distribution, and (ii) that ABC (and by extension, Bayesian) model choice procedures are based on an examination of this distribution around the observed statistics, while the center of mass of the distribution can be further away from the observed statistics, and thus lead to wrong inferences. These premises are incorrect, because, if the model fits well, the observed summary statistic does not necessarily lie within the tails of the posterior predictive distribution. Furthermore, as discussed in more detail below, Bayesian model choice is not based on the posterior predictive distribution at all, as implied in the discussion in Templeton (2009a, p. 326–327). An alternative interpretation of Templeton's Figure 3 is that it is, in fact, the prior predictive distribution—that is the distribution of summary statistics under the model when the parameters are drawn from the prior. With this interpretation, the prior predictive distribution at the observed summary statistic is also the marginal likelihood. In the context of ABC, ratios of marginal likelihoods (Bayes factors) can be approximated as the ratio of the number of simulations made under alternative models that are arbitrarily close to the observed data. Within the Bayesian framework this procedure is correct and is not based on a notion of ‘local probability’, and Templeton's criticisms of a specific deficiency in ABC are therefore also unfounded. Templeton further argues against the use of ABC (and hence Bayesian) methods for model comparison because they cannot take dimensionality into account, and he implies that they will always choose over-determined models. Indeed he appears to criticize ABC approaches for not using the correction of Schwarz (1978) in his Bayesian information criterion (BIC). However, from a Bayesian perspective there is no need to correct for dimension, nor to call for Schwarz (1978), since the marginal likelihood naturally allows for differences in model dimensionality (see e.g. MacKay 2002, chapter 28.1 about Occam's razor). In fact, the penalty in Schwarz's (1978) BIC stems from a Taylor expansion of a standard Bayes Factor (see also Schervish 1995), which illustrates the automatic penalty for dimension and over-parameterization when using Bayes factors.

Sample size

In the section on ‘Sample size’, Templeton (2009a, p. 327) claims that ‘ABC has severe constraints on sample size’. This is a misleading statement. Indeed one of the main motivations behind the approach is that it can potentially deal with larger data sets than can currently be handled with other model-based procedures. There are constraints set by computation time for very large data sets, but with efficient simulation methods implemented on computer clusters sample size is not a major limitation of the approach for most practical applications. Further, Temple-ton argues that the samples sizes (8–12 individuals per continent) used in Fagundes et al. (2007) are too small to lead to reliable estimates, arguing that such size do not meet NCPA requirements. However, as noted above, the ABC framework, by simulating exactly the observed sample sizes, handles any sample sizes correctly. Small sample sizes simply lead to wider credible intervals than large sample sizes. ABC methods are not markedly constrained by the use of multiple loci, and, as is to be expected, the precision of estimates tends to increase when summary statistics are based on many loci (e.g. Excoffier et al. 2005).

Recent developments in ABC methods

In order to put the comments of Templeton (2009a) in context it is perhaps helpful to provide a brief overview of the current status of ABC, which is now quite widely used in statistical inference. For example, it has been applied to infectious disease epidemiology (Tanaka et al. 2006; Luciani et al. 2009; McKinley et al. 2009) and systems biology (Ratmann et al. 2009; Toni et al. 2009). Whereas several studies have now shown that parameter posterior distributions inferred by ABC are very similar to those provided by full-likelihood approaches (see e.g. Marjoram et al. 2003; Bortot et al. 2007; Beaumont et al. 2009; Leuenberger & Wegmann 2010), the approach is still in its infancy and continues to evolve, and to be improved. For instance, Marjoram et al. (2003) developed a Markov chain Monte Carlo (MCMC) ABC approach, improving the sampling efficiency of conventional ABC, which must otherwise explore sometimes very wide priors while posterior distributions may only occupy a narrow region of parameter space. This MCMC-ABC has some problems (Sisson et al. 2007), which are addressed in variants of the original approach (see e.g. Becquet et al. 2007; Bortot et al. 2007; Wegmann et al. 2009). Recently, sequential Monte Carlo (SMC) techniques have been adapted to ABC in order to further improve its efficiency (see e.g. Sisson et al. 2007; Beaumont et al. 2009; Del Moral et al. 2009). As noted by Beaumont et al. (2002) efficient conditional density estimation is a key aspect of ABC, and this has been developed further in Blum & Francois (2009). Further related developments involve the choice of statistics to summarize datasets (Joyce & Marjoram 2008; Sousa et al. 2009) and how they can be combined (Hamilton et al. 2005; Wegmann et al. 2009). A number of software packages now allow an easy implementation of ABC models, such as DIY-ABC (Cornuet et al. 2008) or popABC (Lopes et al. 2009), which can accommodate a wide range of evolutionary models, and be used for both model choice and parameter estimation.

Theoretical and statistical problems in NCPA

NCPA inferences are typically not tested by users

Templeton (2009a,b) claims that NCPA is embedded into a strong statistical framework, as it is based on the rejection of null models and hypothesis testing based on likelihood ratios contrasting NCPA inferences. It is interesting to examine what aspects of the NCPA procedure actually involve hypothesis testing and the rejection of null models. In the hundreds of published empirical studies based on this method, the only statistical procedure of NCPA is a simple permutation test of the null hypothesis of no association between clades and geographic location (see e.g. Knowles 2008; Petit 2008). However, the processes inferred by NCPA have never been tested as null models to see if they can actually give rise to data sets similar to those observed. Therefore NCPA inferences are typically presented without further attempt at model checking or validation. There is thus no measure of confidence that can be assigned to the inferences being made, nor any indication of support in the data for alternative processes. Moreover, almost all published NCPA inferences are based on the analysis of a single locus and NCPA internal cross-validation is not used.

Lack of NCPA expectations under different scenarios

When (i) there is a lack of strong prior knowledge of the universe of biological possibilities, or (ii) because of the possibility of multiple processes leading to the same output, it has been claimed that the ‘broader coverage’ of processes makes NCPA the method of choice (Templeton 2004b). However, as emphasized above, because the interpretation of the patterns of genetic variation is not associated with a defined model, there is no basis for evaluation of the inferences made with the dichotomous inference key of NCPA. In other words there is no explicit description of the patterns of variation in NCPA outcome expected under one historical scenario relative to another. There is no study verifying that the interpretations of the distance statistics used in NCPA (i.e. DC and DN values) actually correspond to what is expected under the processes NCPA claims to be able to distinguish. This does not mean that model-based inference is not without its challenges, especially with regards to issues surrounding model choice (as reviewed in Hey & Machado 2003; Knowles 2004, 2009; Nielsen & Beaumont 2009), but these difficulties should not be used as a justification for resorting to a method with undefined statistical properties (Knowles 2008). Any sound statistical method needs to provide an assessment of its error or uncertainty. Even if NCPA was not flawed in the many other ways described in this paper, the inference of phylogeographic processes based on pure verbal logic with no alternate models and no statistical support should be enough to relegate it to be regarded as an exploratory tool at best.

Cross-validation using multi-locus data lacks rigor

The suggestion that the new multilocus NCPA somehow overcomes these problems is likewise indefensible, and the statistical test on which it relies is flawed (see details below). Additionally, the claim that when NCPA analyses of two or more loci lead to the same inference, this constitutes a rigorous ‘cross-validation’, is not based on any statistical concept of validation. Any concordance in observed patterns across two loci depends on the evolutionary variance of the process itself, which is not evaluated in NCPA, and which may vary extensively among different evolutionary processes. For instance, patterns of molecular diversity after a range expansion can be highly correlated among unlinked loci, and the observation of similar patterns at two loci is expected (e.g. Di Rienzo et al. 1998), whereas a population bottleneck often induces a much larger evolutionary variance across loci (e.g. Bonneuil 1998; Teshima et al. 2006). Thus, the probability for a given number of loci to show congruent patterns can only be evaluated under a given evolutionary model. The fact that the number of false inferences drops with additional loci is expected, but there is no control over the resulting type II error.

NCPA inference key has still not been properly tested

The NCPA procedure consists of four main tasks: (i) the construction of cladograms; (ii) the computation of summary statistics based on geographic patterns associated with these cladograms; (iii) permutation tests to assess their statistical significance; (iv) biological interpretation of the ‘significant’ summary statistics. Task (iv) is carried out via an ‘inference key’, which is consulted each time a statistically significant summary statistic is identified. The concomitant problem of multiple testing has been previously highlighted (Knowles & Maddison 2002; Panchal & Beaumont 2007) and acknowledged by Templeton (2008, 2009b). The inference key was originally provided in Templeton et al. (1995), and leads to a conclusion either that there are insufficient data to make an inference, or that some specified demographic event has occurred in the history of the population. Examples and discussion of the high rate of false positives generated by use of the inference key are given in Nielsen and Beaumont (2009) and in Panchal and Beaumont (2007), as well as in a later section of this article (see Table 1). An important point to note, however, is that the procedure is superficially similar in scope to the decision tree, or classification tree, used in machine learning and statistics (Breiman et al. 1984). The aim of the classification tree is to model a categorical dependent variable (the classification) as a function of independent variables. A sine qua non of such a procedure is that it must be validated on a training set to measure classification error and compare its performance against different algorithms, before it is applied to real classification problems. There is no evidence that the rules encapsulated in the key of Templeton et al. (1995), including its later revisions (Templeton 2004b) have been generated through a training set, as required for a valid statistical procedure. It would appear that the rules are based solely on reasoned opinions (Templeton et al., 1995). A post hoc justification of this inference tree, which appears to uphold the purely verbal reasoning by which it was originally constructed, has been made through analysis of empirical data sets, but the demographic history in these empirical data sets is not known for certain. In the following section, further grounds for doubt about the validity of these conclusions are raised.

Table 1.

NCPA false positive rates of inference. Results are broken down by FST, lattice size, and model. The false positive rate, as defined in Panchal & Beaumont (2010), is shown for both single and multiple loci

Model FST Lattice size

3 × 3 7 × 7 10 × 10



Single locus Multiple loci Single locus Multiple loci Single locus Multiple loci
Panmictic 0 0.665 0.088 0.788 0.204 0.783 0.217
Stepping Stone 0.03 0.753 0.300 0.747 0.150 0.781 0.129
0.05 0.791 0.333 0.834 0.175 0.851 0.200
0.10 0.879 0.386 0.903 0.300 0.903 0.358
0.20 0.895 0.429 0.942 0.396 0.969 0.479
Lattice with Cauchy Disperal Kernel 0.03 0.723 0.188 0.618 0.000 0.576 0.000
0.05 0.783 0.179 0.654 0.000 0.649 0.000
0.10 0.818 0.125 0.753 0.000 0.694 0.000
0.20 0.888 0.042 0.831 0.000 0.802 0.000
Island Model 0.03 0.914 0.613 0.883 0.358 0.823 0.291
0.05 0.950 0.704 0.886 0.467 0.878 0.321
0.10 0.975 0.717 0.949 0.567 0.904 0.454
0.20 0.980 0.708 0.982 0.667 0.941 0.617

The proportions shown here are (for single loci) each based on 1200 data sets. In the case of multiple loci, the single locus simulations were grouped into sets of five, and hence the proportions are based on 240 data sets.

Positive controls do not rigorously test the validity of NCPA

The repeated claim that the inferences from NCPA have been ‘extensively validated’ refers to two studies in which, respectively, 13 and 150 empirical data sets with ‘strong a priori expectations’ were analysed (Templeton 1998, 2004b). Vigorous defence of this approach as a rigorous test of NCPA performance (and hence, its validation) has been made (e.g. Templeton 2009b), including claims that any former criticisms are ‘outdated’ or based on ‘factual errors’. However we emphasize that NCPA has never been successfully verified by researchers independent of its author.

Evaluations of NCPA based on simulated data (Knowles & Maddison 2002; Panchal & Beaumont 2007) and empirical data (Templeton 1998, 2004b) consistently inferred multiple processes other than those expected (in case of the empirical datasets) or other than the actual processes (in case of the simulated data with known history). However, as mentioned above, Templeton has never conducted any validating simulation study. When applied to empirical data he has even suggested that these additional inferences may not be false positives, but rather unexpected discoveries. When these ‘unexpected discoveries’ were found by other authors in simulated datasets, they were of course classified as false positives (Knowles & Maddison 2002; Panchal & Beaumont 2007), but again, not by Templeton (2009a,b), who strongly argues that the simulated data and⁄or their interpretation must be flawed in one way or the other. It is also worth noting that while Templeton's ‘extensive validation’ relies almost exclusively on ‘positive controls’ based on single-locus studies, he charges that any critique of NCPA that is applied to single-locus data is outdated and unfair, given the more recent multilocus NCPA (Templeton 2009a,b). It should not be ignored that in doing so he is implicitly suggesting that all preceding papers that have used NCPA may have led to wrong inferences.

The ages of inferred events are crudely approximated by gene tree coalescent times

An important outcome of NCPA analysis is the dating of inferred events. Estimated dates are subsequently used (i) to build complex evolutionary scenarios from NCPA (see e.g. Templeton 2002) (ii) to treat estimated dates as if they were the observed ages of inferred events in likelihood-ratio tests (Templeton 2004a), and (iii) to invalidate conflicting results obtained by other authors on other data sets (Templeton 2009a, 2010). It is therefore important to understand the estimation method and its foundations. Templeton (2004a) proposes to estimate the age of a given event inferred by NCPA as the ‘age of the youngest monophyletic clade that contributed in a statistically significant fashion to the inference’. The rationale is that ‘the age of the youngest clade marking an event or process is expected to be largely coincident with the age of the event itself in most cases’ (Templeton 2002), but several authors have underlined the dangers of dating population events from coalescent times on gene trees (see e.g. Pamilo & Nei 1988; Nichols 2001; Degnan & Rosenberg 2009). Therefore, the events whose ages are estimated in NCPA are at best, genealogical events, and not demographic events as claimed. That is not to say that temporal and spatial inferences of genealogical events may not be informative, but by themselves they cannot directly lead to statements about demography.

Coalescent theory is not applied correctly

Templeton (2004a) estimates the time since the most recent common ancestor (TMRCA) of a given clade by applying results of Tajima (1983) on the expected coalescent time of a pair of genes (noted hereafter T2) conditional on the number of sites at which they differ (say π). There are serious problems attached to this estimation in the NCPA context. First, T2 is not equal to the TMRCA of a sample of n genes (noted here Tn). In a stationary panmictic population, Tn is roughly twice as large as T2, but the relation between T2 and Tn is different for more complex evolutionary scenarios. Second, since Templeton ignores sample sizes and only concentrates on the number of different DNA sequences in a given clade (say k), he is using Tajima's theory as if it could be applied to estimate the average TMRCA, Tk, among k haplotypes given their average number of pairwise differences π̅k, while Tajima's theory can only be used to estimate 2 as the average T2 over all n(n – 1)/2 pairs of genes in the clade. Third, Tajima's derivations are only strictly valid under a specific evolutionary model, which is that of a panmictic population of constant size, while Templeton applies this theory to haplotypes found in a clade that shows some support for demographic events that depart from stationarity (e.g. short or long range migrations in a subdivided population, population spatial expansion, or vicariance events). Fourth, as noted by Rannala & Bertorelle (2001) subclades within a genealogy do not follow the standard coalescent, but are conditional on the other parts of the genealogy and not independent, contrary to the assumption of Templeton's method. Thus, NCPA age inferences are not model-free, but are in fact based on a simple evolutionary model (isolated, random-mating and constant-size population) that is used precisely to establish that a different model applies! This weakness seems to have previously been overlooked, and suggests that evolutionary scenarios inferred by NCPA are not only based on unreliably-inferred demographic events, but also on a wrong timing of these events.

Likelihood ratio tests are not based on valid likelihoods

Multi-locus hypothesis testing in NCPA is based on the age distribution of inferred events, and basically evaluates the probability of a given number of loci showing NCPA-inferred events within a given time period. We now reexamine the theoretical foundations of this approach.

Templeton (2004a) proposed to take into account the stochasticity of the coalescent process by (incorrectly) assuming that Tk has a Gamma distribution with the same mean and variance as T2 as derived by Tajima (1983). He obtained its distribution conditional on its mean (Tk) and on πk (defined above) as

f(Tk|T^kπ¯k)=T^kπ¯keTk(1+π¯k)/T^k(T^k1+π¯k)1+π¯kΓ(1+π¯k). (1)

Note that k is an estimate but is used here as if it were known without error, but that is a minor point compared to the use of this theory in an evolutionary context where it does not apply. Templeton (2004a) then uses eqn (1) to infer the probability that a given NCPA-inferred event E occurs before a given time T as

Pr(TET)=t=0Tf(t|T^k,π¯k)dt. (2)

However, Pr(TET) is at best the probability that the TMRCA occurred before time T in a panmictic and stationary population. The use of eqn (2) as the probability of a given demographic event within a given time interval thus goes beyond the already doubtful assumption that the TMRCA of a clade can be used to date an inferred event. Indeed, it further assumes that the timings of these events are distributed as if they were coalescent times, which is a very strong assumption. This assumption is invalid because phenomena like vicariance events or episodes of intercontinental gene flow (or any other NCPA-inferred event) will alter the distribution of coalescent time between two DNA sequences, which will therefore not follow eqn (1). Despite these problems, Templeton (2009a) used eqn (1) further to estimate the probability of no gene flow between two continents between times Tl and Tu where an episode of gene flow has been dated by NCPA at the i-th locus at ki as

Pr(TlTETu)=ti=TlTuf(ti|T^ki,π¯ki)dti. (3)

However, this equation merely describes the probability that two genes drawn from a stationary panmictic population and differing at π̅ki sites do not coalesce between Tl and Tu, given their expected coalescence time of ki, and it has nothing to do with the probability of an absence of gene flow between continents. It follows that such an equation cannot be used in likelihood ratio tests as proposed for NCPA, and that these likelihood ratio tests are not testing phylogeographic hypotheses. Moreover, these likelihoods cannot be simply fixed as their inapplicability does not stem from mathematical errors, but from a misinterpretation of what they are supposed to describe. Therefore, Templeton's assertions that NCPA ‘multilocus tests are based on explicit probability distributions and likelihood ratios’ (Templeton 2009a, p. 322), or that NCPA uses ‘a likelihood function that explicitly incorporates the randomness associated with the coalescent and mutational processes’ (Templeton 2010) are wrong.

Problems with multi-locus NCPA

In a recent article, Panchal & Beaumont (2010) have evaluated the merit of the multi-locus method promoted by Templeton, using an automated program (ANeCA-ML). They have simulated multi-locus test data sets under a variety of conditions and analysed them under NCPA following closely the descriptions in Templeton (2002, 2004a,b). Four demographic scenarios are considered: panmixia, as described in Panchal & Beaumont (2007); an island model; a strict 4-neighbor stepping stone model; a lattice model with a Cauchy dispersal kernel allowing for long-distance dispersal. All the demes are laid out in a 2-D lattice (of sides 3, 7 and 10 demes) to provide geographical coordinates for NCPA. The data consist of sets of 5 loci, each of 500 bp, evolving under a Kimura 2-parameter model.

The multi-locus analysis reveals a number of problems in addition to those described above for single-locus NCPA:

  1. Inferences are identified as cross-validated in NCPA when two different clades provide the same inference for the same geographical region, but the necessary degree of overlap of geographical area covered by clades is unspecified and thus arbitrary. For example in Templeton (2002), because of large variation in sample size among loci, inferences were deemed cross-validated if the two loci agreed on the same continent.

  2. Hypotheses of concordance of temporal events can be rejected for a group of clades, but then subsets of these may be found in which the hypothesis is not rejected. Each of these subsets can then be deemed to support the hypothesis of a particular event. For example, in Templeton (2002, fig. 3) the hypothesis that all 5 loci support the same temporal event is rejected. But then the loci are grouped (apparently by eye from their inferred event times) into a set of two loci (mtDNA and Y-chromosome), and a set of three (autosomal) loci, leading to a claim that forms the basis of the entire paper that there were two colonization events out of Africa. As an aside, no account seems to be taken that these differences are to be expected from the different mode of inheritance of these loci, and therefore the claim may be baseless.

  3. There is no relative weight attached to the various inferences that result from NCPA. The outcome is all-or-nothing with no measure of uncertainty, whereas a model-comparison procedure (either Bayesian or frequentist) would allow for the possibility that the data supported an island model but also a lattice model, or it supported a stepping-stone model and a lattice model.

  4. The reduction in false-positive rate arising from the use of multiple loci is very patchy, and depends on the (generally unknown) true scenario. This contrasts with the case of model-based inference in which the false positive rate is generally well controlled and the whole motivation for using more loci is to increase power (Rannala & Yang 2003).

Table 1 summarizes results found in Panchal and Beaumont (2010). It can be seen that with more loci the false-positive rate is indeed reduced, but due to the very specific nature of the inferences yielded by NCPA, it is highly variable across simulated scenarios. For example under the stepping stone model only Restricted Gene Flow (RGF) with isolation by distance is regarded as a true positive, and any inference including Long Distance Dispersal (LDD) is regarded as a false-positive. Under the lattice model with LDD, a much larger range of inferences are allowed that include RGF with isolation by distance, and RGF with LDD. In the island model all inferences of RGF are regarded as true positives as long as they do not include isolation by distance. A direct consequence of this is that the false positive rate for the island model remains very high (54%) whereas that for the lattice model with LDD is less than 5%. In the latter case a much wider range of inferences were deemed consistent with the scenario, whereas with the island model any inference with isolation-by-distance was deemed a false positive. The rates decrease with increasing lattice size, and increase with increasing level of population structure. The rates for single loci are typically always quite high. In conclusion, the use of multiple loci tends to reduce the false-positive rate in NCPA. However when there is population structure, it does not lead to improved discrimination among its possible causes because in this case the most frequent inference is restricted gene flow with isolation by distance, irrespective of whether the data comes from an island model or a stepping stone model.

Conclusions

Gleaning useful information about evolutionary processes from population genetic data is hard, and requires appreciation of the mathematical and conceptual underpinnings of population genetics theory. Such requirements are taken for granted by experimentalists in the physical sciences, while in evolutionary biology there remains a tendency to treat statistical procedures uncritically as ‘black boxes’, and to accept apparently easy solutions, especially those that fit with common-sense nostrums. We argue here that the need for rational, quantitative assessment of population genetics models and estimates is unavoidable.

In this article we have demonstrated that the majority of criticisms by Templeton (2009a, 2010) of ABC are in fact directed at model-based inference more generally, and are unfounded. Other criticisms arise from profound misconceptions of the ABC procedure itself, and are easily rebutted. Templeton promotes the use of NCPA, and we demonstrate that, despite its past popularity among empiricists, there are many problems associated with the method: there is no justification for the adoption of specific alternative hypotheses following the rejection of a simple null hypotheses by a permutation test; there is no measure of confidence in its support for hypotheses or estimates; the inference key of NCPA has not been properly validated, including error rate estimates; the ages of inferred events are estimated from a simple evolutionary model (the standard coalescent) in precisely those situations when it does not apply; the likelihood ratio tests are not based on valid likelihoods. As a result, it maintains a highly inflated false positive rate, even when applied to multi-locus data.

Current model-based statistical methodology does not match in scope the breadth of inference claimed by NCPA, but the latter's claims are not based on real, external validation. ABC has limitations, but like full-likelihood methods, it is based on explicit models, uses all the data simultaneously in inference, and allows an assessment of uncertainty in all inferences. Geographic and genetic information are intimately linked (Novembre et al. 2008), and the use of geographic information can certainly bring additional insights on past evolutionary processes such as environmental adaptations, range expansions and migrations. While most inferential approaches integrating geography only use information on allele frequencies (e.g. Guillot et al. 2005; Novembre et al. 2005; Francois et al. 2006; Corander et al. 2008), coalescent-based approaches seem in an ideal position to enable us to integrate molecular information into phylogeographic inferences (see e.g. Manolopoulou 2008; Itan et al. 2009;). Ongoing advances in computation and methodology will undoubtedly yield increasing flexibility in the range of evolutionary and historical scenarios that can be considered, ensuring a major role for model-based approaches in reconstructing realistic demographic and evolutionary scenarios from the spatial distribution of genetic data. It should enable us to have a better appreciation of the complex and subtle relationships between demographic history, natural selection, and genomic diversity.

Acknowledgments

We thank the editor and three anonymous reviewers for their helpful comments on an earlier version.

Footnotes

The authors are all involved in the development of model-based inference methodologies in population genetics, phylogenetics, phylogeography, or statistics.

References

  1. Beaumont MA, Panchal M. On the validity of nested clade phylogeographical analysis. Molecular Ecology. 2008;17:2563–2565. doi: 10.1111/j.1365-294X.2008.03786.x. [DOI] [PubMed] [Google Scholar]
  2. Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian computation in population genetics. Genetics. 2002;162:2025–2035. doi: 10.1093/genetics/162.4.2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Beaumont MA, Cornuet JM, Marin JM, Robert CP. Adaptivity for ABC algorithms: the ABC-PMC scheme. Biometrika. 2009;96:983–990. [Google Scholar]
  4. Becquet C, Patterson N, Stone AC, Przeworski M, Reich D. Genetic structure of chimpanzee populations. PLoS Genet. 2007;3:e66. doi: 10.1371/journal.pgen.0030066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Beerli P, Felsenstein J. Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics. 1999;152:763–773. doi: 10.1093/genetics/152.2.763. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Berger JO. Robust bayesian-analysis—sensitivity to the prior. Journal of Statistical Planning and Inference. 1990;25:303–328. [Google Scholar]
  7. Berger JO, Sellke T. Testing a point null hypothesis—the irreconcilability of P-values and evidence. Journal of the American Statistical Association. 1987;82:112–122. [Google Scholar]
  8. Blum MGB, François O. Non-linear regression models for Approximate Bayesian Computation. Statistics and Computing. 2010 in press. [Google Scholar]
  9. Bonneuil N. Population paths implied by the mean number of pairwise nucleotide differences among mitochondrial DNA sequences. Annals of Human Genetics. 1998;62:61–73. doi: 10.1046/j.1469-1809.1998.6210061.x. [DOI] [PubMed] [Google Scholar]
  10. Bortot P, Coles SG, Sisson SA. Inference for stereological extremes. Journal of the American Statistical Association. 2007;102:84–92. [Google Scholar]
  11. Box GEP, Draper NR. Empirical Model-Building and Response Surfaces. John Wiley & Sons; Oxford, UK: 1987. [Google Scholar]
  12. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Chapman and Hall/CRC Press; New York: 1984. [Google Scholar]
  13. Cavalli-Sforza LL, Edwards AWF. Phylogenetic analysis: models and estimation procedures. Evolution. 1967;32:550–570. doi: 10.1111/j.1558-5646.1967.tb03411.x. [DOI] [PubMed] [Google Scholar]
  14. Corander J, Marttinen P, Siren J, Tang J. Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinformatics. 2008;9:539. doi: 10.1186/1471-2105-9-539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Cornuet JM, Santos F, Beaumont MA, et al. Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics. 2008;24:2713–2719. doi: 10.1093/bioinformatics/btn514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Crandall KA. Multiple interspecies transmissions of human and simian T-cell leukemia/lymphoma virus type I sequences. Molecular Biology and Evolution. 1996;13:115–131. doi: 10.1093/oxfordjournals.molbev.a025550. [DOI] [PubMed] [Google Scholar]
  17. Degnan JH, Rosenberg NA. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends in Ecology and Evolution. 2009;24:332–340. doi: 10.1016/j.tree.2009.01.009. [DOI] [PubMed] [Google Scholar]
  18. Del Moral P, Doucet A, Jasra A. An adaptive sequential Monte Carlo method for Approximate Bayesian Computation. Annals of Applied Statistics. 2009 http://www.cs.ubc.ca/∼arnaud/del-moral_doucet_jasra_smcabc.pdf.
  19. Di Rienzo A, Donnelly P, Toomajian C, et al. Heterogeneity of microsatellite mutations within and between loci, and implications for human demographic histories. Genetics. 1998;148:1269–1284. doi: 10.1093/genetics/148.3.1269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Eswaran V, Harpending H, Rogers AR. Genomics refutes an exclusively African origin of humans. Journal of Human Evolution. 2005;49:1–18. doi: 10.1016/j.jhevol.2005.02.006. [DOI] [PubMed] [Google Scholar]
  21. Excoffier L, Estoup A, Cornuet JM. Bayesian analysis of an admixture model with mutations and arbitrarily linked markers. Genetics. 2005;169:1727–1738. doi: 10.1534/genetics.104.036236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Fagundes NJ, Ray N, Beaumont M, et al. Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America. 2007;104:17614–17619. doi: 10.1073/pnas.0708280104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Felsenstein J. Evolutionary trees from DNA-sequences—a maximum-likelihood approach. Journal of Molecular Evolution. 1981;17:368–376. doi: 10.1007/BF01734359. [DOI] [PubMed] [Google Scholar]
  24. Fisher RA. Statistical Methods for Research Workers. Oliver and Boyd; London: 1925. [Google Scholar]
  25. Francois O, Ancelet S, Guillot G. Bayesian clustering using hidden Markov random fields in spatial population genetics. Genetics. 2006;174:805–816. doi: 10.1534/genetics.106.059923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Gelman A, Meng XL, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica. 1996;6:733–760. [Google Scholar]
  27. Gigerenzer G, Swijtink Z, Porter T, et al. The Empire of Chance: How Probability changed Science and Everyday Life. Cambridge University Press; Cambridge: 1990. [Google Scholar]
  28. Griffiths RC, Tavaré S. Ancestral inference in population genetics. Statistical Science. 1994;9:307–319. [Google Scholar]
  29. Guillot G, Estoup A, Mortier F, Cosson JF. A spatial statistical model for landscape genetics. Genetics. 2005;170:1261–1280. doi: 10.1534/genetics.104.033803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Hamilton G, Currat M, Ray N, et al. Bayesian estimation of recent migration rates after a spatial expansion. Genetics. 2005;170:409–417. doi: 10.1534/genetics.104.034199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Hey J, Machado CA. The study of structured populations—new hope for a difficult and divided science. Nature Reviews Genetics. 2003;4:535–543. doi: 10.1038/nrg1112. [DOI] [PubMed] [Google Scholar]
  32. Hey J, Nielsen R. Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proceedings of the National Academy of Sciences of the United States of America. 2007;104:2785–2790. doi: 10.1073/pnas.0611164104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Hudson RR. Testing the constant rate neutral allele model with protein sequence data. Evolution. 1983;37:203–217. doi: 10.1111/j.1558-5646.1983.tb05528.x. [DOI] [PubMed] [Google Scholar]
  34. Itan Y, Powell A, Beaumont MA, Burger J, Thomas MG. The origins of lactase persistence in Europe. PLoS Computational Biology. 2009;5:e1000491. doi: 10.1371/journal.pcbi.1000491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Joyce P, Marjoram P. Approximately sufficient statistics and bayesian computation. Statistical Applications in Genetics and Molecular Biology. 2008;7 doi: 10.2202/1544-6115.1389. Article26. [DOI] [PubMed] [Google Scholar]
  36. Kingman JFC. The coalescent. Stochastic Processes and Their Applications. 1982;13:235–248. [Google Scholar]
  37. Knowles LL. The burgeoning field of statistical phylogeography. Journal of Evolutionary Biology. 2004;17:1–10. doi: 10.1046/j.1420-9101.2003.00644.x. [DOI] [PubMed] [Google Scholar]
  38. Knowles LL. Why does a method that fails continue to be used? Evolution. 2008;62:2713–2717. doi: 10.1111/j.1558-5646.2008.00481.x. [DOI] [PubMed] [Google Scholar]
  39. Knowles LL. Statistical phylogeography. Annual Review of Ecology, Evolution, and Systematics. 2009;40:593–612. [Google Scholar]
  40. Knowles LL, Maddison WP. Statistical phylogeography. Molecular Ecology. 2002;11:2623–2635. doi: 10.1046/j.1365-294x.2002.01637.x. [DOI] [PubMed] [Google Scholar]
  41. Kuhner MK, Yamato J, Felsenstein J. Estimating effective population size and mutation rate from sequence data using Metropolis–Hastings sampling. Genetics. 1995;140:1421–1430. doi: 10.1093/genetics/140.4.1421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Leuenberger C, Wegmann D. Bayesian computation and model selection without likelihoods. Genetics. 2010 doi: 10.1534/genetics.109.109058. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Lopes JS, Balding D, Beaumont MA. PopABC: a program to infer historical demographic parameters. Bioinformatics. 2009;25:2747–2749. doi: 10.1093/bioinformatics/btp487. [DOI] [PubMed] [Google Scholar]
  44. Luciani F, Sisson SA, Jiang H, Francis AR, Tanaka MM. The epidemiological fitness cost of drug resistance in Mycobacterium tuberculosis. Proceedings of the National Academy of Sciences of the United States of America. 2009;106:14711–14715. doi: 10.1073/pnas.0902437106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. MacKay DJC. Information Theory, Inference and Learning Algorithms. Cambridge University Press; Cambridge, UK: 2002. [Google Scholar]
  46. Macneil MA. Making empirical progress in observational ecology. Environmental Conservation. 2008;35:193–196. [Google Scholar]
  47. Manolopoulou I. A Bayesian approach to Nested Clade Analysis. University of Cambridge; Cambridge: 2008. [Google Scholar]
  48. Marjoram P, Tavare S. Modern computational approaches for analysing molecular genetic variation data. Nature Reviews Genetics. 2006;7:759–770. doi: 10.1038/nrg1961. [DOI] [PubMed] [Google Scholar]
  49. Marjoram P, Molitor J, Plagnol V, Tavare S. Markov chain Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences of the United States of America. 2003;100:15324–15328. doi: 10.1073/pnas.0306899100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. McKinley T, Cook AR, Deardon R. Inference in epidemic models without likelihoods. The International Journal of Biostatistics. 2009;5 Article 24. [Google Scholar]
  51. Nichols R. Gene trees and species trees are not the same. Trends in Ecology and Evolution. 2001;16:358–364. doi: 10.1016/s0169-5347(01)02203-0. [DOI] [PubMed] [Google Scholar]
  52. Nielsen R, Beaumont MA. Statistical inferences in phylogeography. Molecular Ecology. 2009;18:1034–1047. doi: 10.1111/j.1365-294X.2008.04059.x. [DOI] [PubMed] [Google Scholar]
  53. Novembre J, Glavani A, Slatkin M. The geographic spread of the CCR5 D32 HIV-resistance allele. PLoS Biology. 2005;3:e339. doi: 10.1371/journal.pbio.0030339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Novembre J, Johnson T, Bryc K, et al. Genes mirror geography within Europe. Nature. 2008;456:98–101. doi: 10.1038/nature07331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Pamilo P, Nei M. Relationship between gene trees and species trees. Molecular Biology and Evolution. 1988;5:568–583. doi: 10.1093/oxfordjournals.molbev.a040517. [DOI] [PubMed] [Google Scholar]
  56. Panchal M, Beaumont MA. The automation and evaluation of nested clade phylogeographic analysis. Evolution. 2007;61:1466–1480. doi: 10.1111/j.1558-5646.2007.00124.x. [DOI] [PubMed] [Google Scholar]
  57. Panchal M, Beaumont MA. Evaluating nested clade phylogeographic analysis under models of restricted gene flow. Systematic Biology. 2010 doi: 10.1093/sysbio/syq022. in press. [DOI] [PubMed] [Google Scholar]
  58. Petit RJ. The coup de grace for the nested clade phylogeographic analysis? Molecular Ecology. 2008;17:516–518. doi: 10.1111/j.1365-294X.2007.03589.x. [DOI] [PubMed] [Google Scholar]
  59. Petit RJ, Grivet D. Optimal randomization strategies when testing the existence of a phylogeographic structure. Genetics. 2002;161:469–471. doi: 10.1093/genetics/161.1.469. author reply 473–465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Platt JR. Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. Science. 1964;146:347–353. doi: 10.1126/science.146.3642.347. [DOI] [PubMed] [Google Scholar]
  61. Popper KR. The Logic of Scientific Discovery. Hutchinson; London: 1959. [Google Scholar]
  62. Pritchard J, Seielstad M, Perez-Lezaun A, Feldman M. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution. 1999;16:1791–1798. doi: 10.1093/oxfordjournals.molbev.a026091. [DOI] [PubMed] [Google Scholar]
  63. Rannala B, Bertorelle G. Using linked markers to infer the age of a mutation. Human Mutation. 2001;18:87–100. doi: 10.1002/humu.1158. [DOI] [PubMed] [Google Scholar]
  64. Rannala B, Yang Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics. 2003;164:1645–1656. doi: 10.1093/genetics/164.4.1645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Ratmann O, Andrieu C, Wiuf C, Richardson S. Model criticism based on likelihood-free inference, with an application to protein network evolution. Proceedings of the National Academy of Sciences of the United States of America. 2009;106:10576–10581. doi: 10.1073/pnas.0807882106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Schervish MJ. Theory of Statistics. Springer-Verlag; New York: 1995. [Google Scholar]
  67. Schwarz G. Estimating the dimension of a model. Annals of Statistics. 1978;6:461–464. [Google Scholar]
  68. Sisson SA, Fan Y, Tanaka MM. Sequential Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences of the United States of America. 2007;104:1760–1765. doi: 10.1073/pnas.0607208104. Errata (2009), 1106, 16889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Sousa VC, Fritz M, Beaumont MA, Chikhi L. Approximate bayesian computation without summary statistics: the case of admixture. Genetics. 2009;181:1507–1519. doi: 10.1534/genetics.108.098129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Tajima F. Evolutionary relationship of DNA sequences in finite populations. Genetics. 1983;105:437–460. doi: 10.1093/genetics/105.2.437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Tanaka MM, Francis AR, Luciani F, Sisson SA. Using approximate Bayesian computation to estimate tuberculosis transmission parameters from genotype data. Genetics. 2006;173:1511–1520. doi: 10.1534/genetics.106.055574. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Tavare S, Balding DJ, Griffiths RC, Donnelly P. Inferring coalescence times from DNA sequence data. Genetics. 1997;145:505–518. doi: 10.1093/genetics/145.2.505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Templeton AR. Nested clade analyses of phylogeographic data: testing hypotheses about gene flow and population history. Molecular Ecology. 1998;7:381–397. doi: 10.1046/j.1365-294x.1998.00308.x. [DOI] [PubMed] [Google Scholar]
  74. Templeton AR. Out of Africa again and again. Nature. 2002;416:45–51. doi: 10.1038/416045a. [DOI] [PubMed] [Google Scholar]
  75. Templeton AR. A maximum likelihood framework for cross validation of phylogeographic hypotheses. In: Wasser SP, editor. Evolutionary Theory and Processes: Modern Horizon. Kluwer Academic Publishers; Dordrecht: 2004a. pp. 209–230. [Google Scholar]
  76. Templeton AR. Statistical phylogeography: methods of evaluating and minimizing inference errors. Molecular Ecology. 2004b;13:789–809. doi: 10.1046/j.1365-294x.2003.02041.x. [DOI] [PubMed] [Google Scholar]
  77. Templeton AR. Genetics and recent human evolution. Evolution. 2007;61:1507–1519. doi: 10.1111/j.1558-5646.2007.00164.x. [DOI] [PubMed] [Google Scholar]
  78. Templeton AR. Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology. 2008;17:1877–1880. doi: 10.1111/j.1365-294X.2008.03731.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Templeton AR. Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology. 2009a;18:319–331. doi: 10.1111/j.1365-294X.2008.04026.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Templeton AR. Why does a method that fails continue to be used? The answer. Evolution. 2009b;63:807–812. doi: 10.1111/j.1558-5646.2008.00600.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Templeton AR. Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology. 2010 doi: 10.1111/j.1365-294X.2009.04514.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Templeton AR, Crandall KA, Sing CF. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. Genetics. 1992;132:619–633. doi: 10.1093/genetics/132.2.619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Templeton AR, Routman E, Phillips CA. Separating population structure from population history—a cladistic analysis of the geographical distribution of mitochondrial DNA haplotypes in the tiger salamander, Ambystoma tigrinum. Genetics. 1995;140:767–782. doi: 10.1093/genetics/140.2.767. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Teshima KM, Coop G, Przeworski M. How reliable are empirical genomic scans for selective sweeps? Genome Research. 2006;16:702–712. doi: 10.1101/gr.5105206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Thompson EA. The Icelandic admixture problem. Annals of Human Genetics. 1973;37:69–80. doi: 10.1111/j.1469-1809.1973.tb01815.x. [DOI] [PubMed] [Google Scholar]
  86. Toni T, Welch D, Strelkova N, Ipsen A, Stumpf MPH. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of the Royal Society Interface. 2009;6:187–202. doi: 10.1098/rsif.2008.0172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Wegmann D, Leuenberger C, Excoffier L. Efficient Approximate Bayesian Computation coupled with Markov Chain Monte Carlo without likelihood. Genetics. 2009;182:1207–1218. doi: 10.1534/genetics.109.102509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Wilson IJ, Balding DJ. Genealogical inference from microsatellite data. Genetics. 1998;150:499–510. doi: 10.1093/genetics/150.1.499. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES