Skip to main content
Systematic Biology logoLink to Systematic Biology
. 2017 Dec 13;67(3):503–517. doi: 10.1093/sysbio/syx087

Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo

Vu Dinh 1,, Aaron E Darling 2, Frederick A Matsen IV 3
Editor: Edward Susko
PMCID: PMC5920340  PMID: 29244177

Abstract

Phylogenetics, the inference of evolutionary trees from molecular sequence data such as DNA, is an enterprise that yields valuable evolutionary understanding of many biological systems. Bayesian phylogenetic algorithms, which approximate a posterior distribution on trees, have become a popular if computationally expensive means of doing phylogenetics. Modern data collection technologies are quickly adding new sequences to already substantial databases. With all current techniques for Bayesian phylogenetics, computation must start anew each time a sequence becomes available, making it costly to maintain an up-to-date estimate of a phylogenetic posterior. These considerations highlight the need for an online Bayesian phylogenetic method which can update an existing posterior with new sequences. Here, we provide theoretical results on the consistency and stability of methods for online Bayesian phylogenetic inference based on Sequential Monte Carlo (SMC) and Markov chain Monte Carlo. We first show a consistency result, demonstrating that the method samples from the correct distribution in the limit of a large number of particles. Next, we derive the first reported set of bounds on how phylogenetic likelihood surfaces change when new sequences are added. These bounds enable us to characterize the theoretical performance of sampling algorithms by bounding the effective sample size (ESS) with a given number of particles from below. We show that the ESS is guaranteed to grow linearly as the number of particles in an SMC sampler grows. Surprisingly, this result holds even though the dimensions of the phylogenetic model grow with each new added sequence.

Keywords: Bayesian inference, effective sample size, online inference, phylogenetics, sequential Monte Carlo, subtree optimality


Maximum likelihood and Bayesian methods currently form the most popular means of phylogenetic inference. The Bayesian methods in particular enjoy the flexibility to incorporate a wide range of ancillary model features such as geographical information or trait data which are essential for some applications (Lemey et al., 2009; Ronquist et al., 2012). However, Bayesian tree inference with current implementations is a computationally intensive task, often requiring days or weeks of CPU time to analyze modest data sets with 100 or so sequences.

New developments in DNA and RNA sequencing technology have led to sustained growth in sequence data sets. This advanced technology has enabled real time outbreak surveillance efforts, such as ongoing Zika, Ebola, and foodborne disease sequencing projects, which make pathogen sequence data available as an epidemic unfolds (Gardy et al., 2015; Quick et al., 2016). In general, these new pathogen sequences arrive one at a time (or in small batches) into a background of existing sequences. Most phylogenetic inferences, however, are performed “from scratch” even when an inference has already been made on the previously available sequences. Projects such as Inline graphic (Neher and Bedford, 2015) incorporate new sequences into trees as they become available, but do so by recalculating the phylogeny from scratch at each update using a fast approximation to maximum likelihood inference, rather than a Bayesian method.

Thus, modern researchers using phylogenetics are in the situation of having previous inferences, having new sequences, and yet having no principled method to incorporate those new sequences into existing inferences. Existing methods either treat a previous point estimate as an established fact and directly insert a new sequence into a phylogeny (Matsen et al., 2010; Berger et al., 2011) or use such a tree as a starting point for a new maximum-likelihood search (Izquierdo-Carrasco et al., 2014). There is currently no method to update posterior distributions on phylogenetic trees with additional sequences.

In this article, we develop the theoretical foundations for an online Bayesian method for phylogenetic inference based on Sequential and Markov Chain Monte Carlo (MCMC). Unlike previous applications of Sequential Monte Carlo (SMC) to phylogenetics (Bouchard-Côté et al., 2012; Bouchard-Côté, 2014; Wang et al., 2015), we develop and analyze online algorithms that can update a posterior distribution as new sequence data becomes available. We first show a consistency result, demonstrating that the method samples from the correct distribution in the limit of a large number of particles in the SMC. Next, we derive the first reported set of bounds on how phylogenetic likelihood surfaces change when new sequences are added. These bounds enable us to characterize the theoretical performance of sampling algorithms by developing a lower bound on the effective sample size (ESS) for a given number of particles. Surprisingly, this result holds even though the dimensions of the phylogenetic model grow with each new added sequence.

Mathematical Setting

Background and Notation

Throughout this article, a phylogenetic treeInline graphic is a tree Inline graphic with leaves labeled by a set of taxon names (e.g., species names), such that each edge Inline graphic is associated with a non-negative number Inline graphic. These trees will be unrooted, although sometimes a root will be designated for notational convenience. For each phylogenetic tree Inline graphic, we will refer to Inline graphic as its tree topology and to Inline graphic as the vector of branch lengths. We denote by Inline graphic the set of all edges in trees with topology Inline graphic; any edge adjacent to a leaf is called a pendant edge, and any other edge is called an internal edge.

We will employ the standard likelihood-based framework for statistical phylogenetics on discrete characters under the common assumption that alignment sites are independently and identically distributed (IID) (Felsenstein, 2004), which we now review briefly. Let Inline graphic denote the set of character states and let Inline graphic. For DNA Inline graphic and Inline graphic. We assume that the mutation events occur according to a continuous time Markov chain on states Inline graphic with instantaneous rate matrix Inline graphic and stationary distribution Inline graphic. This rate matrix Inline graphic and the branch length Inline graphic on the edge Inline graphic define the transition matrix Inline graphic on edge Inline graphic, where Inline graphic denotes the probability of mutating from state Inline graphic to state Inline graphic across the edge Inline graphic (with length Inline graphic).

In an online setting, the taxa Inline graphic and their corresponding observed sequences Inline graphic, each of length Inline graphic, arrive in a specific order, where Inline graphic is a finite but large number. For all Inline graphic, we consider the set of all phylogenetic trees that have Inline graphic as their set of taxa and seek to sample from a sequence of probability distributions Inline graphic of increasing dimension corresponding to phylogenetic likelihood functions (Felsenstein, 2004).

For a fixed phylogenetic tree Inline graphic, the phylogenetic likelihood is defined as follows and will be denoted by Inline graphic. Given the set of observations Inline graphic of length Inline graphic up to sequence Inline graphic, the likelihood of observing Inline graphic given the tree has the form

graphic file with name M41.gif

where Inline graphic ranges over all extensions of Inline graphic to the internal nodes of the tree, Inline graphic denotes the assigned state of node Inline graphic by Inline graphic, and Inline graphic denotes the root of the tree. Although we designate a root for notational convenience, the methods and results we discuss apply equally to unrooted trees.

Given a proper prior distribution with density Inline graphic imposed on branch lengths and tree topologies, the target posterior distributions have densities Inline graphic. We will also use Inline graphic to denote the unnormalized density Inline graphic Throughout the article, we assume that the phylogenetic trees of interest all have non-negative branch lengths bounded from above by Inline graphic and use Inline graphic to denote the set of all such trees.

An important goal of this article is to show that the collection of particles generated by the online phylogenetic sequential Monte Carlo (OPSMC) algorithm do a good job of approximating a sample from the posterior distribution, in the same way that a sample from a MCMC algorithm approximates a sample from the posterior. We would like to show that this approximation becomes arbitrarily good as the number of particles goes to infinity, that is, that the particle distribution converges to the posterior distribution. The type of convergence we will demonstrate is called weak convergence, which means here that for any integrable real-valued function Inline graphic defined on the set of trees, the weighted average of the value of the function Inline graphic over the trees Inline graphic and corresponding weights generated by the algorithm converges (with probability 1) to the to the posterior mean of Inline graphic. This convergence result also implies weak convergence of many key quantities of interest in phylogenetics: convergence of posterior probabilities on trees, branch lengths, and splits.

The proofs require a slightly more abstract measure-theoretic perspective. A measure in this context is a function from subsets of some set to the non-negative real numbers with nice properties, such as that the measure of a countable union of disjoint sets is the sum of the measures of the individual sets. A classic example is Lebesgue measure on subsets of real space, which simply gives the volume of each subset. We can extend this to Inline graphic, the product space of the space of all possible tree topologies and the space of all branch lengths Inline graphic, via the corresponding product measure, Inline graphic, that is uniform on trees and Euclidean on each branch length space. In our presentation, we will use the notation Inline graphic to denote the integration of a test function Inline graphic (defined on Inline graphic) with respect to the measure Inline graphic. Specifically, this integration can be computed by

graphic file with name M65.gif

for any Inline graphic and test function Inline graphic defined on Inline graphic, where Inline graphic is the number of different topologies of Inline graphic. A posterior distribution on Inline graphic can also be considered a measure, such that the value of the posterior measure on a subset of Inline graphic is simply the integral of the posterior on that subset.

In this article, we will also be considering measures on collections of discrete particles, which are simply non-negative weights on those particles: the measure applied to a collection of particles is the sum of those particle weights. This measure will be called the empirical measure. Thus, said in measure-theoretic terms, we would like to show that the empirical measure on the particles converges weakly to the posterior measure on Inline graphic.

Sequential Monte Carlo

SMC methods are designed to approximate a sequence of probability distributions changing through time. These probability distributions may be of increasing dimension or complexity. They track the sequence of probability distributions of interest by producing a discrete representation of the distribution Inline graphic at each generation Inline graphic through a random collection of weighted particles. In the online phylogenetic context, these particles are trees; after each generation, new sequences arrive, and the collection of particles is updated to represent the next target distribution.

To make it easier to keep track of all variables involved in the description of SMC, we also provide a table of variables in the Appendix. Figure 1 can also be used as a reference to the steps of the algorithm. It is worth noting that the algorithm starts with a set of weighted particles, which are illustrated at the center of Figure 1, right before the “resampling step.” While the details of the algorithms might vary, the main idea of SMC interspersed with MCMC sampling can be described as follows.

Figure 1.

Figure 1.

An overview of the OPSMC algorithm.

At the beginning of each generation Inline graphic, a list of Inline graphic particles Inline graphic are maintained along with a positive weight Inline graphic associated with each particle Inline graphic. (In our phylogenetic context, each particle at this point is a phylogenetic tree with Inline graphic taxa). Using a valid SMC algorithm, the weights of these particles can be normalized to approximate the integral of any integrable test function Inline graphic with respect to the target distribution, which in our case is:

graphic file with name M83.gif (1)

This approximation will be formalized into a statement about weak convergence of the empirical particle distribution to the target (in our case the phylogenetic posterior).

A new list of Inline graphic particles is then created in three steps: resampling, Markov transition, and mutation.

Resampling step

The aim of the resampling step is to obtain an unweighted empirical distribution of the weighted measure Inline graphic by discarding samples with small weights and allowing samples with large weights to reproduce (Fig. 1, “resampling” step). Particle sampling is done in proportion to particle weight, thus from an evolutionary perspective, this is a Wright–Fisher step using particle weights as fitnesses. After resampling, the weights on the particles have effectively been translated into their frequency, and so each particle is then assigned the same weight. A total of Inline graphic particles are sampled from the weights Inline graphic, and we denote the particles obtained after this step by Inline graphic.

Markov transition

The scheme employed in the resampling step introduces some Monte Carlo error. Moreover, when the distribution of the weights from the previous generation is skewed, the particles having high importance weights might be oversampled. This results in a depletion of samples (or path degeneracy): after some generations, numerous particles are in fact sharing the same ancestor as the result of a “selective sweep” in the SMC algorithm.

An optional Markov transition step can be employed to alleviate this sampling bias, during which MCMC steps are run separately on each particle Inline graphic for a certain amount of time to obtain a new approximately independent sample Inline graphic (Fig. 1, “MH moves” step). These transitions are done via the Metropolis–Hastings kernel Inline graphic, which we will describe in more detail in the next section. Note that in our phylogenetic setting, the dimension of the state space does not change in the Resampling and Markov transition steps, and all particles have the same number of taxa as ones from the original list (Inline graphic taxa).

Mutation step

Finally, in the mutation step, new particles Inline graphic are created from the previous generation Inline graphic using a proposal distribution Inline graphic (that may be dependent on data). That is, for each Inline graphic, the particle Inline graphic is independently generated from the proposal distribution Inline graphic. The particles are then weighted by an appropriate weight function Inline graphic. If we assume further that for each state Inline graphic, there exists a unique state, denoted by Inline graphic, such that Inline graphic, then Inline graphic can be chosen as

graphic file with name M104.gif (2)

The new particles Inline graphic with their corresponding weights Inline graphic (where Inline graphic for Inline graphic) now represent the distribution Inline graphic and act as the input for the next generation.

In our phylogenetic context, this mutation step adds a new sequence to the tree: the new particle Inline graphic will be Inline graphic-taxon tree obtained from the Inline graphic-taxon tree Inline graphic (Fig. 1, “propose addition of new sequence”). Then Inline graphic is simply Inline graphic but with the most recently added pendant edge removed. To add a sequence, the proposal strategy Inline graphic will specify: an edge Inline graphic to which the new pendant edge is added, the position Inline graphic on that edge to attach the new pendant edge, and the length Inline graphic of the pendant edge. Different ways of choosing Inline graphic lead to different sampling strategies and performances. Note that such an addition changes not only the distribution on the state space, but also the state space itself.

The process is then iterated until Inline graphic.

Online Phylogenetic Inference via Sequential Monte Carlo

Next, we more fully develop OPSMC and isolate technical conditions that will ensure a correct sampler. For these conditions, we make a distinction between Criteria, which are a more general set of conditions that imply consistency of OPSMC, and stronger Assumptions, which guarantee that the Criteria hold and to enable further analyses of the sampler.

In contrast to the traditional setting of SMC, for OPSMC when the number of leaves Inline graphic of the particles increases, not only does the local dimension of the space Inline graphic increase linearly, the number of different topologies in Inline graphic also increases superexponentially in Inline graphic. Careful constructions of the proposal distribution Inline graphic, which will build Inline graphic-taxon trees out of Inline graphic-taxon trees, and the Markov transition kernel Inline graphic are essential to cope with this increasing complexity. This goes beyond simply satisfying the criteria guaranteeing a correct sampler (Fourment et al., 2017).

General Criteria for a Consistent OPSMC Sampler

The proposal distributions Inline graphic will be designed in such a way that the following criterion holds.

Criterion 1.

At every generation of the OPSMC sampling process, for any two trees Inline graphic and Inline graphic in the tree space Inline graphic, the proposal density Inline graphic satisfies Inline graphic if and only if Inline graphic can be obtained from Inline graphic by removing the taxon Inline graphic and its corresponding edge.

This formulation is analogous to the definition of “covering” from (Wang et al., 2015), although is distinct in the setting of online inference. In either case, for every tree Inline graphic, there exists a unique tree Inline graphic in Inline graphic such that Inline graphic and thus a weight function of the form (2) can be used.

As we mentioned in the previous section, to obtain an Inline graphic-taxon tree from an Inline graphic-taxon tree, a proposal strategy Inline graphic must specify:

  1. an edge Inline graphic to which the new pendant edge is added,

  2. the position Inline graphic on that edge to attach the new pendant edge, and

  3. the length Inline graphic of the pendant edge.

The position Inline graphic on an edge of a tree will be specified by its distal length, which is the distance from the attachment location to the end of the edge that is farthest away from the root of the tree.

Besides the SMC proposal strategy Inline graphic, an appropriate Markov transition kernel Inline graphic can increase the effectiveness of an OPSMC algorithm. It is worth noting that the problem of sample depletion is even more severe for OPSMC than for typical applications of SMC, since after each generation, the sampling space actually expands in dimensionality and complexity. To alleviate this sampling bias, MCMC steps can be run separately on each particle for a certain amount of time to obtain new independent samples. We require the following criterion, which is as expected for any Markov transition kernel used in standard MCMC.

Criterion 2.

At every generation of the OPSMC sampling process, if a Markov transition kernel is used, the Markov transition kernel Inline graphic has Inline graphic as its stationary distribution.

As we will see later in the proof of consistency of OPSMC, Criterion 2 is the only assumption required of the Markov transition kernel if it is applied. This leaves us with a great degree of freedom to improve the efficiency of the sampling algorithm without damaging its theoretical properties. For example, one can use global information provided by the population of particles, such as ESS (Beskos et al., 2014), to guide the proposal, or to define a transition kernel on the whole set (or some subset) of particles (Andrieu et al., 2001). In the context of phylogenetics, we can design a sampler that recognizes subtrees that have been insufficiently sampled, and samples more particles to improve the ESS within such regions. Similarly, one can use samplers that rearrange the tree structure in the neighborhood of newly added pendant edges.

We will prove in Theorem 10 that when Criteria 1 and 2 are satisfied, then the OPSMC sampler is consistent.

Designing an Effective OPSMC Sampler

Although Criteria 1 and 2 guarantee the consistency of OPSMC, they do not lead to any insights about the performance of the OPSMC sampler. From a practical point of view, one is interested in quantifying some measure of efficiency of an MCMC algorithm. This question is even more interesting for SMC (and in particular, for phylogenetic SMC algorithms), since in the general case it is known that SMC-type algorithms may suffer from the curse-of-dimensionality: when the dimension of the problem increases, the number of the particles must increase exponentially to maintain a constant effective sample size (Chopin, 2004; Bengtsson et al., 2008; Bickel et al., 2008; Snyder et al., 2008).

In this article, we are interested how the ESS of the OPSMC sampler behaves in the limit of a large number of particles. This task is challenging because we need to quantify how the target distribution changes with newly added sequences, and how the proposals should be designed to cope with these changes in an efficient way. Throughout this section, we will also derive several assumptions, imposed on both the target distribution and the design of the proposal, to enable such analyses.

First we assume the prior distribution does not change drastically when a new taxon is added:

Assumption 3.

There exists Inline graphic independent of Inline graphic, and Inline graphic such that

Assumption 3.

and Inline graphic for all Inline graphic.

In practice, the most common prior distribution on unrooted trees is the uniform distribution on topologies with an identical prior on each of the branches, which satisfy the Assumption 3 with Inline graphic.

As we will discuss in later sections, to make sure that the proposals can capture the posterior distributions Inline graphic efficiently, some regularity conditions on Inline graphic are also necessary. These conditions are formalized in terms of a lower bound on the posterior expectation of Inline graphic, the average branch length of Inline graphic for a given tree Inline graphic:

Assumption 4

(Assumption on the average branch length). There exist positive constants Inline graphic (independent of Inline graphic) such that for each Inline graphic

Assumption 4

where Inline graphic is the number of tree topologies with Inline graphic taxa, and Inline graphic denotes the average of branch lengths of the tree Inline graphic.

In other words, we assume that on average (with respect to the posterior distribution), the average branch lengths of the trees are bounded from below by Inline graphic. While the condition is technical, it can be verified in many realistic cases in phylogenetic inference. For example, in the absence of data, the posterior is just the prior, and Assumption 4 holds for any prior that put equal weights on topologies and selects branch lengths from a distribution with nonzero mean. Similarly, if the target distributions concentrate on a single “correct” tree generated from a Yule process with rate parameter Inline graphic, then the average branch length converges to Inline graphic.

As we discussed above, to quantify the efficiency of OPSMC, we need to provide some estimates on how the target distribution changes with newly added sequences. We will discuss in detail below that the most extreme changes on the target distributions happen when some of the target distribution concentrates on a set of trees with short edges, for which a single new sequence can shake up all previous beliefs about the phylogenies of interest. Assumption 4 rules out such pathological behaviors and provides a foundation for analyzing the ESS, although it does impose some limitations. For instance, if we think of a random process generating Inline graphic taxon trees in a fixed time (e.g., since the dawn of life on earth) then for most natural models, the average branch length might go to zero as Inline graphic goes to infinity. However, we note that both Assumptions 3 and 4 can be further relaxed to address more general cases (details in the section “Effective sample sizes of online phylogenetic SMC”).

Throughout the article, we will investigate two different classes of sampling schemes: length-based proposals and likelihood-based proposals.

Length-based proposals

For length-based proposals:

  1. the edge Inline graphic is chosen from a multinomial distribution weighted by length of the edges,

  2. the distal position Inline graphic is selected from a distribution Inline graphic (with density Inline graphic) across the edge length, and

  3. the pendant length Inline graphic is sampled from a distribution Inline graphic (with density Inline graphic) with support contained in Inline graphic.

For example, if these Inline graphic distributions are uniform, we obtain a uniform prior on attachment locations across the tree.

We assume that

Assumption 5.

The densities Inline graphic of the distal position on edge Inline graphic and Inline graphic of the pendant edge lengths are continuous on Inline graphic and Inline graphic, respectively and satisfy

Assumption 5.

where Inline graphic denotes the length of edge Inline graphic and Inline graphic is independent of Inline graphic.

One way to look at this assumption is that it is designed to guarantee Criterion 1: given two trees Inline graphic and Inline graphic such that Inline graphic can be obtained from Inline graphic by removing the taxon Inline graphic and its corresponding edge, we want to guarantee that Inline graphic is positive. In the length-based proposal, this corresponds to guaranteeing that the proposal does not “miss” any possible choice of the edge, the distal position, and the pendant edge length, and one way to do so is to assume that the densities Inline graphic and Inline graphic are bounded away from zero. The assumption is clearly implied by (and so is weaker than) a uniform positive lower bound on the densities Inline graphic and Inline graphic. The constant Inline graphic in the assumption quantifies how “spread out” the distribution is on Inline graphic and plays an important role in the efficiency of the algorithm.

Example 6.

For any density function Inline graphic on Inline graphic such that Inline graphic is integrable, the family of proposals Inline graphic satisfies Assumption 5.

Likelihood-based proposals

We denote by Inline graphic the tree obtained by adding an edge of length Inline graphic to edge Inline graphic of the tree Inline graphic at distal position Inline graphic. Any tree Inline graphic can be represented by Inline graphic, where Inline graphic is the edge on which the pendant edge containing the most recent taxon is attached. In the likelihood-based approach, the edge Inline graphic (from the tree Inline graphic) is chosen from a multinomial distribution weighted by a user-defined likelihood-based utility function Inline graphic. Likelihood-based proposals are informed by data and are capable of capturing the posterior distribution more efficiently, but with an additional cost for computing the likelihoods.

In a Bayesian inference framework, the most natural utility function is the average likelihood utility function

graphic file with name M225.gif

In this setting, for a fixed tree Inline graphic, the utility function on each edge Inline graphic of the tree is computed by Inline graphic. The utility quantifies the expected value of the unnormalized posterior if we indeed attach the new pendant edge to edge Inline graphic. The sampler then uses these weights to choose the edge to attach the new sequence. By doing so, the proposals obtain a set of “better” trees that are more likely to survive in the subsequent generations.

Many other likelihood-based utility functions (that are quicker to compute) can be defined. We will assume that the likelihood-based utility function Inline graphic satisfies the following assumption.

Assumption 7.

There exist Inline graphic independent of Inline graphic such that

Assumption 7.

for all Inline graphic.

This assumption ensures that the sampler is efficient as long as the utility function Inline graphic is “comparable” to the average likelihood utility function up to some multiplicative constants Inline graphic and Inline graphic. By analyzing OPSMC under this assumption rather than average likelihood utility function itself, we have the option to choose other (cheaper) utility functions for the proposal, the most important of which is the maximum a posteriori (MAP) utility function

graphic file with name M238.gif

Here, we approximate an integral by the MAP multiplied by the area of integration (in this case, Inline graphic). This utility function can be computed via a simple optimization procedure that can be done quickly and efficiently in computational phylogenetics software and avoids the burden of sampling the posterior distribution required to compute the average likelihood utility function. The following lemma (proven in the Appendix) establishes that the MAP utility function also satisfies Assumption 7.

Lemma 8.

There exists Inline graphic independent of Inline graphic such that

Lemma 8.

for all Inline graphic.

In a similar manner, the distributions Inline graphic and Inline graphic might also be guided by information about the likelihood function. As for the length-based proposal, we assume the following conditions on the distal position and pendant edge length proposals for the likelihood-based approach.

Assumption 9.

The densities Inline graphic and Inline graphic are absolutely continuous with respect to the Lebesgue measure on Inline graphic and Inline graphic, respectively. Moreover, there exists Inline graphic independent of Inline graphic such that

Assumption 9.

This condition plays the same roles as Assumption 5 in the length-based proposal: to guarantee Criterion 1 by ensuring that the densities Inline graphic and Inline graphic are bounded away from zero. Since the likelihood-based proposals are more difficult to analyze than the length-based one, Assumption 9 is a bit stronger than Assumption 5. However, we note that the density functions from Example 6 satisfy both assumptions.

Consistency of Online Phylogenetic SMC

We recall that the OPSMC sampler maintains a list of Inline graphic particles Inline graphic with a positive weight Inline graphic associated with each particle Inline graphic. We would like to evaluate whether a sample of particles provides a good approximation to the posterior distribution in the sense of (1), which is made rigorous in measure-theoretic terms as follows. We form the normalized empirical measure Inline graphic and define the integral Inline graphic of a test function Inline graphic with respect to this measure by

graphic file with name M262.gif

We will show that the normalized empirical measure Inline graphic converges weakly to Inline graphic, that is the integral Inline graphic of a test function Inline graphic with respect to this measure converges to the integral of Inline graphic with respect to the posterior distribution with probability 1.

In this section, we will demonstrate this weak convergence by induction on the number of taxa Inline graphic; that is, for every Inline graphic, assuming that Inline graphic, we will prove that Inline graphic (where Inline graphic means weak convergence). We note that although the measures mentioned above are indexed by Inline graphic, they implicitly depend on the number of particles from the previous generations. Thus, the convergence should be interpreted in the sense of when the number of particles of all generations approach infinity.

We note that when Inline graphic, the set of all rooted trees with no taxa consists of a single tree Inline graphic. Thus, if we use this single tree as the ensemble of particles at Inline graphic, then Inline graphic is precisely Inline graphic. Alternatively, we can start with Inline graphic and use an MCMC method to create an ensemble of particles with stationary distribution Inline graphic. In either case, an induction argument gives the main theorem:

Theorem 10

(Consistency). If Criteria 1 and 2 are satisfied and the sampler starts at Inline graphic by a list consisting of a single rooted tree with no taxa, or at Inline graphic with an ensemble of particles created by an ergodic MCMC method with stationary distribution Inline graphic, then Inline graphic converges weakly to the posterior Inline graphic. That is, for every integrable test function Inline graphic and Inline graphic

Theorem 10

as Inline graphic, where a.s. denotes almost sure convergence (i.e., convergence with probability 1).

Theorem 10 shows that approximation by the OPSMC sampler becomes arbitrarily good as the number of particles goes to infinity.

Remark 11.

Theorem 10 describes the asymptotic behavior of OPSMC in the limit when the number of particles of all generations (up to generation Inline graphic) approaches infinity, and guarantees that the algorithm is consistent regardless of the relative rates with which the Inline graphic approach infinity. For example, in the traditional setting when the number of particles are the same in every generation, that is, Inline graphic, we also have

Remark 11.

as Inline graphic.

However, it is worth pointing out that because the sampler is built sequentially, to make a prediction with a given accuracy at generation Inline graphic with finite data, we need to control the number of particles from all previous generations. These issues will be further addressed in the latter part of the article, where we uniformly bound the ESS of the sampler as the generations proceed.

Characterizing Changes in the Likelihood Landscapes When New Sequences Arrive

Although the consistency of OPSMC is guaranteed, and informative OPSMC samplers can be developed by changing the Markov transition kernels, its applicability is constrained by an implicit assumption: the distance between target distributions of consecutive generations is not too large. Since SMC methods are built upon the idea of recycling particles from one generation to explore the target distribution of the next generation, it is obvious that one would never be able to design an efficient SMC sampler if Inline graphic and Inline graphic are very different from one another.

While a condition on minor changes in the target distributions may be easy to verify in some applications, it is not straightforward in the context of phylogenetic inference. A similar question on how the “optimal” trees (under some appropriate measure of optimality) change has been studied extensively in the field, with negative results for almost all regular measures of optimality (Heath et al., 2008; Cueto and Matsen, 2011). However, to the best of our knowledge, no work has been done detailing how phylogenetic likelihood landscapes change when new sequences arrive.

In this section, we will establish that under some minor regularity conditions on the distribution described in the previous sections, the relative changes between target distributions from consecutive generations are uniformly bounded. This result enables us to provide a lower bound on the effective sample size of OPSMC algorithms in the next section.

Lemma 12.

Consider an arbitrary tree Inline graphic obtained from the parent tree Inline graphic by choosing edge Inline graphic, distal position Inline graphic and pendant length Inline graphic. Denote

Lemma 12.

and

Lemma 12.

Recalling that our sequences are of length Inline graphic and assuming that Inline graphic for all Inline graphic, we have

Lemma 12.

Sketch of proof.

Using the 1D formulation of the phylogenetic likelihood function derived in Dinh and Matsen (2016), we can prove that

Sketch of proof. (3)

Similarly, we have Inline graphic for all Inline graphic.

For any tree Inline graphic with Inline graphic taxa, we let Inline graphic and denote by Inline graphic and Inline graphic the chosen edge for attaching the last taxon, the distal length, and the length of the pendant edge, respectively. We deduce that

Sketch of proof.

Recall that Inline graphic is the average branch length of Inline graphic. Using the fact that for a fixed tree Inline graphic, Inline graphic and Inline graphic, we have

Sketch of proof.

Recalling that Inline graphic, we deduce that

Sketch of proof.

This completes the proof. □

The main idea of Lemma 12 is to bound the posterior values of a tree Inline graphic by that of its parent Inline graphic with an explicit constant independent of Inline graphic. While the proof of the lemma is technical, its main insights can be explained from the observation that the most extreme changes of the posteriors happen when, either (1) the new pendant edge is small, or (2) the edge lengths of the parent tree Inline graphic are small.

Indeed, in the simplest case where both priors on the sets of trees with Inline graphic and Inline graphic-taxa are uniform, we can choose Inline graphic. Equation (3) and the subsequent equation establish that the ratio between the likelihood values of tree Inline graphic and its parent Inline graphic can be bounded from above as long as the length of the pendant edge is not too small. Similarly, the ratio between the total masses of Inline graphic and Inline graphic can also be controlled if the target distribution Inline graphic does not concentrate on a set of trees with short edges. This motivates Assumption 4 to provide a lower bound on the total edge lengths and to rule out such pathological behaviors.

ESSs of Online Phylogenetic SMC

In this section, we are interested in the asymptotic behavior of OPSMC in the limit of large Inline graphic, that is, when the number of particles of the sampler approaches infinity. This asymptotic behavior is illustrated via estimates of the ESS of the sampler with large numbers of particles. We note that although there are several studies on the stability of SMC as the generations proceed, most of them focus on cases where the sequence of target distributions have a common state space of fixed dimension (Del Moral, 1998; Douc and Moulines, 2008; Künsch, 2005; Oudjane and Rubenthaler, 2005; Del Moral et al., 2009; Beskos et al., 2014). In general, establishing stability bounds for SMC requires imposing some conditions on the effect of data at any generation Inline graphic to the target distribution at generation Inline graphic (Crisan and Doucet, 2002; Chopin, 2004; Doucet and Johansen, 2009). Lemma 12 helps validate a condition of this type.

In this section, we use the ESS (Kong et al., 1994; Liu and Chen, 1995) of the particles at generation Inline graphic as a measure of the sampler’s efficiency. The ESS is computed as

graphic file with name M342.gif

The detailed derivation of the formula is provided in Kong et al. (1994) and Liu and Chen (1995), but a simple intuition is as follows: if the weights of the particles are roughly of the same fitness (weight), then the ESS is equal to the number of particles; on the other hand, if Inline graphic of the particles share almost all of the weight equally, Inline graphic. This formulation of the ESS is usually used to do adaptive resampling, whereas some additional resampling steps are done when the ESS drops below certain threshold. We emphasize that as with other measures of efficiency of MCMC methods, good ESS is necessary but not sufficient to ensure a good quality posterior. From the definition, it is also clear that the ESS could not exceed the number of particles.

The following result, proven in the Appendix, enables us to estimate the asymptotic behavior of the sample’s ESS in various settings.

Theorem 13.

In the limit as the number of particles approaches infinity, we have

Theorem 13.

with probability 1.

This asymptotic estimate and the results on likelihood landscapes from the previous section allow us to prove the following Theorem (see Appendix for proof).

Theorem 14

(ESS of OPSMC for likelihood-based proposals). If Assumptions 3, 4, 7, and 9 hold, then with probability 1, there exists Inline graphic independent of Inline graphic such that Inline graphic. That is, the effective sample size of an OPSMC sampler with likelihood-based proposals is bounded below by a constant multiple of the number of particles. Moreover, if Assumption 4 does not hold, the ESS of OPSMC algorithms decays at most linearly as the dimension (the number of taxa) increases.

We also have similar estimates for length-based proposals (see Appendix for proof):

Theorem 15

(ESS of OPSMC for length-based proposals). If Assumptions 3, 5, and 4 hold, then with probability 1, the ESS of OPSMC with length-based proposals are bounded below by a constant multiple of the number of particles. Moreover, if Assumption 4 does not hold, the ESS of OPSMC algorithms decays at most quadratically as the dimension (the number of taxa) increases.

In summary, we are able to prove that in many settings, the ESS of OPSMC is bounded from below. These results are surprising, since in the general case it is known that SMC-type algorithms may suffer from the curse-of-dimensionality: when the dimension of the problem increases, the number of the particles must increase exponentially to maintain a constant ESS (Chopin, 2004; Bengtsson et al., 2008; Bickel et al., 2008; Snyder et al., 2008). We further note that although the Markov transition kernels Inline graphic after the mutation step are not involved in the theoretical analysis of the ESS in this section, the results hold true for all kernels that satisfy Criterion 2.

Discussion

In this article, we establish foundations for OPSMC, including essential theoretical convergence results. We prove that under some mild regularity conditions and with carefully constructed proposals, the OPSMC sampling algorithm is consistent. This includes relaxing the condition used in (Bouchard-Côté et al., 2012), in which the authors assume that the weight of the particles are bounded from above. We then investigate two different classes of sampling schemes for online phylogenetic inference: length-based proposals and likelihood-based proposals. In both cases, we show the ESS to be bounded below by a multiple of the number of particles.

The consistency and convergence results in this article apply to a variety of sampling strategies. One possibility would be for an algorithm to use a large number of particles, directly using the SMC machinery to approximate the posterior. Alternatively, the SMC part of the sampler could be quite limited, resulting in an algorithm which combines many independent parallel MCMC runs in a principled way. As described above, the SMC portion of the algorithm enables MCMC transition kernels that would normally be disallowed by the requirement of preserving detailed balance. For example, one could use a kernel that focuses effort around the part of the tree which has recently been disturbed by adding a new sequence.

In the future we will develop efficient and practical implementations of these ideas, and a first step in this direction has already been made (Fourment et al., 2017). Many challenges remain. For example, the exclusive focus of this article has been on the tree structure, consisting of topology, and branch lengths. However, Bayesian phylogenetics algorithms typically coestimate mutation model parameters along with the tree structures. Although proposals for other model parameters can be obtained by particle MCMC (Andrieu et al., 2010), we have not attempted to incorporate it into the current SMC framework. In addition, we note that the input for this type of phylogenetics algorithm consists of a multiple sequence alignment (MSA) of many sequences, rather than just individual sequences themselves. This raises the question of how to maintain an up-to-date MSA. Programs exist to add sequences into existing MSAs (Caporaso et al., 2010; Katoh and Standley, 2013), although from a statistical perspective, it could be preferable to jointly estimate a sequence alignment and tree posterior (Suchard and Redelings, 2006). It is an open question how that could be done in an online fashion, although in principle it could be facilitated by some modifications to the sequence addition proposals described here.

Funding

This work was funded by National Science Foundation grants [DMS-1223057, CISE-1564137 to V.D. and F.A.M.]; National Institutes of Health grant [U54-GM111274 to V.D. and F.A.M.]; and Faculty Scholar grant from the Howard Hughes Medical Institute and the Simons Foundation to F.A.M.

Appendix

Notation and variables

Inline graphic Particle in generation Inline graphic, weighted by Inline graphic
Inline graphic Particle obtained after the “resampling step,” unweighted
Inline graphic Particle obtained after the “Markov transition step,” unweighted
Inline graphic Particle in generation Inline graphic, obtained after the “mutation step,” weighted by Inline graphic
Inline graphic Unnormalized measure, induced by the particles Inline graphic
Inline graphic Unnormalized measure, induced by the particles Inline graphic
Inline graphic Unnormalized measure, induced by the particles Inline graphic
Inline graphic Unnormalized measure, induced by the particles Inline graphic
Inline graphic The posterior distribution at generation Inline graphic
Inline graphic The proposal distribution at generation Inline graphic
Inline graphic The Markov transition kernel at generation Inline graphic
Inline graphic Proposal distribution of the distal position
Inline graphic Proposal distribution of the pendant edge length

Integration on Inline graphic

As described in the introduction, to do integration on Inline graphic, the product space of the space of all possible tree topologies and the space of all branch lengths Inline graphic, we consider the corresponding product measure, Inline graphic, that is uniform on trees and Euclidean on each branch length space. Integration using this measure is then just the average of the integrals for each topology:

graphic file with name M378.gif

for any Inline graphic and test function Inline graphic defined on Inline graphic, where Inline graphic is the number of different topologies of Inline graphic.

Consistency of Online Phylogenetic SMC

For clarity, we restate below the main steps in the OPSMC and introduce some important notations. We recall that at the beginning of each generation Inline graphic, a list of Inline graphic particles Inline graphic are maintained along with a positive weight Inline graphic associated with each particle Inline graphic. These weighted particles form an unnormalized measure and a corresponding normalized empirical measure

graphic file with name M389.gif

such that Inline graphic approximates Inline graphic. A new list of Inline graphic particles is then created in three steps: resampling, Markov transition and mutation.

Resampling step (the “resampling” step illustrated in Fig. 1).

A total of Inline graphic particles are sampled from the distribution Inline graphic, and after resampling we obtain the unweighted measure

graphic file with name M395.gif

where Inline graphic is the multiplicity of particle Inline graphic (i.e., the number of times Inline graphic arose in the sample), sampled from a multinomial distribution parameterized by the weights Inline graphic. We denote the particles obtained after this step by Inline graphic.

Markov transition (the “MH moves” step illustrated in Fig. 1). MCMC steps can be run separately on each particle Inline graphic for a certain amount of time to obtain a new approximately independent sample Inline graphic with (unweighted) measure denoted Inline graphic.

Mutation step (the “propose addition of new sequence” step illustrated in Fig. 1).

In the mutation step, new particles Inline graphic are created from a proposal distribution Inline graphic and are weighted by the weight function Inline graphic defined by equation (2). The new particles Inline graphic with their corresponding weights now represent the distribution Inline graphic and act as the input for the next generation.

The proposal Inline graphic is assumed to be normalized, and the unnormalized measure over the particles Inline graphic can be computed by

graphic file with name M411.gif

for any measurable set Inline graphic.

The process is then iterated until Inline graphic. For convenience, we will denote the unnormalized empirical measures of the particles right after generation Inline graphic by Inline graphic, Inline graphic and Inline graphic, respectively. Similarly, the corresponding normalized distributions will be denoted by Inline graphic, Inline graphic and Inline graphic.

For convenience, let Inline graphic and Inline graphic be the number of particles at the Inline graphic and Inline graphic generations, respectively. Recall that the normalized distributions after the substeps of OPSMC are denoted by Inline graphic, Inline graphic, and Inline graphic, we have the following lemma.

Lemma 16.

Assume that Criteria 1 and 2 are satisfied. If we define

Lemma 16.

then the following statements hold, where Inline graphic means weak convergence

  1. If Inline graphic, then Inline graphic.

  2. If Inline graphic, then Inline graphic.

  3. If Inline graphic, then Inline graphic.

  4. Inline graphic is proportional to Inline graphic.

  5. If Inline graphic, then Inline graphic.

Proof of Lemma 16.

(1). Assume that Inline graphic converges to Inline graphic a.s.,

Proof of Lemma 16.

where here Inline graphic denotes the total mass of the empirical measure Inline graphic (i.e., the sum of all of all particle weights). Since the particles in generation Inline graphic are sampled independently from a multinomial distribution with fixed weights, by the strong law of large numbers, we have

Proof of Lemma 16.

This implies

Proof of Lemma 16.

This implies that when Inline graphic, we have Inline graphic a.s..

(2). The rationale behind the use of MCMC moves is based on the observation that if the unweighted particles are distributed according to Inline graphic, then when we apply a Markov transition kernel Inline graphic of invariant distribution Inline graphic to any particle, the new particles are still distributed according to the posterior distribution of interest.

Formally, if Inline graphic a.s. for every integrable test function Inline graphic, by choosing Inline graphic for any measurable set Inline graphic, we deduce that

Proof of Lemma 16.

a.s., since the Markov kernel Inline graphic is invariant with respect to Inline graphic. Therefore, for any measurable function Inline graphic, we have that Inline graphic converges to Inline graphic almost surely.

(3). Since Inline graphic a.s. for every integrable test function Inline graphic, by choosing Inline graphic for any measurable set Inline graphic, we deduce that

Proof of Lemma 16.

Therefore, for any measurable function Inline graphic, we also have Inline graphic converges to Inline graphic almost surely.

(4) We note that

Proof of Lemma 16. (A.1)

(5). Since the proposal Inline graphic and the Markov kernel Inline graphic are assumed to be normalized, we have Inline graphic.

We have,

Proof of Lemma 16. (A.2)

By a similar argument, we have

Proof of Lemma 16.

In other words, Inline graphic converges to Inline graphic. □

Proof of Theorem 13.

By definition, we have

Proof of Theorem 13.

Thus,

Proof of Theorem 13.

On the other hand, by applying Theorem 10 for Inline graphic, we have

Proof of Theorem 13.

which completes the proof via the convergence result (A.2). □

An induction argument with Lemma 16 gives the main theorem.

Other Proofs

Proof of Lemma 8.

The lower bound is straightforward. For the upper bound, consider Inline graphic and fix Inline graphic; by the same arguments as in the proof of Lemma 12, we have

Proof of Lemma 8.

Thus, if we define

Proof of Lemma 8.

then we have Inline graphic and

Proof of Lemma 8.

On the other hand, from Lemma 12, we have Inline graphic.

By choosing Inline graphic, we obtain

Proof of Lemma 8.

which completes the proof. □

Proof of Lemma 12.

Let Inline graphic be the length of the edge Inline graphic and Inline graphic be the transition matrix across an edge of length Inline graphic and Inline graphic the observed value at site Inline graphic of the newly added taxon. We follow the formulation of 1D phylogenetic likelihood function as in Dinh and Matsen (2016, Section 2.2) to fix all parameters except Inline graphic and consider the likelihood of Inline graphic a function of Inline graphic, we have

Proof of Lemma 12.

where Inline graphic the probability of observing Inline graphic and Inline graphic at the left and right nodes of Inline graphic at the site index Inline graphic, respectively (note that in Dinh and Matsen 2016, it is called Inline graphic). Similarly, by representing the likelihood of the tree Inline graphic in terms of Inline graphic, Inline graphic and Inline graphic, we have

Proof of Lemma 12. (A.3)

where the indices Inline graphic are looped over all possible state characters. Since Inline graphic for all Inline graphic and Inline graphic, we deduce that

Proof of Lemma 12. (A.4)

Similarly, we have Inline graphic for all Inline graphic.

Recall that Inline graphic is the average branch length of Inline graphic. Using the fact that for a fixed tree Inline graphic, Inline graphic and Inline graphic, we have

Proof of Lemma 12.

Noting that Inline graphic, we obtain

Proof of Lemma 12. (A.5)

which implies

Proof of Lemma 12.

Proof of Theorem 14.

We recall that Inline graphic denotes the tree obtained by adding an edge of length Inline graphic to edge Inline graphic of the tree Inline graphic at distal position Inline graphic. Any tree Inline graphic can be represented by Inline graphic, where Inline graphic is the edge on which the pendant edge containing the most recent taxon is attached.

Define

Proof of Theorem 14.

Since edge Inline graphic is chosen from a multinomial distribution weighted by Inline graphic, given any tree Inline graphic obtained from the parent tree Inline graphic, chosen edge Inline graphic, distal position Inline graphic and pendant length Inline graphic, we have

Proof of Theorem 14.

On the other hand, by Lemma 12 and the fact that Inline graphic, we have

Proof of Theorem 14.

where Inline graphic is the constant from Assumption 3, Inline graphic and Inline graphic are defined as in Lemma 12.

These two identities and Assumption 9 imply that

Proof of Theorem 14.

Note that from Assumption 7, we have

Proof of Theorem 14.

and

Proof of Theorem 14.

Thus

Proof of Theorem 14.

Now by Theorem 13, there exists Inline graphic independent of Inline graphic and Inline graphic such that with probability one, there exists Inline graphic such that we have

Proof of Theorem 14.

Let

Proof of Theorem 14.

Note that since Inline graphic and Inline graphic are positive (the ESS is at least 1), and the infimum is taken over a finite set, Inline graphic is positive and does not depend on Inline graphic. Thus Inline graphic is independent of Inline graphic and satisfies Inline graphic.

We also note that without the assumption on average branch lengths, a crude estimate gives Inline graphic, which leads to a linear decay in the upper bound on the ESS. □

Proof of Theorem 15.

Since the edge Inline graphic is chosen from a multinomial distribution weighted by length of the edges, then given any tree Inline graphic obtained from the parent tree Inline graphic by choosing edge Inline graphic, distal position Inline graphic and pendant length Inline graphic, we have

Proof of Theorem 15.

where Inline graphic and Inline graphic are the length of edge Inline graphic and the total tree length, respectively, and Inline graphic and Inline graphic are the numbers of tree topologies of Inline graphic and Inline graphic.

We denote

Proof of Theorem 15.

and recall that

Proof of Theorem 15.

where Inline graphic is the constant from Assumption 5, and

Proof of Theorem 15.

from the assumption on the average branch length (Assumption 4). We have

Proof of Theorem 15.

By the assumption of maximum branch length Inline graphic, we have

Proof of Theorem 15.

Thus by Theorem 13 there exists Inline graphic independent of Inline graphic and Inline graphic such that Inline graphic. □

References

  1. Andrieu, C.,, Doucet, A.,, Holenstein, R. 2010. Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Series B Stat. Methodol. 72:269–342. [Google Scholar]
  2. Andrieu, C.,, Doucet, A.,, Punskaya, E. 2001. Sequential Monte Carlo methods for optimal filtering. In: Doucet, A.,, de Freitas, N.,, Gordon, N., editors. Sequential Monte Carlo Methods in Practice. New York: Springer; p. 79–95. [Google Scholar]
  3. Bengtsson, T.,, Bickel, P.,, Li, B. 2008. Curse-of-dimensionality revisited: collapse of the particle filter in very large scale systems. In: Probability and Statistics: Essays in Honor of David A. Freedman. Institute of Mathematical Statistics, p. 316–334. [Google Scholar]
  4. Berger, S.A.,, Krompass, D.,, Stamatakis, A. 2011. Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Syst. Biol. 60:291–302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Beskos, A.,, Crisan, D.,, Jasra, A. 2014. On the stability of sequential Monte Carlo methods in high dimensions. Ann. Appl. Probab. 24:1396–1445. [Google Scholar]
  6. Bickel, P.,, Li, B.,, Bengtsson, T. 2008. Sharp failure rates for the bootstrap particle filter in high dimensions. In: Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh. Institute of Mathematical Statistics, p. 318–329. [Google Scholar]
  7. Bouchard-Côté, A. 2014. SMC (Sequential Monte Carlo) for Bayesian phylogenetics. In: Chen, M.-H.,, Kuo, L.,, Lewis, P.O., editors. Bayesian Phylogenetics: Methods, Algorithms, and Applications. Boca Raton (FL): CRC Press. p. 163–186. [Google Scholar]
  8. Bouchard-Côté, A., Sankararaman, S., and Jordan, M.I.. 2012. Phylogenetic inference via sequential monte carlo. Syst. Biol. 61:579–593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Caporaso, J.G.,, Bittinger, K.,, Bushman, F.D.,, DeSantis, T.Z.,, Andersen, G.L.,, Knight, R. 2010. PyNAST: a flexible tool for aligning sequences to a template alignment. Bioinformatics 26:266–267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Chopin, N. 2004. Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference. Ann. Stat. 32:2385–2411. [Google Scholar]
  11. Crisan, D.,, Doucet, A. 2002. A survey of convergence results on particle filtering methods for practitioners. IEEE Trans. Signal Process. 50:736–746. [Google Scholar]
  12. Cueto, M.A.,, Matsen, F.A. 2011. Polyhedral geometry of phylogenetic rogue taxa. Bull. Math. Biol. 73:1202–1226. [DOI] [PubMed] [Google Scholar]
  13. Del Moral, P. 1998. A uniform convergence theorem for the numerical solving of the nonlinear filtering problem. J. Appl. Probab. 35:873–884. [Google Scholar]
  14. Del Moral, P.,, Patras, F.,, Rubenthaler, S., et al. 2009. Tree based functional expansions for Feynman–Kac particle models. Ann. Appl. Probab. 19:778–825. [Google Scholar]
  15. Dinh, V.,, Matsen, F.A. 2016. The shape of the one-dimensional phylogenetic likelihood function. Ann. Appl. Probab. (in press)http://arxiv.org/abs/1507.03647. [DOI] [PMC free article] [PubMed]
  16. Douc, R,, Moulines, E. 2008. Limit theorems for weighted samples with applications to sequential Monte Carlo methods. Ann. Stat. 36:2344–2376. [Google Scholar]
  17. Doucet, A.,, Johansen, A.M. 2009. A tutorial on particle filtering and smoothing: Fifteen years later.“ In: The Oxford handbook of nonlinear filtering. Oxford: Oxford University Press; p 656–704. [Google Scholar]
  18. Felsenstein, J. 2004. Inferring phylogenies, vol. 2 Sunderland (MA): Sinauer Associates Sunderland. [Google Scholar]
  19. Fourment, M.,, Claywell, B.C.,, Dinh, V.,, McCoy, C.,, Matsen IV, F.A.,, Darling, A. E. 2017. Effective online Bayesian phylogenetics via sequential Monte Carlo with guided proposals. bioRxiv, submitted toSyst. Biol. 145219. https://www.biorxiv.org/content/early/2017/06/02/145219. [DOI] [PMC free article] [PubMed]
  20. Gardy, J.,, Loman, N.J.,, Rambaut, A. 2015. Real-time digital pathogen surveillance—the time is now. Genome Biol. 16:155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Heath, T.A.,, Hedtke, S.M.,, Hillis, D.M. 2008. Taxon sampling and the accuracy of phylogenetic analyses. J. Systemat. Evol. 46:239–257. [Google Scholar]
  22. Izquierdo-Carrasco, F.,, Cazes, J.,, Smith, S.A.,, Stamatakis, A. 2014. PUmPER: phylogenies updated perpetually. Bioinformatics 30:1476–1477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Katoh, K.,, Standley, D.M. 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30:772–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kong, A.,, Liu, J.S.,, Wong, W.H. 1994. Sequential imputations and Bayesian missing data problems. J. Am. Stat. Assoc. 89:278–288. [Google Scholar]
  25. Künsch, H. R. 2005. Recursive Monte Carlo filters: algorithms and theoretical analysis. Ann. Stat. 33:1983–2021. [Google Scholar]
  26. Lemey, P.,, Rambaut, A.,, Drummond, A.J.,, Suchard, M.A. 2009. Bayesian phylogeography finds its roots. PLoS Comput. Biol. 5:e1000520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Liu, J.S.,, Chen, R. 1995. Blind deconvolution via sequential imputations. J. Am. Stat. Assoc. 90:567–576. [Google Scholar]
  28. Matsen, F.,, Kodner, R.,, Armbrust, E.V. 2010. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics11:538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Neher, R.A.,, Bedford, T. 2015. nextflu: real-time tracking of seasonal influenza virus evolution in humans. Bioinformatics. 31:3546–3548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Oudjane, N.,, Rubenthaler, S. 2005. Stability and uniform particle approximation of nonlinear filters in case of non ergodic signals. Stoch. Anal. Appl. 23:421–448. [Google Scholar]
  31. Quick, J.,, Loman, N.J.,, Duraffour, S.,, Simpson, J.T.,, Severi, E.,, Cowley, L.,, Bore, J.A.,, Koundouno, R.,, Dudas, G.,, Mikhail, A.,, Ouédraogo, N.,, Afrough, B.,, Bah, A.,, Baum, J.H.J.,, Becker-Ziaja, B.,, Boettcher, J.P.,, Cabeza-Cabrerizo, M.,, Camino-Sánchez, A.,, Carter, L.L.,, Doerrbecker, J.,, Enkirch, T.,, García-Dorival, I.,, Hetzelt, N.,, Hinzmann, J.,, Holm, T.,, Kafetzopoulou, L.E.,, Koropogui, M.,, Kosgey, A.,, Kuisma, E.,, Logue, C.H.,, Mazzarelli, A.,, Meisel, S.,, Mertens, M.,, Michel, J.,, Ngabo, D.,, Nitzsche, K.,, Pallasch, E.,, Patrono, L.V.,, Portmann, J.,, Repits, J.G.,, Rickett, N.Y.,, Sachse, A.,, Singethan, K.,, Vitoriano, I.,, Yemanaberhan, R.L.,, Zekeng, E.G.,, Racine, T.,, Bello, A.,, Sall, A.A.,, Faye, O.,, Faye, O.,, Magassouba, N.,, Williams, C.V.,, Amburgey, V.,, Winona, L.,, Davis, E.,, Gerlach, J.,, Washington, F.,, Monteil, V.,, Jourdain, M.,, Bererd, M.,, Camara, A.,, Somlare, H.,, Camara, A.,, Gerard, M.,, Bado, G.,, Baillet, B.,, Delaune, D.,, Nebie, K.Y.,, Diarra, A.,, Savane, Y.,, Pallawo, R.B.,, Gutierrez, G.J.,, Milhano, N.,, Roger, I.,, Williams, C.J.,, Yattara, F.,, Lewandowski, K.,, Taylor, J.,, Rachwal, P.,, Turner, D.J.,, Pollakis, G.,, Hiscox, J.A.,, Matthews, D.A.,, O’Shea, M. K., Johnston, A.M.,, Wilson, D.,, Hutley, E.,, Smit, E.,, Di Caro, A.,, Wölfel, R.,, Stoecker, K.,, Fleischmann, E.,, Gabriel, M.,, Weller, S.A.,, Koivogui, L.,, Diallo, B.,, Keïta, S.,, Rambaut, A.,, Formenty, P.,, Günther, S., Carroll, M. W. 2016. Real-time, portable genome sequencing for Ebola surveillance. Nature 530:228–232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Ronquist, F.,, Teslenko, M.,, Van Der Mark, P.,, Ayres, D.L.,, Darling, A.,, Höhna, S.,, Larget, B.,, Liu, L.,, Suchard, M.A.,, Huelsenbeck, J.P. 2012. Mrbayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61:539–542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Snyder, C.,, Bengtsson, T.,, Bickel, P.,, Anderson, J. 2008. Obstacles to high-dimensional particle filtering. Mon. Weather Rev. 136:4629–4640. [Google Scholar]
  34. Suchard, M.A.,, Redelings, B.D. 2006. BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22:2047–2048. [DOI] [PubMed] [Google Scholar]
  35. Wang, L.,, Bouchard-Côté, A.,, Doucet, A. 2015. Bayesian phylogenetic inference using a combinatorial sequential Monte Carlo method. J. Am. Stat. Assoc. 110:1362–1374. [Google Scholar]

Articles from Systematic Biology are provided here courtesy of Oxford University Press

RESOURCES