Adaptive dating and fast proposals: Revisiting the phylogenetic relaxed clock model

Jordan Douglas; Rong Zhang; Remco Bouckaert

doi:10.1371/journal.pcbi.1008322

. 2021 Feb 2;17(2):e1008322. doi: 10.1371/journal.pcbi.1008322

Adaptive dating and fast proposals: Revisiting the phylogenetic relaxed clock model

Jordan Douglas ^1,^2,^*, Rong Zhang ^1,², Remco Bouckaert ^1,^2,³

Editor: Roger Dimitri Kouyos⁴

PMCID: PMC7880504 PMID: 33529184

Abstract

Relaxed clock models enable estimation of molecular substitution rates across lineages and are widely used in phylogenetics for dating evolutionary divergence times. Under the (uncorrelated) relaxed clock model, tree branches are associated with molecular substitution rates which are independently and identically distributed. In this article we delved into the internal complexities of the relaxed clock model in order to develop efficient MCMC operators for Bayesian phylogenetic inference. We compared three substitution rate parameterisations, introduced an adaptive operator which learns the weights of other operators during MCMC, and we explored how relaxed clock model estimation can benefit from two cutting-edge proposal kernels: the AVMVN and Bactrian kernels. This work has produced an operator scheme that is up to 65 times more efficient at exploring continuous relaxed clock parameters compared with previous setups, depending on the dataset. Finally, we explored variants of the standard narrow exchange operator which are specifically designed for the relaxed clock model. In the most extreme case, this new operator traversed tree space 40% more efficiently than narrow exchange. The methodologies introduced are adaptive and highly effective on short as well as long alignments. The results are available via the open source optimised relaxed clock (ORC) package for BEAST 2 under a GNU licence (https://github.com/jordandouglas/ORC).

Author summary

Biological sequences, such as DNA, accumulate mutations over generations. By comparing such sequences in a phylogenetic framework, the evolutionary tree of lifeforms can be inferred and historic divergence dates can be estimated. With the overwhelming availability of biological sequence data, and the increasing affordability of collecting new data, the development of fast and efficient phylogenetic algorithms is more important than ever. In this article we focus on the relaxed clock model, which is very popular in phylogenetics. We explored how a range of optimisations can improve the statistical inference of the relaxed clock. This work has produced a phylogenetic setup which can infer parameters related to the relaxed clock up to 65 times faster than previous setups, depending on the dataset. The methods introduced adapt to the dataset during computation and are highly efficient when processing long biological sequences.

Introduction

The molecular clock hypothesis states that the evolutionary rates of biological sequences are approximately constant through time [1]. This assumption forms the basis of clock-model-based phylogenetics, under which the evolutionary trees and divergence dates of life forms are inferred from biological sequences, such as nucleic and amino acids [2, 3]. In Bayesian phylogenetics, these trees and their associated parameters are estimated as probability distributions [4–6]. Statistical inference can be performed by the Markov chain Monte Carlo (MCMC) algorithm [7, 8] using platforms such as BEAST [9], BEAST 2 [10], MrBayes [11], and RevBayes [12].

The simplest phylogenetic clock model—the strict clock—makes the mathematically convenient assumption that the evolutionary rate is constant across all lineages [4, 5, 13]. However, molecular substitution rates are known to vary over time, over population sizes, over evolutionary pressures, and over nucleic acid replicative machineries [14–16]. Moreover, any given dataset could be clock-like (where substitution rates have a small variance across lineages) or non clock-like (a large variance). In the latter case, a strict clock is probably not suitable.

This led to the development of relaxed (uncorrelated) clock models, under which each branch in the phylogenetic tree has its own molecular substitution rate [3]. Branch rates can be drawn from a range of probability distributions including log-normal, exponential, gamma, and inverse-gamma distributions [3, 17, 18]. This class of models is widely used, and has aided insight into many recent biological problems, including the 2016 Zika virus outbreak [19] and the COVID-19 pandemic [20].

Finally, although not the focus of this article, the class of correlated clock models assumes some form of auto-correlation between rates over time. The correlation itself can invoke a range of stochastic models, including compound Poisson [21] and CIR processes [17], Bayesian parametric models [22], or it can exist as a series of local clocks [23, 24]. While these models may be more biologically realistic than uncorrelated models [17], due to their correlated and discrete natures the time required for MCMC to achieve convergence can be cumbersome, particularly for larger datasets [24]. In the remainder of this paper we only consider uncorrelated relaxed clock models.

With the overwhelming availability of biological sequence data, the development of efficient Bayesian phylogenetic methods is more important than ever. The performance of MCMC is dependent not only on computational performance but also the efficacy of an MCMC setup to achieve convergence. A critical task therein lies the further advancement of MCMC operators. Recent developments in this area include the advancement of guided tree proposals [25–27], coupled MCMC [28, 29], adaptive multivariate transition kernels [30], and other explorative proposal kernels such as the Bactrian and mirror kernels [31, 32]. In the case of relaxed clocks, informed tree proposals can account for correlations between substitution rates and divergence times [33]. The rate parameterisation itself can also affect the ability to “mix” during MCMC [3, 18, 33].

While a range of advanced operators and other MCMC optimisation methods have arisen over the years, there has yet to be a large scale performance benchmarking of such methods as applied to the relaxed clock model. In this article, we systematically evaluate how the relaxed clock model can benefit from i) adaptive operator weighting, ii) substitution rate parameterisation, iii) Bactrian proposal kernels [31], iv) tree operators which account for correlations between substitution rates and times, and v) adaptive multivariate operators [30]. The discussed methods are implemented in the ORC package and compared using BEAST 2 [10].

Models and methods

Preliminaries

Let $T$ be a binary rooted time tree with N taxa, and data D associated with the tips, such as a multiple sequence alignment with L sites, morphological data, or geographic locations. The posterior density of a phylogenetic model is described by

\begin{matrix} p (T, {\vec{R}}^{}, σ, μ_{C}, θ | D) \propto p (D | T, r ({\vec{R}}^{}), μ_{C}, θ) p (T | θ) p ({\vec{R}}^{} | σ) p (σ) p (μ_{C}) p (θ) \end{matrix}

(1)

where σ and μ_C represent clock model related parameters, and $p (T | θ)$ is the tree prior where θ describes parameters related to the tree branching or coalescent process. The tree likelihood $p (D | T, r ({\vec{R}}^{}), μ_{C}, θ)$ has μ_C as the overall clock rate and ${\vec{R}}^{}$ is an abstracted vector of branch rates which is transformed into real rates by function $r ({\vec{R}}^{})$ . Branch rates have a mean of 1 under the prior to avoid non-identifiability with the clock rate μ_C. Three methods of representing rates as ${\vec{R}}^{}$ are presented in Branch rate parameterisations.

Let t_i be the height (time) of node i. Each node i in $T$ , except for the root, is associated with a parental branch length τ_i (the height difference between i and its parent) and a parental branch substitution rate $r_{i} = r (R_{i})$ . In an uncorrelated relaxed clock model, each of the 2N − 2 elements in ${\vec{R}}^{}$ are independently distributed under the prior $p ({\vec{R}}^{} | σ)$ .

The posterior distribution is sampled by the Metropolis-Hastings-Green MCMC algorithm [7, 8, 34], under which the probability of accepting proposed state x′ from state x is equal to:

\begin{matrix} α (x^{'} | x) = min (1, \frac{p (x^{'} | D)}{p (x | D)} \frac{q (x | x^{'})}{q (x^{'} | x)} | J |) \end{matrix}

(2)

where q(x′|x) is the transition kernel: the probability of proposing state x′ from state x. The ratio between the two $\frac{q (x | x^{'})}{q (x^{'} | x)}$ is known as the Hastings ratio [8]. The determinant of the Jacobian matrix |J|, known as the Green ratio, solves the dimension-matching problem for proposals which operate on multiple terms across one or more spaces [34, 35].

Branch rate parameterisations

In Bayesian inference, the way parameters are represented in the model can affect the mixing ability of the model and the meaning of the model itself [36]. Three methods for parameterising substitution rates are described below. Each parameterisation is associated with i) an abstraction of the branch rate vector ${\vec{R}}^{}$ , ii) some function for transforming this parameter into unabstracted branch rates $r ({\vec{R}}^{})$ , and iii) a prior density function of the abstraction $p ({\vec{R}}^{} | σ)$ . The three methods are summarised in Fig 1.

Fig 1 — Top left: the prior density of a branch rate r under a Log-normal(−0.5σ², σ) distribution (with its mean fixed at 1). The function for transforming $R$ into branch rates $r (R)$ is depicted for *real* (top right), *cat* (bottom left), and *quant* (bottom right). For visualisation purposes, there are only 10 bins/pieces displayed, however in practice we use 2N − 2 bins for *cat* and 100 pieces for *quant*. The first and final *quant* pieces are equal to the underlying function (solid lines) however the pieces in between use linear approximations of this function (dashed lines).

1. Real rates

The natural (and unabstracted) parameterisation of a substitution rate is a real number $R_{i} \in R, R_{i} > 0$ which is equal to the rate itself. Under the real parameterisation:

r ({\vec{R}}^{}) = {\vec{R}}^{} .

(3)

Under a log-normal clock prior $p ({\vec{R}}^{} | σ)$ , rates are distributed with a mean of 1:

\begin{matrix} p (R_{i} | σ) = \frac{1}{R_{i} σ \sqrt{2 π}} exp (- \frac{{(ln R_{i} - μ)}^{2}}{2 σ^{2}}) \end{matrix}

(4)

where μ = −0.5σ² is set such that the expected value is 1. In this article we only consider log-normal clock priors, however the methods discussed are general.

Zhang and Drummond 2020 introduced a series of tree operators which propose node heights and branch rates, such that the resulting genetic distances (r_i × τ_i) remain constant [33]. These operators account for correlations between branch rates and branch times. By keeping the genetic distance of each branch constant, the likelihood is unaltered by the proposal.

2. Categories

The category parameterisation cat is an abstraction of the real parameterisation. Each of the 2N − 2 branches are assigned an integer from 0 to n − 1:

\begin{matrix} {\vec{R}}^{} \in & {0, 1, \dots, n - 1}^{2 N - 2} . \end{matrix}

(5)

These integers correspond to n rate categories (Fig 1). Let f(x|σ) be the probability density function (PDF) and let $F (x | σ) = \int_{0}^{x} f (t | σ) d t$ be the cumulative distribution function (CDF) of the prior distribution used by the underlying real clock model (a log-normal distribution for the purposes of this article). In the cat parameterisation, f(x|σ) is discretised into n bins and each element within ${\vec{R}}^{}$ points to one such bin. The rate of each bin is equal to its median value:

r (R_{i}) = F^{- 1} (\frac{R_{i} + 0.5}{n}),

(6)

where F⁻¹ is the inverse cumulative distribution function (i-CDF). The domain of ${\vec{R}}^{}$ is uniformly distributed under the prior:

\begin{matrix} p (R_{i} | σ) = p (R_{i}) = \frac{1}{n} . \end{matrix}

(7)

The key advantage of the cat parameterisation is the removal of a term from the posterior density (Eq 1), or more accurately the replacement of a non-trivial $p ({\vec{R}}^{} | σ)$ term with that of a uniform prior. This may facilitate efficient exploration of the posterior distribution by MCMC.

This parameterisation has been widely used in BEAST and BEAST 2 analyses [3]. However, the recently developed constant distance operators—which are incompatible with the cat parameterisation—can yield an increase in mixing rate under real by up to an order of magnitude over that of cat, depending on the dataset [33].

3. Quantiles

Finally, rates can be parameterised as real numbers describing the rate’s quantile with respect to some underlying clock model distribution [18]. Under the quant parameterisation, each of the 2N − 2 elements in ${\vec{R}}^{}$ are uniformly distributed.

{\vec{R}}^{} \in R^{2 N - 2}, 0 < R_{i} < 1

(8)

p (R_{i} | σ) = p (R_{i}) = 1

(9)

Transforming these quantiles into rates invokes the i-CDF of the underlying real clock model distribution. Evaluation of the log-normal i-CDF is has high computational costs and therefore an approximation is used instead.

r (R_{i}) = {\hat{F}}^{- 1} (R_{i})

(10)

where ${\hat{F}}^{- 1}$ is a linear piecewise approximation with 100 pieces. While this approach has clear similarities with cat, the domain of rates here is continuous instead of discrete. In this project we extended the family of constant distance operators [33] so that they are compatible with quant. Further details on the quant piecewise approximation and constant distance operators can be found in S1 Appendix.

Clock model operators

The weight of an operator determines the probability of the operator being selected. Weights are typically fixed throughout MCMC. In BEAST 2, operators can have their own tunable parameter s, which determines the step size of the operator. This term is tuned over the course of MCMC by achieving a target acceptance rate, typically 0.234 [10, 37, 38]. We define clock model operators as those which generate proposals for either ${\vec{R}}^{}$ or σ. Pre-existing BEAST 2 clock model operators are summarised in Table 1, and further operators are introduced throughout this paper.

Table 1. Summary of pre-existing BEAST 2 operators.

Operator	Description	Parameters	Parameterisations
`RandomWalk`	Moves a single element by a tunable amount.	${\vec{R}}^{}, σ$	cat, real, quant
`Scale`	Applies `RandomWalk` on the log-transformation (suitable for parameters with positive domains).	${\vec{R}}^{}, σ$	cat, real, quant
`Interval`	Applies `RandomWalk` on the logit-transformation (suitable for parameters with upper and lower limits).	${\vec{R}}^{}$	quant
`Swap`	Swaps two random elements within the vector [3].	${\vec{R}}^{}$	cat, real, quant
`Uniform`	Resamples one element in the vector from a uniform distribution.	${\vec{R}}^{}$	cat, quant
`ConstantDistance`	Adjusts an internal node height and recalculates all incident branch rates such that the genetic distances remain constant [33].	${\vec{R}}^{}, t$	real, quant
`SimpleDistance`	Applies `ConstantDistance` to the root node [33].	${\vec{R}}^{}, t$	real, quant
`SmallPulley`	Proposes new branch rates incident to the root such that their combined genetic distance is constant [33].	${\vec{R}}^{}$	real, quant
`CisScale`	Applies `Scale` to σ. Then recomputes all rates such that their quantiles are constant (for real [33]) or recomputes all quantiles such that their rates are constant (quant).	${\vec{R}}^{}, σ$	real, quant

Open in a new tab

Summary of pre-existing BEAST 2 operators, which apply to either branch rates ${\vec{R}}^{}$ or the clock standard deviation σ, and the substitution rate parameterisation they apply to. ConstantDistance and SimpleDistance also adjust node heights t.

The family of constant distance operators (ConstantDistance, SimpleDistance, and SmallPulley [33]) are best suited for larger datasets (or datasets with strong signal) where the likelihood distribution is peaked (Fig 1). While simple one dimensional operators such as RandomWalk or Scale must take small steps in order to stay “on the ridge” of the likelihood function, the constant distance operators “wander along the ridge” by ensuring that genetic distances are constant after the proposal.

Scale and CisScale both operate on the clock model standard deviation σ however they behave differently in the real and quant parameterisations (Fig 2). In real, large proposals of σ → σ′ made by Scale could incur large penalties in the clock model prior density $p ({\vec{R}}^{} | σ^{'})$ and thus may be rejected quite often. This led to the development of the fast clock scaler [33] (herein referred to as CisScale). This operator recomputes all branch rates ${\vec{R}}^{} \to {\vec{R}}^{'}$ such that their quantiles under the new clock model prior remain constant $p ({\vec{R}}^{} | σ) = p ({\vec{R}}^{'} | σ^{'})$ . In contrast, a proposal made by Scale σ → σ′ under the quant parameterisation implicitly alters all branch rates $r ({\vec{R}}^{})$ while leaving the quantiles ${\vec{R}}^{}$ themselves constant. Whereas, application of CisScale under quant results in all quantiles being recomputed $\vec{R} \to {\vec{R}}^{'}$ such that their rates are constant, i.e. $r ({\vec{R}}^{}) = r ({\vec{R}}^{'})$ . In summary, Scale and CisScale propose rates/quantiles in the opposite (trans) or same (cis) space that the clock model is parameterised under (Fig 2).

Fig 2 — The two operators above propose a clock standard deviation σ → σ′. Then, either the new quantiles are such that the rates remain constant (“New quantiles”, above) or the new rates are such that the quantiles remain constant (“New rates”). In the *real* parameterisation, these two operators are known as `Scale` and `CisScale`, respectively. Whereas, in *quant*, they are known as `CisScale` and `Scale`.

Adaptive operator weighting

It is not always clear which operator weighting scheme is best for a given dataset. In this article we introduce AdaptiveOperatorSampler—a meta-operator which learns the weights of other operators during MCMC and then samples these operators according to their learned weights. This meta-operator undergoes three phases. In the first phase (burn-in), AdaptiveOperatorSampler samples from its set of sub-operators uniformly at random. In the second phase (learn-in), the meta-operator starts learning several terms detailed below whilst continuing to sample operators uniformly at random. In its final phase, AdaptiveOperatorSampler samples operators (denoted by ω) using the following distribution:

\begin{matrix} p (ω) \propto {\begin{matrix} 1 & with probability Ω \\ \frac{1}{T (ω)} \sum_{p \in POI} \sum_{x \in accepts (ω)} D (x_{p}, x_{p}^{'}) & with probability 1 - Ω \end{matrix} \end{matrix}

(11)

where $T (ω)$ is the cumulative computational time spent on each operator, $D$ is a distance function, and we use Ω = 0.01 to allow any sub-operator to be sampled regardless of its performance. The parameters of interest (POI) may be either a set of numerical parameters (such as branch rates or node heights), or it may be the tree itself, but it cannot be both in its current form. The distance between state x_p and its (accepted) proposal $x_{p}^{'}$ with respect to parameter p is determined by

\begin{matrix} D (x_{p}, x_{p}^{'}) = {\begin{matrix} RF {(x_{p}, x_{p}^{'})}^{2} & if p is a tree \\ \frac{1}{| p |} [\frac{‖ x_{p} - x_{p}^{'} ‖}{σ_{p}}]^{2} & if p is numerical \end{matrix} \end{matrix}

(12)

where RF is the Robinson-Foulds tree distance [39], and |p| is the number of dimensions of numerical parameter p (1 for σ, 2N − 2 for ${\vec{R}}^{}$ , and 2N − 1 for node heights t). The remaining terms are trained during the second and third phases: the sample standard deviation σ_p of each numerical parameter of interest p, the cumulative computational runtime spent on each operator $T (ω)$ , and the summed distances $\sum_{x} D (x_{p}, x_{p}^{'})$ .

Under Eqs 11 and 12, operators which effect larger changes on the parameters of interest, in shorter runtime, are sampled with greater probabilities. Division of the squared distance by a parameter’s sample variance $σ_{p}^{2}$ enables comparison between numerical parameters which exist in different spaces.

Datasets which contain very poor signal (or small L) are likely to mix better when more weight is placed on bold operators (Fig 3). We therefore introduce the SampleFromPrior( ${\vec{x}}^{}$ ) operator. This operator resamples ψ randomly selected elements within vector ${\vec{x}}^{}$ from their prior distributions, where $ψ \sim Binomial (n = | {\vec{x}}^{} |, p = \frac{s}{| {\vec{x}}^{} |})$ for tunable term s. SampleFromPrior is included among the set of operators under AdaptiveOperatorSampler and serves to make the boldest proposals for datasets with poor signal.

In this article we apply three instances of the AdaptiveOperatorSampler meta-operator to the real, cat, and quant parameterisations. These are summarised in Table 2.

Table 2. Summary of `AdaptiveOperatorSampler` operators and their parameters of interest (POI).

Meta-operator	POI	Operators
`AdaptiveOperatorSampler(σ)`	σ	`CisScale`( $σ, {\vec{R}}^{}$ )
		`RandomWalk(σ)`
		`Scale(σ)`
		`SampleFromPrior(σ)`
`AdaptiveOperatorSampler`( ${\vec{R}}^{}$ )	${\vec{R}}^{}, t$	`ConstantDistance`( ${\vec{R}}^{}, t$ )
		`RandomWalk`( ${\vec{R}}^{}$ )
		`Scale`( ${\vec{R}}^{}$ )
		`Interval`( ${\vec{R}}^{}$ )
		`Swap`( ${\vec{R}}^{}$ )
		`SampleFromPrior`( ${\vec{R}}^{}$ )
`AdaptiveOperatorSampler(root)`	${\vec{R}}^{}, t$	`SimpleDistance`( ${\vec{R}}^{}, t$ )
`AdaptiveOperatorSampler(root)`	${\vec{R}}^{}, t$	`SmallPulley`( ${\vec{R}}^{}, t$ )
`AdaptiveOperatorSampler(leaf)`	${\vec{R}}_{leaf}^{}, t$	`ConstantDistance`( ${\vec{R}}_{leaf}^{}, t$ )
		`LeafAVMVN`( ${\vec{R}}_{leaf}^{}$ )
		`RandomWalk`( ${\vec{R}}_{leaf}^{}$ )
		`Scale`( ${\vec{R}}_{leaf}^{}$ )
		`Interval`( ${\vec{R}}_{leaf}^{}$ )
		`Swap`( ${\vec{R}}_{leaf}^{}$ )
		`SampleFromPrior`( ${\vec{R}}_{leaf}^{}$ )
`AdaptiveOperatorSampler(internal)`	${\vec{R}}_{int}^{}, t$	`ConstantDistance`( ${\vec{R}}_{int}^{}, t$ )
		`RandomWalk`( ${\vec{R}}_{int}^{}$ )
		`Scale`( ${\vec{R}}_{int}^{}$ )
		`Interval`( ${\vec{R}}_{int}^{}$ )
		`Swap`( ${\vec{R}}_{int}^{}$ )
		`SampleFromPrior`( ${\vec{R}}_{int}^{}$ )
`AdaptiveOperatorSampler(NER)`	$T$	NER{}
`AdaptiveOperatorSampler(NER)`	$T$	$NER {D_{A E}, D_{B E}, D_{C E}}$

Open in a new tab

Different operators are applicable to different substitution rate parameterisations (Table 1). Nodes are broken down into regions to enable operators to be weighted according to their dimension. AdaptiveOperatorSampler(root) applies the root-targeting constant distance operators only [33] while AdaptiveOperatorSampler( ${\vec{R}}^{}$ ) targets all rates and all nodes heights t. Nodes are further broken down into leaf rate ${\vec{R}}_{leaf}^{}$ and internal node rate ${\vec{R}}_{int}^{}$ operators. This facilitates suitable weighting of the LeafAVMVN operator, which is only applicable to leaf nodes (4. Screening operators for acceptance rate using simulated data.). In this setup, the RandomWalk(x), Scale(x), and SampleFromPrior(x) operators apply to the corresponding set of branch rates x, whereas ConstantDistance(x, t) is only applicable to internal nodes which have at least one child of type $x \in {{\vec{R}}_{leaf}^{}, {\vec{R}}_{int}^{}}$ . Finally, the Robinson-Foulds distance between trees before and after every proposal accept is used to train the weights behind AdaptiveOperatorSampler(NER) (see Narrow exchange rate). In the special case of NER proposals, the RF distance is always equal to 1.

Bactrian proposal kernel

The step size of a proposal kernel q(x′|x) should be such that the proposed state x′ is sufficiently far from the current state x to explore vast areas of parameter space, but not so large that the proposal is rejected too often [37]. Operators which attain an acceptance probability of 0.234 are often considered to have arrived at a suitable midpoint between these two extremes [10, 37]. The standard uniform distribution kernel has recently been challenged by the family of Bactrian kernels [31, 32]. The (Gaussian) Bactrian(m) distribution is defined as the sum of two normal distributions:

\begin{matrix} Σ \sim Bactrian (m) \equiv \frac{1}{2} Normal (- m, 1 - m^{2}) + \frac{1}{2} Normal (m, 1 - m^{2}) \end{matrix}

(13)

where 0 ≤ m < 1 describes modality. When m = 0, the Bactrian distribution is equivalent to Normal(0, 1). As m approaches 1, the distribution becomes increasingly bimodal (Fig 4). Yang et al. 2013 [31] demonstrate that Bactrian(m = 0.95) yields a proposal kernel which traverses the posterior distribution more efficiently than the standard uniform kernel, by placing minimal probability on steps which are too small or too large. In this case, a target acceptance probability of around 0.3 is optimal.

Fig 4 — The step size made under a Bactrian proposal kernel is equal to sΣ where Σ is drawn from the above distribution and s is tunable.

In this article we compare the abilities of uniform and Bactrian(0.95) proposal kernels at estimating clock model parameters. The clock model operators which these proposal kernels apply to are described in S1 Appendix.

Narrow exchange rate

The NarrowExchange operator [41], used widely in BEAST [9, 42] and BEAST 2 [10], is similar to nearest neighbour interchange [43], and works as follows (Fig 5):

Step 1. Sample an internal/root node E from tree $T$ , where E has grandchildren.
Step 2. Identify the child of E with the greater height. Denote this child as D and its sibling as C (i.e. t_D > t_C). If D is a leaf node, then reject the proposal.
Step 3. Randomly identify the two children of D as A and B.
Step 4. Relocate the B − D branch onto the C − E branch, so that B and C become siblings and their parent is D. All node heights remain constant.

We hypothesised that if NarrowExchange was adapted to the relaxed clock model by ensuring that genetic distances remain constant after the proposal (analogous to constant distance operators [33]), then its ability to traverse the state space may improve.

Here, we present the NarrowExchangeRate (NER) operator. Let r_A, r_B, r_C, and r_D be the substitution rates of nodes A, B, C, and D, respectively. In addition to the modest topological change applied by NarrowExchange, NER also proposes new branch rates ${r_{A}}^{'}$ , ${r_{B}}^{'}$ , ${r_{C}}^{'}$ , and ${r_{D}}^{'}$ . While NER does not alter t_D (i.e. ${t_{D}}^{'} \leftarrow t_{D}$ ), we also consider NERw—a special case of the NER operator which embarks t_D on a random walk:

\begin{matrix} {t_{D}}^{'} \leftarrow t_{D} + s Σ \end{matrix}

(14)

for random walk step size sΣ where s is tunable and Σ is drawn from a uniform or Bactrian distribution. NER (and NERw) are compatible with both the real and quant parameterisations. Analogous to the ConstantDistance operator, the proposed rates ensure that the genetic distances between nodes A, B, C, and E are constant. Let $D_{i j}$ be the constraint defined by a constant genetic distance between nodes i and j before and after the proposal. There are six pairwise distances between these four nodes and therefore there are six such constraints:

\begin{matrix} D_{A B} : & r_{A} (t_{D} - t_{A}) + r_{B} (t_{D} - t_{B}) = \\ {r_{A}}^{'} (t_{E} - t_{A}) + {r_{D}}^{'} (t_{E} - {t_{D}}^{'}) + {r_{B}}^{'} ({t_{D}}^{'} - t_{B}) \end{matrix}

(15)

\begin{matrix} D_{A C} : & r_{A} (t_{D} - t_{A}) + r_{D} (t_{E} - t_{D}) + r_{C} (t_{E} - t_{C}) = \\ {r_{A}}^{'} (t_{E} - t_{A}) + {r_{D}}^{'} (t_{E} - {t_{D}}^{'}) + {r_{C}}^{'} ({t_{D}}^{'} - t_{C}) \end{matrix}

(16)

\begin{matrix} D_{A E} : & r_{A} (t_{D} - t_{A}) + r_{D} (t_{E} - t_{D}) = \\ {r_{A}}^{'} (t_{E} - t_{A}) \end{matrix}

(17)

\begin{matrix} D_{B C} : & r_{B} (t_{D} - t_{B}) + r_{D} (t_{E} - t_{D}) + r_{C} (t_{E} - t_{D}) = \\ {r_{B}}^{'} ({t_{D}}^{'} - t_{B}) + {r_{C}}^{'} ({t_{D}}^{'} - t_{C}) \end{matrix}

(18)

\begin{matrix} D_{B E} : & r_{B} (t_{D} - t_{B}) + r_{D} (t_{E} - t_{D}) = \\ {r_{B}}^{'} ({t_{D}}^{'} - t_{B}) + {r_{D}}^{'} (t_{E} - {t_{D}}^{'}) \end{matrix}

(19)

\begin{matrix} D_{C E} : & r_{C} (t_{E} - t_{C}) = \\ {r_{C}}^{'} ({t_{D}}^{'} - t_{C}) + {r_{D}}^{'} (t_{E} - {t_{D}}^{'}) \end{matrix}

(20)

Further constraints are imposed by the model itself:

\begin{matrix} r_{i}, {r_{i}}^{'} > 0 for i \in {A, B, C, D} \end{matrix}

(21)

\begin{matrix} t_{B}, t_{C} < {t_{D}}^{'} < t_{E} . \end{matrix}

(22)

Unfortunately, there is no solution to all six $D_{i j}$ constraints unless non-positive rates or illegal trees are permitted. Therefore instead of conserving all six pairwise distances, NER conserves a subset of distances. It is not immediately clear which subset should be conserved.

Automated generation of operators and constraint satisfaction

The total space of NER operators consists of all possible subsets of distance constraints (i.e. ${}, {D_{A B}}, {D_{A C}}, \dots, {D_{A B}, D_{A C}, D_{A E}, D_{B C}, D_{B E}, D_{C E}}$ ) that are solvable. The simplest NER kernel—the null operator denoted by NER{}—does not satisfy any distance constraints and is equivalent to NarrowExchange. To determine which NER variants have the best performance, we developed an automated pipeline for generating and testing these operators.

1. Solution finding

Using standard analytical linear-system solving libraries in MATLAB [44], the 2⁶ = 64 subsets of distance constraints were solved. 54 of the 64 subsets were found to be solvable, and the unsolvables were discarded.

2. Solving Jacobian determinants

The determinant of the Jacobian matrix J is required for computing the Green ratio of this proposal. J is defined as

\begin{matrix} J & = [\begin{matrix} \frac{\partial {r_{A}}^{'}}{\partial r_{A}} & \frac{\partial {r_{A}}^{'}}{\partial r_{B}} & \frac{\partial {r_{A}}^{'}}{\partial r_{C}} & \frac{\partial {r_{A}}^{'}}{\partial r_{D}} & \frac{\partial {r_{A}}^{'}}{\partial t_{D}} \\ \frac{\partial {r_{B}}^{'}}{\partial r_{A}} & \frac{\partial {r_{B}}^{'}}{\partial r_{B}} & \frac{\partial {r_{B}}^{'}}{\partial r_{C}} & \frac{\partial {r_{B}}^{'}}{\partial r_{D}} & \frac{\partial {r_{B}}^{'}}{\partial t_{D}} \\ \frac{\partial {r_{C}}^{'}}{\partial r_{A}} & \frac{\partial {r_{C}}^{'}}{\partial r_{B}} & \frac{\partial {r_{C}}^{'}}{\partial r_{C}} & \frac{\partial {r_{C}}^{'}}{\partial r_{D}} & \frac{\partial {r_{C}}^{'}}{\partial t_{D}} \\ \frac{\partial {r_{D}}^{'}}{\partial r_{A}} & \frac{\partial {r_{D}}^{'}}{\partial r_{B}} & \frac{\partial {r_{D}}^{'}}{\partial r_{C}} & \frac{\partial {r_{D}}^{'}}{\partial r_{D}} & \frac{\partial {r_{D}}^{'}}{\partial t_{D}} \\ \frac{\partial {t_{D}}^{'}}{\partial r_{A}} & \frac{\partial {t_{D}}^{'}}{\partial r_{B}} & \frac{\partial {t_{D}}^{'}}{\partial r_{C}} & \frac{\partial {t_{D}}^{'}}{\partial r_{D}} & \frac{\partial {t_{D}}^{'}}{\partial t_{D}} \end{matrix}] . \end{matrix}

(23)

Solving the determinant |J| invokes standard analytical differentiation and linear algebra libraries of MATLAB. 6 of the 54 solvable operators were found to have |J| = 0, corresponding to irreversible proposals, and were discarded.

3. Automated generation of BEAST 2 operators

Java class files were generated using string processing. Each class corresponded to a single operator, extended the class of a meta-NER-operator, and consisted of the solutions found in 1 and the Jacobian determinant found in 2. |J| is further augmented if the quant parameterisation is employed (S1 Appendix). One such operator is expressed in Algorithm 1 and a second in S1 Appendix.

Algorithm 1 The $NER {D_{A E}, D_{B E}, D_{C E}}$ operator.

1: procedure proposal(t_A, t_B, t_C, t_D, t_E, r_A, r_B, r_C, r_D)

3: sΣ ← getRandomWalkSize() ▹Random walk size is 0 unless this is NERw

4: $t_{D}^{'} \leftarrow t_{D} + s Σ$ ▹ Propose new node height for D

6: $r_{A}^{'} \leftarrow \frac{r_{A} (t_{D} - t_{A}) + r_{D} (t_{E} - t_{D})}{t_{E} - t_{A}}$ ▹Propose new rates

7: $r_{B}^{'} \leftarrow \frac{r_{B} (t_{D} - t_{B}) + r_{D} (t_{D}^{'} - t_{D})}{t_{D}^{'} - t_{B}}$

8: $r_{C}^{'} \leftarrow \frac{r_{C} (t_{E} - t_{C}) - r_{D} (t_{E} - t_{D}^{'})}{t_{D}^{'} - t_{C}}$

9: $r_{D}^{'} \leftarrow r_{D}$

10:

11: $| J | \leftarrow \frac{(t_{D} - t_{A}) (t_{D} - t_{B}) (t_{E} - t_{C})}{(t_{E} - t_{A}) (t_{D}^{'} - t_{B}) (t_{D}^{'} - t_{C})}$ ▹Calculate Jacobian determinant

12: return $(r_{A}^{'}, r_{B}^{'}, r_{C}^{'}, r_{D}^{'}, t_{D}^{'}, | J |)$

4. Screening operators for acceptance rate using simulated data

Selecting the best NER variant to proceed to benchmarking on empirical data (Results) was determined by performing MCMC on simulated data, measuring the acceptance rates of each of the 96 NER/NERw variants, and comparing them with the null operator NER{} / NarrowExchange. In total, there were 300 simulated datasets each with N = 30 taxa and varying alignment lengths.

These experiments showed that NER variants which satisfied the genetic distances between nodes B and A (i.e. $D_{A B}$ ) or between B and C (i.e. $D_{B C}$ ) usually performed worse than the standard NarrowExchange operator (Fig 6). This is an intuitive result. If there is high uncertainty in the positioning of B with respect to A and C, then there is no value in respecting either of these distance constraints, and the proposals made to the rates may often be too extreme or the Green ratio |J| too small for the proposal to be accepted.

Fig 6 also revealed a cluster of NER variants which—under the conditions of the simulation—performed better than the null operator NER{} around 25% of the time and performed worse around 10% of the time. One such operator was $NER {D_{A E}, D_{B E}, D_{C E}}$ (Algorithm 1). This variant conserves the genetic distance between nodes A, B, C and their grandparent E. This operator performed well when branch rates had a large variance (σ > 0.5), corresponding to non clock-like data. On the other hand, the null operator NER{} performed better on shorter sequences (L < 1kb) with weaker signal. Overall, $NER {D_{A E}, D_{B E}, D_{C E}}$ outperformed the standard NarrowExchange operator when the data was not clock-like and contained sufficient signal.

Finally, this initial screening showed that applying a (Bactrian) random walk to the node height t_D made the operator worse. This effect was most dominant for the NER variants which satisfied distance constraints (i.e. the operators which are not NER{}).

Although there were several operators which behaved equivalently during this initial screening process, we selected $NER {D_{A E}, D_{B E}, D_{C E}}$ to proceed to benchmarking (Results). Due to the apparent sensitivity of NER operators to the data, we introduce the adaptive operator AdaptiveOperatorSampler(NER) which allows the operator scheme to fall back on the standard NarrowExchange in the event of $NER {D_{A E}, D_{B E}, D_{C E}}$ performing poorly (Table 2).

An adaptive leaf rate operator

The adaptable variance multivariate normal (AVMVN) kernel learns correlations between parameters during MCMC [30, 42]. Baele et al. 2017 observed a large increase (≈ 5 − 10×) in sampling efficiency from using the AVMVN kernel substitution model parameters [30]. Here, we consider application of the AVMVN kernel to the branch rates of leaf nodes. This operator, referred to as LeafAVMVN, is not readily applicable to internal node branch rates due to their dependencies on tree topology.

Leaf rate AVMVN kernel

The AVMVN kernel assumes its parameters live in $x \in R^{N}$ for taxon count N and that these parameters follow a multivariate normal distribution with covariance matrix Σ_N. Hence, the kernel operates on the logarithmic or logistic transformation of the N leaf branch rates, depending on the rate parameterisation:

\begin{matrix} x = {\begin{matrix} log r for real \\ log \frac{q}{1 - q} for quant \end{matrix} \end{matrix}

(24)

where r is a real rate and q is a rate quantile. The AVMVN probability density is defined by

\begin{matrix} AVMVN (\vec{x}) = MVN (\vec{x}, (1 - β) \frac{Σ_{N}}{N} + β \frac{I_{N}}{N}), \end{matrix}

(25)

where $MVN$ is the multivariate normal probability density. β = 0.05 is a constant which determines the fraction of the proposal determined by the identity matrix $I_{N}$ , as opposed to the covariance matrix Σ_D which is trained during MCMC. Our BEAST 2 implementation of the AVMVN kernel is adapted from that of BEAST [42].

LeafAVMVN has the advantage of operating on all N leaf rates simultaneously (as well as learning their correlations), as opposed to ConstantDistance which operates on at most 2, or Scale which operates on at most 1 leaf rate at a time. As the size of the covariance matrix Σ_N grows with the number of taxa N, LeafAVMVN is likely to be less efficient with larger taxon sets. Therefore, the weight behind this operator is learned by AdaptiveOperatorSampler.

To prevent the learned weight behind LeafAVMVN from dominating the AdaptiveOperatorSampler weighting scheme and therefore inhibiting the mixing of internal node rates, we introduce the AdaptiveOperatorSampler(leaf) and AdaptiveOperatorSampler(internal) meta-operators which operate exclusively on leaf node rates ${\vec{R}}_{leaf}^{}$ and internal node rates ${\vec{R}}_{int}^{}$ respectively (Table 2). The former employs the LeafAVMVN operator and learns its weight during MCMC (after providing it sufficient time to learn Σ_N).

Model specification and MCMC settings

In all phylogenetic analyses presented here, we use a Yule [45] tree prior $p (T | λ)$ with birth rate λ ∼ Log-normal(1, 1.25). Here and throughout the article, a Log-normal(a, b) distribution is parameterised such that a and b are the mean and standard deviation in log-space. The clock standard deviation has a σ ∼ Gamma(0.5396, 0.3819) prior. Datasets are partitioned into subsequences, where each partition is associated with a distinct HKY substitution model [46]. The transition-transversion ratio κ ∼ Log-normal(1, 1.25), the four nucleotide frequencies (f_A, f_C, f_G, f_T)∼Dirichlet(10, 10, 10, 10), and the relative clock rate μ_C ∼ Log-normal(−0.18, 0.6) are estimated independently for each partition. The operator scheme ensures that the clock rates μ_C have a mean of 1 across all partitions. This avoids non-identifiability with branch substitution rates. To enable rapid benchmarking of larger datasets we use BEAGLE for high-performance tree likelihood calculations [47] and coupled MCMC with four chains for efficient mixing [29]. The neighbour joining tree [48] is used as the initial state in each MCMC chain.

Throughout the article, we have introduced four new operators. These are summarised in Table 3.

Table 3. Summary of clock model operators introduced throughout this article.

Operator	Description	Parameters
`AdaptiveOperatorSampler`	Samples sub-operators proportionally to their weights, which are learned (see Adaptive operator weighting).	${\vec{R}}^{}, σ, t T$
`SampleFromPrior`	Resamples a random number of elements from their prior (see Adaptive operator weighting).	${\vec{R}}^{}, σ$
`NarrowExchangeRate`	Moves a branch and recomputes branch rates so that their genetic distances are constant (see Narrow exchange rate).	${\vec{R}}^{}, T$
`LeafAVMVN`	Proposes new rates for all leaves in one move (see An adaptive leaf rate operator) [30].	${\vec{R}}^{}$

Open in a new tab

Pre-existing clock model operators are summarised in Table 1. t denotes node heights while $T$ denotes the whole tree, including topology.

In Table 4, we define all operator configurations which are benchmarked throughout Results.

Table 4. Operator configurations and the substitution rate parameterisations which each operator is applicable to.

Configuration	Operator	Weight	real	cat	quant
nocons	`RandomWalk( ${\vec{R}}^{}$ )`	10	✓	✓
	`Scale( ${\vec{R}}^{}$ )`	10	✓
	`Uniform`( ${\vec{R}}^{}$ )	10		✓	✓
	`Interval`( ${\vec{R}}^{}$ )	10			✓
	`Swap`( ${\vec{R}}^{}$ )	10	✓	✓	✓
	`Scale(σ)`	10	✓	✓	✓
cons	`ConstantDistance`( ${\vec{R}}^{}, t$ )	$20 \times \frac{2 N - 2}{2 N - 1}$	✓		✓
	`SimpleDistance`( ${\vec{R}}^{}, t$ )	$\frac{20}{2} \times \frac{1}{2 N - 1}$	✓		✓
	`SmallPulley`( ${\vec{R}}^{}$ )	$\frac{20}{2} \times \frac{1}{2 N - 1}$	✓		✓
	`RandomWalk`( ${\vec{R}}^{}$ )	5	✓
	`Scale`( ${\vec{R}}^{}$ )	2.5	✓
	`Uniform`( ${\vec{R}}^{}$ )	5			✓
	`Interval`( ${\vec{R}}^{}$ )	2.5			✓
	`Swap`( ${\vec{R}}^{}$ )	2.5	✓		✓
	`CisScale`( $σ, {\vec{R}}^{}$ )	10	✓
	`Scale(σ)`	10			✓
adapt	`AdaptiveOperatorSampler(σ)`	10	✓	✓	✓
	`AdaptiveOperatorSampler`( ${\vec{R}}^{}$ )	$30 \times \frac{2 N - 2}{2 N - 1}$	✓		✓
	`AdaptiveOperatorSampler`( ${\vec{R}}^{}$ )	30		✓
	`AdaptiveOperatorSampler(root)`	$30 \times \frac{1}{2 N - 1}$	✓		✓
AVMVN	`AdaptiveOperatorSampler(σ)`	10	✓		✓
	`AdaptiveOperatorSampler(leaf)`	$30 \times \frac{N}{2 N - 1}$	✓		✓
	`AdaptiveOperatorSampler(internal)`	$30 \times \frac{N - 2}{2 N - 1}$	✓		✓
	`AdaptiveOperatorSampler(root)`	$30 \times \frac{1}{2 N - 1}$	✓		✓
NER{}	`NarrowExchange`	15	✓	✓	✓
NER	`AdaptiveOperatorSampler(NER)`	15	✓		✓

Open in a new tab

Within each configuration (and substitution rate parameterisation), the weight behind ${\vec{R}}^{}$ sums to 30, the weight of σ is equal to 10, and the weight of NER is equal to 15. Operators which apply to specific node sets (root, internal, leaf, or all) are weighted according to leaf count N. The adaptive operators are further broken down in Table 2. All other operators (i.e. those which apply to which apply to other terms in the state such as the nucleotide substitution model) are held constant within each dataset.

Results

To avoid an intractably large search space, the five targets for clock model improvement were evaluated sequentially in the following order: Adaptive operator weighting, Branch rate parameterisations, Bactrian proposal kernel, Narrow exchange rate, and An adaptive leaf rate operator. The four operators introduced in these sections are summarised in Table 3. The setting which was considered to be the best in each step was then incorporated into the following step. This protocol and its outcomes are summarised in Fig 7.

Fig 7 — Each area (detailed in Models and methods) is optimised sequentially, and the best setting from each step is used when optimising the following step.

Methodologies were assessed according to the following criteria.

Validation. This was assessed by measuring the coverage of all estimated parameters in well-calibrated simulation studies. These are presented in S2 Appendix and give confidence operators are implemented correctly.
Mixing of parameters. Key parameters were evaluated for the number of effective samples generated per hour (ESS/hr). These key parameters were the likelihood $L$ and prior p densities, tree length l (i.e. the sum of all branch lengths), mean branch rate $\bar{r}$ , branch rate of all leaf nodes r, and relaxed clock standard deviation σ. We also included the HKY substitution model term κ. The mixing of κ should not be strongly affected by any of the clock model operators, and thus it served as a positive control in each experiment.

Methodologies were benchmarked using one simulated and eight empirical datasets [49–56]. The latter were compiled by Lanfear as “benchmark alignments” (Table 5) [57, 58]. Each methodology was benchmarked for million-states-per-hour using the Intel Xeon Gold 6138 CPU (2.00 GHz). These terms were multiplied by the ESS-per-state across 20 replicates on the New Zealand eScience Infrastructure (NeSI) cluster to compute the total ESS/hr of each dataset under each setting. All methodologies used identical models and operator configurations, except where a difference is specified.

Table 5. Benchmark datasets, sorted in increasing order of taxon count N.

	N	P	L (kb)	L_unq (kb)	$\hat{σ}$	Description
1	38	16	15.5	10.5	0.44	Seed plants (Ran 2018 [49])
2	44	7	5.9	1.8	0.30	Squirrel Fishes (Dornburg 2012 [50])
3	44	3	1.9	0.8	0.27	Bark beetles (Cognato 2001 [51])
4	51	6	5.4	1.8	0.52	Southern beeches (Sauquet 2011 [52])
5	61	8	6.9	4.3	0.53	Bony fishes (Broughton 2013 [53])
6	70	3	2.2	0.9	0.19	Caterpillars (Kawahara 2013 [54])
7	80	1	10.0	0.9	0.82	Simulated data
8	94	4	2.2	1	0.34	Bees (Rightmyer 2013 [55])
9	106	1	0.8	0.5	0.37	Songbirds (Moyle 2016 [56])

Open in a new tab

Number of partitions P, total alignment length L, and number of unique site patterns L_unq in the alignment are specified. Clock standard deviation estimates $\hat{σ}$ are of moderate magnitude, suggesting that most of these datasets are not clock-like.

Round 1: A simple operator-weight learning algorithm greatly improved performance

We compared the nocons, cons, and adapt operator configurations (Table 4). nocons contained all of the standard BEAST 2 operator configurations and weightings for real, cat, and quant. cons additionally contained (cons)tant distance operators and employed the same operator weighting scheme used previously [33] (real and quant only). Finally, the adapt configuration combined all of the above applicable operators, as well as the simple-but-bold SampleFromPrior operator, and learned the weights of each operator using the AdaptiveOperatorSampler.

This experiment revealed that nocons usually performed better than cons on smaller datasets (i.e. small L) while cons consistently performed better on larger datasets (Fig 8 and S1 Fig). This result is unsurprising (Fig 3). Furthermore, the adapt setup dramatically improved mixing for real by finding the right balance between cons and nocons. This yielded an ESS/hr (averaged across all 9 datasets) 95% faster than cons and 520% faster than nocons, with respect to leaf branch rates, and 620% and 190% faster for σ. Similar results were observed with quant. However, adapt neither helped nor harmed cat, suggesting that the default operator weighting scheme was sufficient.

Fig 8 — Top left, top right, bottom left: each plot compares the ESS/hr (±1 standard error) across two operator configurations. Bottom right: the effect of sequence length L on operator weights learned by `AdaptiveOperatorSampler`. Both sets of observations are fit by logistic regression models. The benchmark datasets are displayed in Table 5. The *cat* and *quant* settings are evaluated in S1 Fig.

This experiment also revealed that the standard Scale operator was preferred over CisScale for the real configuration. Averaged across all datasets, the learned weights behind these two operators were 0.47 and 0.03. This was due to the computationally demanding nature of CisScale which invokes the i-CDF function. In contrast, the performance of Scale and CisScale were more similar under the quant configuration and were weighted at 0.28 and 0.45. For both real and quant, proposals which altered quantiles, while leaving the rates constant (Scale and CisScale respectively), were preferred.

Overall, the AdaptiveOperatorSampler operator was included in all subsequent rounds in the tournament.

Round 2: The real parameterisation yielded the fastest mixing

In Round 1 we selected the best configuration for each of the three rate parameterisations described in Branch rate parameterisations, and in Round 2 we compared the three to each other. adapt (real) and adapt (quant) both employed constant distance tree operators [33] and both used the AdaptiveOperatorSampler operator to learn clock model operator weights. Clock model operators weights were also learned in the adapt (cat) configuration.

This experiment showed that the real parameterisation greatly outperformed cat on most datasets and most parameters (Fig 9). This disparity was strongest for long alignments. In the most extreme case, leaf substitution rates r and clock standard deviation σ both mixed around 50× faster on the 15.5 kb seed plant dataset (Ran et al. 2018 [49]) for real than they did for cat. The advantages in using constant distance operators would likely be even stronger for larger L. Furthermore, real outperformed quant on most datasets, but this was mostly due to the slow computational performance of quant compared with real, as opposed to differences in mixing prowess (Fig 10). Irrespective of mixing ability, the adapt (real) configuration had the best computational performance and generated samples 40% faster than adapt (cat) and 60% faster than adapt (quant).

Fig 10 — The computational time required for a setting to sample a single state is divided by that of the nocons (*cat*) configuration. The geometric mean under each configuration, averaged across all 9 datasets, is displayed as a horizontal bar.

Overall, we determined that real, and its associated operators, made the best parameterisation covered here and it proceeded to the following rounds of benchmarking. This outcome allowed Rounds 3, 4, and 5 to commence. If cat dominated in this round, then the operators benchmarked in Rounds 4 and 5 would not be applicable to the parameterisation and the Bactrian proposal kernel benchmarked in Round 3 would not be applicable to branch rate categories in its present form.

Round 3: Bactrian proposal kernels were around 15% more efficient than uniform kernels

We benchmarked the adapt (real) configuration with a) standard uniform proposal kernels, and b) Bactrian(0.95) kernels [31]. These kernels applied to all clock model operators. These results confirmed that the Bactrian kernel yields faster mixing than the standard uniform kernel (Fig 11). All relevant continuous parameters considered had an ESS/hr, averaged across the 9 datasets, between 15% and 20% faster compared with the standard uniform kernel. Although the Bactrian proposal made little-to-no difference to the caterpillar dataset (Kawahara and Rubinoff 2013 [54]), every other dataset did in fact benefit. Bactrian proposal kernels proceeded to round 4 of the relaxed clock model optimisation protocol.

Fig 11 — The ESS/hr (±1 s.e.) under the Bactrian configuration, divided by that under the uniform kernel, is shown in the y-axis for each dataset and relevant parameter. Horizontal bars show the geometric mean under each parameter.

Round 4: NER operators outperformed on larger datasets

Our initial screening of the NarrowExchangeRate (NER) operators revealed that the $NER {D_{A E}, D_{B E}, D_{C E}}$ operator outperformed the standard NarrowExchange / NER{} operator about 25% of the time on simulated data, however it was also very sensitive to the dataset. Therefore we wrapped up the two operators (NER{} and $NER {D_{A E}, D_{B E}, D_{C E}}$ ) within an AdaptiveOperatorSampler operator so that the appropriate weights could be learned. In this round we benchmarked the Bactrian + adapt (real) setting with the adaptive NER operator (Table 4). The benchmark datasets are fairly non clock-like and therefore could potentially benefit from NER (Table 5).

Our experiments confirmed that $NER {D_{A E}, D_{B E}, D_{C E}}$ was indeed superior on larger datasets (where L > 5kb; Fig 12). While there was no significant difference in the ESS/hr of continuous parameters, $NER {D_{A E}, D_{B E}, D_{C E}}$ did have an acceptance rate 41% higher than that of the standard NarrowExchange operator in the most extreme case (the bony fish alignment by Broughton et al. [53]). The moderate variance in branch substitution rates ( $\hat{σ} = 0.5$ ), coupled with a long alignment (7kb), and high topological uncertainty (Fig 12) made this dataset the perfect target. Every acceptance of a branch rearrangement proposal yields a new topology and thus facilitates traversal of tree space.

In contrast, the standard NarrowExchange operator outperformed on smaller datasets. The new operator was not always helpful and sometimes it even hindered performance. Use of an adaptive operator (AdaptiveOperatorSampler) removes the burden from the user in making the decision of which operator to use. The AdaptiveOperatorSampler(NER) operator proceeded into the final round of the tournament.

Round 5: The AVMVN leaf rate operator was computationally demanding and improved mixing very slightly

We tested the applicability of the AVMVN kernel to leaf rate proposals. This operator exploits any correlations which exist between leaf branch substitution rates. To do this, we wrapped the LeafAVMVN operator within an AdaptiveOperatorSampler (Table 2). The two configurations compared here were a) adapt + Bactrian + NER (real) and b) AVMVN + NER + Bactrian + adapt (real) (Table 4).

These results showed that the AVMVN operator yielded slightly better mixing (around 6% faster) for the tree likelihood, the tree length, and the mean branch rate (Fig 13). However, it also produced slightly slower mixing for κ, reflecting the high computational costs associated with the LeafAVMVN operator (Fig 10). The learned weight of the LeafAVMVN operator was quite small (ranging from 1 to 8% across all datasets), again reflecting its costly nature, but also reinforcing the value in having an adaptive weight operator which penalises slow operators. The LeafAVMVN operator provided some, but not much, benefit in its current form.

Fig 13 — See Fig 11 caption for figure notation.

Overall, we determined that the AVMVN operator configuration was the final winner of the tournament, however its performance benefits were minor and therefore the computational complexities introduced by the LeafAVMVN operator may not be worth the trouble.

Tournament conclusion

In conjunction with all settings which came before it, the tournament winner outperformed both the historical cat configuration [3] as well as the recently developed cons (real) scheme [33]. Averaged across all datasets, this configuration yielded a relaxed clock mixing rate between 1.2 and 13 times as fast as cat and between 1.8 and 7.8 times as fast as cons (real), depending on the parameter. For the largest dataset considered (seed plants by Ran et al. 2018 [49]), the new settings were up to 66 and 37 times as fast respectively. This is likely to be even more extreme for larger alignments.

Discussion

Modern operator design

Adaptability and advanced proposal kernels, such as Bactrian kernels, are increasingly prevalent in MCMC operator design [60–63]. Adaptive operators undergo training to improve their efficiency over time [64]. In previous work, the conditional clade probabilities of neighbouring trees have served as the basis of adaptive tree operators [26, 27]. Proposal step sizes can be tuned during MCMC [38]. The mirror kernel learns a target distribution which acts as a “mirror image” of the current point [32]. The AVMVN operator learns correlations between numerical parameters in order to traverse the joint posterior distribution efficiently [30].

Here, we introduced an adaptive operator which learns the weights of other operators, by using a target function that rewards operators which bring about large changes to the state and penalises operators which exhibit poor computational runtime (Eq 11). We demonstrated how learning the operator weightings, on a dataset-by-dataset basis, can improve mixing by up to an order of magnitude. We also demonstrated the versatility of this operator by applying it to a variety of settings. Assigning operator weights is an important task in Bayesian MCMC inference and the use of such an operator can relieve some of the burden from the person making this decision. However, this operator is no silver bullet and it must be used in a way that maintains ergodicity within the search space [64].

We also found that a Bactrian proposal kernel quite reliably increased mixing efficiency by 15–20% (Fig 11). Similar observations were made by Yang et al. [31]. While this may only be a modest improvement, incorporation of the Bactrian kernels into pre-existing operators is a computationally straightforward task and we recommend implementing them in Bayesian MCMC software packages.

Traversing tree space

In this article we introduced the family of narrow exchange rate operators (Fig 5). These operators are built on top of the narrow exchange operator and are specifically designed for the (uncorrelated) relaxed clock model, by accounting for the correlation which exists between branch lengths and branch substitution rates. This family consists of 48 variants, each of which conserves a unique subset of genetic distances before and after the proposal. While most of these operators turned out to be worse than narrow exchange, a small subset were more efficient, but only on large datasets.

Lakner et al. 2008 categorised tree operators into two classes. “Branch-rearrangement” operators relocate a branch and thus alter tree topology. Members of this class include narrow exchange, nearest neighbour interchange, and subtree-prune-and-regraft [43]. Whereas “branch-length” operators propose branch lengths, but can potentially alter the tree topology as a side-effect. Such operators include subtree slide [65], LOCAL [66], and continuous change [67]. Lakner et al. 2008 observed that topological proposals made by the former class consistently outperformed topological changes invoked by the latter [68].

We hypothesised that the increased efficiency behind narrow exchange rate operators could facilitate proposing internal node heights in conjunction with branch rearrangements. This would enable the efficient exploration of both topology and branch length spaces with a single proposal. Unfortunately, by incorporating a random walk on the height of the node being relocated, the acceptance rate of the operator declined dramatically (Fig 6). This decline was greater when more genetic distances were conserved.

These findings support Lakner’s hypothesis. The design of operators which are able to efficiently traverse topological and branch length spaces simultaneously remains an open problem.

Larger datasets require smarter operators

As signal within the dataset becomes stronger, the posterior distribution becomes increasingly peaked (Fig 3). This change in the posterior topology necessitates the use of operators which exploit known correlations in the posterior density; operators such as AVMVN [30], constant distance [33], and the narrow exchange rate operators introduced in this article.

We have shown that while the latter two operators are efficient on large alignments, they are also quite frequently outperformed by simple random walk operators on small alignments. For instance, we found that constant distance operators outperformed standard operator configurations by up to two orders of magnitude on larger datasets but they were up to three times slower on smaller ones (Fig 8). Similarly, our narrow exchange rate operators were up to 40% more efficient on large datasets but up to 10% less efficient on smaller ones (Fig 12).

This emphasises the value in our adaptive operator weighting scheme, which can ensure that operator weights are suitable for the size of the alignment. Given the overwhelming availability of sequence data, high performance on large datasets is more important than ever.

Future outlook

Based on these experiments, we have corroborated the results of Zhang and Drummond 2020 [33] by showing that the relaxed clock rate parameterisation can be greatly superior to the categories parameterisation. However this is conditional on the use of operators. Standard scale operators are ineffective in the real rate space on large datasets and constant distance operators tend to be the same on smaller ones. As the categories configuration is the current relaxed clock model default in both BEAST as well as BEAST 2, we therefore recommend researchers make the transition to the real rate parameterisation and employ an operator setup similar to that described in this article.

However, there remain several avenues for improvement. First, the use of a Weibull clock model prior, as opposed to a log-normal, could further decrease calculation time due to its closed-form inverse cumulative distribution function. Second, the leaf rate AVMVN operator presented in this article could be improved by proposing rates for only a small subset of correlated branch rates per proposal, as opposed to all at once. This would mitigate the computational burden associated with a covariance matrix which grows with the tree size (An adaptive leaf rate operator). Finally, while our adaptive weighting operator can learn operator weights within parameter spaces, it is not clear how to best assign weights between spaces; the relative weighting between clock model operators and tree topology operators for instance.

Conclusion

In this article, we delved into the highly correlated structure between substitution rates and divergence times of relaxed clock models, in order to develop MCMC operators which traverse its posterior space efficiently. We introduced a range of relaxed clock model operators and compared three molecular substitution rate parameterisations. These methodologies were compared by constructing phylogenetic models from several empirical datasets and comparing their abilities to converge in a tournament-like protocol (Fig 7). The methods introduced are adaptive, treat each dataset differently, and rarely perform worse than without adaptation. This work has produced an operator configuration which is highly effective on large alignments and can explore relaxed clock model space up to two orders of magnitude more efficiently than previous setups.

Supporting information

S1 Appendix. Rate quantiles and operators.

The linear piecewise approximation used in the quant parameterisation is described. Constant distance tree operators, CisScale, and NarrowExchangeRate are extended to the quant parameterisation. A second NER algorithm is specified.

(PDF)

Click here for additional data file.^{(179.8KB, pdf)}

S2 Appendix. Well-calibrated simulation studies.

Methodologies are validated using well-calibrated simulation studies.

(PDF)

Click here for additional data file.^{(411.6KB, pdf)}

S1 Fig. Round 1: Benchmarking adapt under the quant and cat parameterisations.

These results show that cat does not benefit from adaptive weight sampling. Whereas, adapt and cons both greatly improve the quant parameterisation for most datasets, as expected.

(PDF)

Click here for additional data file.^{(90.7KB, pdf)}

Acknowledgments

We wish to thank Alexei J. Drummond for his help during conception of the narrow exchange rate operators.

Data Availability

The work presented in this article was implemented as a user friendly open source BEAST 2 package with GUI support via BEAUti making it easy to set up an analysis. Instructions for using and installing this package can be found at https://github.com/jordandouglas/ORC.

Funding Statement

J.D. and R.B. were funded by Marsden grant 18-UOA-096. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Zuckerkandl E. Molecular disease, evolution, and genetic heterogeneity. Horizons in biochemistry. 1962; p. 189–225. [Google Scholar]
2. Douzery EJ, Delsuc F, Stanhope MJ, Huchon D. Local molecular clocks in three nuclear genes: divergence times for rodents and other mammals and incompatibility among fossil calibrations. Journal of Molecular Evolution. 2003;57(1):S201–S213. 10.1007/s00239-003-0028-x [DOI] [PubMed] [Google Scholar]
3. Drummond AJ, Ho SY, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS biology. 2006;4(5):e88 10.1371/journal.pbio.0040088 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Kuhner MK, Yamato J, Felsenstein J. Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics. 1995;140(4):1421–1430. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Larget B, Simon DL. Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Molecular biology and evolution. 1999;16(6):750–759. 10.1093/oxfordjournals.molbev.a026160 [DOI] [Google Scholar]
6. Mau B, Newton MA, Larget B. Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics. 1999;55(1):1–12. 10.1111/j.0006-341X.1999.00001.x [DOI] [PubMed] [Google Scholar]
7. Metropolis N. Equation of state calculations by fast computing machines. J Chem Phys. 1953;21:1087–1092. 10.1063/1.1699114 [DOI] [Google Scholar]
8. Hastings W. Monte-Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57:97–109. 10.1093/biomet/57.1.97 [DOI] [Google Scholar]
9. Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular biology and evolution. 2012;29(8):1969–1973. 10.1093/molbev/mss075 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, et al. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS computational biology. 2019;15(4):e1006650 10.1371/journal.pcbi.1006650 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic biology. 2012;61(3):539–542. 10.1093/sysbio/sys029 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Höhna S, Landis MJ, Heath TA, Boussau B, Lartillot N, Moore BR, et al. RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Systematic biology. 2016;65(4):726–736. 10.1093/sysbio/syw021 [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Zuckerkandl E, Pauling L. Evolutionary divergence and convergence in proteins In: Evolving genes and proteins. Elsevier; 1965. p. 97–166. [Google Scholar]
14. Gillespie JH. The causes of molecular evolution. vol. 2 Oxford University Press; On Demand; 1994. [Google Scholar]
15. Woolfit M. Effective population size and the rate and pattern of nucleotide substitutions. Biology letters. 2009;5(3):417–420. 10.1098/rsbl.2009.0155 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Loh E, Salk JJ, Loeb LA. Optimization of DNA polymerase mutation rates during bacterial evolution. Proceedings of the National Academy of Sciences. 2010;107(3):1154–1159. 10.1073/pnas.0912451107 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Lepage T, Bryant D, Philippe H, Lartillot N. A general comparison of relaxed molecular clock models. Molecular biology and evolution. 2007;24(12):2669–2680. 10.1093/molbev/msm193 [DOI] [PubMed] [Google Scholar]
18. Li WLS, Drummond AJ. Model averaging and Bayes factor calculation of relaxed molecular clocks in Bayesian phylogenetics. Molecular biology and evolution. 2012;29(2):751–761. 10.1093/molbev/msr232 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Faria NR, Quick J, Claro I, Theze J, de Jesus JG, Giovanetti M, et al. Establishment and cryptic transmission of Zika virus in Brazil and the Americas. Nature. 2017;546(7658):406–410. 10.1038/nature22401 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Giovanetti M, Benvenuto D, Angeletti S, Ciccozzi M. The first two cases of 2019-nCoV in Italy: Where they come from? Journal of medical virology. 2020;92(5):518–521. 10.1002/jmv.25699 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Huelsenbeck JP, Larget B, Swofford D. A compound Poisson process for relaxing the molecular clock. Genetics. 2000;154(4):1879–1892. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Thorne JL, Kishino H, Painter IS. Estimating the rate of evolution of the rate of molecular evolution. Molecular biology and evolution. 1998;15(12):1647–1657. 10.1093/oxfordjournals.molbev.a025892 [DOI] [PubMed] [Google Scholar]
23. Yoder AD, Yang Z. Estimation of primate speciation dates using local molecular clocks. Molecular Biology and Evolution. 2000;17(7):1081–1090. 10.1093/oxfordjournals.molbev.a026389 [DOI] [PubMed] [Google Scholar]
24. Drummond AJ, Suchard MA. Bayesian random local clocks, or one rate to rule them all. BMC biology. 2010;8(1):1–12. 10.1186/1741-7007-8-114 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Zhang C, Huelsenbeck JP, Ronquist F. Using parsimony-guided tree proposals to accelerate convergence in Bayesian phylogenetic inference. Systematic Biology. 2020. 10.1093/sysbio/syaa002 [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Meyer X. Adaptive Tree Proposals for Bayesian Phylogenetic Inference. BioRxiv. 2019; p. 783597. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Höhna S, Drummond AJ. Guided tree topology proposals for Bayesian phylogenetic inference. Systematic biology. 2012;61(1):1–11. 10.1093/sysbio/syr074 [DOI] [PubMed] [Google Scholar]
28. Altekar G, Dwarkadas S, Huelsenbeck JP, Ronquist F. Parallel metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics. 2004;20(3):407–415. 10.1093/bioinformatics/btg427 [DOI] [PubMed] [Google Scholar]
29. Müller NF, Bouckaert R. Adaptive Metropolis-coupled MCMC for BEAST 2. PeerJ. 2020. 10.7717/peerj.9473 [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Baele G, Lemey P, Rambaut A, Suchard MA. Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST. Bioinformatics. 2017;33(12):1798–1805. 10.1093/bioinformatics/btx088 [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Yang Z, Rodríguez CE. Searching for efficient Markov chain Monte Carlo proposal kernels. Proceedings of the National Academy of Sciences. 2013;110(48):19307–19312. 10.1073/pnas.1311790110 [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Thawornwattana Y, Dalquen D, Yang Z, et al. Designing simple and efficient Markov chain Monte Carlo proposal kernels. Bayesian Analysis. 2018;13(4):1037–1063. 10.1214/17-BA1084 [DOI] [Google Scholar]
33. Zhang R, Drummond A. Improving the performance of Bayesian phylogenetic inference under relaxed clock models. BMC Evolutionary Biology. 2020;20:1–28. 10.1186/s12862-020-01609-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995;82(4):711–732. 10.1093/biomet/82.4.711 [DOI] [Google Scholar]
35.Geyer CJ. The metropolis-hastings-green algorithm; 2003. [Google Scholar]
36. Gelman A. Parameterization and Bayesian modeling. Journal of the American Statistical Association. 2004;99(466):537–545. 10.1198/016214504000000458 [DOI] [Google Scholar]
37. Roberts GO, Gelman A, Gilks WR, et al. Weak convergence and optimal scaling of random walk Metropolis algorithms. The annals of applied probability. 1997;7(1):110–120. 10.1214/aoap/1034625254 [DOI] [Google Scholar]
38. Rosenthal JS, et al. Optimal proposal distributions and adaptive MCMC Handbook of Markov Chain Monte Carlo. 2011;4(10.1201). [Google Scholar]
39. Robinson DF, Foulds LR. Comparison of phylogenetic trees. Mathematical biosciences. 1981;53(1-2):131–147. 10.1016/0025-5564(81)90043-2 [DOI] [Google Scholar]
40. Jukes TH, Cantor CR, et al. Evolution of protein molecules. Mammalian protein metabolism. 1969;3:21–132. 10.1016/B978-1-4832-3211-9.50009-7 [DOI] [Google Scholar]
41. Drummond AJ, Nicholls GK, Rodrigo AG, Solomon W. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics. 2002;161(3):1307–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus evolution. 2018;4(1):vey016 10.1093/ve/vey016 [DOI] [PMC free article] [PubMed] [Google Scholar]
43. Semple C, Steel M, et al. Phylogenetics. vol. 24 Oxford University Press; on Demand; 2003. [Google Scholar]
44.Higham DJ, Higham NJ. MATLAB guide. SIAM; 2016.
45. Yule GU. II.—A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FR S. Philosophical transactions of the Royal Society of London Series B, containing papers of a biological character. 1925;213(402-410):21–87. 10.1098/rstb.1925.0002 [DOI] [Google Scholar]
46. Hasegawa M, Kishino H, Yano Ta. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of molecular evolution. 1985;22(2):160–174. 10.1007/BF02101694 [DOI] [PubMed] [Google Scholar]
47. Ayres DL, Darling A, Zwickl DJ, Beerli P, Holder MT, Lewis PO, et al. BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics. Systematic biology. 2012;61(1):170–173. 10.1093/sysbio/syr100 [DOI] [PMC free article] [PubMed] [Google Scholar]
48. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution. 1987;4(4):406–425. [DOI] [PubMed] [Google Scholar]
49. Ran JH, Shen TT, Wang MM, Wang XQ. Phylogenomics resolves the deep phylogeny of seed plants and indicates partial convergent or homoplastic evolution between Gnetales and angiosperms. Proceedings of the Royal Society B: Biological Sciences. 2018;285(1881):20181012 10.1098/rspb.2018.1012 [DOI] [PMC free article] [PubMed] [Google Scholar]
50. Dornburg A, Moore JA, Webster R, Warren DL, Brandley MC, Iglesias TL, et al. Molecular phylogenetics of squirrelfishes and soldierfishes (Teleostei: Beryciformes: Holocentridae): Reconciling more than 100 years of taxonomic confusion. Molecular Phylogenetics and Evolution. 2012;65(2):727–738. 10.1016/j.ympev.2012.07.020 [DOI] [PubMed] [Google Scholar]
51. Cognato AI, Vogler AP. Exploring Data Interaction and Nucleotide Alignment in a Multiple Gene Analysis of Ips (Coleoptera: Scolytinae). Systematic Biology. 2001;50(6):758–780. 10.1080/106351501753462803 [DOI] [PubMed] [Google Scholar]
52. Sauquet H, Ho SYW, Gandolfo MA, Jordan GJ, Wilf P, Cantrill DJ, et al. Testing the Impact of Calibration on Molecular Divergence Times Using a Fossil-Rich Group: The Case of Nothofagus (Fagales). Systematic Biology. 2011;61(2):289–313. 10.1093/sysbio/syr116 [DOI] [PubMed] [Google Scholar]
53. Broughton RE, Betancur-R R, Li C, Arratia G, Ortí G. Multi-locus phylogenetic analysis reveals the pattern and tempo of bony fish evolution. PLoS Currents. 2013. 10.1371/currents.tol.2ca8041495ffafd0c92756e75247483e [DOI] [PMC free article] [PubMed] [Google Scholar]
54. Kawahara AY, Rubinoff D. Convergent evolution of morphology and habitat use in the explosive Hawaiian fancy case caterpillar radiation. Journal of Evolutionary Biology. 2013;26(8):1763–1773. 10.1111/jeb.12176 [DOI] [PubMed] [Google Scholar]
55. Rightmyer MG, Griswold T, Brady SG. Phylogeny and systematics of the bee genus Osmia (Hymenoptera: Megachilidae) with emphasis on North American Melanosmia: subgenera, synonymies and nesting biology revisited. Systematic Entomology. 2013;38(3):561–576. 10.1111/syen.12013 [DOI] [Google Scholar]
56. Moyle RG, Oliveros CH, Andersen MJ, Hosner PA, Benz BW, Manthey JD, et al. Tectonic collision and uplift of Wallacea triggered the global songbird radiation. Nature Communications. 2016;7(1). 10.1038/ncomms12709 [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Lanfear R. BenchmarkAlignments https://github.com/roblanf/BenchmarkAlignments. GitHub. 2019.
58. Lanfear R, Frandsen PB, Wright AM, Senfeld T, Calcott B. PartitionFinder 2: new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses. Molecular biology and evolution. 2016;34(3):772–773. [DOI] [PubMed] [Google Scholar]
59. Douglas J. UglyTrees: a browser-based multispecies coalescent tree visualiser. Bioinformatics. 2020. 10.1093/bioinformatics/btaa679 [DOI] [PMC free article] [PubMed] [Google Scholar]
60. Haario H, Saksman E, Tamminen J, et al. An adaptive Metropolis algorithm. Bernoulli. 2001;7(2):223–242. 10.2307/3318737 [DOI] [Google Scholar]
61. Vihola M. Robust adaptive Metropolis algorithm with coerced acceptance rate. Statistics and Computing. 2012;22(5):997–1008. 10.1007/s11222-011-9269-5 [DOI] [Google Scholar]
62. Benson A, Friel N, et al. Adaptive MCMC for multiple changepoint analysis with applications to large datasets. Electronic Journal of Statistics. 2018;12(2):3365–3396. 10.1214/18-EJS1418 [DOI] [Google Scholar]
63. Davis A, Hauser J. Blocking borehole conductivity logs at the resolution of above-ground electromagnetic systems. Geophysics. 2020;85(2):E67–E77. 10.1190/geo2019-0095.1 [DOI] [Google Scholar]
64. Roberts GO, Rosenthal JS. Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. Journal of applied probability. 2007;44(2):458–475. 10.1017/S0021900200117954 [DOI] [Google Scholar]
65.Hohna S, Defoin-Platel M, Drummond AJ. Clock-constrained tree proposal operators in Bayesian phylogenetic inference. In: 2008 8th IEEE International Conference on BioInformatics and BioEngineering. IEEE; 2008. p. 1–7.
66.Simon D, Larget B. Bayesian analysis in molecular biology and evolution (BAMBE) http://www.mathcs.duq.edu/larget/bambe.html. Pittsburgh, Pennsylvania. 1998.
67. Jow H, Hudelot C, Rattray M, Higgs P. Bayesian phylogenetics using an RNA substitution model applied to early mammalian evolution. Molecular Biology and Evolution. 2002;19(9):1591–1601. 10.1093/oxfordjournals.molbev.a004221 [DOI] [PubMed] [Google Scholar]
68. Lakner C, Van Der Mark P, Huelsenbeck JP, Larget B, Ronquist F. Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. Systematic biology. 2008;57(1):86–103. 10.1080/10635150801886156 [DOI] [PubMed] [Google Scholar]

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008322.r001

Decision Letter 0

Roger Dimitri Kouyos, Stefano Allesina

29 Oct 2020

Dear Mr Douglas,

Thank you very much for submitting your manuscript "Adaptive dating and fast proposals: revisiting the phylogenetic relaxed clock model" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roger Dimitri Kouyos

Associate Editor

PLOS Computational Biology

Stefano Allesina

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: This paper describes a number of measures to improve the efficiency and performance of Bayesian phylogenetic analysis using relaxed molecular clocks. These measures are evaluated using a sequential approach, and benchmarked using synthetic data and eight empirical datasets. The authors find considerable improvements in MCMC efficiency, particularly for large datasets. One especially useful finding is that an adaptive operator weighting scheme can tune the operator weights to optimise performance for datasets of different sizes.

I have no major concerns about the parameterisations and operators that are proposed in the paper, but the paper itself needs a reduction in length. There is an excessive number of figures and tables, and the level of detail in the Methods and Results can be cut down.

In the Abstract, I suggest adding a brief explanation of uncorrelated relaxed clocks (for readers unfamiliar with the term).

In the Author Summary, I suggest mentioning molecular dating or divergence time estimation, which is the main aim when using relaxed clock models.

In the first paragraph of the Introduction, it should be noted that the molecular clock is only essential to molecular dating, but not to phylogenetic analysis in general. Clock models are rarely used in phylogenetic analysis unless one of the goals is to infer the divergence times. BEAST is a notable exception because it always implements some form of clock model. In contrast, the clock models in MrBayes are quite rarely used.

In the fourth paragraph of the Introduction, it would be reasonable to cite the work of Thorne et al. (1998, Mol Biol Evol), who presented the first Bayesian relaxed (autocorrelated) clock and also described a likelihood approximation to reduce computational burden. One other consideration is whether autocorrelated or uncorrelated models are more biologically relevant. This has been considered in a number of studies, including by Lepage et al. (2007, cited by the authors).

In the first paragraph of the Methods, presumably theta represents the parameters of the branching or coalescent model used to generate the tree prior. It would be useful to clarify this.

In the first paragraph of the section Clock Model Operators, please include an explanation of how the parameter s learned over the course of the MCMC. Is this tuned to achieve a target acceptance rate?

Combine some of the Tables listing operators?

Given the superior performance of the _real_ parameterisation for the rate, should this be adopted as the default in BEAST rather than _cat_?

Figure 13 is only mentioned briefly in the text and does not seem essential.

In the Discussion it would be helpful for the authors to describe some potential avenues for improving the proposed operator configuration.

Reviewer #2: Review is uploaded as an attachment

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: Numerical data from figures unavailable

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-materials-and-methods

Attachment

Submitted filename: Review Relaxed Clock Model.docx

Click here for additional data file.^{(14.3KB, docx)}

PLoS Comput Biol. 2021 Feb 2;17(2):e1008322. doi: 10.1371/journal.pcbi.1008322.r002

Author response to Decision Letter 0

5 Nov 2020

Attachment

Submitted filename: reviewer_response.pdf

Click here for additional data file.^{(82.1KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008322.r003

Decision Letter 1

Roger Dimitri Kouyos, Stefano Allesina

30 Nov 2020

Dear Mr Douglas,

We are pleased to inform you that your manuscript 'Adaptive dating and fast proposals: revisiting the phylogenetic relaxed clock model' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.

Best regards,

Roger Dimitri Kouyos

Associate Editor

PLOS Computational Biology

Stefano Allesina

Deputy Editor

PLOS Computational Biology

***********************************************************

Please address the few remaining issues regarding labelling and figure numbering (highlighted by reviewer 2) in the final version of the paper.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have addressed all of my previous comments. The new "Future Outlook" section at the end of the Discussion is particularly useful.

I have no further concerns about the study or further comments on the manuscript.

Reviewer #2: Dear authors, Thank you for the revision and responses. The authors have addressed well the reviewers' questions and comments.

Before the final submission, please note there are still several inconsistencies in the node height abbreviation (Table 2, 3, and 4) and a table number is missing (page 12).

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008322.r004

Acceptance letter

Roger Dimitri Kouyos, Stefano Allesina

27 Jan 2021

PCOMPBIOL-D-20-01606R1

Adaptive dating and fast proposals: revisiting the phylogenetic relaxed clock model

Dear Dr Douglas,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Alice Ellingham

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Appendix. Rate quantiles and operators.

(PDF)

Click here for additional data file.^{(179.8KB, pdf)}

S2 Appendix. Well-calibrated simulation studies.

Methodologies are validated using well-calibrated simulation studies.

(PDF)

Click here for additional data file.^{(411.6KB, pdf)}

S1 Fig. Round 1: Benchmarking adapt under the quant and cat parameterisations.

These results show that cat does not benefit from adaptive weight sampling. Whereas, adapt and cons both greatly improve the quant parameterisation for most datasets, as expected.

(PDF)

Click here for additional data file.^{(90.7KB, pdf)}

Attachment

Submitted filename: Review Relaxed Clock Model.docx

Click here for additional data file.^{(14.3KB, docx)}

Attachment

Submitted filename: reviewer_response.pdf

Click here for additional data file.^{(82.1KB, pdf)}

Data Availability Statement

[pcbi.1008322.ref001] 1. Zuckerkandl E. Molecular disease, evolution, and genetic heterogeneity. Horizons in biochemistry. 1962; p. 189–225. [Google Scholar]

[pcbi.1008322.ref002] 2. Douzery EJ, Delsuc F, Stanhope MJ, Huchon D. Local molecular clocks in three nuclear genes: divergence times for rodents and other mammals and incompatibility among fossil calibrations. Journal of Molecular Evolution. 2003;57(1):S201–S213. 10.1007/s00239-003-0028-x [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref003] 3. Drummond AJ, Ho SY, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS biology. 2006;4(5):e88 10.1371/journal.pbio.0040088 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref004] 4. Kuhner MK, Yamato J, Felsenstein J. Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics. 1995;140(4):1421–1430. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref005] 5. Larget B, Simon DL. Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Molecular biology and evolution. 1999;16(6):750–759. 10.1093/oxfordjournals.molbev.a026160 [DOI] [Google Scholar]

[pcbi.1008322.ref006] 6. Mau B, Newton MA, Larget B. Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics. 1999;55(1):1–12. 10.1111/j.0006-341X.1999.00001.x [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref007] 7. Metropolis N. Equation of state calculations by fast computing machines. J Chem Phys. 1953;21:1087–1092. 10.1063/1.1699114 [DOI] [Google Scholar]

[pcbi.1008322.ref008] 8. Hastings W. Monte-Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57:97–109. 10.1093/biomet/57.1.97 [DOI] [Google Scholar]

[pcbi.1008322.ref009] 9. Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular biology and evolution. 2012;29(8):1969–1973. 10.1093/molbev/mss075 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref010] 10. Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, et al. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS computational biology. 2019;15(4):e1006650 10.1371/journal.pcbi.1006650 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref011] 11. Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic biology. 2012;61(3):539–542. 10.1093/sysbio/sys029 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref012] 12. Höhna S, Landis MJ, Heath TA, Boussau B, Lartillot N, Moore BR, et al. RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Systematic biology. 2016;65(4):726–736. 10.1093/sysbio/syw021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref013] 13. Zuckerkandl E, Pauling L. Evolutionary divergence and convergence in proteins In: Evolving genes and proteins. Elsevier; 1965. p. 97–166. [Google Scholar]

[pcbi.1008322.ref014] 14. Gillespie JH. The causes of molecular evolution. vol. 2 Oxford University Press; On Demand; 1994. [Google Scholar]

[pcbi.1008322.ref015] 15. Woolfit M. Effective population size and the rate and pattern of nucleotide substitutions. Biology letters. 2009;5(3):417–420. 10.1098/rsbl.2009.0155 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref016] 16. Loh E, Salk JJ, Loeb LA. Optimization of DNA polymerase mutation rates during bacterial evolution. Proceedings of the National Academy of Sciences. 2010;107(3):1154–1159. 10.1073/pnas.0912451107 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref017] 17. Lepage T, Bryant D, Philippe H, Lartillot N. A general comparison of relaxed molecular clock models. Molecular biology and evolution. 2007;24(12):2669–2680. 10.1093/molbev/msm193 [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref018] 18. Li WLS, Drummond AJ. Model averaging and Bayes factor calculation of relaxed molecular clocks in Bayesian phylogenetics. Molecular biology and evolution. 2012;29(2):751–761. 10.1093/molbev/msr232 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref019] 19. Faria NR, Quick J, Claro I, Theze J, de Jesus JG, Giovanetti M, et al. Establishment and cryptic transmission of Zika virus in Brazil and the Americas. Nature. 2017;546(7658):406–410. 10.1038/nature22401 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref020] 20. Giovanetti M, Benvenuto D, Angeletti S, Ciccozzi M. The first two cases of 2019-nCoV in Italy: Where they come from? Journal of medical virology. 2020;92(5):518–521. 10.1002/jmv.25699 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref021] 21. Huelsenbeck JP, Larget B, Swofford D. A compound Poisson process for relaxing the molecular clock. Genetics. 2000;154(4):1879–1892. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref022] 22. Thorne JL, Kishino H, Painter IS. Estimating the rate of evolution of the rate of molecular evolution. Molecular biology and evolution. 1998;15(12):1647–1657. 10.1093/oxfordjournals.molbev.a025892 [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref023] 23. Yoder AD, Yang Z. Estimation of primate speciation dates using local molecular clocks. Molecular Biology and Evolution. 2000;17(7):1081–1090. 10.1093/oxfordjournals.molbev.a026389 [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref024] 24. Drummond AJ, Suchard MA. Bayesian random local clocks, or one rate to rule them all. BMC biology. 2010;8(1):1–12. 10.1186/1741-7007-8-114 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref025] 25. Zhang C, Huelsenbeck JP, Ronquist F. Using parsimony-guided tree proposals to accelerate convergence in Bayesian phylogenetic inference. Systematic Biology. 2020. 10.1093/sysbio/syaa002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref026] 26. Meyer X. Adaptive Tree Proposals for Bayesian Phylogenetic Inference. BioRxiv. 2019; p. 783597. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref027] 27. Höhna S, Drummond AJ. Guided tree topology proposals for Bayesian phylogenetic inference. Systematic biology. 2012;61(1):1–11. 10.1093/sysbio/syr074 [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref028] 28. Altekar G, Dwarkadas S, Huelsenbeck JP, Ronquist F. Parallel metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics. 2004;20(3):407–415. 10.1093/bioinformatics/btg427 [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref029] 29. Müller NF, Bouckaert R. Adaptive Metropolis-coupled MCMC for BEAST 2. PeerJ. 2020. 10.7717/peerj.9473 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref030] 30. Baele G, Lemey P, Rambaut A, Suchard MA. Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST. Bioinformatics. 2017;33(12):1798–1805. 10.1093/bioinformatics/btx088 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref031] 31. Yang Z, Rodríguez CE. Searching for efficient Markov chain Monte Carlo proposal kernels. Proceedings of the National Academy of Sciences. 2013;110(48):19307–19312. 10.1073/pnas.1311790110 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref032] 32. Thawornwattana Y, Dalquen D, Yang Z, et al. Designing simple and efficient Markov chain Monte Carlo proposal kernels. Bayesian Analysis. 2018;13(4):1037–1063. 10.1214/17-BA1084 [DOI] [Google Scholar]

[pcbi.1008322.ref033] 33. Zhang R, Drummond A. Improving the performance of Bayesian phylogenetic inference under relaxed clock models. BMC Evolutionary Biology. 2020;20:1–28. 10.1186/s12862-020-01609-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref034] 34. Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995;82(4):711–732. 10.1093/biomet/82.4.711 [DOI] [Google Scholar]

[pcbi.1008322.ref035] 35.Geyer CJ. The metropolis-hastings-green algorithm; 2003. [Google Scholar]

[pcbi.1008322.ref036] 36. Gelman A. Parameterization and Bayesian modeling. Journal of the American Statistical Association. 2004;99(466):537–545. 10.1198/016214504000000458 [DOI] [Google Scholar]

[pcbi.1008322.ref037] 37. Roberts GO, Gelman A, Gilks WR, et al. Weak convergence and optimal scaling of random walk Metropolis algorithms. The annals of applied probability. 1997;7(1):110–120. 10.1214/aoap/1034625254 [DOI] [Google Scholar]

[pcbi.1008322.ref038] 38. Rosenthal JS, et al. Optimal proposal distributions and adaptive MCMC Handbook of Markov Chain Monte Carlo. 2011;4(10.1201). [Google Scholar]

[pcbi.1008322.ref039] 39. Robinson DF, Foulds LR. Comparison of phylogenetic trees. Mathematical biosciences. 1981;53(1-2):131–147. 10.1016/0025-5564(81)90043-2 [DOI] [Google Scholar]

[pcbi.1008322.ref040] 40. Jukes TH, Cantor CR, et al. Evolution of protein molecules. Mammalian protein metabolism. 1969;3:21–132. 10.1016/B978-1-4832-3211-9.50009-7 [DOI] [Google Scholar]

[pcbi.1008322.ref041] 41. Drummond AJ, Nicholls GK, Rodrigo AG, Solomon W. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics. 2002;161(3):1307–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref042] 42. Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus evolution. 2018;4(1):vey016 10.1093/ve/vey016 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref043] 43. Semple C, Steel M, et al. Phylogenetics. vol. 24 Oxford University Press; on Demand; 2003. [Google Scholar]

[pcbi.1008322.ref044] 44.Higham DJ, Higham NJ. MATLAB guide. SIAM; 2016.

[pcbi.1008322.ref045] 45. Yule GU. II.—A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FR S. Philosophical transactions of the Royal Society of London Series B, containing papers of a biological character. 1925;213(402-410):21–87. 10.1098/rstb.1925.0002 [DOI] [Google Scholar]

[pcbi.1008322.ref046] 46. Hasegawa M, Kishino H, Yano Ta. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of molecular evolution. 1985;22(2):160–174. 10.1007/BF02101694 [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref047] 47. Ayres DL, Darling A, Zwickl DJ, Beerli P, Holder MT, Lewis PO, et al. BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics. Systematic biology. 2012;61(1):170–173. 10.1093/sysbio/syr100 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref048] 48. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution. 1987;4(4):406–425. [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref049] 49. Ran JH, Shen TT, Wang MM, Wang XQ. Phylogenomics resolves the deep phylogeny of seed plants and indicates partial convergent or homoplastic evolution between Gnetales and angiosperms. Proceedings of the Royal Society B: Biological Sciences. 2018;285(1881):20181012 10.1098/rspb.2018.1012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref050] 50. Dornburg A, Moore JA, Webster R, Warren DL, Brandley MC, Iglesias TL, et al. Molecular phylogenetics of squirrelfishes and soldierfishes (Teleostei: Beryciformes: Holocentridae): Reconciling more than 100 years of taxonomic confusion. Molecular Phylogenetics and Evolution. 2012;65(2):727–738. 10.1016/j.ympev.2012.07.020 [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref051] 51. Cognato AI, Vogler AP. Exploring Data Interaction and Nucleotide Alignment in a Multiple Gene Analysis of Ips (Coleoptera: Scolytinae). Systematic Biology. 2001;50(6):758–780. 10.1080/106351501753462803 [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref052] 52. Sauquet H, Ho SYW, Gandolfo MA, Jordan GJ, Wilf P, Cantrill DJ, et al. Testing the Impact of Calibration on Molecular Divergence Times Using a Fossil-Rich Group: The Case of Nothofagus (Fagales). Systematic Biology. 2011;61(2):289–313. 10.1093/sysbio/syr116 [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref053] 53. Broughton RE, Betancur-R R, Li C, Arratia G, Ortí G. Multi-locus phylogenetic analysis reveals the pattern and tempo of bony fish evolution. PLoS Currents. 2013. 10.1371/currents.tol.2ca8041495ffafd0c92756e75247483e [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref054] 54. Kawahara AY, Rubinoff D. Convergent evolution of morphology and habitat use in the explosive Hawaiian fancy case caterpillar radiation. Journal of Evolutionary Biology. 2013;26(8):1763–1773. 10.1111/jeb.12176 [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref055] 55. Rightmyer MG, Griswold T, Brady SG. Phylogeny and systematics of the bee genus Osmia (Hymenoptera: Megachilidae) with emphasis on North American Melanosmia: subgenera, synonymies and nesting biology revisited. Systematic Entomology. 2013;38(3):561–576. 10.1111/syen.12013 [DOI] [Google Scholar]

[pcbi.1008322.ref056] 56. Moyle RG, Oliveros CH, Andersen MJ, Hosner PA, Benz BW, Manthey JD, et al. Tectonic collision and uplift of Wallacea triggered the global songbird radiation. Nature Communications. 2016;7(1). 10.1038/ncomms12709 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref057] 57.Lanfear R. BenchmarkAlignments https://github.com/roblanf/BenchmarkAlignments. GitHub. 2019.

[pcbi.1008322.ref058] 58. Lanfear R, Frandsen PB, Wright AM, Senfeld T, Calcott B. PartitionFinder 2: new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses. Molecular biology and evolution. 2016;34(3):772–773. [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref059] 59. Douglas J. UglyTrees: a browser-based multispecies coalescent tree visualiser. Bioinformatics. 2020. 10.1093/bioinformatics/btaa679 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008322.ref060] 60. Haario H, Saksman E, Tamminen J, et al. An adaptive Metropolis algorithm. Bernoulli. 2001;7(2):223–242. 10.2307/3318737 [DOI] [Google Scholar]

[pcbi.1008322.ref061] 61. Vihola M. Robust adaptive Metropolis algorithm with coerced acceptance rate. Statistics and Computing. 2012;22(5):997–1008. 10.1007/s11222-011-9269-5 [DOI] [Google Scholar]

[pcbi.1008322.ref062] 62. Benson A, Friel N, et al. Adaptive MCMC for multiple changepoint analysis with applications to large datasets. Electronic Journal of Statistics. 2018;12(2):3365–3396. 10.1214/18-EJS1418 [DOI] [Google Scholar]

[pcbi.1008322.ref063] 63. Davis A, Hauser J. Blocking borehole conductivity logs at the resolution of above-ground electromagnetic systems. Geophysics. 2020;85(2):E67–E77. 10.1190/geo2019-0095.1 [DOI] [Google Scholar]

[pcbi.1008322.ref064] 64. Roberts GO, Rosenthal JS. Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. Journal of applied probability. 2007;44(2):458–475. 10.1017/S0021900200117954 [DOI] [Google Scholar]

[pcbi.1008322.ref065] 65.Hohna S, Defoin-Platel M, Drummond AJ. Clock-constrained tree proposal operators in Bayesian phylogenetic inference. In: 2008 8th IEEE International Conference on BioInformatics and BioEngineering. IEEE; 2008. p. 1–7.

[pcbi.1008322.ref066] 66.Simon D, Larget B. Bayesian analysis in molecular biology and evolution (BAMBE) http://www.mathcs.duq.edu/larget/bambe.html. Pittsburgh, Pennsylvania. 1998.

[pcbi.1008322.ref067] 67. Jow H, Hudelot C, Rattray M, Higgs P. Bayesian phylogenetics using an RNA substitution model applied to early mammalian evolution. Molecular Biology and Evolution. 2002;19(9):1591–1601. 10.1093/oxfordjournals.molbev.a004221 [DOI] [PubMed] [Google Scholar]

[pcbi.1008322.ref068] 68. Lakner C, Van Der Mark P, Huelsenbeck JP, Larget B, Ronquist F. Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. Systematic biology. 2008;57(1):86–103. 10.1080/10635150801886156 [DOI] [PubMed] [Google Scholar]

PERMALINK

Adaptive dating and fast proposals: Revisiting the phylogenetic relaxed clock model

Jordan Douglas

Rong Zhang

Remco Bouckaert

Roles

Abstract

Author summary

Introduction

Models and methods

Preliminaries

Branch rate parameterisations

Fig 1. Branch rate parameterisations.

1. Real rates

2. Categories

3. Quantiles

Clock model operators

Table 1. Summary of pre-existing BEAST 2 operators.

Fig 2. Clock standard deviation scale operators.

Adaptive operator weighting

Fig 3. Traversing likelihood space.

Table 2. Summary of AdaptiveOperatorSampler operators and their parameters of interest (POI).

Bactrian proposal kernel

Fig 4. The Bactrian proposal kernel.

Narrow exchange rate

Fig 5. Depiction of NarrowExchange and NarrowExchangeRate operators.

Automated generation of operators and constraint satisfaction

1. Solution finding

2. Solving Jacobian determinants

3. Automated generation of BEAST 2 operators

4. Screening operators for acceptance rate using simulated data

Fig 6. Screening of NER and NERw variants by acceptance rate.

An adaptive leaf rate operator

Leaf rate AVMVN kernel

Model specification and MCMC settings

Table 3. Summary of clock model operators introduced throughout this article.

Table 4. Operator configurations and the substitution rate parameterisations which each operator is applicable to.

Results

Fig 7. Protocol for optimising clock model methodologies.

Table 5. Benchmark datasets, sorted in increasing order of taxon count N.

Round 1: A simple operator-weight learning algorithm greatly improved performance

Fig 8. Round 1: Benchmarking the AdaptiveOperatorSampler operator.

Round 2: The real parameterisation yielded the fastest mixing

Fig 9. Round 2: Benchmarking substitution rate parameterisations.

Fig 10. Comparison of runtimes across methodologies.

Round 3: Bactrian proposal kernels were around 15% more efficient than uniform kernels

Fig 11. Round 3: Benchmarking the Bactrian kernel.

Round 4: NER operators outperformed on larger datasets

Fig 12. Round 4: Benchmarking the NER operators.

Round 5: The AVMVN leaf rate operator was computationally demanding and improved mixing very slightly

Fig 13. Round 5: Benchmarking the LeafAVMVN operator.

Tournament conclusion

Discussion

Modern operator design

Traversing tree space

Larger datasets require smarter operators

Future outlook

Conclusion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Roger Dimitri Kouyos

Stefano Allesina

Roles

Author response to Decision Letter 0

Decision Letter 1

Roger Dimitri Kouyos

Stefano Allesina

Roles

Acceptance letter

Roger Dimitri Kouyos

Stefano Allesina

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

Table 2. Summary of `AdaptiveOperatorSampler` operators and their parameters of interest (POI).

Fig 5. Depiction of `NarrowExchange` and `NarrowExchangeRate` operators.

Fig 8. Round 1: Benchmarking the `AdaptiveOperatorSampler` operator.

Fig 13. Round 5: Benchmarking the `LeafAVMVN` operator.