Zigzag path connects two Monte Carlo samplers: Hamiltonian counterpart to a piecewise deterministic Markov process

Akihiko Nishimura; Zhenyu Zhang; Marc A Suchard

doi:10.1080/01621459.2024.2395587

. Author manuscript; available in PMC: 2025 Dec 5.

Published in final edited form as: J Am Stat Assoc. 2024 Dec 5;120:1077–1089. doi: 10.1080/01621459.2024.2395587

Zigzag path connects two Monte Carlo samplers: Hamiltonian counterpart to a piecewise deterministic Markov process

Akihiko Nishimura ¹, Zhenyu Zhang ², Marc A Suchard ³

PMCID: PMC12330883 NIHMSID: NIHMS2023543 PMID: 40786117

Abstract

Zigzag and other piecewise deterministic Markov process samplers have attracted significant interest for their non-reversibility and other appealing properties for Bayesian posterior computation. Hamiltonian Monte Carlo is another state-of-the-art sampler, exploiting fictitious momentum to guide Markov chains through complex target distributions. We establish an important connection between the zigzag sampler and a variant of Hamiltonian Monte Carlo based on Laplace-distributed momentum. The position and velocity component of the corresponding Hamiltonian dynamics travels along a zigzag path paralleling the Markovian zigzag process; however, the dynamics is non-Markovian in this position-velocity space as the momentum component encodes non-immediate pasts. This information is partially lost during a momentum refreshment step, in which we preserve its direction but re-sample magnitude. In the limit of increasingly frequent momentum refreshments, we prove that Hamiltonian zigzag converges strongly to its Markovian counterpart. This theoretical insight suggests that, when retaining full momentum information, Hamiltonian zigzag can better explore target distributions with highly correlated parameters by suppressing the diffusive behavior of Markovian zigzag. We corroborate this intuition by comparing performance of the two zigzag cousins on high-dimensional truncated multivariate Gaussians, including a 11,235-dimensional target arising from a Bayesian phylogenetic multivariate probit modeling of HIV virus data.

Keywords: Bayesian statistics, Hamiltonian Monte Carlo, Markov chain Monte Carlo, non-reversible, piecewise deterministic Markov process, truncated normal distribution

1. Introduction

Emergence of Monte Carlo methods based on continuous-time, non-reversible processes has been hailed as fundamentally new development (Fearnhead et al. 2018) and has attracted an explosion of interest (Dunson & Johndrow 2020). The two most prominent among such algorithms are the bouncy particle sampler (Bouchard-Côté et al. 2018) and zigzag sampler (Bierkens et al. 2019a), both having roots in the computational physics literature (Peters & de With 2012, Turitsyn et al. 2011). These algorithms draw samples from target distributions by simulating piecewise deterministic Markov processes (PDMP), which move along linear trajectories with instantaneous changes in their velocities occurring at random times according to inhomogeneous Poisson processes.

One known issue with the bouncy particle sampler is its near-reducible behavior in the absence of frequent velocity refreshment (Bouchard-Côté et al. 2018, Fearnhead et al. 2018). In case of a high-dimensional i.i.d. Gaussian, Bierkens et al. (2022) show that the bouncy particle sampler’s optimal performance is achieved when refreshment accounts for as much as 78% of all the velocity changes. Such frequent velocity refreshment can lead to “random-walk behavior,” hurting computational efficiency of the sampler (Neal 2010, Fearnhead et al. 2018, Andrieu & Livingstone 2021).

The zigzag sampler on the other hand is provably ergodic without velocity refreshment (Bierkens et al. 2019b) and appears to have a competitive edge in high-dimensions (Bierkens et al. 2022). Much remains unknown, however, about how these samplers perform relative one another and against other classes of algorithms. Early empirical results, while informative, remain limited by their focus on low-dimensional or synthetic examples. For example, the simulated logistic regression examples in Bierkens et al. (2019a) only have 16 regression coefficients and the most complex examples of Bierkens et al. (2022) comprise 256-dimensional multi-variate Gaussians and spherically symmetric t-distributions. Generating further theoretical and empirical insight on the practical performance of these algorithms thus stands out as one of the most critical research areas (Fearnhead et al. 2018, Bierkens et al. 2019a, Dunson & Johndrow 2020).

This article brings novel insight into the zigzag sampler’s performance by revealing its intimate connection to a version of Hamiltonian Monte Carlo (HMC) (Duane et al. 1987, Neal 2010), another state-of-the-art paradigm for Bayesian computation. HMC exploits an auxiliary momentum variable and simulates Hamiltonian dynamics to guide exploration of the original parameter space. This notion of “guidance” provided by momentum has historically inspired the earliest examples of non-reversible Monte Carlo algorithms (Diaconis et al. 2000). Beyond this analogy and heuristic, however, the precise nature of relation between HMC and non-reversible methods has never been explored. In fact, the PDMP sampler literature has so far mentioned HMC only in passing or used it merely as a computational benchmark (Bierkens et al. 2019a, Bouchard-Côté et al. 2018, Sherlock & Thiery 2021). A faint link between the two paradigms appears in Deligiannidis et al. (n.d.), who observe a univariate randomized HMC to emerge as the first-coordinate marginal of the bouncy particle sampler in a high-dimensional limit. Their weak convergence result, however, only concerns the univariate marginal and offers little insight as to why a randomized HMC may appear as the limit.

We study a less explored version of HMC, based on the momentum variable components $p_{i}$ having independent Laplace distributions. The corresponding Hamiltonian dynamics follows a zigzag path akin to that of the Markovian zigzag process. We thus call the resulting algorithm zigzag HMC and, to differentiate the dynamics underlying the two samplers, refer to them as Hamiltonian and Markovian zigzag. In other words, except when we explicitly invoke partial momentum refreshment as a theoretical tool, Hamiltonian zigzag refers to the deterministic dynamics that constitutes a proposal generation mechanism for zigzag HMC. Hamiltonian zigzag in a sense is also “Markovian,” although this term is rarely applied to a deterministic process, in the position-momentum space as its future trajectory depends solely on the current state. Hamiltonian zigzag becomes non-Markovian, however, when viewed as dynamics in the position-velocity space, velocity being the time derivative of position.

We establish that Markovian zigzag is essentially Hamiltonian zigzag with “less momentum,” thereby providing a unified perspective to compare the two Monte Carlo paradigms. We consider a partial momentum refreshment for Hamiltonian zigzag, in which $p_{i} ’ s$ retain their signs but have their magnitudes resampled from the exponential distributions. With this refreshment step inserted at every $Δ t > 0$ time interval, Hamiltonian zigzag converges strongly to its Markovian counterpart as $Δ t \to 0$ .

This result has significant implications in the relative performance of the two zigzag algorithms. Markovian zigzag to some extent avoids random-walk behavior by retaining its direction from previous moments; the inertia induced by retaining full momentum information, however, may allow Hamiltonian zigzag to better explore the space when parameters exhibit strong dependency. The intuition is as follows. Along each coordinate, the partial derivative of the log-density depends on other coordinates and its sign can flip back and forth as the dynamics evolves. Such fluctuation leads to inconsistent guidance from the derivative, “pushing” the dynamics to one direction at one moment and to the opposite at the next. However, momentum can help each coordinate of the dynamics keeps traveling in an effective direction without being affected by small fluctuation in the derivative.

The advantage of having full momentum is visually illustrated in Figure 1. After comparable amounts of computation, Hamiltonian zigzag has traversed a high-density region while Markovian zigzag is still slowly diffusing away from the initial position. This difference in behavior also manifests itself in the overall distance traveled by the dynamics (Figure 2). Our observation here suggests that, with its ability to make larger transitions under comparable computational efforts, Hamiltonian zigzag constitutes a more effective transition kernel.

Figure 2: — Squared distance ${‖ x (t) - x_{0} ‖}^{2}$ of the two zigzag dynamics from the initial position, plotted as function of the number of velocity change events. The experimental setup is identical to that of Figure 1. The dashed line indicates the expected squared distance between the initial position and an independent sample from the target, as a benchmark of the distance traveled by an efficient transition kernel.

We empirically quantify the superior performance of zigzag HMC over Markovian zigzag on a range of truncated Gaussians, a special yet practically relevant class of targets on which we can efficiently simulate both zigzags. One of our examples arises from a Bayesian phylogenetic multivariate probit model for studying correlation structure among binary biological traits of HIV viruses (Zhang et al. 2021). This is a real-world high-dimensional problem in which the super-linear scaling property of Markovian zigzag established by Bierkens et al. (2019a) does not hold — with the number of parameters growing proportional to that of observations, their sub-sampling and control variate techniques cannot be applied here. The same reasons hamper a use of the sub-sampling approach by Bouchard-Côté et al. (2018) based on the local bouncy particle sampler (Bardenet et al. 2017).

The deterministic nature of its dynamics endows Hamiltonian zigzag with a couple of additional advantages over its Markovian counterpart as a building block for sampling algorithms for truncated Gaussians. First, we can combine Hamiltonian zigzag with the no-U-turn algorithm to yield an effectively tuning-free sampler (Section 4.1). Hamiltonian zigzag is further applicable whenever a subset of parameters is conditionally distributed as truncated Gaussian; the split HMC framework (Shahbaba et al. 2014, Nishimura et al. 2020) allows us to combine Hamiltonian zigzag with other integrators to jointly update all the parameters. The split HMC extension is successfully deployed by Zhang et al. (2023), who additionally demonstrates zigzag HMC’S advantage over the bouncy particle sampler. These samplers based on Hamiltonian zigzag are provided as a part of the Bayesian phylogenetic software beast (Suchard et al. 2018) and are also available via the R package hdtg (Zhang et al. 2022). Finally, while Hamiltonian zigzag’s use outside truncated Gaussians is beyond the scope of this article, we can in principle apply zigzag HMC to any targets by approximating the dynamics through the coordinate-wise integrator of Nishimura et al. (2020) or through the mid-point integrator of Chin & Nishimura (2024) (Supplement Section S4).

While this work focuses on the two zigzags, their relationship as presented here seems to hint at HMC’S more general connection to PDMP and other non-reversible methods. We explore, and to some extent substantiate, this conjecture in the Discussion section. In particular, we construct a discrete space analogue of Hamiltonian dynamics which, when combined with partial momentum refreshments, recovers the non-reversible algorithm of Diaconis et al. (2000). Our result on the two zigzags, therefore, portends broader implications in advancing our understanding of HMC and other non-reversible methods.

2. Zigzag Hamiltonian Monte Carlo

2.1. Hamiltonian dynamics based on Laplace momentum

In order to sample from the parameter of interest $x$ , HMC introduces an auxiliary momentum variable $p$ and targets the augmented distribution $π (x, p) : = π_{X} (x) π_{P} (p)$ . HMC explores the augmented space by simulating Hamiltonian dynamics, whose evolution is governed by the differential equation known as Hamilton’s equation:

\frac{d x}{d t} = \nabla K (p), \frac{d p}{d t} = - \nabla U (x),

(1)

where $U (x) = - log π_{X} (x)$ and $K (p) = - log π_{P} (p)$ are referred to as potential and kinetic energy. The solution of (1) preserves the target $π (x, p)$ and, if it can be simulated exactly, can be deployed as a transition kernel. Such exact simulation is infeasible in most applications, but a numerical approximation by a reversible integrator constitutes a valid Metropolis-Hastings proposal (Fang et al. 2014). To this day, the original version of HMC by Duane et al. (1987), based on Gaussian-distributed momentum and the leapfrog integrator, dominates practice (Salvatier et al. 2016, Carpenter et al. 2017).

For the purpose of dealing with discontinuous target densities, Nishimura et al. (2020) proposes the use of Laplace momentum $π_{P} (p) \propto exp (- \sum_{i} | p_{i} |)$ in HMC. The authors then proceed to develop a reversible integrator that qualitatively approximates the corresponding Hamiltonian dynamics, noting only in passing the similarity of its trajectories to those of Markovian zigzag. Under Laplace momentum, Hamilton’s equation becomes

\frac{d x}{d t} = sign (p), \frac{d p}{d t} = - \nabla U (x),

(2)

in which the velocity $v : = d x / d t \in {\pm 1}^{d}$ depends only on the sign of $p$ and thus remains constant except when one of the $p_{i} ’ s$ undergoes a sign change. This property yields a piecewise linear trajectory as follows.

We momentarily ignore a mathematical technicality that arises from the discontinuity of the sign function in (2). Starting from the state $(x (τ)$ , $p (τ))$ with $p_{i} (τ) \neq 0$ for all $i$ and $v = sign (p)$ , the dynamics according to (2) evolves as

x (τ + t) = x (τ) + t v (τ),

(3)

p_{i} (τ + t) = p_{i} (τ) - \int_{0}^{t} \partial_{i} U (x (τ + s)) d s for all i,

(4)

where (3) holds as long as the signs of $p_{i}$ , and hence $v$ , remain constant on $[τ, τ + t)$ . During this time, the momentum magnitude evolves as

| p_{i} | (τ + t) = | p_{i} | (τ) - \int_{0}^{t} v_{i} (τ) \partial_{i} U (x (τ) + s v (τ)) d s for all i .

(5)

From (4) and (5), we see that the sign change in $p_{i}$ occurs at time $τ + t_{i}$ where

t_{i} = inf_{t > 0} {| p_{i} (τ) | = \int_{0}^{t} v_{i} (τ) \partial_{i} U (x (τ) + s v (τ)) d s} .

(6)

Let $i^{*} = {argmin}_{i} t_{i}$ denote the first coordinate to experience a sign change. As $p_{i^{*}}$ changes its sign at time $τ + t_{i^{*}}$ , the velocity $v = d x / d t$ undergoes an instantaneous change

v_{i^{*}} (τ + t_{i^{*}}) = - v_{i^{*}} (τ), v_{j} (τ + t_{i^{*}}) = v_{j} (τ) for j \neq i^{*} .

(7)

Afterward, the position component proceeds along a new linear path

x (τ + t) = x (τ + t_{i^{*}}) + (t - t_{i^{*}}) v (τ + t_{i^{*}}) for t \geq t_{i^{*}}

(8)

until the next sign change event. All the while, the momentum component continues evolving according to (4). We summarize the properties (3)–(8) in the algorithmic description of Hamiltonian zigzag in Section 2.2.

Under mild conditions, which we quantify momentarily, the trajectory of $(x (t), p (t))$ as described satisfies the equation (2) except at the instantaneous moments of sign changes in $p_{i} ’ s$ . For a differential equation with discontinuous right-hand side, what constitutes a solution and whether it is unique are delicate and complex questions. A study of existence and uniqueness typically starts by interpreting the equation as a differential inclusion problem (Filippov 1988). More precisely, given a differential equation $d z / d t = f (z)$ and discontinuity points of $f$ , the equality requirement is relaxed to an inclusion of the form $d z / d t \in F (z)$ for a suitable set $F (z)$ , such as a convex polytope whose extreme points consist of ${lim}_{z^{'} \to z} f (z^{'})$ . The theory remains incomplete, however, despite years of research effort (Fetecau et al. 2003, Khulief 2013).

As current theory falls short, we will directly establish an existence and uniqueness result for Hamilton’s equation with Laplace momentum. For continuously differentiable $U$ , the process (3)–(8) defines a unique trajectory consistent with (2) as long as it stays away from the sets

S_{i} = {(x, p) : \partial_{i} U (x) = 0, p_{i} = 0} .

(9)

If $\partial_{i} U (x (τ + t_{i})) \neq 0$ at the moment of sign change $p_{i} (τ + t_{i}) = 0$ , the relation (4) dictates

lim_{t \to t_{i}^{+}} sign (p_{i} (τ + t)) = - lim_{t \to t_{i}^{-}} sign (p_{i} (τ + t)),

where $t_{i}^{+}$ and $t_{i}^{-}$ indicate the right and left limit. Hence, the velocity change and subsequent evolution according to (7) and (8) define a unique trajectory that satisfies Equation (2) for almost every $t \geq 0$ and has its position components continuous in $t$ .

In Theorem 2.1 below, we make a convenient assumption to ensure that a trajectory avoids the problematic sets (9) from almost every initial state. While identifying more general conditions is beyond the scope of this work, we believe the dynamics to be well-defined under a much weaker assumption on $U (x)$ satisfied by most practical situations. The theorem also establishes the time-reversibility and symplecticity of the dynamics, which together imply that Hamiltonian zigzag preserves the target distribution and thus constitutes a valid transition kernel (Neal 2010, Fang et al. 2014).

Theorem 2.1.

Suppose that $U (x)$ is twice continuously differentiable and that the sets ${x : \partial_{i} U (x) = 0}$ comprise differentiable manifolds of dimension at most $d - 1$ . Then Eq (2) defines a unique dynamics on $ℝ^{2 d}$ away from a set of Lebesgue measure zero. More precisely, there is a measure zero set $Ω \subset ℝ^{2 d}$ such that, for all initial conditions $(x (0), p (0)) \in ℝ^{2 d} \ Ω$ , there exists a unique solution that satisfies Eq (2) at almost every $t \geq 0$ , that remains in $ℝ^{2 d} \ Ω$ for all $t \geq 0$ , and whose position component is continuous in $t$ . Moreover, the dynamics is time-reversible and symplectic on $ℝ^{2 d} \ Ω$ .

The assumption of Theorem 2.1 in particular holds for strongly convex $U (x)$ since it would imply that $\nabla \partial_{i} U = {(\partial_{1} \partial_{i} U, \dots, \partial_{d} \partial_{i} U)}^{⊤} \neq 0$ and, by the implicit function theorem, that ${x : \partial_{i} U (x) = 0}$ is a differentiable manifold of dimension $d - 1$ (Spivak 1965). We thus have the following corollary:

Corollary 2.2.

For a twice continuously differentiable, strongly log-concave target $π_{X} (x)$ and the corresponding potential energy $U (x) = - log π_{X} (x)$ , the Hamiltonian dynamics based on Laplace momentum is well-defined, time-reversible, and symplectic away from a set of Lebesgue measure zero.

We note that, the exact nature of Hamiltonian zigzag on smooth targets being auxiliary to their work, Nishimura et al. (2020) establish corresponding results only under a piecewise constant $U$ with piecewise linear discontinuity set. Given the incomplete theory behind discontinuous differential equations and non-smooth Hamiltonian mechanics, Theorem 2.1 is significant on its own and of independent interest. We defer the proof to Supplement Section S1, however, to keep the article’s focus on the connection between Hamiltonian and Markovian zigzag and its implication on Monte Carlo simulation. We similarly defer to Supplement Section S2 a discussion of the theory’s extension to accommodate constraints on the parameter space, which justifies our application of Hamiltonian zigzag to truncated Gaussian targets (Section 4).

2.2. Simulation of Hamiltonian zigzag dynamics

To summarize the above discussion and prepare for the subsequent discussion of Hamiltonian zigzag’s connection to Markovian one, we describe the evolution of Hamiltonian zigzag in the space $(x, v, | p |)$ with $p_{i} = v_{i} | p_{i} |$ as follows. We denote the $k$ -th event time by $τ^{(k)}$ for $k \geq 1$ and the corresponding state at the moment by $(x^{(k)}, v^{(k)}, | p |^{(k)}) = (x (τ^{(k)}), v (τ^{(k)}), | p | (τ^{(k)}))$ . From a given initial condition at time $τ^{(0)}$ , the position coordinate follows a piecewise linear path segmented by the event times $τ^{(k)}$ , in-between which the dynamics evolves according to

\begin{matrix} x (τ^{(k)} + t) = x^{(k)} + t v^{(k)}, v (τ^{(k)} + t) = v^{(k)}, and \\ | p_{i} | (τ^{(k)} + t) = {| p_{i} |}^{(k)} - \int_{0}^{t} v_{i}^{(k)} \partial_{i} U (x^{(k)} + s v^{(k)}) d s . \end{matrix}

The $(k + 1)$ -th event occurs at time

τ^{(k + 1)} = τ^{(k)} + min_{i} t_{i}^{(k)} where t_{i}^{(k)} = inf_{t > 0} {{| p_{i} |}^{(k)} = \int_{0}^{t} v_{i}^{(k)} \partial_{i} U (x^{(k)} + s v^{(k)}) d s},

resulting in an instantaneous change in the $i^{*}$ -th component of velocity for $i^{*} (k) = {argmin}_{i} t_{i}^{(k)}$ :

v_{i^{*}}^{(k + 1)} = - v_{i^{*}}^{(k)} and v_{j}^{(k + 1)} = v_{j}^{(k)} for j \neq i^{*} .

The position and momentum magnitude at time $τ^{(k + 1)}$ are given by

\begin{matrix} x^{(k + 1)} = x^{(k)} + (τ^{(k + 1)} - τ^{(k)}) v^{(k)} and \\ {| p_{i} |}^{(k + 1)} = {| p_{i} |}^{(k)} - \int_{0}^{τ^{(k + 1)} - τ^{(k)}} v_{i}^{(k)} \partial_{i} U (x^{(k)} + s v^{(k)}) d s . \end{matrix}

The dynamics then continues in the same manner for the next interval $[τ^{(k + 1)}, τ^{(k + 2)})$ .

Algorithm 1 summarizes the trajectory simulation process as pseudo-code. For the moment, we do not concern ourselves with either how we would solve for $t_{i}$ in Line 7 or how to evaluate the integrals in Line 12 and 16. As we demonstrate in Section 4, we can exploit the analytical solutions available under (truncated) multivariate Gaussians for a highly efficient implementation (Supplement Section S5).

Algorithm 1.

Hamiltonian zigzag trajectory simulation for $t \in [0, T]$

1:	function HamiltonianZigzag $(x, p, T)$
2:	$τ \leftarrow 0$
3:	$v \leftarrow sign (p)$
4:	while $τ < T$ do
5:	for $i = 1, \dots, d$ do
6:
7:	$\begin{matrix} t_{i} = inf_{t > 0} {\| p_{i} \| = \\ \int_{0}^{t} v_{i} \partial_{i} U (x + s v) d s} \end{matrix}$
8:	$t^{*} \leftarrow {min}_{i} t_{i}$
9:	if $τ + t^{} > T$ then*
10:	# No further event occurred
11:	$x \leftarrow x + (T - τ) v$
12:	$p \leftarrow p - \int_{0}^{T - τ} \nabla U (x + s v) d s$ .
13:	$τ \leftarrow T$
14:	else
15:	$x \leftarrow x + t^{*} v$
16:	$p \leftarrow p - \int_{0}^{t^{*}} \nabla U (x + s v) d s$ .
17:	$i^{*} \leftarrow {argmin}_{i} t_{i}$
18:	$v_{i^{}} \leftarrow - v_{i^{}}$
19:	$τ \leftarrow τ + t^{*}$
20:	return $(x, p)$

Open in a new tab

Algorithm 2.

Markovian zigzag trajectory simulation for $t \in [0, T]$

1:	function MarkovianZigzag $(x, v, T)$
2:	$τ \leftarrow 0$
3:
4:	while $τ < T$ do
5:	for $i = 1, \dots, d$ do
6:	$u_{i} \sim Unif (0, 1)$
7:	$\begin{matrix} t_{i} = inf_{t > 0} {- log u_{i} = \\ \int_{0}^{t} {[v_{i} \partial_{i} U (x + s v)]}^{+} d s} \end{matrix}$
8:	$t^{*} \leftarrow {min}_{i} t_{i}$
9:	if $τ + t^{} > T$ then*
10:	# No further event occurred
11:	$x \leftarrow x + (T - τ) v$
12:
13:	$τ \leftarrow T$
14:	else
15:	$x \leftarrow x + t^{*} v$
16:
17:	$i^{*} \leftarrow {argmin}_{i} t_{i}$
18:	$v_{i^{}} \leftarrow - v_{i^{}}$
19:	$τ \leftarrow τ + t^{*}$
20:	return $(x, v)$

Open in a new tab

3. Link between Hamiltonian and Markovian zigzags

3.1. Hamiltonian zigzag’s apparent similarity to Markovian zigzag

The Markovian zigzag process by Bierkens et al. (2019a) follows a piecewise linear trajectory similar to Hamiltonian zigzag, but without any apparent concept of momentum. Starting from the state $(x (τ), v (τ))$ with $v \in {\pm 1}^{d}$ , Markovian zigzag follows a linear path

x (τ + t) = x (τ) + t v (τ), v (τ + t) = v (τ)

for $t \geq 0$ , until the next velocity switch event $v_{i} \leftarrow - v_{i}$ that occurs with Poisson rate

λ_{i} (x, v) = {[v_{i} \partial_{i} U (x)]}^{+} : = max {0, v_{i} \partial_{i} U (x)} .^{1}

(10)

In particular, the next event time $τ + t^{*}$ can be simulated by setting $t^{*} = {min}_{i} t_{i}$ where

t_{i} = inf_{t > 0} {- log u_{i} = \int_{0}^{t} {[v_{i} (τ) \partial_{i} U (x (τ) + s v (τ))]}^{+} d s} for u_{i} \sim Unif (0, 1) .

(11)

At time $τ + t^{*}$ , the velocity undergoes an instantaneous change

v_{i^{*}} (τ + t_{i^{*}}) = - v_{i^{*}} (τ), v_{j} (τ + t_{i^{*}}) = v_{j} (τ) for j \neq i^{*} = \underset{i}{argmin} t_{i} .

The position component then proceeds along a new linear path

x (τ + t) = x (τ + t_{i^{*}}) + (t - t_{i^{*}}) v (τ + t_{i^{*}}) for t \geq t_{i^{*}}

until the next sign change event. The Markovian zigzag process as described has the stationary distribution $π (x, v) : = 2^{- d} π_{X} (x)$ , with $v$ distributed uniformly on ${\pm 1}^{d}$ .

Algorithm 2 describes the dynamics of Markovian zigzag in pseudo-code with empty lines inserted as appropriate to facilitate comparison with the dynamics of Hamiltonian zigzag. The similarity between the two zigzags is striking. The main difference lies in Lines 6 and 7 of the algorithms, reflecting the formulae (11) and (6) for their respective velocity switch event times. For one thing, Markovian zigzag’s event time depends on the random quantity $u_{i}$ while Hamiltonian zigzag’s is deterministic. On the other hand, when combining Hamiltonian zigzag with momentum refreshment, the distributional equality $- log u_{i} \overset{d}{=} | p_{i} | \sim Exp (scale = 1)$ makes the quantities $- log u_{i}$ and $| p_{i} |$ comparable in a sense and, as we will show in Section 3.2, is a key element connecting the two zigzags.

The quantities $- log u_{i}$ and $| p_{i} |$ being comparable, the only remaining difference between (11) and (6) is the presence and absence of the positive part operator ${[\cdot]}^{+}$ in the integrands. These presence and absence of ${[\cdot]}^{+}$ are manifestations of the fact that Markovian zigzag is memory-less while Hamiltonian zigzag transfers energy between the potential and kinetic parts and encodes this information in momentum. The etiology and consequence of this difference is most easily seen in the case of a one-dimensional unimodal target $U (x)$ , as visually illustrated in Figure 3. Before a velocity switch event, a zigzag trajectory from the initial position $x_{0}$ and velocity $v_{0}$ satisfies

\begin{array}{l} \int_{0}^{t} {[v_{0} \partial_{i} U (x_{0} + s v_{0})]}^{+} d s = {\begin{array}{l} 0 for 0 \leq t \leq t_{min} \\ U (x_{0} + t v_{0}) - U_{min} for t > t_{min}, \end{array} \\ \int_{0}^{t} v_{0} \partial_{i} U (x_{0} + s v_{0}) d s = U (x_{0} + t v_{0}) - U (x_{0}) \\ = U (x_{0} + t v_{0}) - U_{min} - [U (x_{0}) - U_{min}], \\ where t_{min} : = \underset{t \geq 0}{argmin} U (x_{0} + t v_{0}) and U_{min} : = U (x_{0} + t_{min} v_{0}) . \end{array}

Figure 3: — Comparison of Markovian (left) and Hamiltonian (right) zigzag trajectories under the one-dimensional potential $U (x)$ . Neither zigzag is affected by velocity switch events while going down the potential energy hill, during which the velocity and gradient point in the opposite directions and the relation $v (t) \partial U (x (t)) < 0$ holds. During this time, Hamiltonian zigzag stores up kinetic energy converted from potential energy, while Markovian zigzag remains memory-less. Once the trajectories reaches the potential energy minimum $U_{min}$ at time $t_{min} : = {argmin}_{t \geq 0} U (x_{0} + t v_{0})$ and start going “uphill,” the accumulated momentum $| p (t_{min}) | = | p_{0} | + U (x_{0}) - U_{min}$ keeps Hamiltonian zigzag traveling in the same direction longer than Markovian zigzag. The last statement technically holds only “on average” due to randomness in the realized values of $| p_{0} | \overset{d}{=} - \log u$ .

The velocity switch event formulae (11) and (6) therefore simplify to

\begin{matrix} t^{M} = inf_{t \geq t_{min}} {- log u = U (x_{0} + t v_{0}) - U_{min}}, \\ t^{H} = inf_{t \geq t_{min}} {U (x_{0}) - U_{min} + | p_{0} | = U (x_{0} + t v_{0}) - U_{min}} . \end{matrix}

(12)

From (12), we see that the Markovian event necessarily precedes the Hamiltonian one (i.e. $t^{M} \leq t^{H}$ ) when $| p_{0} | = - log u$ and hence $| p (t_{min}) | = U (x_{0}) - U_{min} + | p_{0} | \geq - log u$ .

In higher dimensions, the same reasoning applies to the relative behavior of the two zigzags along each coordinate. On average, the memory-less property as manifested by the presence of ${[\cdot]}^{+}$ in (11) causes Markovian zigzag to experience more frequent velocity switch events and travel shorter distances along each linear segment. In contrast, when a coordinate of Hamiltonian zigzag is going down potential energy hills (i.e. $v_{i} \partial_{i} U (x) < 0$ ), the decrease in potential energy causes an equivalent increase in kinetic energy and in momentum magnitude as dictated by the relation (5). Hamiltonian zigzag can then use this stored kinetic energy to continue traveling in the same direction for longer distances.

3.2. Markovian zigzag as an infinite momentum refreshment limit

We now consider a version of Hamiltonian zigzag in which we periodically refresh the momentum by resampling their magnitudes $| p_{i} (τ) | \sim Exp (1)$ while keeping their signs. This process follows a zigzag path as before, but its inter-event times are now random. We see from the formula of (6) that, following a momentum magnitude refreshment at time $τ$ , the velocity flip $v_{i} \leftarrow - v_{i}$ occurs during the interval $[τ, τ + Δ t]$ if and only if

| p_{i} (τ) | \leq max_{0 \leq t \leq Δ t} [\int_{0}^{t} v_{i} (τ) \partial_{i} U (x (τ + s)) d s] .

Provided that $\partial_{i} U$ is continuous and $\partial_{i} U (x (τ)) \neq 0$ , the sign of $\partial_{i} U (x (τ + s))$ stays constant on the interval $s \in [τ, τ + Δ t]$ for sufficiently small $Δ t$ , so that

max_{0 \leq t \leq Δ t} [\int_{0}^{t} v_{i} (τ) \partial_{i} U (x (τ + s)) d s] < 0 or = \int_{0}^{Δ t} {[v_{i} (τ) \partial_{i} U (x (τ + s))]}^{+} d s .

Under these conditions, the probability of the $i$ -th velocity flip is therefore

\begin{matrix} ℙ {| p_{i} (τ) | \leq max_{0 \leq t \leq Δ t} [\int_{0}^{t} v_{i} (τ) \partial_{i} U (x (τ + s)) d s]} \\ = ℙ {| p_{i} (τ) | \leq \int_{0}^{Δ t} {[v_{i} (τ) \partial_{i} U (x (τ + s))]}^{+} d s} \\ = 1 - exp {- \int_{0}^{Δ t} {[v_{i} (τ) \partial_{i} U (x (τ + s))]}^{+} d s} \\ = {[v_{i} (τ) \partial_{i} U (x (τ))]}^{+} Δ t + O (Δ t^{2}) . \end{matrix}

(13)

Equation (13) shows that, immediately following the momentum magnitude refreshment, an i-th velocity switch event for Hamiltonian zigzag happens at a rate essentially identical to that of Markovian zigzag as given in (10).

Now consider resampling the momentum magnitude after every time interval of size $Δ t$ and letting $Δ t \to 0$ . Our analysis above suggests that, under this limit, the rate of coordinate-wise velocity switch events for Hamiltonian zigzag converges to $λ_{i} (x, v) = {[v_{i} \partial_{i} U (x)]}^{+}$ . That is, Hamiltonian zigzag becomes equivalent to Markovian zigzag under this infinite momentum refreshment limit.

We now turn the above intuition into a rigorous argument. In Theorem 3.1 below, $D [0, \infty)$ denotes the space of right-continuous-with-left-limit functions from $[0, \infty)$ to $ℝ^{d} \times ℝ^{d}$ endowed with Skorokhod topology, the canonical space to study convergence of stochastic processes with jumps (Billingsley 1999, Ethier & Kurtz 2005). In particular, the convergence $(x_{Δ t}, v_{Δ t}) \to (x, v)$ in this space implies the convergence of the ergodic average $T^{- 1} \int_{0}^{T} f (x_{Δ t} (t)) d t \to T^{- 1} \int_{0}^{T} f (x (t)) d t$ for any continuous real-valued function $f$ .

Theorem 3.1 (Weak convergence).

Given an initial position $x (0)$ and velocity $v (0) \in {\pm 1}^{d}$ , consider Hamiltonian zigzag dynamics with $p_{i} (0) = v_{i} (0) | p_{i} (0) |$ with $| p_{i} (0) | \sim Exp (1)$ and with momentum magnitude resampling at every $Δ t$ interval, i.e. $| p_{i} (n Δ t) | \sim Exp (1)$ for $n \in ℤ^{+}$ . For each $Δ t$ , let $(x_{Δ t}, p_{Δ t})$ denote the corresponding dynamics and $(v_{Δ t})$ the right-continuous modification of velocity $d x_{Δ t} / d t = sign (p_{Δ t})$ . Then, as $Δ t \to 0$ , the dynamics $(x_{Δ t}, v_{Δ t})$ converges weakly to the Markovian zigzag process in $D [0, \infty)$ .

This weak convergence result characterizes Markovian zigzag as a special case, albeit only in the limit, of Hamiltonian zigzag. As such, it almost guarantees that Hamiltonian zigzag, if combined with optimal momentum refreshment schedule, will outperform Markovian zigzag. At a more practical level, the interpretation of Markovian zigzag as Hamiltonian zigzag with less momentum pinpoints the cause of dramatic differences in their efficiency observed in our numerical examples (Section 4).

En route to the weak convergence result, we in fact establish a stronger convergence in probability via explicit coupling of Hamiltonian and Markovian zigzag processes:

Theorem 3.2 (Strong convergence).

The Hamiltonian zigzags with momentum magnitude refreshments, as described in Theorem 3.1, can be constructed so that their position-velocity components $(x_{Δ t}, v_{Δ t})$ converge strongly to the Markovian zigzag in $D [0, T]$ for all $T < \infty$ . More precisely, there exists a family of Hamiltonian zigzags $(x_{Δ t}^{H}, p_{Δ t}^{H})$ and Markovian zigzags $(x_{Δ t}^{M}, v_{Δ t}^{M})$ on the same probability space such that, for any $ϵ > 0$ and $T > 0$ ,

\lim_{Δ t \to 0} ℙ {ρ_{T} [(x_{Δ t}^{H}, v_{Δ t}^{H}), (x_{Δ t}^{M}, v_{Δ t}^{M})] > ϵ} = 0,

(14)

where $v_{Δ t}^{H}$ is the right-continuous modification of $sign (p_{Δ t}^{H})$ and $ρ_{T} (\cdot, \cdot)$ is the Skorokhod metric on $[0, T]$ . In fact, the two zigzags can be constructed so that

lim_{Δ t \to 0} ℙ {(x_{Δ t}^{H}, v_{Δ t}^{H}) \equiv (x_{Δ t}^{M}, v_{Δ t}^{M}) o n [0, T]} = 1 .

In the statement above, the distributions of the Hamiltonian zigzags $(x_{Δ t}^{H}, v_{Δ t}^{H})$ depend on $Δ t$ due to momentum magnitude refreshments, but their Markovian counterparts all have the same distribution $(x_{Δ t}^{M}, v_{Δ t}^{M}) \overset{d}{=} (x^{M}, v^{M})$ . By Theorem 3.1 of Billingsley (1999), convergence in the sense of (14) implies weak convergence $(x_{Δ t}^{H}, v_{Δ t}^{H}) \to (x^{M}, v^{M})$ in $D [0, T]$ for any $T > 0$ and hence, by Theorem 16.7 of Billingsley (1999), in $D [0, \infty)$ . In particular, our Theorem 3.1 follows from our Theorem 3.2, whose proof is in Supplement Section S3.

4. Numerical study: two zigzags duel over truncated multivariate Gaussians

Our theoretical result of Section 3.2 shows Markovian zigzag as essentially equivalent to Hamiltonian zigzag with constant refreshment of momentum magnitude. As we heuristically argue in Section 1 and 3.1, the loss of full momentum information can make Markovian zigzag more prone to random-walk behavior in the presence of strong dependency among parameters. We validate this intuition empirically in this section.

We have so far put aside the issue of numerically simulating zigzag trajectories in practice. The coordinate-wise integrator of Nishimura et al. (2020) provides one way to qualitatively approximate Hamiltonian zigzag. With a suitable modification (Supplement Section S4), the mid-point integrator of Chin & Nishimura (2024) provides another option. For exact simulations, however, both zigzags require computing the times of velocity switch events (Line 7 in Algorithm 1 and 2). Hamiltonian zigzag additionally requires computing the integrals of Line 12 and 16 for updating momentum. Being a Markovian process, Markovian zigzag allows the use of Poisson thinning in determining event times (Bierkens et al. 2022). This fact makes it somewhat easier to simulate Markovian zigzag, while an efficient implementation remains challenging except for a limited class of models (Vanetti et al. 2017).

Here we focus on sampling from a truncated multivariate Gaussian, a special yet practically relevant class of targets, on which we can efficiently simulate both zigzags. Besides simple element-wise multiplications and additions, simulating each linear segment of the zigzags only requires solving $d$ quadratic equations and extracting a column of the Gaussian precision matrix $Φ$ (Supplement Section S5). This in particular gives the zigzags a major potential advantage, depending on the structure of $Φ$ , over other state-of-the-art algorithms for truncated Gaussians that require computationally expensive pre-processing operations involving $Φ$ (Pakman & Paninski 2014, Botev 2017). In fact, the numerical results of Zhang et al. (2022) indicate zigzag HMC as a preferred choice over the algorithms of Pakman & Paninski (2014) and Botev (2017) in many high-dimensional applications. Supplement Section S7 provides more detailed discussion of how these algorithms compare in their algorithmic complexities and complement the benchmark of Zhang et al. (2022) with additional numerical results.

We compare performances of the two zigzags on a range of truncated Gaussians, consisting of both synthetic and real-data posteriors. As predicted, Hamiltonian zigzag emerges as a clear winner as dependency among parameters increases.

4.1. Zigzag-Nuts: Hamiltonian zigzag with no-U-turn algorithm

Given the availability of analytical solutions in simulating zigzag trajectories, Markovian zigzag is completely tuning-free in the truncated Gaussian case. Hamiltonian zigzag requires periodic momentum refreshments $p_{i} \sim$ Laplace(scale = 1) for ergodicity, so the integration time $T$ in-between refreshments remains a user-specified input. On the other hand, being a reversible dynamics, Hamiltonian zigzag can take advantage of the no-U-turn algorithm (nuts) of Hoffman & Gelman (2014) to automatically determine an effective integration time. This way, we only need to supply a base integration time $Δ T$ to Hamiltonian zigzag — the no-U-turn algorithm will then identify an appropriate integration time $T = 2^{k} Δ T$ , where $k \geq 0$ is the height of a binary trajectory tree at which the trajectory exhibits a U-turn behavior for the first time. We provide in Supplement Section S8 the details of how to combine the no-U-turn algorithm with reversible dynamics in general.

With the automatic multiplicative adjustment of the total integration time, the combined Zigzag-Nuts algorithm only requires us to set $Δ T$ as a reasonable underestimate of an optimal integration time. Based on the intuition that the integration time should be proportional to a width of the target in the least constrained direction (Neal 2010), we choose $Δ T$ for Zigzag-Nuts as follows. In the absence of truncation, this width of the target is proportional to $ν_{max}^{1 / 2} (Φ^{- 1}) = ν_{min}^{- 1 / 2} (Φ)$ where $ν_{max}$ and $ν_{min}$ denote the largest and smallest eigenvalues, both of which can be computed quickly via a small number of matrix-vector operations $w \to Φ^{- 1} w$ or $w \to Φ w$ using the Lanzcos algorithm (Meurant 2006). For the standard HMC based on Gaussian momentum, an optimal integration time on multivariate Gaussian targets is $T = ϖ ν_{min}^{- 1 / 2} (Φ) / 2$ (Bou-Rabee & Sanz-Serna 2018), where $ϖ \approx$ 3.14 denotes Archimedes’s constant. This suggests that $Δ T \approx ν_{min}^{- 1 / 2} (Φ)$ should be close to the upper end of reasonable base integration times. We hence propose a choice $Δ T = ν_{min}^{- 1 / 2} (Φ) Δ T_{rel}$ for $Δ T_{rel} \leq 1$ , where $Δ T_{rel}$ represents a base integration time relative to the target’s width $ν_{min}^{- 1 / 2} (Φ)$ .

In our numerical results, we find that $Δ T_{rel} = 0.1$ works well in a broad range of problems. We observe further performance gains from a larger value, i.e. $Δ T_{rel} > 0.1$ , if the target is highly constrained. We use $Δ T_{rel} = 0.1$ in this section for simplicity’s sake, but additional numerical results are available in Supplement Section S9.

4.2. Study set-up and efficiency metrics

The existing empirical evaluations of Markovian zigzag rely on simple low-dimensional target distributions; consequently, there is great interest in having its performance tested on more challenging higher-dimensional problems (Dunson & Johndrow 2020). We start from where Bierkens et al. (2022) left off — 256-dimensional correlated Gaussians (without truncation) — and first test the two zigzags on synthetic truncated Gaussians of dimension up to 4,096. We then proceed to a real-world application, comparing the performances of the two zigzags on a 11,235-dimensional truncated Gaussian posterior.

We compare the two zigzags’ performances using effective sample sizes (ESS), a well-established metric for quantifying efficiency of Markov chain Monte Carlo (MCMC) algorithms (Geyer 2011). When assessing a relative performance of two MCMC algorithms, we also need to account for their per-iteration computational costs. We therefore report ESS per unit time, as is commonly done in the literature, where “time” refers to the actual time it takes for our code to run and is not to be confused with the time scales of zigzag dynamics. We note that relative computational speed can vary significantly from one computing environment to another due to various performance optimization strategies used by modern hardware, such as instruction-level parallelism and multi-tiered memory cache (Guntheroth 2016, Nishimura & Suchard 2023, Holbrook et al. 2020). For this reason, while ESS per time is arguably the most practically relevant metric, we consider an alternative platform-independent performance metric in Supplement Section S10.

We run Zigzag-Nuts for 25,000 iterations on each synthetic posterior of Section 4.3. Each iteration of Zigzag-Nuts is more computationally intensive on the real-data posterior of Section 4.4, but also mixes more efficiently than on the hardest synthetic posterior. We hence use a shorter chain of 1,500 iterations on the real-data one. With these chain lengths, we obtain at least 100 ESS along each coordinate in all our examples.

For each Markovian zigzag simulation, we collect MCMC samples spaced at time intervals of size $Δ T$ , the base integration time for Zigzag-Nuts. This way, we sample Markovian zigzag at least as frequently as Hamiltonian zigzag along their respective trajectories. We thus ensure a fair comparison between the two zigzags and, if any, tilt the comparison in favor of Markovian zigzag. To obtain at least 100 ESS along each coordinate, we simulate Markovian zigzag for $T = 250, 000 \times Δ T$ on each synthetic posterior and $T = 1, 500 \times Δ T$ on the real-data posterior, generating 250,000 and 1,500 samples respectively.

Note that, while both zigzags can in theory utilize entire trajectories to estimate posterior quantities of interest (Bierkens et al. 2019a, Nishimura & Dunson 2020), such approaches are often impractical in high-dimensional settings due to memory constraints. In the 11,235-dimensional example of Section 4.4, for example, 1,500 iterations of Markovian (Hamiltonian) zigzag undergoes 1.6 × 10⁸ (2.1 × 10⁸) velocity switch events. Storing all these event locations would require 1.8 (2.4) TB in 64-bit double precision, while providing little practical benefit because of their high auto-correlations.

We implement both zigzags (Algorithm S2 and S3 in Supplement Section S5) in the Java programming language as part of the Bayesian phylogenetic software beast; the code and instruction to reproduce the results are available at https://github.com/aki-nishimura/code-for-hamiltonian-zigzag-2024. We run each MCMC on a c5.2xlarge instance in Amazon Elastic Compute Cloud, equipped with 4 Intel Xeon Platinum 8124M processors and 16 GB of memory. For each target, we repeat the simulation 5 times with different seeds and report ESS averaged over these 5 independent replicates. ESS’s are computed using the R CODA package (Plummer et al. 2006).

4.3. Threshold model posteriors under correlated Gaussian priors

Bierkens et al. (2022) consider Gaussian targets with compound symmetric covariance

Var (x_{i}) = 1, Cov (x_{i}, x_{j}) = ρ \in [0, 1) for i \neq j .

(15)

Such a distribution can be interpreted as a prior induced by a model $x_{i} = ρ^{1 / 2} z + {(1 - ρ)}^{1 / 2} ϵ_{i}$ , with shared latent factor $z \sim 𝒩 (0, 1)$ and individual variations $ϵ_{i} \sim 𝒩 (0, 1)$ . We construct truncated Gaussian posteriors by assuming a threshold model

y_{i} = 𝟙 {x_{i} > 0} - 𝟙 {x_{i} \leq 0} .

An arbitrary thresholding would make it difficult to get any feel of the geometric structure behind the resulting truncated Gaussian posterior. We hence assume a simple thresholding with $y_{i} = 1$ for all $i$ , inducing posteriors constrained to the positive orthant ${x_{i} > 0}$ . We investigate the effect of degree of correlation on zigzags’ performance by varying the correlation coefficient, using the values $ρ = 1 - {(0.1)}^{k}$ for $k = 0, 1, 2$ .

The numerical result for the compound symmetric posteriors, including the i.i.d. case $ρ = 0$ , is summarized in Table 1. Since $x_{i}$ are exchangeable, we calculate ESS only along the first coordinate. We also calculate ESS along the principal eigenvector of $Φ^{- 1}$ since HMC typically struggles most in sampling from the least constrained direction (Neal 2010). Markovian zigzag face a similar challenge as evidenced by the visual comparison of the two zigzag samples projected onto the principal component (Figure 4).

Table 1:

ESS per computing time — relative to that of Markovian zigzag — under the compound symmetric posteriors. We test the algorithms under three correlation parameter values ( $ρ = 0$ , 0.9, and 0.99) and two varying dimensions ( $d = 256$ and 1,024). ESS’S are calculated along the first coordinate and along the principal eigenvector of $Φ^{- 1}$ , each shown under the labels $" x_{1} "$ and “PC.”

	Relative Ess per time
Compound symmetric	$ρ = 0$	$ρ = 0.9$		$ρ = 0.99$
	$x_{1}$	$x_{1}$	PC	$x_{1}$	PC

Case: $d = 256$
Markovian	1	1	1	1	1
Zigzag-Nuts	0.64	4.5	4.6	41	40
Zigzag-Hmc $(T_{rel} = \sqrt{2})$	5.5	46	66	180	180
Case: $d = 1$ ,024
Zigzag-Nuts	0.57	4.7	4.5	54	54
Zigzag-Hmc $(T_{rel} = \sqrt{2})$	5.6	56	85	300	300

Open in a new tab

Figure 4: — Traceplot of the zigzag samples from the 1,024-dimensional compound symmetric posterior (15) with $ρ = 0.99$ , projected onto the principal component $u = (1, \dots, 1) / \sqrt{d}$ via a map $x \to 〈 x, u 〉$ . The horizontal axis is scaled to represent the number of velocity switch events.

As predicted, Hamiltonian zigzag demonstrates increasingly superior performance over its Markovian counterpart as the correlation increases, delivering 4.5 to 4.7-fold gains in relative ESS at $ρ = 0.9$ and 40 to 54-fold gains at $ρ = 0.99$ . The efficiency gain is generally greater at the higher dimension $d = 1, 024$ . For the i.i.d. case, Hamiltonian zigzag seems to have no advantage. This is in a sense expected since, on an i.i.d. target, both zigzags become equivalent to running $d$ independent one-dimensional dynamics and have no interactions among the coordinates. In other words, Hamiltonian zigzag’s additional momentum plays no role when parameters are independent. Also, for a univariate Gaussian target, Markovian zigzag has been shown to induce negative auto-correlations and thus achieve sampling efficiency above that of independent Monte Carlo (Bierkens & Duncan 2017).

The compound symmetric targets here have a particularly simple correlation structure; the covariance matrix can be written as $Φ^{- 1} = (1 - ρ) I + ρ 1 1^{⊤}$ for $1 = (1, \dots, 1)$ , meaning that the probability is tightly concentrated along the principal component and is otherwise distributed symmetrically in all the other directions. This simple structure in particular allows us to manually identify an effective integration time $T$ for Hamiltonian zigzag without too much troubles. We therefore use this synthetic example to investigate how Zigzag-Nuts perform relative to manually-tuned Hamiltonian zigzag, which we denote as “Zigzag-HMC” in Table 1.

For each of the three targets, we try $T = ν_{min}^{- 1 / 2} (Φ) T_{rel}$ with $T_{rel} = 2^{k / 2}$ for $k = - 2, - 1, 0, 1, 2$ . We report the ESS’s based on $T_{rel} = 2^{1 / 2}$ in Table 1 as we find this choice to yield the optimal ESS in the majority of cases. We see that manually-optimized Hamiltonian zigzag delivers substantial increases in ESS compared to Zigzag-Nuts. Such efficiency gains are also observed by the authors who proposed alternative methods for tuning HMC (Wang et al. 2013, Wu et al. 2018). The results here indicate that their tuning approaches may be worthy alternatives to Zigzag-Nuts and may further reinforce Hamiltonian zigzag’s advantage over Markovian zigzag.

Finally, given that the zigzags’s motions are restricted to the discrete set of directions $v \in {\pm 1}^{d}$ and that the compound symmetric posteriors happen to be concentrated along one of these directions, one may wonder whether this coincidental structure affect the above numerical results. To answer this question, we conduct additional simulations with rotated versions of the compound symmetric posterior, corresponding to the covariance matrices $Φ^{- 1} = (1 - ρ) I + ρ u u^{⊤}$ with the principal components $u \in ℝ^{d}$ drawn uniformly from the $(d - 1)$ -dimensional sphere. These additional simulations indicate that essentially the same pattern holds in the relative performance of two zigzags and that absolute ESS per time changes little across different rotations of the target (Supplement Section S11). The latter finding reinforces the theoretical results of Bierkens et al. (2023) who show that, under a class of anistropic Gaussian targets, any deviation from diagonal covariance results in diffusive behavior in Markovian zigzag. On the other hand, the inertia provided by full momentum information appears to endow Hamiltonian zigzag with fundamentally different behavior.

4.4. Posterior from phylogenetic multivariate probit model

We now consider a 11,235-dimensional target arising from the phylogenetic multivariate probit model of Zhang et al. (2021). For simplicity’s sake, here we describe the model with some simplifications and refer interested readers to the original work for full details.

The goal of Zhang et al. (2021) is to learn correlation structure among $m = 21$ binary biological traits across $n = 535$ HIV viruses while accounting for their shared evolutionary history. Their model assumes that, conditional on the bifurcating phylogenetic tree $𝒯$ informed by the pathogen genome sequences, latent continuous biological traits $x_{1}, \dots, x_{n} \in ℝ^{m}$ follows Brownian motion along the tree with an unknown $m \times m$ diffusion covariance $Γ$ (Figure 5). The latent traits $x = [x_{1}, \dots, x_{n}]$ map to the binary observation $y \in ℝ^{m \times n}$ via the threshold model $y_{i j} = 𝟙 {x_{i j} > 0} - 𝟙 {x_{i j} \leq 0}$ .

Figure 5: — Example paths of latent biological traits following the phylogenetic diffusion. The traits of two distinct organisms evolve together until a branching event. Beyond that point, the traits evolve independently but with the same diffusion covariance induced by a shared bio-molecular mechanism.

The model gives rise to a posterior distribution on the joint space $(x, Γ, 𝒯)$ of latent biological traits, diffusion covariance, and phylogenetic tree. To deal with this complex space, Zhang et al. (2021) deploys a Gibbs sampler, the computational bottleneck of which is updating $x$ from its full conditional — a 11,235 = 21 × 535 dimensional truncated multivariate Gaussian. The $m n \times m n$ precision matrix $Φ = Φ (Γ, 𝒯)$ of $x ∣ Γ, 𝒯$ induced by the phylogenetic Brownian diffusion model changes at each Gibbs iteration, precluding use of sampling algorithms that require expensive pre-processings of $Φ$ . For the purpose of our simulation, we fix $Γ$ at the highest posterior probability sample and $𝒯$ at the maximum clade credibility tree obtained from the prior analysis by Zhang et al. (2021). Our target then is the distribution $x ∣ Γ, 𝒯 \sim 𝒩 (μ (Γ, 𝒯), Φ^{- 1} (Γ, 𝒯))$ truncated to ${sign (x) = y$ }. There are 404 (3.6%) missing entries in $y$ and the target remains unconstrained along these coordinates. The model is parametrized in such a way that the marginal variances of $x ∣ Γ, 𝒯$ are comparable across the coordinates; the diagonal preconditioning technique for HMC (Stan Development Team 2018, Nishimura et al. 2020), therefore, plays little role in this application (Supplement Section S12). The target distribution parameters $μ$ , $Φ$ , and $y$ are available on a Zenodo repository at http://doi.org/10.5281/zenodo.4679720.

On this real-world posterior, Hamiltonian zigzag again outperforms its Markovian counterpart, delivering 6.5-fold increase in the minimum ESS across the coordinates and 19-fold increase in the ESS along the principal eigenvector of $Φ^{- 1} (Γ, 𝒯)$ (Table 2). There is no simple way to characterize the underlying correlation structure for this non-synthetic posterior, so we provide a histogram of the pairwise correlations in Figure 6 as a crude descriptive summary. We in particular find that only 0.00156% of correlations have magnitudes above 0.9. However, the joint structure, truncation, and high-dimensionality apparently make for a complex target, which Hamiltonian zigzag can explore more efficiently by virtue of its full momentum.

Table 2:

Relative ESS per computing time under the phylogenetic probit posterior $(d = 11,235)$ . The “min” label indicates the minimum ESS across all the coordinates.

Phylogenetic probit	Relative Ess per time
Phylogenetic probit	min	PC

Markovian	1	1
Zigzag-Nuts	6.5	19

Open in a new tab

Figure 6: — Histogram of the pairwise correlations, i.e. the upper off-diagonal entries of $Φ^{- 1} (Γ, 𝒯)$ , in the phylogenetic probit posterior. The vertical axis is in log₁₀ scale. No correlation falls outside the horizontal plot range [−0.5,1.0], with the smallest and largest correlation being −0.416 and 0.989. Out of $m n (m n - 1) / 2 \approx$ 63.1 × 10⁶ correlations, 55.3×10⁶ (87.6%) lie within [−0.2,0.2] and 987 (0.00156%) within [0.9,1].

5. Discussion

In recent years, both piecewise deterministic Markov process (PDMP) samplers and HMC have garnered intense research interests as potential game changers for Bayesian computation. In this article, we established that one of the most prominent PDMP is actually a close cousin of HMC, differing only in the amount of “momentum” they are born with. This revelation provided novel insights into relative performance of the two samplers, demonstrated via the practically relevant case of truncated multivariate Gaussian targets.

The uncovered kinship between two zigzags begs the question: is there a more general relationship between PDMP and HMC? Searching for an affirmative answer to this question would require us to go beyond the current HMC framework based on classical Hamiltonian dynamics. Discontinuous changes in velocity seen in PDMP cannot be imitated by smooth Hamiltonian dynamics; such behavior is possible under Hamiltonian zigzag only because of its non-differentiable momentum distribution.

In a sense, our result shows that HMC’S momentum is really made of two components: direction and magnitude of inertia. PDMP’S velocity on the other hand consists only of direction. We suspect it is possible to introduce a notion of inertia magnitudes to PDMP in the form of auxiliary parameters — these parameters can interact with the main parameters so as to emulate the behavior of HMC, preserving the total log-density and storing inertia obtained from potential energy downhills for later use.

While we leave more thorough investigations for future research, we now demonstrate how the above idea translates into a novel non-reversible algorithm based on Hamiltonian-like dynamics in a discrete space. We illustrate the approach on a cyclic graph $G$ with $n$ vertices, i.e. $n$ discrete states placed along a circle. For $x \in G$ , we use $x + 1$ to denote the neighbor in the clockwise direction and $x - 1$ in the counter-clockwise direction.

We first augment the position space ${x \in G}$ with “direction” $v \in {- 1, + 1}$ and “inertia” $| p | \geq 0 -$ there is no native notion of “momentum $p$ ” in this setting, but we use the notation and terminology to draw parallels with Hamiltonian zigzag. On this augmented space, we define discrete Hamiltonian dynamics with associated “potential energy” $U (x)$ , whose trajectory ${x (t), v (t), | p | (t)}$ evolves as follows. At each time $t \in ℤ$ , the next state is given by

\begin{matrix} x (t + 1) = x (t) + v (t), v (t + 1) = v (t), \\ | p | (t + 1) = | p | (t) - U (x (t) + v (t)) + U (x (t)) \end{matrix}

if $| p | (t) \geq U (x (t) + v (t)) - U (x (t))$ ; otherwise,

x (t + 1) = x (t), v (t + 1) = - v (t), | p | (t + 1) = | p | (t) .

Figure 7 visually illustrates behavior of this discrete dynamics, which has much conceptual similarity to one-dimensional Hamiltonian zigzag as studies in Section 3.1 and illustrated in Figure 3. In particular, the dynamics by design conserves the sum $U (x) + | p |$ , with the “inertia” $| p |$ playing the role analogous to that of the kinetic energy $K (p)$ in the continuous case. It is straightforward to verify that the dynamics is reversible and volume-preserving, admits $π (x, v, | p |) = \frac{1}{2} π_{X} (x) \exp (- | p |)$ for $π_{X} (x) \propto \exp (- U (x))$ as an invariant distribution, and thus constitutes a valid transition kernel (Fang et al. 2014, Nishimura et al. 2020).

Now consider adding a “partial momentum refreshment” to the above dynamics, drawing $| p | (t) \sim Exp (1)$ and flipping the sign of $v (t)$ with probability $r \in [0, 1]$ at the beginning of each time step. Remarkably, the position-velocity marginal of this dynamics coincides with the general form of non-reversible algorithms presented in Section 5 of Diaconis et al. (2000). Under a “full momentum refreshment” of drawing $| p | (t) \sim Exp (1)$ and $v (t) \sim$ Unif $({- 1, + 1})$ , the dynamics reduces to a Metropolis algorithm with symmetric proposals to the neighbors. We can analogously construct Hamiltonian-like dynamics on a more general discrete space, adding partial momentum refreshment to which recovers the fiber algorithm of Diaconis et al. (2000) and its generalization by Herschlag et al. (2020).

Despite using HMC as an inspiration, Diaconis et al. (2000) and subsequent work have failed to recognize the actual HMC analogues lurking behind their non-reversible methods. This is understandable — the notion of momentum from smooth Hamiltonian dynamics does not easily transfer to discrete spaces or non-differentiable paths of PDMP. On the other hand, having established the explicit link between the two zigzags and having identified the dual role of momentum in providing both direction and inertia, we only need a bit of outside-the-box thinking to extend the idea to discrete spaces.

A unified framework for the HMC and other non-reversible paradigms would present a number of opportunities. For example, the introduction of momentum to Herschlag et al. (2020)’s flow-based algorithm could improve its performance by reducing random walk behavior along each flow. There could also be cross-fertilization of ideas between continuous- and discrete-space methods. The elaborate procedures developed in the PDMP literature for choosing the next direction at an event time (Fearnhead et al. 2018, Wu & Robert 2020), for example, is likely applicable to the flow-based algorithm in discrete spaces. All in all, our revelation of the two zigzags’ kinship — and the insights generated from it — will likely catalyze further novel developments in Monte Carlo methods.

Supplementary Material

Supp 1

NIHMS2023543-supplement-Supp_1.pdf^{(1.6MB, pdf)}

Acknowledgment

This work is supported by: the Alfred P. Sloan Foundation and the Johns Hopkins Open Source Programs Office (A.N.); Food and Drug Administration grant HHS 75F40120D00039 (A.N. and M.A.S.); as well as NIH grants U19 AI135995, R01 AI153044 and R01 AI162611 (M.A.S.).

Footnotes

Disclosure Statement

The authors report no competing interests.

Markovian zigzag based on the rate (10) is referred to as the canonical zigzag by Bierkens et al. (2019a) and is the predominant version in the literature. More generally, however, any Poission rate satisfying $λ_{i} (x, v) - λ_{i} (x, F_{i} (v)) = v_{i} \partial_{i} U (x)$ can be used, where ${[F_{i} (v)]}_{i} = - v_{i}$ and ${[F_{i} (v)]}_{j} = v_{j}$ for $j \neq i$ .

Contributor Information

Akihiko Nishimura, Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University.

Zhenyu Zhang, Department of Biostatistics, University of California, Los Angeles.

Marc A. Suchard, Department of Biostatistics, Computational Medicine, and Human Genetics,University of California, Los Angeles

References

Andrieu C. & Livingstone S. (2021), ‘Peskun–tierney ordering for Markovian Monte Carlo: Beyond the reversible scenario’, The Annals of Statistics 49(4), 1958–1981. [Google Scholar]
Bardenet R, Doucet A. & Holmes C. (2017), ‘On Markov chain Monte Carlo methods for tall data’, Journal of Machine Learning Research 18(47). [Google Scholar]
Bierkens J. & Duncan A. (2017), ‘Limit theorems for the zig-zag process’, Advances in Applied Probability 49(3), 791–825. [Google Scholar]
Bierkens J, Fearnhead P. & Roberts G. (2019a), ‘The zig-zag process and super-efficient sampling for Bayesian analysis of big data’, The Annals of Statistics 47(3), 1288–1320. [Google Scholar]
Bierkens J, Kamatani K. & Roberts G. (2022), ‘High-dimensional scaling limits of piecewise deterministic sampling algorithms’, The Annals of Applied Probability 32(5), 3361–3407. [Google Scholar]
Bierkens J, Kamatani K. & Roberts GO (2023), ‘Scaling of piecewise deterministic Monte Carlo for anisotropic targets’, arXiv:2305.00694. [Google Scholar]
Bierkens J, Roberts G. & Zitt P-A (2019b), ‘Ergodicity of the zigzag process’, The Annals of Applied Probability 29(4), 2266–2301. [Google Scholar]
Billingsley P. (1999), Convergence of Probability Measures, Wiley Series in Probability and Statistics, John Wiley & Sons Inc. [Google Scholar]
Botev ZI (2017), ‘The normal law under linear restrictions: simulation and estimation via minimax tilting’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(1), 125–148. [Google Scholar]
Bou-Rabee N. & Sanz-Serna J. (2018), ‘Geometric integrators and the Hamiltonian Monte Carlo method’, Acta Numerica 27, 113–206. [Google Scholar]
Bouchard-Côté A, Vollmer SJ & Doucet A. (2018), ‘The bouncy particle sampler: a non-reversible rejection-free Markov chain Monte Carlo method’, Journal of the American Statistical Association pp. 1–13.30034060 [Google Scholar]
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P. & Riddell A. (2017), ‘Stan: A probabilistic programming language’, Journal of Statistical Software 76(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
Chin A. & Nishimura A. (2024), ‘MCMC using bouncy Hamiltonian dynamics: A unifying framework for Hamiltonian Monte Carlo and piecewise deterministic processes’, arXiv:2405.08290. [Google Scholar]
Deligiannidis G, Paulin D, Bouchard-Côté A. & Doucet A. (n.d.), ‘Randomized Hamiltonian Monte Carlo as scaling limit of the bouncy particle sampler and dimension-free convergence rates’, The Annals of Applied Probability 31(6), 2612–2662. [Google Scholar]
Diaconis P, Holmes S. & Neal RM (2000), ‘Analysis of a nonreversible Markov chain sampler’, The Annals of Applied Probability pp. 726–752. [Google Scholar]
Duane S, Kennedy AD, Pendleton BJ & Roweth D. (1987), ‘Hybrid Monte Carlo’, Physics Letters B 195(2), 216–222. [Google Scholar]
Dunson DB & Johndrow J. (2020), ‘The Hastings algorithm at fifty’, Biometrika 107(1), 1–23. [Google Scholar]
Ethier SN & Kurtz TG (2005), Markov processes: characterization and convergence, John Wiley & Sons. [Google Scholar]
Fang Y, Sanz-Serna J-M & Skeel RD (2014), ‘Compressible generalized hybrid Monte Carlo’, The Journal of Chemical Physics 140(17), 174108. [DOI] [PubMed] [Google Scholar]
Fearnhead P, Bierkens J, Pollock M. & Roberts G. (2018), ‘Piecewise deterministic Markov processes for continuous-time Monte Carlo’, Statistical Science 33(3), 386–412. [Google Scholar]
Fetecau RC, Marsden JE, Ortiz M. & West M. (2003), ‘Nonsmooth Lagrangian mechanics and variational collision integrators’, SIAM Journal on Applied Dynamical Systems 2(3), 381–416. [Google Scholar]
Filippov AF (1988), Differential equations with discontinuous righthand sides, Kluwer Academic Publishers. [Google Scholar]
Geyer C. (2011), Introduction to Markov chain Monte Carlo, in ‘Handbook of Markov Chain Monte Carlo’, CRC Press, pp. 3–48. [Google Scholar]
Guntheroth K. (2016), Optimized C++: Proven Techniques for Heightened Performance, O’Reilly Media, Inc. [Google Scholar]
Herschlag G, Mattingly JC, Sachs M. & Wyse E. (2020), ‘Non-reversible Markov chain Monte Carlo for sampling of districting maps’, arXiv:2008.07843. [Google Scholar]
Hoffman MD & Gelman A. (2014), ‘The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo’, Journal of Machine Learning Research 15(1), 1593–1623. [Google Scholar]
Holbrook AJ, Lemey P, Baele G, Dellicour S, Brockmann D, Rambaut A. & Suchard MA (2020), ‘Massive parallelization boosts big bayesian multidimensional scaling’, Journal of Computational and Graphical Statistics pp. 1–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
Khulief Y. (2013), ‘Modeling of impact in multibody systems: an overview’, Journal of Computational and Nonlinear Dynamics 8(2). [Google Scholar]
Meurant GA (2006), The Lanczos and Conjugate Gradient Algorithms: from Theory to Finite Precision Computations, Society for Industrial and Applied Mathematics. [Google Scholar]
Neal RM (2010), MCMC using Hamiltonian Dynamics, in ‘Handbook of Markov chain Monte Carlo’, CRC Press. [Google Scholar]
Nishimura A. & Dunson D. (2020), ‘Recycling intermediate steps to improve Hamiltonian Monte Carlo’, Bayesian Analysis 15(4), 1087–1108. [Google Scholar]
Nishimura A, Dunson D. & Lu J. (2020), ‘Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods’, Biometrika 107(2), 365–380. [Google Scholar]
Nishimura A. & Suchard MA (2023), ‘Supplement to “Prior-preconditioned conjugate gradient for accelerated Gibbs sampling in ‘large n, large p’ sparse Bayesian regression”’, Journal of the American Statistical Association 118(544), 2468–2481. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pakman A. & Paninski L. (2014), ‘Exact Hamiltonian Monte Carlo for truncated multivariate gaussians’, Journal of Computational and Graphical Statistics 23(2), 518–542. [Google Scholar]
Peters EAJF & de With G. (2012), ‘Rejection-free Monte Carlo sampling for general potentials’, Physical Review E 85, 026703. [DOI] [PubMed] [Google Scholar]
Plummer M, Best N, Cowles K. & Vines K. (2006), ‘CODA: Convergence diagnosis and output analysis for MCMC’, R News 6(1), 7–11. [Google Scholar]
Salvatier J, Wiecki TV & Fonnesbeck C. (2016), ‘Probabilistic programming in Python using PyMC3’, PeerJ Computer Science 2, e55. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shahbaba B, Lan S, Johnson WO & Neal RM (2014), ‘Split Hamiltonian Monte Carlo’, Statistics and Computing 24(3), 339–349. [Google Scholar]
Sherlock C. & Thiery A. (2021), ‘A discrete bouncy particle sampler’, Biometrika. [Google Scholar]
Spivak M. (1965), Calculus on manifolds: a modern approach to classical theorems of advanced calculus, Addison-Wesley. [Google Scholar]
Stan Development Team (2018), Stan Modeling Language Users Guide and Reference Manual, Version 2.18.0. URL: http://mc-stan.org/ [Google Scholar]
Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ & Rambaut A. (2018), ‘Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10’, Virus Evolution 4(1), vey016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Turitsyn KS, Chertkov M. & Vucelja M. (2011), ‘Irreversible Monte Carlo algorithms for efficient sampling’, Physica D: Nonlinear Phenomena 240(4–5), 410–414. [Google Scholar]
Vanetti P, Bouchard-Côté A, Deligiannidis G. & Doucet A. (2017), ‘Piecewise-deterministic Markov chain Monte Carlo’, arXiv:1707.05296. [Google Scholar]
Wang Z, Mohamed S. & Freitas N. (2013), Adaptive Hamiltonian and Riemann manifold Monte Carlo, in ‘Proceedings of the 30th International Conference on Machine Learning’, pp. 1462–1470. [Google Scholar]
Wu C. & Robert CP (2020), ‘Coordinate sampler: a non-reversible gibbs-like MCMC sampler’, Statistics and Computing 30(3), 721–730. [Google Scholar]
Wu C, Stoehr J. & Robert CP (2018), ‘Faster Hamiltonian Monte Carlo by learning leapfrog scale’, arXiv:1810.04449. [Google Scholar]
Zhang Z, Chin A, Nishimura A. & Suchard MA (2022), ‘hdtg: an R package for high-dimensional truncated normal simulation’, arXiv:2210.01097. [Google Scholar]
Zhang Z, Nishimura A, Bastide P, Ji X, Payne RP, Goulder P, Lemey P. & Suchard MA (2021), ‘Large-scale inference of correlation among mixed-type biological traits with Phylogenetic multivariate probit models’, The Annals of Applied Statistics 15(1), 230–251. [Google Scholar]
Zhang Z, Nishimura A, Trovão NS, Cherry JL, Holbrook AJ, Ji X, Lemey P. & Suchard MA (2023), ‘Accelerating Bayesian inference of dependency between mixed-type biological traits’, PLOS Computational Biology 19(8), e1011419. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

NIHMS2023543-supplement-Supp_1.pdf^{(1.6MB, pdf)}

[R1] Andrieu C. & Livingstone S. (2021), ‘Peskun–tierney ordering for Markovian Monte Carlo: Beyond the reversible scenario’, The Annals of Statistics 49(4), 1958–1981. [Google Scholar]

[R2] Bardenet R, Doucet A. & Holmes C. (2017), ‘On Markov chain Monte Carlo methods for tall data’, Journal of Machine Learning Research 18(47). [Google Scholar]

[R3] Bierkens J. & Duncan A. (2017), ‘Limit theorems for the zig-zag process’, Advances in Applied Probability 49(3), 791–825. [Google Scholar]

[R4] Bierkens J, Fearnhead P. & Roberts G. (2019a), ‘The zig-zag process and super-efficient sampling for Bayesian analysis of big data’, The Annals of Statistics 47(3), 1288–1320. [Google Scholar]

[R5] Bierkens J, Kamatani K. & Roberts G. (2022), ‘High-dimensional scaling limits of piecewise deterministic sampling algorithms’, The Annals of Applied Probability 32(5), 3361–3407. [Google Scholar]

[R6] Bierkens J, Kamatani K. & Roberts GO (2023), ‘Scaling of piecewise deterministic Monte Carlo for anisotropic targets’, arXiv:2305.00694. [Google Scholar]

[R7] Bierkens J, Roberts G. & Zitt P-A (2019b), ‘Ergodicity of the zigzag process’, The Annals of Applied Probability 29(4), 2266–2301. [Google Scholar]

[R8] Billingsley P. (1999), Convergence of Probability Measures, Wiley Series in Probability and Statistics, John Wiley & Sons Inc. [Google Scholar]

[R9] Botev ZI (2017), ‘The normal law under linear restrictions: simulation and estimation via minimax tilting’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(1), 125–148. [Google Scholar]

[R10] Bou-Rabee N. & Sanz-Serna J. (2018), ‘Geometric integrators and the Hamiltonian Monte Carlo method’, Acta Numerica 27, 113–206. [Google Scholar]

[R11] Bouchard-Côté A, Vollmer SJ & Doucet A. (2018), ‘The bouncy particle sampler: a non-reversible rejection-free Markov chain Monte Carlo method’, Journal of the American Statistical Association pp. 1–13.30034060 [Google Scholar]

[R12] Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P. & Riddell A. (2017), ‘Stan: A probabilistic programming language’, Journal of Statistical Software 76(1). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Chin A. & Nishimura A. (2024), ‘MCMC using bouncy Hamiltonian dynamics: A unifying framework for Hamiltonian Monte Carlo and piecewise deterministic processes’, arXiv:2405.08290. [Google Scholar]

[R14] Deligiannidis G, Paulin D, Bouchard-Côté A. & Doucet A. (n.d.), ‘Randomized Hamiltonian Monte Carlo as scaling limit of the bouncy particle sampler and dimension-free convergence rates’, The Annals of Applied Probability 31(6), 2612–2662. [Google Scholar]

[R15] Diaconis P, Holmes S. & Neal RM (2000), ‘Analysis of a nonreversible Markov chain sampler’, The Annals of Applied Probability pp. 726–752. [Google Scholar]

[R16] Duane S, Kennedy AD, Pendleton BJ & Roweth D. (1987), ‘Hybrid Monte Carlo’, Physics Letters B 195(2), 216–222. [Google Scholar]

[R17] Dunson DB & Johndrow J. (2020), ‘The Hastings algorithm at fifty’, Biometrika 107(1), 1–23. [Google Scholar]

[R18] Ethier SN & Kurtz TG (2005), Markov processes: characterization and convergence, John Wiley & Sons. [Google Scholar]

[R19] Fang Y, Sanz-Serna J-M & Skeel RD (2014), ‘Compressible generalized hybrid Monte Carlo’, The Journal of Chemical Physics 140(17), 174108. [DOI] [PubMed] [Google Scholar]

[R20] Fearnhead P, Bierkens J, Pollock M. & Roberts G. (2018), ‘Piecewise deterministic Markov processes for continuous-time Monte Carlo’, Statistical Science 33(3), 386–412. [Google Scholar]

[R21] Fetecau RC, Marsden JE, Ortiz M. & West M. (2003), ‘Nonsmooth Lagrangian mechanics and variational collision integrators’, SIAM Journal on Applied Dynamical Systems 2(3), 381–416. [Google Scholar]

[R22] Filippov AF (1988), Differential equations with discontinuous righthand sides, Kluwer Academic Publishers. [Google Scholar]

[R23] Geyer C. (2011), Introduction to Markov chain Monte Carlo, in ‘Handbook of Markov Chain Monte Carlo’, CRC Press, pp. 3–48. [Google Scholar]

[R24] Guntheroth K. (2016), Optimized C++: Proven Techniques for Heightened Performance, O’Reilly Media, Inc. [Google Scholar]

[R25] Herschlag G, Mattingly JC, Sachs M. & Wyse E. (2020), ‘Non-reversible Markov chain Monte Carlo for sampling of districting maps’, arXiv:2008.07843. [Google Scholar]

[R26] Hoffman MD & Gelman A. (2014), ‘The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo’, Journal of Machine Learning Research 15(1), 1593–1623. [Google Scholar]

[R27] Holbrook AJ, Lemey P, Baele G, Dellicour S, Brockmann D, Rambaut A. & Suchard MA (2020), ‘Massive parallelization boosts big bayesian multidimensional scaling’, Journal of Computational and Graphical Statistics pp. 1–34. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Khulief Y. (2013), ‘Modeling of impact in multibody systems: an overview’, Journal of Computational and Nonlinear Dynamics 8(2). [Google Scholar]

[R29] Meurant GA (2006), The Lanczos and Conjugate Gradient Algorithms: from Theory to Finite Precision Computations, Society for Industrial and Applied Mathematics. [Google Scholar]

[R30] Neal RM (2010), MCMC using Hamiltonian Dynamics, in ‘Handbook of Markov chain Monte Carlo’, CRC Press. [Google Scholar]

[R31] Nishimura A. & Dunson D. (2020), ‘Recycling intermediate steps to improve Hamiltonian Monte Carlo’, Bayesian Analysis 15(4), 1087–1108. [Google Scholar]

[R32] Nishimura A, Dunson D. & Lu J. (2020), ‘Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods’, Biometrika 107(2), 365–380. [Google Scholar]

[R33] Nishimura A. & Suchard MA (2023), ‘Supplement to “Prior-preconditioned conjugate gradient for accelerated Gibbs sampling in ‘large n, large p’ sparse Bayesian regression”’, Journal of the American Statistical Association 118(544), 2468–2481. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Pakman A. & Paninski L. (2014), ‘Exact Hamiltonian Monte Carlo for truncated multivariate gaussians’, Journal of Computational and Graphical Statistics 23(2), 518–542. [Google Scholar]

[R35] Peters EAJF & de With G. (2012), ‘Rejection-free Monte Carlo sampling for general potentials’, Physical Review E 85, 026703. [DOI] [PubMed] [Google Scholar]

[R36] Plummer M, Best N, Cowles K. & Vines K. (2006), ‘CODA: Convergence diagnosis and output analysis for MCMC’, R News 6(1), 7–11. [Google Scholar]

[R37] Salvatier J, Wiecki TV & Fonnesbeck C. (2016), ‘Probabilistic programming in Python using PyMC3’, PeerJ Computer Science 2, e55. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Shahbaba B, Lan S, Johnson WO & Neal RM (2014), ‘Split Hamiltonian Monte Carlo’, Statistics and Computing 24(3), 339–349. [Google Scholar]

[R39] Sherlock C. & Thiery A. (2021), ‘A discrete bouncy particle sampler’, Biometrika. [Google Scholar]

[R40] Spivak M. (1965), Calculus on manifolds: a modern approach to classical theorems of advanced calculus, Addison-Wesley. [Google Scholar]

[R41] Stan Development Team (2018), Stan Modeling Language Users Guide and Reference Manual, Version 2.18.0. URL: http://mc-stan.org/ [Google Scholar]

[R42] Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ & Rambaut A. (2018), ‘Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10’, Virus Evolution 4(1), vey016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] Turitsyn KS, Chertkov M. & Vucelja M. (2011), ‘Irreversible Monte Carlo algorithms for efficient sampling’, Physica D: Nonlinear Phenomena 240(4–5), 410–414. [Google Scholar]

[R44] Vanetti P, Bouchard-Côté A, Deligiannidis G. & Doucet A. (2017), ‘Piecewise-deterministic Markov chain Monte Carlo’, arXiv:1707.05296. [Google Scholar]

[R45] Wang Z, Mohamed S. & Freitas N. (2013), Adaptive Hamiltonian and Riemann manifold Monte Carlo, in ‘Proceedings of the 30th International Conference on Machine Learning’, pp. 1462–1470. [Google Scholar]

[R46] Wu C. & Robert CP (2020), ‘Coordinate sampler: a non-reversible gibbs-like MCMC sampler’, Statistics and Computing 30(3), 721–730. [Google Scholar]

[R47] Wu C, Stoehr J. & Robert CP (2018), ‘Faster Hamiltonian Monte Carlo by learning leapfrog scale’, arXiv:1810.04449. [Google Scholar]

[R48] Zhang Z, Chin A, Nishimura A. & Suchard MA (2022), ‘hdtg: an R package for high-dimensional truncated normal simulation’, arXiv:2210.01097. [Google Scholar]

[R49] Zhang Z, Nishimura A, Bastide P, Ji X, Payne RP, Goulder P, Lemey P. & Suchard MA (2021), ‘Large-scale inference of correlation among mixed-type biological traits with Phylogenetic multivariate probit models’, The Annals of Applied Statistics 15(1), 230–251. [Google Scholar]

[R50] Zhang Z, Nishimura A, Trovão NS, Cherry JL, Holbrook AJ, Ji X, Lemey P. & Suchard MA (2023), ‘Accelerating Bayesian inference of dependency between mixed-type biological traits’, PLOS Computational Biology 19(8), e1011419. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Zigzag path connects two Monte Carlo samplers: Hamiltonian counterpart to a piecewise deterministic Markov process

Akihiko Nishimura

Zhenyu Zhang

Marc A Suchard

Abstract

1. Introduction

Figure 1:

Figure 2:

2. Zigzag Hamiltonian Monte Carlo

2.1. Hamiltonian dynamics based on Laplace momentum

Theorem 2.1.

Corollary 2.2.

2.2. Simulation of Hamiltonian zigzag dynamics

Algorithm 1.

Algorithm 2.

3. Link between Hamiltonian and Markovian zigzags

3.1. Hamiltonian zigzag’s apparent similarity to Markovian zigzag

Figure 3:

3.2. Markovian zigzag as an infinite momentum refreshment limit

Theorem 3.1 (Weak convergence).

Theorem 3.2 (Strong convergence).

4. Numerical study: two zigzags duel over truncated multivariate Gaussians

4.1. Zigzag-Nuts: Hamiltonian zigzag with no-U-turn algorithm

4.2. Study set-up and efficiency metrics

4.3. Threshold model posteriors under correlated Gaussian priors

Table 1:

Figure 4:

4.4. Posterior from phylogenetic multivariate probit model

Figure 5:

Table 2:

Figure 6:

5. Discussion

Figure 7:

Supplementary Material

Acknowledgment

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases