Compressed computations using wavelets for hidden Markov models with continuous observations

Luca Bello; John Wiedenhöft; Alexander Schliep

doi:10.1371/journal.pone.0286074

. 2023 Jun 6;18(6):e0286074. doi: 10.1371/journal.pone.0286074

Compressed computations using wavelets for hidden Markov models with continuous observations

Luca Bello ¹, John Wiedenhöft ^2,^#, Alexander Schliep ^1,^3,^*,^#

Editor: Amita Nandal⁴

PMCID: PMC10243634 PMID: 37279196

Abstract

Compression as an accelerant of computation is increasingly recognized as an important component in engineering fast real-world machine learning methods for big data; c.f., its impact on genome-scale approximate string matching. Previous work showed that compression can accelerate algorithms for Hidden Markov Models (HMM) with discrete observations, both for the classical frequentist HMM algorithms—Forward Filtering, Backward Smoothing and Viterbi—and Gibbs sampling for Bayesian HMM. For Bayesian HMM with continuous-valued observations, compression was shown to greatly accelerate computations for specific types of data. For instance, data from large-scale experiments interrogating structural genetic variation can be assumed to be piece-wise constant with noise, or, equivalently, data generated by HMM with dominant self-transition probabilities. Here we extend the compressive computation approach to the classical frequentist HMM algorithms on continuous-valued observations, providing the first compressive approach for this problem. In a large-scale simulation study, we demonstrate empirically that in many settings compressed HMM algorithms very clearly outperform the classical algorithms with no, or only an insignificant effect, on the computed probabilities and infered state paths of maximal likelihood. This provides an efficient approach to big data computations with HMM. An open-source implementation of the method is available from https://github.com/lucabello/wavelet-hmms.

Introduction

Hidden Markov Models (HMM) [1] are still one of the most widely used machine learning methods across an extensive range of data modalities—texts to images, time-series of hand movements to weather events—and virtually all disciplines analyzing data, from political science [2] to education [3] and computer security [4] to autonomous vehicles [5].

Their simple structure, i.i.d. observations generated in each state of the underlying Markov chain and Markov dynamics between states, is on the one hand powerful enough for black-box modeling and on the other hand amenable to handcrafting models [6] reflecting expert knowledge from the application domain. Their immense success in bioinformatics is largely owed to the latter.

Another aspect driving their popularity is computational efficiency. The standard algorithms—Rabiner’s three problems [1]—for computing likelihood under a model, finding the optimal state sequence given an observation with the Viterbi algorithm, and each iteration of the maximum likelihood (ML) estimation of parameters, i.e., the iterations of the Baum-Welch training, are linear in the length of the observation sequence T, more exactly O(N² ⋅ T) where N is the number of states. A tighter bound can be found if the model is sparse, or more specifically, when all out-degrees of the states are bounded by a small constant as for example in profile HMMs [7].

Still, with sufficient amounts of data even a linear algorithms might not deliver running times sufficiently fast in practice. This is certainly the case when analyzing biological sequences, where for example popular gene finding models have hundreds of states. The insight that the combination of a small alphabet—the four characters A, C, G, and T representing the four constituent nucleotides of DNA molecules—and large sequence length T, up to 248 million for the longest Human chromosome, forces a high degree of repetitiveness in the sequences [8] suggested that exploiting this repetitiveness in computations may achieve a large impact.

Mozes et al. [9] recognized that for discrete observations rewriting HMM-computations as matrix multiplications of transition-emission operators M^σ, similar to [10] and observable operator models [11], allowed them to pre-compute partial products representing a frequently occurring sequence pattern and thus, with a sufficiently compressible sequence, achieve considerable speed-ups for likelihood, Viterbi and Baum-Welch training.

Also for discrete observations, Mahmud et al. [12] used transition-emission operators to substantially accelerate Forward-Backward Gibbs (FBG) sampling [13], a rapidly converging sampler for computations with fully Bayesian HMM, the fourth problem for HMM. There, marginal state probabilities conditioned on the data provides a robust alternative to locally optimal ML estimation followed by Viterbi-computation.

Clearly, there is no trivial path to extend the work from discrete to continuous-valued observations, but a specific type of important data naturally suggested a workable approach. A popular type of experiment in genetics identifies chromosomal duplications and deletions, changes where segments from several hundreds to many thousands of nucleotides of length are lost or added. The data coming from array CGH [14], SNP array platforms [15] or high-throughput sequencing [16] indicates the abundance of DNA vs. genomic location. In absence of any errors the data would be integer-valued and piecewise constant; the observed data is a noisy version of that ideal. The discrete underlying states and the large number of observations between changes explains why HMMs are a very suitable model for this data [17] and suggests a compressive computation approach similar in spirit to run-length encoding in the discrete case. The fundamental assumption is that a state change from t to t + 1 is prohibited when consecutive observations x_t and x_t+1 are close. This assumption was the basis for a greedy clustering algorithm and adapted sampling procedures leading to a substantial improvement [18] in the running times of the FBG sampler for continuous-valued observations with negligble errors in the estimation. Later, Wiedenhoeft et al. [19] proposed a principled approach based on wavelets, in which the breakpoints, i.e. the position of discontinuities in the wavelet reconstruction, defined the boundaries of segments or blocks; see section on wavelets for details. Highly optimized algorithms and data structures for computing such breakpoints [20] helped to achieve several-thousand fold acceleration and greatly improved convergence of the sampler allowing Bayesian computation even for genome-scale data at single-nucleotide resolution [21], thereby settling compressive computation for the fourth problem.

Novelty and outline

In the following, we introduce the first compressive approach for Rabiner’s three original problems [1]—likelihood computation, computing Viterbi paths, and Maximum-likelihood estimation with the Baum-Welch algorithm—for Hidden-Markov Models with continous emissions based on compression obtained with wavelets. We review the basics of the wavelet compression and present adapted equations for HMM computations in this compressive setting. With a large-scale evaluation study, we demonstrate the large improvements in running times for data which is amenable to this compressive approach, namely piece-wise constant with noise, or, equivalently, data generated by HMM with separable means and high self-transition probabilities. An open-source implementation of the method is available from https://github.com/lucabello/wavelet-hmms/tree/v1.0-thesis.

Materials and methods

First, let us fix some notation for introducing the models and (un-)compressed computations. Random variables will be denoted as upright letters (x or X), their instantiated values in italics (x or X), and their domains in script ( $X$ ). To avoid notational clutter, we denote density and mass functions as

p (x ∣ y) ≔ f_{X ∣ Y = y} (x),

where the names of the random variables involved can be inferred from their instantiations. Sometimes, instead of evaluating this function at x, we have a specific value j from context, which would make inferring the associated random variable impossible. We therefore use the notation

p (x = j ∣ y) ≔ f_{X ∣ Y = y} (j),

for disambiguation, where $j \in X$ . Accordingly, fixed values are denoted like p(x = 1∣y) and p(x∣y = 5); note that the first expression explicitly denotes a density and the second a likelihood function. Indeed, being algebraically equivalent, we also have

p (x ∣ y) ≔ L (y ∣ X = x) .

Importantly, note that Bayes’ theorem holds not only for probability measures, but for densities and mass functions as well:

f_{X ∣ Y = y} (x) = \frac{f_{Y ∣ X = x} (y)}{f_{Y} (y)} f_{X} (x) \Rightarrow p (x ∣ y) = \frac{p (y ∣ x)}{p (y)} p (x) .

In the following we introduce the three fundamental algorithms we will later accelerate with the wavelet-based compression.

The three classical HMM problems

It’s important to define some notation to properly describe the problems ahead:

$q_{t} \in Q$ denotes the hidden state at position t, and $q_{s}^{t} = (q_{s}, \dots, q_{t}) \in Q^{t - s + 1}$ denotes a state sequence, also called (or state path, generating path). For the length T of the data, we define $q = q_{1}^{T}$ for convenience.
$N = | Q |$ , the number of states of the HMM;
$A = {A_{i j}} \in A$ , 1 ≤ i, j ≤ N, the transition matrix of the Markov process on $Q$ .
$y_{t} \in Y$ denotes the observed value at position t. Analog to the definition of $q_{s}^{t}$ , $y_{s}^{t}$ is a partial observation sequence $y_{s}^{t} = (y_{s}, \dots, y_{t}) \in Y^{t - s + 1}$ and $y = y_{1}^{T}$ .
θ_j denotes the parametrization of the emission distribution of y_t in state j. Due to their importance, we denote the likelihood functions of emissions as L_j(t) ≔ p(y_t∣q_t = j, θ_j).
π = {π_i}, the initial state distribution.

The model will be often indicated through the compact notation

\begin{matrix} λ = (N, A, θ, π) . \end{matrix}

(1)

Evaluation problem

The evaluation problem concerns measuring how well a specific sequence of observations is represented by a given model, through the computation of the probability that the observed sequence was produced by the model. Given an observation sequence y and a model λ, the goal is to compute its likelihood p(y∣q, λ).

The standard algorithm used to solve the evaluation problem is the forward algorithm. The key element is the forward variable α_t(i), defined as the joint probability of observing the sequence up to time t and being in state i at time t

\begin{matrix} α_{t} (i) = p (y_{1}^{t}, q_{t} = i ∣ λ) . \end{matrix}

(2)

Through induction, the following procedure can be defined:

\begin{matrix} α_{1} (i) = π_{i} L_{i} (1), 1 \leq i \leq N \end{matrix}

(3a)

\begin{matrix} α_{t + 1} (j) = [\sum_{i = 1}^{N} α_{t} (i) A_{i j}] L_{j} (t + 1), 1 \leq t \leq T - 1, 1 \leq j \leq N \end{matrix}

(3b)

\begin{matrix} p (y ∣ λ) = \sum_{i = 1}^{N} α_{T} (i) . \end{matrix}

(3c)

Looking at the computational complexity, the number of calculations for each observation is N² (N per each state); repeating this for the whole sequence length gives a complexity that is $O (T \cdot N^{2})$ .

The backward-variables $β_{t} (j) = ≔ p (y_{t + 1}^{T} ∣ q_{t} = j)$ give the probability of observing the remainder of the observation sequence starting from state j at time t and lend themselves to computation with a similar dynamic programming scheme as for the α_t(i).

Decoding problem

The decoding problem deals with the computation of the most likely generating path of an observation sequence for a given model; formally, this means finding the maximization of p(q∣y, λ). The standard approach is to use the Viterbi algorithm: it follows a similar strategy to the forward algorithm, but applying a maximization in place of the summation,

\begin{matrix} δ_{t} (i) = max_{q_{1}^{t - 1}} p (q_{1}^{t - 1}, q_{t} = i, y_{1}^{t} ∣ λ) . \end{matrix}

(4)

The most likely path will be the argument of this maximization over all the states considering the whole observations sequence; it can be defined in the notation as ψ. Through induction it is possible to write the following equations:

\begin{matrix} δ_{1} (i) = π_{i} L_{i} (1), 1 \leq i \leq N \\ ψ_{1} = 0 \end{matrix}

(5a)

\begin{matrix} δ_{t} (j) = max_{1 \leq i \leq N} [δ_{t - 1} (i) A_{i j}] L_{j} (t), 2 \leq t \leq T, 1 \leq j \leq N \\ ψ_{t} (j) = \underset{1 \leq i \leq N}{arg max} [δ_{t - 1} (i) A_{i j}], 2 \leq t \leq T, 1 \leq j \leq N \end{matrix}

(5b)

\begin{matrix} P^{*} = max_{1 \leq i \leq N} [δ_{T} (i)] \\ q_{T}^{*} = \underset{1 \leq i \leq N}{arg max} [δ_{T} (i)] \end{matrix}

(5c)

\begin{matrix} q_{t}^{*} = ψ_{t + 1} (q_{t + 1}^{*}), t = T - 1, T - 2, \dots, 1 . \end{matrix}

(5d)

Training problem

The goal is to find the model λ that maximizes p(y∣λ); a popular technique is the Baum-Welch method, which starts from initial parameter estimates and iteratively performs reestimations of the parameters to improve the likelihood.

This algorithm introduces a new key variable: ξ(i, j), the probability of being in state i at time t and in state j at time t + 1

\begin{matrix} ξ_{t} (i, j) = p (q_{t} = i, q_{t + 1} = j ∣ y, λ) . \end{matrix}

(6)

It can be useful to express this equation using the forward and backward variables. In fact, the forward variable α_t(i) accounts for the observations from the first one up to y_t in state i; the backward variable $β_{t + 1} (j) ≔ p (y_{t + 1}^{T} ∣ q_{t} = j)$ does the complementary job, considering the observation sequence starting in state j and from observation y_t+1 up to the last one. The step between t and t + 1 has been left out: to tie the two variables, it is necessary to include the probability of transitioning from state i to j and observing y_t+1, which is A_ijL_j(t + 1). The new formulation of ξ_t can be written as

\begin{matrix} ξ_{t} (i, j) = \frac{α_{t} (i) A_{i j} L_{j} (t + 1) β_{t + 1} (j)}{p (y ∣ λ)} = \frac{α_{t} (i) A_{i j} L_{j} (t + 1) β_{t + 1} (j)}{\sum_{p = 1}^{N} \sum_{q = 1}^{N} α_{t} (p) A_{p q} L_{q} (t + 1) β_{t + 1} (q)} . \end{matrix}

(7)

We define the γ_t(i) variables as the probability of being in state i at time t while emitting the observation sequence. This can be expressed in terms of the ξ_t as

\begin{matrix} γ_{t} (i) = \sum_{j = 1}^{N} ξ_{t} (i, j) . \end{matrix}

(8)

Note that the sum γ_t(i) over t can be interpreted as the expected number of times that the state i is visited, or equivalently as the expected number of transitions out of state i (if we exclude the last observation at time T),

\begin{matrix} \sum_{t = 1}^{T - 1} γ_{t} (i) = expected number of transitions out of state i . \end{matrix}

(9)

Similarly, the sum of ξ_t(i, j) over t can be interpreted as the expected number of transitions from i to j

\begin{matrix} \sum_{t = 1}^{T - 1} ξ_{t} (i, j) = expected number of transitions from state i to j . \end{matrix}

(10)

These interpretations lead to the definition of two reestimation formulas for the initial distribution and the transition probabilities

\begin{matrix} {\bar{π}}_{i} = γ_{1} (i), and \end{matrix}

(11a)

\begin{matrix} {\bar{a}}_{i j} = \frac{\sum_{t = 1}^{T - 1} ξ_{t} (i, j)}{\sum_{t = 1}^{T - 1} γ_{t} (i)} . \end{matrix}

(11b)

The HMMs that have been considered throughout this thesis work have continuous emission densities. While the model formalism allows finite mixtures of log-concave or elliptically symmetric multi-variate densities, we restricted ourselves to a single univariate Gaussian as emission densities, i.e., in state j we have

\begin{matrix} L_{j} (y) = N (y | μ_{j}, σ^{2}), \end{matrix}

(12)

where μ_j is the mean and σ² is the variance of the Gaussian distribution associated with the state j. Thus, the reestimation formulas are given as:

\begin{matrix} {\bar{μ}}_{j} = \frac{\sum_{t = 1}^{T} γ_{t} (j) \cdot y_{t}}{\sum_{t = 1}^{T} γ_{t} (j)} \end{matrix}

(13a)

\begin{matrix} {\bar{σ^{2}}}_{j} = \frac{\sum_{t = 1}^{T} γ_{t} (j) \cdot {(y_{t} - μ_{j})}^{2}}{\sum_{t = 1}^{T} γ_{t} (j)} \end{matrix}

(13b)

Applying the reestimation formulas (11a), (11b), (13a) and (13b) produces a reestimated model $\bar{λ}$ ; the Baum-Welch algorithm guarantees that either the original model λ is a critical point of the likelihood function (the result would be $λ = \bar{λ}$ ) or the model $\bar{λ}$ is more likely than the previous one, meaning that $p (y ∣ \bar{λ}) > p (y ∣ λ)$ . The iteration of this procedure converges to a local maximum and produces a maximum likelihood estimate of the model, providing a solution to the training problem.

Wavelet compression

In the following we briefly introduce wavelet shrinkage and transform and the resulting block structure enabling compressed HMM computations. Haar wavelet shrinkage is a powerful regression method for univariate time series y = f + ϵ, where f is a piecewise constant function and ϵ is a vector of centered homoscedastic Gaussian noise of variance σ². If the latter is known, wavelet coefficients whose absolute value is below the universal threshold $λ_{u} = \sqrt{2 ln T} σ$ can be attributed to noise and removed, yielding a minimax estimator $y = \hat{f} + \hat{ϵ}$ [22]. The coefficients of the Haar wavelet transform can be computed in-place and in linear time [23, 24], and the variance can conveniently be estimated from the finest detail wavelet coefficients themselves.

Data generated by a homoscedastic Gaussian HMM can be treated in a similar fashion, with discontinuities in $\hat{f}$ marking the approximate location of state transitions; for details, see [25]. Note that despite the fact that we can expect $\hat{f} [t] \approx μ_{q [t]}$ , $\hat{f}$ cannot reliably be used to identify q directly. However, since the normal distribution is in the exponential family, sufficient statistics between the discontinuities in $\hat{f}$ can be used in likelihood computations required for inference of HMM parameters, assuming the generating Markov process does not switch states in between these positions, or that at least the contribution of such transitions is negligible (weak path assumption [18]). Blocks are defined as contiguous subsequences of observations respectively hidden states where no state switches are allowed; cf. Fig 1.

Fig 1 — Within each block, only self-transitions are allowed. Relevant variables such as the forward or backward variables are computed for each block instead of for each individual observation. Our evaluation demonstrates that for compressible input, i.e., input with large average block sizes, the compressed computation has a neglible impact on accuracy while accelerating computations greatly, in particular for models with many states.

While such blocks of sufficient statistics can be precomputed for homoscedastic HMM, in heteroscedastic settings wavelet shrinkage does not yield a minimax estimator, but can still be used for compression, in that the smallest emission variance can be used for thresholding, at the cost of undercompression in high-variance regions. Unfortunately, estimating this in a manner similar to wavelet regression is challenging, since it is unknown which finest detail coefficients come from low-variance segments. In order to estimate HMM parameters, a data structure is required to quickly yield blocks of sufficient statistics at arbitrary variance thresholds, i.e. different compression levels, without re-estimating $\hat{f}$ for every given σ². For this purpose, we have previously developed a highly efficient data structure called a breakpoint array with a linear time constructor [20, 25] based on the lifting scheme [23, 24].

Compressed computation for HMM

To perform the compressed computations, the classic equations have to be adapted to the compression scheme introduced in the previous section; see Fig 1 for a schematic view. Restructuring the forward algorithm does not alter the interpretation of the forward variables. For a block of observations, α_w(i) is the approximation of the uncompressed forward variable at the end of the w-th block, and E_w(j) the exponent resulting from considering the length of the block and summary statistics, yielding

\begin{matrix} α_{1} (i) = π_{i} e^{E_{1} (i)} for 1 \leq i \leq N, \end{matrix}

(14a)

\begin{matrix} α_{w} (j) = [\sum_{i = 1}^{N} α_{w - 1} (i) A_{i j}] e^{E_{w} (j)} for 1 \leq w \leq W, 1 \leq j \leq N, and \end{matrix}

(14b)

\begin{matrix} p (y ∣ λ) \approx \sum_{i = 1}^{N} α_{W} (i) . \end{matrix}

(14c)

Here, the term $e^{E_{w} (j)}$ above results from making the Gaussian assumption explicit and rewriting the regular induction for forward variables adapted to (n − 1) self-transitions and one non-self transitions in a block,

\begin{matrix} α_{w} (j) = \sum_{i = 1}^{N} [α_{w - 1} (i) a_{i j}] a_{j j}^{n_{w} - 1} \prod_{k = 1}^{n_{w}} b_{j} (y_{w, k}) . \end{matrix}

(15)

and collecting as many terms as possible in the exponential, which additionally improves numerical stability, leading to

\begin{matrix} E_{w} (j) ≔ \frac{2 μ_{j} Σ_{1, w} - Σ_{2, w}}{2 σ_{j}^{2}} + K (n_{w}, j), and \end{matrix}

(16a)

\begin{matrix} K (n_{w}, j) ≔ (n_{w} - 1) log (a_{j j}) - n_{w} (log (σ_{j}) + \frac{μ_{j}^{2}}{2 σ_{j}^{2}} + \frac{1}{2} log (2 π)) . \end{matrix}

(16b)

Both the backward and Viterbi algorithms require similar transformations, using block summary information in place of the individual observations.

The Baum-Welch algorithm is more complex than the others, requiring more variables in the parameter reestimation. It is important to remember that both the compressed forward and backward variables refer to the end of a block. Consequently, different situations result based on computing ξ inside or outside a block. For the purpose of rewriting the reestimation equations, it is useful to define the ξ variable for a block in the following way

\begin{matrix} ξ_{w} (i, j) = \sum_{t \in Y_{w}} ξ_{t} (i, j) = \end{matrix}

(17)

\begin{matrix} = \frac{1}{p (y ∣ λ)} \cdot {\begin{matrix} (n_{w} - 1) α_{w} (i) β_{w} (i) + α_{w} (i) A_{i j} e^{E_{w + 1} (j)} β_{w + 1} (j), & for i = j \land w \neq W \\ (n_{W} - 1) α_{W} (i) β_{W} (i), & for i = j \land w = W \\ α_{w} (i) A_{i j} e^{E_{w + 1} (j)} β_{w + 1} (j), & for i \neq j \land w \neq W \\ 0, & for i \neq j \land w = W . \end{matrix} \end{matrix}

(18)

Moving forward, it is interesting to note that by interpreting γ_t(i) as the probability of visiting the state i at time t, the variable is also constant over t inside a block (also implied from the result above); this means that for any t inside a block, any γ_t(i) can be representative for the whole block, say

\begin{matrix} γ_{t} (i) = \frac{α_{t} (i) β_{t} (i)}{p (y ∣ λ)} = \frac{α_{w} (i) β_{w} (i)}{p (y ∣ λ)} . \end{matrix}

(19)

It is worth noting that this reformulation correctly maintains the definition given in (8). For convenience it is useful to define γ_w(i) as the representative value for a block, which means that for all the t associated with a block, γ_w(i) = γ_t(i).

The reestimation formulas Eqs (11a), (11b), (13a) and (13b) can be updated with the new definitions of ξ_t(i, j) and γ_t(i), yielding

\begin{matrix} {\bar{π}}_{i} = γ_{1} (i), and \end{matrix}

(20a)

\begin{matrix} {\bar{a}}_{i j} = \frac{\sum_{w = 1}^{W} ξ_{w} (i, j)}{\sum_{w = 1}^{W} [n_{w} γ_{w} (i)] - γ_{W} (i)} . \end{matrix}

(20b)

The mean and standard deviation reestimations follow a slightly more complex reformulation, both for the general mixture and the single Gaussian distribution. In particular, Eq (13a) multiplies the single observation value by the respective γ_t(i). Since γ_t(i) is constant inside a block, the equation can be rewritten as

\begin{matrix} {\bar{μ}}_{j} = \frac{\sum_{t = 1}^{T} γ_{t} (j) y_{t}}{\sum_{t = 1}^{T} γ_{t} (j)} = \frac{\sum_{w = 1}^{W} γ_{w} (j) \sum_{k = 1}^{n_{w}} y_{w, k}}{\sum_{w = 1}^{W} γ_{w} (j) \cdot n_{w}} = \frac{\sum_{w = 1}^{W} γ_{w} (j) \cdot Σ_{1, w}}{\sum_{w = 1}^{W} γ_{w} (j) \cdot n_{w}} . \end{matrix}

(21)

The same reasoning applies to the variance reestimation, yielding

\begin{matrix} {\bar{σ}}_{j}^{2} = \frac{\sum_{t = 1}^{T} γ_{t} (j) {(y_{t} - μ_{j})}^{2}}{\sum_{t = 1}^{T} γ_{t} (j)} = \frac{\sum_{w = 1}^{W} γ_{w} (j) [Σ_{2, w} - 2 {\bar{μ}}_{j} Σ_{1, w} + n_{w} {\bar{μ}}_{j}^{2}]}{\sum_{w = 1}^{W} γ_{w} (j) \cdot n_{w}} . \end{matrix}

(22)

Implementation

Our implementation named WaHMM (Wavelets HMM) adapts the classic algorithms to the compressed structures: instead of operating on every individual observation in a sequence, it takes advantage of the blocks’ summary information. The wavelet compression is performed using parts of the software HaMMLET https://schlieplab.org/Software/HaMMLET/.

WaHMM is written in C++ for better efficiency and a smaller memory footprint. To offer a simpler interface to the user, some Python scripts are provided for model creations, usage of the actual tool, and for analysis and graphical display of results. The software implements HMM with univariate Gaussian state densities. The Python framework Pomegranate (https://github.com/jmschrei/pomegranate) is used to generate data from the model, through the script generate_data.py.

Note that in WaHMM probabilities are represented in a logarithmic space; this not only scales the [0, 1] interval to (−∞, 0], easing the burden of numerical precision, but also transforms all the products into summations, which are much easier and faster to perform.

Results

In the following we introduce the simulation study which varies compressibility and thus indicates speed-ups of our method for various scenarios. To achieve acceleration with the proposed method requires that the sequences are compressible. This is consistent with compressive computations in the discrete observation case, where for example computations for a non-repeating sequence of N observations using M = N different symbols cannot be accelerated. In either case, the highest possible acceleration is possible with constant sequences.

Simulation study

We designed a large-scale simulation study to evaluate accuracy and speedups of the compressed approach. In total we created 50 different HMMs and simulated a total of 500 millions observations. The HMM topologies, reflecting popular choices in applications, belonged to one of three classes: fully-connected with transitions from state i to every state j; left-to-right, where a partial ordering on states exists and the possibility of a transition i → j implies i ≤ j; circular, with transitions i → i + 1 mod M or i → i. See Fig 2 for details. For each topology, we choose a range of values for the number of states, N ∈ {2, 3, 5, 10, 20}. The transition probabilities were always chosen so that the average number of state transitions in a generated sequence would be 10N. Specifically, for fully-connected topologies the self-transition probabilties were defined to be 1 − (10N/T) and the non-self transition probabilties where chosen uniformly according to the topology, so that the total of outgoing probabilities sum up to 1. For left-to-right and circular topologies the self-transition and non-self-transitions were chosen similarly.

Fig 2 — The simulations used N ∈ {2, 3, 5, 10, 20} states. The presented conclusions will cover the fully-connected model due to other topologies yielding very similar results.

The identifyability of states depends on how different the univariate Gaussian state densities are. If the pair-wise differences of means are bounded from below by some ϵ > 0 and all variances are 0, then the model simplifies to a Markov chain. On the other hand, the absolute values of the pair-wise differences only matter with respect to the variance. Consequently, the states’ emission probabilities were taken as evenly distanced Gaussians starting from the standard Gaussian Z(0, 1), with state separation $η_{S_{1}, S_{2}} = \frac{| μ_{S_{1}} - μ_{S_{2}} |}{3 (σ_{S_{1}} + σ_{S_{2}})}$ varying from 0.1 to 1.0 in 0.1 increments, to obtain a range of emission densities corresponding to hard and easy scenarios for state identifyability.

For each HMM, a set of 100 tests was executed to average out the results: each test includes a data generation phase (with 100k observations), and a run of the evaluation, decoding and training algorithms. Evaluation and decoding was run once. As the training algorithm can yield quite different results based on the starting model estimation, in each test the training was repeated 10 times with different initial parameter estimates produced by the K-means algorithm. The Baum-Welch algorithm was set to terminate after 100 iterations or when the improvement on the evaluation probability (in the logarithmic space) was below 10⁻³.

Observations

Here we will present observations from our simulation study. As we observed highly concordant results for all tested topologies we only present the results for fully-connected HMMs. As we obtained very favorable and highly consistent results for all simulations, we also present figures with additional diagnostics in the supplement.

For all three algorithms—evaluation, decoding and training—, the error compared to the uncompressed version is negligible regardless of the model topology. Fig 3 shows the error for the evaluation algorithm. Despite the overall negligible magnitude, the largest relative error displayed in Fig 3 is approximately 7e–3, which corresponds to an absolute error on the scale of 10⁻⁵⁰⁰⁰. There are some differences depending on model size and state separation, with larger models and lower state seperation displaying higher errors.

Fig 3 — Since the errors are relative to a large magnitude negative log-probability, the actual error is extremely small. As an example, the relative error on the evaluation probability for a test involving a fully-connected model with 2 states is approximately 2e-3; the absolute error in the logarithmic space is around -6e2, which corresponds to a normal absolute error of approximately 1e-261.

Having ascertained that the results of the compressed algorithms are virtually indistinguishable from the result of the standard algorithms, we investigated the attainable speedups. A general trend through all the tests is that the compressed algorithms are much faster then the standard ones, and that it particularly shows when the state separation is low. Evaluation and decoding results, see Figs 4 and 5 respectively, are again quite similar, as for both of them the compressed algorithm is several times faster than the standard one. In the evaluation task a 10-state model saw a speedup of 6–10 times, and a 5 state-model of 2-3 times compared to the uncompressed case. We observed no speedup for the 2-state model. A likely explanation is that the small size of the dynamic programming matrices and the model will allow very rapid branch-free in-cache computations, whereas the compressed version had a higher algorithmic overhead. The trends are very clear though in that the compression empirically reduces the running time from O(N²T) for the uncompressed version to something linear in both number of states and sequence length.

The most impressive speedups are obtained for the training algorithm, see Fig 5. There are three contributing factors to the speedup. First, as Baum-Welch training performs one forward and backward computation at O(N²T) per each individual reestimation step and then some additional computation on the order of O(NT) we expect a speed-up as large as the one seen for the evaluation problem, i.e. of 2–10 depending on model size. Second, the benchmarking for the evaluation problem included the setup operations such as wavelet computation and population of the breakpoint array followed by one execution of the forward algorithm. In Baum-Welch training the setup costs are amortized over multiple forward and backward computations, yielding an even larger acceleration over the uncompressed algorithm in this case. Third, we observe that the compression also has a large impact on the convergence, in particular for hard instances with low state separation. There, the compression greatly reduces the amount of reestimation steps needed, up to 10-fold for 10 states and state seperation 0.1.

Discussion

In recent years, compression has become an effective computational building block to accelerate algorithms in response to ever-increasing data set sizes across a wide range of fields. The main idea is often to avoid re-computation for identical patterns in the data, as in the case for Hidden Markov Models (HMM) with discrete-valued observations whether for the frequentist classical three algorithms [9] or Gibbs sampling for Bayesian HMM [12].

For continous-valued observations, and for data resembling piece-wise constant functions plus Gaussian noise in the ideal case, our prior work [19, 20] on Gibbs sampling for Bayesian HMM has clearly shown how much can be gained from a compressive approach. The main insight there was that the compression—assuming that hidden states are not changed within contiguous blocks of the data with little variation—allows the Bayesian computation to focus on breakpoints, or state changes, between blocks instead of the variation within a block. This resulted not only in accelerated computations due to reducing the effort from something proportional to the number of blocks instead of the length of the observation sequence, but also a drastically accelerated convergence rate of the Gibbs sampling.

Here, we introduce the first compressive approach for Rabiner’s three original problems [1]—likelihood computation, computing Viterbi paths, and Maximum-likelihood estimation with the Baum-Welch algorithm—for Hidden-Markov Models with continous emissions based on compression obtained with wavelets. We utilized the approach to compression developed in [19, 20] and extended it to the classical three algorithms. We used a wavelet transform to rapidly detect a superset of possible block boundaries in the observation sequence where state changes might occur, and adapted the computations of forward, backward and Viterbi variables accordingly by summing over blocks instead of individual observations.

In a large-scale simulation study, assuming a similar type of data, piece-wise constant functions plus homoscedastic Gaussian noise, we demonstrate that again we reach acceleration across all the fundamental dynamic programming algorithms and achieve practically relevant speed-up for training, likelihood, and Viterbi-path computations for models with 5 states and larger. In particular for training, we observe similar behaviour as for the convergence rate of the Gibbs sampler as in the Bayesian case [19]. There, enforcing that the hidden state variable for all observations in a block must be equal, prevented the sampler from exploring states where one or several of those variables where set to discordant values. In the Baum-Welch algorithm, the blocks seem to similarly prevent exploration of intermediate, sub-optimal states and thus to accelerate convergence. This is effect is clearly visible in the cases of low separability between state density means in contrast with high separability with total speedups—combining the contribution of accelerated computation of forward and backward variables and the contribution of accelerated convergence—of up to several hundred fold. The latter are almost trivial to resolve and uncompressed Baum-Welch converges also very rapidly, the impact of the compression on the convergence rate increasese as the separability decreases.

Note that the likelihood contribution from all observations in a block is de facto averaged and a decision about the best hidden state variable is made jointly for the block. If the data is indeed compressible, then the wavelet compression will indicate a superset of blocks and the adapted algorithm will find, as demonstrated empirically, the best hidden state variable per-block, which allows the training to converge rapidly.

As expected, the acceleration depends on compressibility of data and the number of states. Small models with only a few states benefit less than large models. We observed negligible influence of the model topology (not shown).

Conclusion

Together with prior work we have demonstrated that compression can substantially accelerate frequentist and Bayesian HMM algorithms with both discrete and continuos observations as long as the data is amenable to compression. In the latter case, the method assumes that the data is piecewise constant with homoscedastic noise.

In future work it would be interesting to extend the approach to heteroscedastic noise, which requires static respectively dynamic adaptation of the noise threshold in the wavelet compression. For discrete observations, the counterpart to the current work is run-length encoding (RLE) and data compressible with RLE; note that compressed computations are exact in the discrete case. Mozes et al. [9] also proposed compression based on Lempel-Ziv parsing. Finding an analogue for continous observations might be a challenge, but possibly symbolic representation schemes developed for indexing time-series, e.g. [26], can provide a starting point.

Acknowledgments

LB acknowledges support from the EU’s Erasmus program for an exchange with Chalmers Technical University.

Data Availability

All data used in the manuscript is simulated. Software to generate the data is distributed along with the open-source implementation of the method at https://github.com/lucabello/wavelet-hmms.

Funding Statement

The author(s) received no specific funding for this work.

References

1. Rabiner LR. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE. 1989;77(2):257–285. doi: 10.1109/5.18626 [DOI] [Google Scholar]
2. Schrodt PA. Pattern Recognition of International Crises using Hidden Markov Models. In: Richards D, editor. Non-linear Models and Methods in Political Science. Ann Arbor, MI: University of Michigan Press; 1998. [Google Scholar]
3. Balakrishnan G, Coetzee D. Predicting student retention in massive open online courses using hidden markov models. Electrical Engineering and Computer Sciences University of California at Berkeley. 2013;53:57–58. [Google Scholar]
4.Wright C, Monrose F, Masson GM. HMM profiles for network traffic classification. In: Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security; 2004. p. 9–15.
5.Martinsson J, Mohammadiha N, Schliep A. Clustering Vehicle Maneuver Trajectories Using Mixtures of Hidden Markov Models. In: 21st International Conference on Intelligent Transportation Systems ITSC. IEEE; 2018. p. 3698–3705.
6.Krogh A. Two methods for improving performance of an HMM and their application for gene finding. Proceedings / International Conference on Intelligent Systems for Molecular Biology; ISMB International Conference on Intelligent Systems for Molecular Biology. 1997;5:179–86. [PubMed]
7. Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. Hidden Markov models in computational biology: Applications to protein modeling. Journal of molecular biology. 1994;235(5):1501–1531. doi: 10.1006/jmbi.1994.1104 [DOI] [PubMed] [Google Scholar]
8. Haubold B, Wiehe T. How repetitive are genomes? BMC bioinformatics. 2006;7(1):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Mozes S, Weimann O, Ziv-Ukelson M. Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions. Lecture Notes in Computer Science. 2007;.
10.Balasubramanian V. Equivalence and Reduction of Hidden Markov Models;.
11. Jaeger H. Observable operator models for discrete stochastic time series. Neural computation. 2000;12(6):1371–1398. doi: 10.1162/089976600300015411 [DOI] [PubMed] [Google Scholar]
12. Mahmud M, Schliep A. Speeding Up Bayesian HMM by the Four Russians Method. In: Przytycka T, Sagot MF, editors. Algorithms in Bioinformatics. vol. 6833 of Lecture Notes in Computer Science. Springer; Berlin / Heidelberg; 2011. p. 188–200. Available from: http://link.springer.com/chapter/10.1007%2F978-3-642-23038-7_17. [Google Scholar]
13. Scott S. Bayesian Methods for Hidden Markov Models: Recursive Computing in the 21st Century. Journal of the American Statistical Association. 2002; p. 337–351. doi: 10.1198/016214502753479464 [DOI] [Google Scholar]
14. Pinkel D, Albertson DG. Nature Genetics. 2005;37:11–17. doi: 10.1038/ng0105-11 [DOI] [PubMed] [Google Scholar]
15. LaFramboise T. Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances. Nucleic Acids Research. 2009;37:4181–4193. doi: 10.1093/nar/gkp552 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Soon WW, Hariharan M, Snyder MP. High-throughput sequencing for biology and medicine. Molecular Systems Biology. 2013;9. doi: 10.1038/msb.2012.61 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Willenbrock H, Fridlyand J. A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics. 2005;21(22):4084–4091. doi: 10.1093/bioinformatics/bti677 [DOI] [PubMed] [Google Scholar]
18. Mahmud M, Schliep A. Fast MCMC Sampling for Hidden Markov Models to Determine Copy Number Variations. BMC Bioinformatics. 2011;12(1):428. doi: 10.1186/1471-2105-12-428 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Wiedenhoeft J, Brugel E, Schliep A. Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression. PLoS Computational Biology. 2016;12(5):e1004871. doi: 10.1371/journal.pcbi.1004871 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Wiedenhoeft J, Cagan A, Kozhemyakina R, Gulevich R, Schliep A. Bayesian localization of CNV candidates in WGS data within minutes. Algorithms for Molecular Biology. 2019;14(20). doi: 10.1186/s13015-019-0154-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Wiedenhoeft J, Schliep A. Using HaMMLET for Bayesian Segmentation of WGS Read-Depth Data. Methods Mol Biol. 2018;(1833):83–93. doi: 10.1007/978-1-4939-8666-8_6 [DOI] [PubMed] [Google Scholar]
22. Donoho DL, Johnstone IM. Ideal spatial adaptation by wavelet shrinkage. Biometrika. 1994;81(3):425–455. doi: 10.1093/biomet/81.3.425 [DOI] [Google Scholar]
23. Sweldens W. Lifting scheme: a new philosophy in biorthogonal wavelet constructions. International Society for Optics and Photonics; 1995. p. 68–79. Available from: http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=1007578. [Google Scholar]
24. Sweldens W. The Lifting Scheme: A Construction of Second Generation Wavelets. SIAM Journal on Mathematical Analysis. 1998;29(2):511–546. doi: 10.1137/S0036141095289051 [DOI] [Google Scholar]
25.John Wiedenhoeft. Dynamically Compressed Bayesian Hidden Markov Models Using Haar Wavelets [Ph.D.]. Rutgers University; 2018. Available from: https://rucore.libraries.rutgers.edu/rutgers-lib/59275/.
26. Lin J, Keogh E, Wei L, Lonardi S. Experiencing SAX: a novel symbolic representation of time series. Data Mining and knowledge discovery. 2007;15(2):107–144. doi: 10.1007/s10618-007-0064-z [DOI] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0286074.r001

Decision Letter 0

Amita Nandal

14 Feb 2023

PONE-D-22-22196Compressed computations using wavelets for hidden Markov models with continuous observationsPLOS ONE

Dear Dr. Schliep,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 31 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Amita Nandal

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Additional Editor Comments (if provided):

Authors need to adhere to the reviewer's comments

1. There should be a separate section in the manuscript highlighting the novelty of the proposed method.

2. The proposed method should be compared with other state-of-the-art techniques to highlight its effectiveness.

3.At the end of the introduction, it is difficult to understand your research motivation due to the lack of necessary summary and innovation of the method.

4.There is a lack of concatenation among the chapters of the article. It is suggested that the motivation between the chapters be supplemented.

5. Formula (3) lacks the introduction of variable definition and it is suggested to be supplemented.

6. It is suggested that the next research direction should be supplemented in the conclusion, and the reasonable prospect of this study should be carried out in order to inspire the follow-up scholars.

7.What are the meanings of these variables in formula (2)? Please add.

8. Why the method proposed by the author has achieved the best effect, please make a specific analysis.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper is well written and has some novelty. Following are the comments-

1. There should be a separate section in the manuscript highlighting the novelty of the proposed method.

2. The proposed method should be compared with other state-of-the-art techniques to highlight its effectiveness.

Reviewer #2: 1.At the end of the introduction, it is difficult to understand your research motivation due to the lack of necessary summary and innovation of the method.

2.There is a lack of concatenation among the chapters of the article. It is suggested that the motivation between the chapters be supplemented.

3. Formula (3) lacks the introduction of variable definition and it is suggested to be supplemented.

4. It is suggested that the next research direction should be supplemented in the conclusion, and the reasonable prospect of this study should be carried out in order to inspire the follow-up scholars.

5.What are the meanings of these variables in formula (2)? Please add.

6. Why the method proposed by the author has achieved the best effect, please make a specific analysis.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Jun 6;18(6):e0286074. doi: 10.1371/journal.pone.0286074.r002

Author response to Decision Letter 0

1 Mar 2023

Response to Reviewers

PONE-D-22-22196 Compressed computations using wavelets for hidden Markov models with continuous observations

In the following I will address the reviewer’s comments designated as mandatory by the academic editor in detail. It seems that we failed to communicate sufficiently clearly that our method is the very first and only method to allow compressive computations for frequentist HMM algorithms with continuous emissions (cf. below). We added exposition to the manuscript to clarify this point.

For the authors, Alexander Schliep

1. There should be a separate section in the manuscript highlighting the novelty of the proposed method.

The novelty of the method has previously been highlighted in the final paragraph of the introduction. We added a section header to indicate it.

2. The proposed method should be compared with other state-of-the-art techniques to highlight its effectiveness. 

Ours is the first method to perform compressed computations of Likelihood, Viterbi paths, and Maximum-likelihood estimation with the Baum-Welch algorithm for HMMs with continuous emissions. Prior approaches, including ours for Bayesian HMMs with discrete and continuous emissions, tackle different problems or different types of emissions. As such, our method is the only method for this problem, making an evaluation benchmark impossible as no competing methods exist.

We added some text to abstract, introduction and conclusion to make this more obvious.

3.At the end of the introduction, it is difficult to understand your research motivation due to the lack of necessary summary and innovation of the method. 

To summarize, the research motivation is as follows. Compressed computations for Hidden Markov Models have been used with great success for three out of the four cases resulting from two types of emissions (discrete vs. continuous) and two types of computations (frequentist—Viterbi, likelihood and Baum-Welch—vs. Bayesian MCMC) by us and one other group. The submission focuses on the fourth and last open case (frequentist and continuous). This has been motivated in the abstract “Here we extend the compressive computation approach for the first time to the classical frequentist HMM algorithms on continuous-valued observations” . In the introduction we give a complete summary of the subfield “Mozes et al. ... for discrete observations achieve considerable speed-ups”, “Also for discrete observations, Mahmud et al. substantially accelerate Forward-Backward Gibbs

(FBG) sampling ...” and “ substantial improvement [18] in the running times of the FBG sampler for continuous-valued observation”, to “In the following, we introduce compressive computations based on wavelets for Rabiner’s three original problems [1]: likelihood computation, computing Viterbi paths, and Maximum-likelihood estimation with the Baum-Welch algorithm.” We also added corresponding language to the discussion.

The last paragraph of the introduction is a succinct, complete and exhaustive summary of what we accomplish in the manuscript, now highlighted by a separate section header.

4.There is a lack of concatenation among the chapters of the article. It is suggested that the motivation between the chapters be supplemented. 

We added some additional exposition to the manuscript.

5. Formula (3) lacks the introduction of variable definition and it is suggested to be supplemented. 

It is not clear to us which variable(s) the reviewer might be referring to. Every variable used in equations (3a)–(3b) has been defined and explained in lines 86–97, resp. lines 72–84 (conditional probability), and line 102 (the likelihood function). We additionally made the definition of y_s^t explicit by repeating and adapting the definition of q_s^t.

As a matter of fact we already do discuss potential follow-up research in the conclusion (note, we follow PLOS ONE suggested article structure), in particular suggesting an extension to heteroscedastic noise and a potential use of the symbolic indexing by Keogh et al..

7.What are the meanings of these variables in formula (2)? Please add.

It is not clear to us which variable(s) the reviewer might be referring to. Every variable used in equation (2) has been defined and explained in lines 86–97, resp. lines 72–84 (conditional probability), and line 102 (the likelihood function). We additionally made the definition of y_s^t explicit by repeating and adapting the definition of q_s^t.

8. Why the method proposed by the author has achieved the best effect, please make a specific analysis.

We don’t make the claim that our method has the “best effect”, as we do not perform a benchmark evaluation against other methods. The reason why we don’t perform a benchmark evaluation is that ours is the first method to perform compressed computations of likelihood, Viterbi paths, and Maximum-likelihood estimation with the Baum-Welch algorithm for HMMs with continuous emissions. Prior approaches, including ours for Bayesian HMMs with discrete and continuous emissions, tackle different problems or different types of emissions. As such, our method is the only and thus the state-of-the-art method for this problem, making an evaluation benchmark impossible as no competing method exists.

What we did benchmark is a comparison of uncompressed (prior art) and compressed (our contribution) versions; to reiterate there are no other compression method for the continuous case and the particular HMM algorithm we consider.

Attachment

Submitted filename: ResponeToReviewers.pdf

Click here for additional data file.^{(80.4KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0286074.r003

Decision Letter 1

Amita Nandal

9 May 2023

Compressed computations using wavelets for hidden Markov models with continuous observations

PONE-D-22-22196R1

Dear Dr. Schliep,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Amita Nandal

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

The manuscript can be accepted for publication.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Reviewer #2: The authors of the article have made extensive edits. It's finally ready for journal publishing, in my opinion.

Reviewer #3: The authors have revised the paper significantly. I think it can be now accepted for publication in the journal.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

**********

PLoS One. doi: 10.1371/journal.pone.0286074.r004

Acceptance letter

Amita Nandal

29 May 2023

PONE-D-22-22196R1

Compressed computations using wavelets for hidden Markov models with continuous observations

Dear Dr. Schliep:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Amita Nandal

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: ResponeToReviewers.pdf

Click here for additional data file.^{(80.4KB, pdf)}

Data Availability Statement

All data used in the manuscript is simulated. Software to generate the data is distributed along with the open-source implementation of the method at https://github.com/lucabello/wavelet-hmms.

[pone.0286074.ref001] 1. Rabiner LR. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE. 1989;77(2):257–285. doi: 10.1109/5.18626 [DOI] [Google Scholar]

[pone.0286074.ref002] 2. Schrodt PA. Pattern Recognition of International Crises using Hidden Markov Models. In: Richards D, editor. Non-linear Models and Methods in Political Science. Ann Arbor, MI: University of Michigan Press; 1998. [Google Scholar]

[pone.0286074.ref003] 3. Balakrishnan G, Coetzee D. Predicting student retention in massive open online courses using hidden markov models. Electrical Engineering and Computer Sciences University of California at Berkeley. 2013;53:57–58. [Google Scholar]

[pone.0286074.ref004] 4.Wright C, Monrose F, Masson GM. HMM profiles for network traffic classification. In: Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security; 2004. p. 9–15.

[pone.0286074.ref005] 5.Martinsson J, Mohammadiha N, Schliep A. Clustering Vehicle Maneuver Trajectories Using Mixtures of Hidden Markov Models. In: 21st International Conference on Intelligent Transportation Systems ITSC. IEEE; 2018. p. 3698–3705.

[pone.0286074.ref006] 6.Krogh A. Two methods for improving performance of an HMM and their application for gene finding. Proceedings / International Conference on Intelligent Systems for Molecular Biology; ISMB International Conference on Intelligent Systems for Molecular Biology. 1997;5:179–86. [PubMed]

[pone.0286074.ref007] 7. Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. Hidden Markov models in computational biology: Applications to protein modeling. Journal of molecular biology. 1994;235(5):1501–1531. doi: 10.1006/jmbi.1994.1104 [DOI] [PubMed] [Google Scholar]

[pone.0286074.ref008] 8. Haubold B, Wiehe T. How repetitive are genomes? BMC bioinformatics. 2006;7(1):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0286074.ref009] 9.Mozes S, Weimann O, Ziv-Ukelson M. Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions. Lecture Notes in Computer Science. 2007;.

[pone.0286074.ref010] 10.Balasubramanian V. Equivalence and Reduction of Hidden Markov Models;.

[pone.0286074.ref011] 11. Jaeger H. Observable operator models for discrete stochastic time series. Neural computation. 2000;12(6):1371–1398. doi: 10.1162/089976600300015411 [DOI] [PubMed] [Google Scholar]

[pone.0286074.ref012] 12. Mahmud M, Schliep A. Speeding Up Bayesian HMM by the Four Russians Method. In: Przytycka T, Sagot MF, editors. Algorithms in Bioinformatics. vol. 6833 of Lecture Notes in Computer Science. Springer; Berlin / Heidelberg; 2011. p. 188–200. Available from: http://link.springer.com/chapter/10.1007%2F978-3-642-23038-7_17. [Google Scholar]

[pone.0286074.ref013] 13. Scott S. Bayesian Methods for Hidden Markov Models: Recursive Computing in the 21st Century. Journal of the American Statistical Association. 2002; p. 337–351. doi: 10.1198/016214502753479464 [DOI] [Google Scholar]

[pone.0286074.ref014] 14. Pinkel D, Albertson DG. Nature Genetics. 2005;37:11–17. doi: 10.1038/ng0105-11 [DOI] [PubMed] [Google Scholar]

[pone.0286074.ref015] 15. LaFramboise T. Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances. Nucleic Acids Research. 2009;37:4181–4193. doi: 10.1093/nar/gkp552 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0286074.ref016] 16. Soon WW, Hariharan M, Snyder MP. High-throughput sequencing for biology and medicine. Molecular Systems Biology. 2013;9. doi: 10.1038/msb.2012.61 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0286074.ref017] 17. Willenbrock H, Fridlyand J. A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics. 2005;21(22):4084–4091. doi: 10.1093/bioinformatics/bti677 [DOI] [PubMed] [Google Scholar]

[pone.0286074.ref018] 18. Mahmud M, Schliep A. Fast MCMC Sampling for Hidden Markov Models to Determine Copy Number Variations. BMC Bioinformatics. 2011;12(1):428. doi: 10.1186/1471-2105-12-428 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0286074.ref019] 19. Wiedenhoeft J, Brugel E, Schliep A. Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression. PLoS Computational Biology. 2016;12(5):e1004871. doi: 10.1371/journal.pcbi.1004871 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0286074.ref020] 20. Wiedenhoeft J, Cagan A, Kozhemyakina R, Gulevich R, Schliep A. Bayesian localization of CNV candidates in WGS data within minutes. Algorithms for Molecular Biology. 2019;14(20). doi: 10.1186/s13015-019-0154-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0286074.ref021] 21. Wiedenhoeft J, Schliep A. Using HaMMLET for Bayesian Segmentation of WGS Read-Depth Data. Methods Mol Biol. 2018;(1833):83–93. doi: 10.1007/978-1-4939-8666-8_6 [DOI] [PubMed] [Google Scholar]

[pone.0286074.ref022] 22. Donoho DL, Johnstone IM. Ideal spatial adaptation by wavelet shrinkage. Biometrika. 1994;81(3):425–455. doi: 10.1093/biomet/81.3.425 [DOI] [Google Scholar]

[pone.0286074.ref023] 23. Sweldens W. Lifting scheme: a new philosophy in biorthogonal wavelet constructions. International Society for Optics and Photonics; 1995. p. 68–79. Available from: http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=1007578. [Google Scholar]

[pone.0286074.ref024] 24. Sweldens W. The Lifting Scheme: A Construction of Second Generation Wavelets. SIAM Journal on Mathematical Analysis. 1998;29(2):511–546. doi: 10.1137/S0036141095289051 [DOI] [Google Scholar]

[pone.0286074.ref025] 25.John Wiedenhoeft. Dynamically Compressed Bayesian Hidden Markov Models Using Haar Wavelets [Ph.D.]. Rutgers University; 2018. Available from: https://rucore.libraries.rutgers.edu/rutgers-lib/59275/.

[pone.0286074.ref026] 26. Lin J, Keogh E, Wei L, Lonardi S. Experiencing SAX: a novel symbolic representation of time series. Data Mining and knowledge discovery. 2007;15(2):107–144. doi: 10.1007/s10618-007-0064-z [DOI] [Google Scholar]

PERMALINK

Compressed computations using wavelets for hidden Markov models with continuous observations

Luca Bello

John Wiedenhöft

Alexander Schliep

Roles

Abstract

Introduction

Novelty and outline

Materials and methods

The three classical HMM problems

Evaluation problem

Decoding problem

Training problem

Wavelet compression

Fig 1. An example of a compressed trellis for a three-state model with states labeled +, − and =.

Compressed computation for HMM

Implementation

Results

Simulation study

Fig 2. The HMM topologies used in the tests shown for N = 4 states.

Observations

Fig 3. We show the relative difference between the P(O|λ) log-probabilities of the compressed and standard algorithms.

Fig 4. Speedup on the evaluation problem using the compressed algorithm, including the input data processing time.

Fig 5. Speedup on the training problem using the compressed algorithm, including the input data processing time.

Discussion

Conclusion

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Amita Nandal

Roles

Author response to Decision Letter 0

Decision Letter 1

Amita Nandal

Roles

Acceptance letter

Amita Nandal

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases