A reaction network scheme for hidden Markov model parameter learning

Carsten Wiuf; Abhishek Behera; Abhinav Singh; Manoj Gopalkrishnan

doi:10.1098/rsif.2022.0877

. 2023 Jun 21;20(203):20220877. doi: 10.1098/rsif.2022.0877

A reaction network scheme for hidden Markov model parameter learning

Carsten Wiuf ^1,^†,^✉, Abhishek Behera ^2,^†, Abhinav Singh ^3,^‡, Manoj Gopalkrishnan ²

PMCID: PMC10282575 PMID: 37340782

Abstract

With a view towards artificial cells, molecular communication systems, molecular multiagent systems and federated learning, we propose a novel reaction network scheme (termed the Baum–Welch (BW) reaction network) that learns parameters for hidden Markov models (HMMs). All variables including inputs and outputs are encoded by separate species. Each reaction in the scheme changes only one molecule of one species to one molecule of another. The reverse change is also accessible but via a different set of enzymes, in a design reminiscent of futile cycles in biochemical pathways. We show that every positive fixed point of the BW algorithm for HMMs is a fixed point of the reaction network scheme, and vice versa. Furthermore, we prove that the ‘expectation’ step and the ‘maximization’ step of the reaction network separately converge exponentially fast and compute the same values as the E-step and the M-step of the BW algorithm. We simulate example sequences, and show that our reaction network learns the same parameters for the HMM as the BW algorithm, and that the log-likelihood increases continuously along the trajectory of the reaction network.

Keywords: molecular programming, synthetic biology, hidden Markov model, Baum–Welch algorithm, statistical learning

1. Introduction

The implementation of abstract dynamical systems with molecular systems has gained scientific interest as a promising piece in the nanotechnology toolbox. Several automated theoretical schemes can now compile arbitrary networks of formal reactions into DNA oligonucleotide sequences [1–8]. Experimental demonstrations have synthesized these oligonucleotides, mixed them in a single test tube, and verified that they interact via base pairing reactions to implement the dynamics of the formal network [2,7,9–11]. In this way, if an algorithm can be described by reaction network dynamics, it might equally be carried out in a test tube by a DNA machine. Understanding how to describe algorithms in terms of reaction network dynamics is thus becoming a matter of interest.

Dynamical systems described by formal reaction networks are known to be computationally universal [12]. Several examples of functions computed by reaction network dynamics have been described [13–20]. However, the computational universality proofs often do not lend to the most elegant ways of implementing the algorithms with chemical reaction networks. We are particularly interested in problems arising in statistical learning theory, such as maximum-likelihood estimation and related optimization problems, Bayesian posterior sampling, and inference from incomplete data. These problems are at the core of statistical theory and may have many practical applications when implemented inside artificial cells. Our work demonstrates that these problems are particularly amenable to implementation by reaction network dynamics by exploiting the formal connection between the notion of entropy in statistics and in statistical mechanics.

Molecule-based statistical inference is receiving increasing attention. In [21], dynamic Bayesian decision-making in the form of a two-state hidden Markov model (HMM) is implemented by means of intracellular kinetics (which might be interpreted in terms of reaction networks), where the target is to infer an approximate posterior distribution. Similar ideas have been applied to decode information in biological processes [22]. On a different note, a cell’s potential for solving tasks and learning requires a thorough understanding of design principles in reaction networks and how molecular sources of stochastic information are communicated within and between cells [23,24]. Such tasks might be seen as computational and statistical problems posed to the cell.

In this article, we describe a reaction network scheme whose dynamics learns parameters of HMMs. HMMs are standard statistical models widely used in machine learning to model complex data with a linear spatial structure as in bioinformatics [25], or a temporal structure as in speech recognition [26]. HMMs also form an essential component of communication systems as well as of intelligent agents trained by reinforcement learning methods. They might be used in an exploratory sense without stipulating the interpretation of the hidden variables in advance, or in a concrete sense to learn the strength and influence of known hidden variables. Our group has previously worked on statistical learning theory from the perspective of reaction networks and log-linear models [27–31], and the current paper builds on these experiences.

Our proposed algorithm has similarities to, and important differences from, the Baum–Welch (BW) algorithm [32] which is the standard learning algorithm for HMMs. The BW algorithm is an iterative expectation-maximization (EM) algorithm where one step is performed at a time in a prescribed sequence. Our reaction network scheme is divided into four subnetworks that correspond to the forward algorithm, the backward algorithm, the expectation step (E-step), and the maximization step (M-step) of the BW algorithm. Each subnetwork describes a system of ordinary differential equations (ODEs) that might be run separately, exactly mimicking the steps of the BW algorithm; or run simultaneously in continuous time and in a distributed manner, obtaining a variant on the BW theme where all four stages of the BW algorithm are being performed at the same time. Because our algorithm permits the different stages to be run simultaneously and without coordination, it is particularly suited to software federated learning implementations where HMMs might need to be run on a network of edge devices in a distributed manner.

We obtain the following results.

—
Our scheme can be partitioned into two modules, the inference module (forward, backward, E-step) and the learning module (M-step), which provide feedback to each other. We show in theorem 4.2 that each module separately converges exponentially fast to the correct value (the value of the BW algorithm) when the other module is kept switched off.
—
We prove in theorem 4.3 that our scheme (when divided into two modules, or when considered jointly) and the BW algorithm have the same set of positive fixed points.
—
We demonstrate practical feasibility of our algorithm by simulation of example HMMs, and by showing that parameters can be learned successfully with performance comparable to that of the BW algorithm.

The proposed reaction network is a formal (abstract) reaction network without particular chemical features. It might be turned into a chemically realistic reaction network by compiling the formal reactions into reactions between DNA oligonucleotide sequences, using recently proposed techniques [1–4]. Promising areas of application of this work come from cellular biology. In many cellular processes, only partial information about the environment is available in the form of a sequence of observations. For example, this might happen when an enzyme acts processively on a substrate or a molecular walker locates its position on a grid [33–35]. In the future, a molecule-based HMM device might learn a molecular environment within an organism by sensing and interacting with the environment at the molecular level. It might take action according to the learning outcome, for example, choosing among different drug options, or a molecule-based HMM might be used as a building block in an artificial cell or population of cells, enabling cooperative behaviour among cells or facilitating various tasks.

2. The Baum–Welch algorithm

A stochastic map between two finite sets P and Q is a matrix A = (a_pq)_P×Q, such that a_pq ≥ 0 for p ∈ P, q ∈ Q, and $\sum_{q \in Q} a_{p q} = 1$ for p ∈ P. One might think of a stochastic map as a collection of conditional probability distributions.

An HMM is a tuple (H, V, θ, ψ, π) of two finite sets H (for ‘hidden’) and V (for ‘visible’), an initial probability distribution π = (π_h)_h∈H on H, and two stochastic maps: the transition matrix θ : H → H, and the emission matrix ψ : H → V. See figure 1 for an example.

Figure 1. — Learning HMMs from sequences. (a) **HMM**: the hidden states, elements of H = {H₁, H₂}, are not directly observable. Instead, we observe elements from a set V = {V₁, V₂} of visible states. The parameters θ₁₁, θ₁₂, θ₂₁, θ₂₂ denote the probability of transitions between the hidden states. The probability of observing the states V₁, V₂ depends on the parameters ψ₁₁, ψ₁₂, ψ₂₁, ψ₂₂, as indicated in the figure. (b) The **forward algorithm** computes the likelihood of the first ℓ + 1 observed states (the position ℓ + 1 likelihood) $α_{ℓ + 1, 1} = α_{ℓ 1} θ_{11} ψ_{1 v_{ℓ + 1}} + α_{ℓ 2} θ_{21} ψ_{1 v_{ℓ + 1}}$ by forward propagating the position ℓ likelihoods α_ℓ1 and α_ℓ2. Here, v_ℓ, v_ℓ+1 ∈ V are the observed symbols at position ℓ and ℓ + 1, respectively. (c) The **backward algorithm** computes the conditional probability of the observed states ℓ + 1, …, L, given the observed state ℓ (the position ℓ conditional probability) $β_{ℓ 1} = θ_{11} ψ_{1 v_{ℓ + 1}} β_{ℓ + 1, 1} + θ_{12} ψ_{2 v_{ℓ + 1}} β_{ℓ + 1, 2}$ by propagating the position ℓ conditional probabilities β_ℓ+1,1 and β_ℓ+1,2 backwards. (d) The **BW algorithm** is a fixed point expectation-maximization computation. The E-step calls the forward and backward algorithm as subroutines and, conditioned on the entire observed sequence $(v_{1}, v_{2}, \dots, v_{L}) \in V^{L}$ , computes the probabilities γ_ℓg of being in states g ∈ H at position ℓ and the probabilities ξ_ℓgh of taking the transitions (g, h) ∈ H² at position ℓ. The M-step updates the parameters θ and ψ to maximize their likelihood given the observed sequence.

The likelihood $P (v_{1}, \dots, v_{L} | θ, ψ)$ of an observed sequence v₁, …, v_L ∈ V given the parameter (θ, ψ) is the probability

P (v_{1}, \dots, v_{L} | θ, ψ) = \sum_{η \in H^{L}} π_{h_{1}} ψ_{h_{1} v_{1}} \prod_{ℓ = 2}^{L} θ_{h_{ℓ - 1} h_{ℓ}} ψ_{h_{ℓ} v_{ℓ}},

2.1

where the sum is over all sequences η = (h₁, …, h_L) ∈ H^L, with L ≥ 2 (to avoid triviality). Knowing π and the sequence $(v_{1}, v_{2}, \dots, v_{L})$ of visible states, one can estimate maximum-likelihood values for (θ, ψ) by means of the BW algorithm, which is a particular instance of the EM algorithm. In §5, we show how to extend this construction to unknown π and multiple sequences while preserving the theoretical guarantees.

The standard BW algorithm is composed of four subroutines. The forward algorithm (figure 1b) outputs the quantities α_ℓh computed from initial values of (θ, ψ) and π by the recursion

α_{1 h} = π_{h} ψ_{h v_{1}} and α_{ℓ + 1, h} = \sum_{g \in H} α_{ℓ g} θ_{g h} ψ_{h v_{ℓ + 1}},

and the backward algorithm (figure 1c) outputs the quantities β_ℓh computed by the recursion

β_{L h} = 1 and β_{ℓ h} = \sum_{g \in H} θ_{h g} ψ_{g v_{ℓ + 1}} β_{ℓ + 1, g},

where h ∈ H and ℓ = 1, …, L − 1. The E-step computes

ξ_{ℓ g h} = \frac{α_{ℓ g} θ_{g h} ψ_{h v_{ℓ + 1}} β_{ℓ + 1, h}}{\sum_{f \in H} α_{ℓ f} β_{ℓ f}} and γ_{ℓ h} = \frac{α_{ℓ h} β_{ℓ h}}{\sum_{f \in H} α_{ℓ f} β_{ℓ f}},

for ℓ ∈ {1, 2, …, L} and g, h ∈ H. Finally, the M-step computes an update of (θ, ψ),

θ_{g h}^{'} = \frac{\sum_{ℓ = 1}^{L - 1} ξ_{ℓ g h}}{\sum_{ℓ = 1}^{L - 1} \sum_{f \in H} ξ_{ℓ g f}} and ψ_{h w}^{'} = \frac{\sum_{ℓ = 1}^{L} γ_{ℓ h} δ_{w, v_{ℓ}}}{\sum_{ℓ = 1}^{L} γ_{ℓ h}},

for g, h ∈ H and w ∈ V, where δ is the Dirac delta function $δ_{w, v_{ℓ}} = 1$ iff w = v_ℓ. Here, θ_gh′ and ψ_hw′ (with primes) are the updated parameter values of θ_gh and ψ_hw, respectively, after one iteration of the BW algorithm. As π is assumed known, it is not updated.

The BW algorithm (figure 1d) sequentially runs the forward algorithm, the backward algorithm, the E-step, and the M-step until the change in the likelihood (2.1) is insignificant. In practice, this means until the difference between (θ, ψ) and (θ′, ψ′) becomes smaller than a prescribed tolerance level. For the steps to be well defined, division by zero is not allowed. Denote the parameter region for which division by zero does not happen in any step of an iteration by

Θ = {(θ, ψ) : \sum_{ℓ = 1}^{L} α_{ℓ h} β_{ℓ h} > 0, h \in H}

(see the electronic supplementary material for details and proof). Furthermore, let $Θ_{0} = {(θ, ψ) | θ > 0, ψ > 0}$ and $Θ_{1} = {(θ, ψ) | θ \geq 0, ψ \geq 0}$ (the full parameter space), where vector/matrix inequalities are taken coordinate-wise. It follows that $Θ_{0} \subseteq Θ \subseteq Θ_{1}$ , and that all steps in one iteration of the BW algorithm are valid, provided $(θ, ψ) \in Θ$ . Each row (a conditional distribution) of the update (θ′, ψ′) has unit length automatically.

The following lemma is essential. See the electronic supplementary material for a proof.

Lemma 2.1. —

Assume all letters of V are in the observed sequence v₁, …, v_L and π > 0, L ≥ 2, or all letters of V are in v₂, …, v_L (excluding v₁) and π ≥ 0, L ≥ 3. If $(θ, ψ) \in Θ_{0}$ , then the same holds for the updated parameter value, $(θ^{'}, ψ^{'}) \in Θ_{0}$ .

Let (θ_n, ψ_n) denote the value of (θ′, ψ′) after n iterations of the BW algorithm. According to lemma 2.1, if $(θ_{0}, ψ_{0}) \in Θ_{0}$ , then $(θ_{n}, ψ_{n}) \in Θ_{0}$ for all n ≥ 0. If $(θ_{n}, ψ_{n}) \to (θ^{*}, ψ^{*}) \in Θ_{0}$ as n → ∞, then it is a local extremum or saddle point of the likelihood (2.1) [36–39]. If $(θ^{*}, ψ^{*}) \in Θ$ , then it is a fixed point of the BW algorithm. However, the limit might be outside $Θ$ [36–38]. In general, the BW algorithm might have multiple fixed points depending on the initial point (θ₀, ψ₀) and the observed sequence.

Henceforth, we make the assumptions of lemma 2.1. If v ∈ V is not observed, then ψ_hv = 0 for all h ∈ H, and one might equivalently consider a HMM with the state v removed from V. Hence, the real (mild) restriction is π > 0, L ≥ 2, or π ≥ 0, L ≥ 3.

3. Baum–Welch reaction network

Let S be a finite set. A formal (chemical) reaction over S is a pair $a, b \in Z_{\geq 0}^{S}$ of non-negative integer vectors. This is commonly represented in chemical equation notation as $\sum_{i \in S} a_{i} X_{i} \to \sum_{i \in S} b_{i} X_{i}$ , where X_i, i ∈ S, represent formal (chemical) species. Given a rate constant k > 0, mass-action kinetics describes the change of concentrations through time by the system of ODEs $\dot{X} (t) = (b - a) k \prod_{i \in S} X_{i} {(t)}^{a_{i}}$ , where species names are overloaded to also represent the vector of concentrations X(t) = (X_i(t))_i∈S at time t ≥ 0. A reaction network is a finite collection (a₁, b₁), (a₂, b₂), $\dots, (a_{m}, b_{m})$ of formal reactions, together with a choice of reaction rate constants. Combining the effect of all reactions yields the ODE system $\dot{X} (t) = \sum_{j = 1}^{m} (b_{j} - a_{j}) k_{j} \prod_{i \in S} X_{i} {(t)}^{a_{j i}}$ . For background on reaction networks, see [40].

We now describe a reaction network that implements HMM learning. Our scheme bears close correspondence to the BW algorithm as presented in the previous section. Let an HMM (H, V, θ, ψ, π) and an observed sequence $(v_{1}, v_{2}, \dots, v_{L}) \in V^{L}$ be given. Choose an arbitrary hidden state h* ∈ H and an arbitrary visible state v* ∈ V. This choice is merely an artifice to break symmetry, and our results hold independently of these choices. We represent every variable appearing in the BW algorithm by a separate species. The species are θ_gh, θ_gh′, ψ_hw, ψ_hw′, π_h, α_ℓh, β_ℓh, γ_ℓh, ξ_ℓgh with indices g, h ∈ H, w ∈ V, ℓ = 1, …, L. That is, both (θ, ψ) and the update (θ′, ψ′) are represented as species.

We first work out in full detail how the forward algorithm may be translated into chemical reactions. Recall that the forward algorithm is the recursion

α_{1 h} = π_{h} ψ_{h v_{1}} and α_{ℓ + 1, h} = \sum_{g \in H} α_{ℓ g} θ_{g h} ψ_{h v_{ℓ + 1}},

for ℓ = 1, …, L − 1 and g, h ∈ H. This implies the balance equations

α_{1 h} π_{h^{*}} ψ_{h^{*} v_{1}} = α_{1 h^{*}} π_{h} ψ_{h v_{1}}

3.1

and

α_{ℓ + 1, h} \sum_{g \in H} α_{ℓ g} θ_{g h^{*}} ψ_{h^{*} v_{ℓ + 1}} = α_{ℓ + 1, h^{*}} \sum_{g \in H} α_{ℓ g} θ_{g h} ψ_{h v_{ℓ + 1}},

3.2

for all h ∈ H and ℓ = 1, …, L − 1. For the initialization step, this prompts the use of the reactions

α_{1 h} + π_{h^{*}} + ψ_{h^{*} v_{1}} \overset{1}{\to} α_{1 h^{*}} + π_{h^{*}} + ψ_{h^{*} v_{1}}

and

α_{1 h^{*}} + π_{h} + ψ_{h v_{1}} \overset{1}{\to} α_{1 h} + π_{h} + ψ_{h v_{1}},

for all h ∈ H, where $\overset{1}{\to}$ indicates the reaction rate constant is put to 1. Only one species change in each reaction (the red species), the other species are catalysts that remain unchanged by the reaction. The rate by which α_1h is converted into α_1h* depends on the concentrations of the catalysts and, thus, is time-dependent. By design, at equilibrium the set of reactions fulfil the balance equation (3.1). For the recursion step, we use the reactions

\begin{aligned} α_{ℓ + 1, h} + α_{ℓ g} + θ_{g h^{*}} + ψ_{h^{*} v_{ℓ + 1}} \overset{1}{\to} α_{ℓ + 1, h^{*}} + α_{ℓ g} + θ_{g h^{*}} + ψ_{h^{*} v_{ℓ + 1}} \end{aligned}

and

\begin{aligned} α_{ℓ + 1, h^{*}} + α_{ℓ g} + θ_{g h} + ψ_{h v_{ℓ + 1}} \overset{1}{\to} α_{ℓ + 1, h} + α_{ℓ g} + θ_{g h} + ψ_{h v_{ℓ + 1}}, \end{aligned}

for all g, h ∈ H, and ℓ = 1, …, L − 1. Again by design, at equilibrium the balance equation (3.2) is fulfilled for this set of reactions.

The reactions depend on the observed sequence $(v_{1}, v_{2}, \dots, v_{L}) \in V^{L}$ of visible states. This is a problem because one would have to design different reaction networks for different observed sequences, defeating the whole purpose. To solve this problem, we introduce the catalyst species E_ℓw with $ℓ = 1, \dots, L$ and w ∈ V. The E_ℓw species are initialized such that at time zero, E_ℓw(0) = 1 for w = v_ℓ, and E_ℓw(0) = 0 for $w \neq v_{ℓ}$ . Thus, their concentrations remain fixed throughout time.

The other parts of the BW algorithm may be translated into chemical reactions using a similar logic. The full set of reactions is shown in table 1 and is termed the BW reaction network. It consists of four parts that are further divided into smaller subnetworks. Each of the four parts corresponds to one of the four parts of the BW algorithm, as shown in table 1. The corresponding equations of the ODE system are shown in appendix A (equations (A 1)–(A 8)). Each of the smaller subnetworks in table 1 when run independently can be separately analysed as a mono-molecular reaction network for which the dynamical behaviour can be fully described as shown in appendix A.

Table 1.

The BW reaction network, divided into four parts corresponding to the forward ( $R_{1}^{α}, R_{ℓ}^{α}$ ) and the backward ( $R_{ℓ}^{β}$ ) algorithm, the E-step ( $R_{ℓ}^{γ}, R_{ℓ}^{ξ}$ ) and the M-step ( $R^{θ}, R^{ψ}$ ). All parts but the M-step are further divided into small subnetworks, one for each ℓ = 1, …, L − 1 (or L). Catalytic species are in black, non-catalytic in red. The indices vary over g, h ∈ H, w ∈ V, and $ℓ = 1, \dots, L - 1$ . For γ species, ℓ = L is also allowed.

BW algorithm	BW reaction network	subnetwork
$\begin{matrix} α_{1 h} = π_{h} ψ_{h v_{1}} \\ α_{ℓ + 1, h} = \sum_{g \in H} α_{ℓ g} θ_{g h} ψ_{h v_{ℓ + 1}} \end{matrix}$	$\begin{aligned} α_{1 h} + π_{h^{}} + ψ_{h^{} w} + E_{1 w} \to α_{1 h^{}} + π_{h^{}} + ψ_{h^{} w} + E_{1 w} \\ α_{1 h^{}} + π_{h} + ψ_{h w} + E_{1 w} \to α_{1 h} + π_{h} + ψ_{h w} + E_{1 w} \\ α_{ℓ + 1, h} + α_{ℓ g} + θ_{g h^{}} + ψ_{h^{} w} + E_{ℓ + 1, w} \to α_{ℓ + 1, h^{}} + α_{ℓ g} + θ_{g h^{}} + ψ_{h^{} w} + E_{ℓ + 1, w} \\ α_{ℓ + 1, h^{}} + α_{ℓ g} + θ_{g h} + ψ_{h w} + E_{ℓ + 1, w} \to α_{ℓ + 1, h} + α_{ℓ g} + θ_{g h} + ψ_{h w} + E_{ℓ + 1, w} \end{aligned}$	$\begin{matrix} R_{1}^{α} \\ R_{ℓ}^{α} \end{matrix}$
$\begin{matrix} β_{L h} = 1 \\ β_{ℓ h} = \sum_{g \in H} θ_{h g} ψ_{g v_{ℓ + 1}} β_{ℓ + 1, g} \end{matrix}$	$\begin{aligned} β_{ℓ h} + β_{ℓ + 1, g} + θ_{h^{} g} + ψ_{g w} + E_{ℓ + 1, w} \to β_{ℓ h^{}} + β_{ℓ + 1, g} + θ_{h^{} g} + ψ_{g w} + E_{ℓ + 1, w} \\ β_{ℓ h^{}} + β_{ℓ + 1, g} + θ_{h g} + ψ_{g w} + E_{ℓ + 1, w} \to β_{ℓ h} + β_{ℓ + 1, g} + θ_{h g} + ψ_{g w} + E_{ℓ + 1, w} \end{aligned}$	$R_{ℓ}^{β}$
$\begin{aligned} γ_{ℓ h} = \frac{α_{ℓ h} β_{ℓ h}}{\sum_{g \in H} α_{ℓ g} β_{ℓ g}} \\ ξ_{ℓ g h} = \frac{α_{ℓ g} θ_{g h} ψ_{h v_{ℓ + 1}} β_{ℓ + 1, h}}{\sum_{f \in H} α_{ℓ f} β_{ℓ f}} \end{aligned}$	$\begin{aligned} γ_{ℓ h} + α_{ℓ h^{}} + β_{ℓ h^{}} \to γ_{ℓ h^{}} + α_{ℓ h^{}} + β_{ℓ h^{}} \\ γ_{ℓ h^{}} + α_{ℓ h} + β_{ℓ h} \to γ_{ℓ h} + α_{ℓ h} + β_{ℓ h} \\ ξ_{ℓ g h} + α_{ℓ h^{}} + θ_{h^{} h^{}} + β_{ℓ + 1, h^{}} + ψ_{h^{} w} + E_{ℓ + 1, w} \to \\ ξ_{ℓ h^{} h^{}} + α_{ℓ h^{}} + θ_{h^{} h^{}} + β_{ℓ + 1, h^{}} + ψ_{h^{} w} + E_{ℓ + 1, w} \\ ξ_{ℓ h^{} h^{}} + α_{ℓ g} + θ_{g h} + β_{ℓ + 1, g} + ψ_{h w} + E_{ℓ + 1, w} \to \\ ξ_{ℓ g h} + α_{ℓ g} + θ_{g h} + β_{ℓ + 1, g} + ψ_{h w} + E_{ℓ + 1, w} \end{aligned}$	$\begin{matrix} R_{ℓ}^{γ} \\ R_{ℓ}^{ξ} \end{matrix}$
$\begin{matrix} θ_{g h}^{'} = \frac{\sum_{ℓ = 1}^{L - 1} ξ_{ℓ g h}}{\sum_{ℓ = 1}^{L - 1} \sum_{f \in H} ξ_{ℓ g f}} \\ ψ_{h w}^{'} = \frac{\sum_{ℓ = 1}^{L} γ_{ℓ h} δ_{w, v_{ℓ}}}{\sum_{ℓ = 1}^{L} γ_{ℓ h}} \end{matrix}$	$\begin{matrix} θ_{g h}^{'} + ξ_{ℓ g h^{}} \to θ_{g h^{}}^{'} + ξ_{ℓ g h^{}} \\ θ_{g h^{}}^{'} + ξ_{ℓ g h} \to θ_{g h}^{'} + ξ_{ℓ g h} \\ ψ_{h w}^{'} + γ_{ℓ h} + E_{ℓ v^{}} \to ψ_{h v^{}}^{'} + γ_{ℓ h} + E_{ℓ v^{}} \\ ψ_{h v^{}}^{'} + γ_{ℓ h} + E_{ℓ w} \to ψ_{h w}^{'} + γ_{ℓ h} + E_{ℓ w} \end{matrix}$	$\begin{matrix} R^{θ} \\ R^{ψ} \end{matrix}$

Open in a new tab

4. The dynamics of the Baum–Welch reaction network

In the following, we expose the relationship between the dynamics of the BW algorithm and the BW reaction network. For this, we define three different, alternative ways of running the BW reaction network in table 1.

Let α_ℓ(t) = (α_ℓh(t))_h∈H denote the vector of concentrations at time t ≥ 0, with similar notation for other quantities. For convenience, we leave out the species E_ℓw.

4.1. BW1

Initialization. Fix an initial value, $(θ, ψ) = (θ_{0}, ψ_{0}) \in Θ_{0}$ . Initialize the concentrations at time t = 0: α_ℓ(0), β_ℓ(0), γ_ℓ(0) $\in R_{\geq 0}^{H}$ , ξ_ℓ(0), $θ^{'} (0) \in R_{\geq 0}^{H \times H}$ and $ψ^{'} (0) \in R^{H \times V}$ , such that $\sum_{h} α_{ℓ h} (0) = A_{ℓ}$ , $\sum_{h} β_{ℓ h} (0) = B_{ℓ}$ , $\sum_{h} γ_{ℓ h} (0) = 1$ , $\sum_{h} η_{ℓ g h} (0) = 1$ , $\sum_{h} θ_{g h}^{'} (0) = 1$ and $\sum_{v} ψ_{h v}^{'} (0) = 1$ , for arbitrary positive constants A_ℓ, B_ℓ.

Execution. The ODE systems of the subnetworks are executed sequentially in the order $R_{1}^{α}, \dots, R_{L}^{α}$ , $R_{L - 1}^{β}, \dots, R_{1}^{β}$ , $R_{1}^{γ},$ $\dots, R_{L}^{γ}$ , $R_{1}^{ξ}, \dots, R_{L - 1}^{ξ}$ , $R^{θ}$ , $R^{ψ}$ (table 1), such that the ODE system of the kth subnetwork (k = 1, …, 4L − 1) is run until an equilibrium is obtained with the chosen initial values, before the ODE system of the (k + 1)th subnetwork is executed. The catalyst species concentrations of the kth subnetwork are fixed to their equilibrium values obtained in the k′th subnetworks, k′ < k. In particular, the concentrations of the species θ, ψ are fixed to their initial values $(θ_{0}, ψ_{0}) \in Θ_{0}$ .

Iteration. The above procedure is then iterated. After completion of the nth iteration, n ≥ 0, the concentrations of the species θ, ψ in the (n + 1)th iteration are initialized to the equilibrium values $(θ_{n}, ψ_{n}) \in Θ_{0}$ of the species (θ′, ψ′) in the nth iteration. All other species are initialized at their current values.

Completion. This process is continued until convergence of (θ_n, ψ_n) has been achieved. The limit of (θ_n, ψ_n) is denoted the equilibrium of BW1 with initial point (θ₀, ψ₀).

4.2. BW2

Initialization. As in BW1.

Execution. The ODE systems corresponding to the inference module (forward, backward and E-step) and the learning module (M-step) of the BW algorithm are executed sequentially and run until an equilibrium is obtained, with the concentrations of the species θ, ψ fixed to the initial values $(θ_{0}, ψ_{0}) \in Θ_{0}$ .

Iteration and completion. BW2 is iterated similarly to BW1. The limit of (θ_n, ψ_n) is denoted the equilibrium of BW2 with initial point (θ₀, ψ₀). See figure 2.

Figure 2. — (a) Iterations of BW2. In the nth iteration, n ≥ 0, the concentrations of (θ, ψ) are fixed at (θ_n, ψ_n). The ODE system is initialized at (θ′(0), ψ′(0)) = (θ_n, ψ_n). Over time (θ′(t), ψ′(t)) converges to (θ_n+1, ψ_n+1). As n → ∞, (θ_n, ψ_n) converges to the blue dot. (b) A trajectory of BW3 converging towards the blue dot, initialized at a (random) point in the parameter space.

4.3. BW3

Initialization. As in BW1 with (θ′(0), ψ′(0)) = (θ₀, ψ₀).

Execution. The species θ, ψ are identified with the species θ′, ψ′. That is, unprimed species are substituted for primed ones in table 1. The full ODE system as described in §3 is executed at the same time without fixing any species concentrations so that all species concentrations are dynamic.

Completion. The limit of (θ′(t), ψ′(t)) (if it exists) is denoted the equilibrium of BW3 with initial point (θ₀, ψ₀).

BW1 replicates the BW algorithm with one equilibrium update for each step in one iteration of the BW algorithm. In total, there are 4L steps in one iteration of the BW algorithm, equalling the number of subnetworks (table 1). BW2 combines all 4L steps in one iteration of the BW algorithm, corresponding to the simultaneous calculation of the forward and backward algorithm, the E-step and the M-step in one iteration. BW3 makes all updates simultaneously. BW3 is a feedback system where the concentrations of all species are adjusted continuously.

Proofs of the following statements are in appendix A.

Theorem 4.1. —

If the BW algorithm and BW1 are initiated at the same point $(θ_{0}, ψ_{0}) \in Θ_{0}$ , then their equilibria always exist and agree.

Theorem 4.2. —

If BW1 and BW2 are initiated at the same point $(θ_{0}, ψ_{0}) \in Θ_{0}$ , then their equilibria always exist and agree. Furthermore, this equilibrium is globally asymptotically stable for the BW2 dynamics and convergence is exponentially fast, subject to the invariant subspace defined by the initialization of θ′, ψ′: the sum of the entries of each row of θ′(t), respectively, ψ′(t), is one.

The two theorems imply that the initial conditions of the species other than (θ′, ψ′) are irrelevant, hence we need not be concerned with these. Furthermore, BW1 and BW2 compute the same parameter value as the BW algorithm provided $(θ_{0}, ψ_{0}) \in Θ_{0}$ . Theorems 4.1 and 4.2 might not be true if (θ₀, ψ₀) is at the boundary $Θ ∖ Θ_{0}$ ; see below for discussion. In particular, an equilibrium of BW3 at the boundary might not be an equilibrium of the BW algorithm.

Theorem 4.3. —

The sets of positive equilibria of BW1, BW2 and BW3 agree.

Any of the three algorithms as well as the BW algorithm might have several positive equilibria depending on the observed data and the initial point. In general, one should expect coexistence of boundary and positive equilibria [37,39].

Lemma 4.4. —

The solution to the ODE system of BW3 exists for all times and any initial condition. Furthermore, assume $(θ^{'} (t), ψ^{'} (t)) \to (θ^{*}, ψ^{*}) \in Θ_{0}$ as t → ∞ for some initial condition, then x(t) → x* for some positive vector x*, where x(t) denotes the vector of concentrations of all species except the species θ′, ψ′.

If the initial point (θ₀, ψ₀) belongs to $Θ ∖ Θ_{0}$ , then the dynamics of BW1/BW2 and BW3 might differ. Imagine that one or more of the (time-dependent) reaction rates are set to zero in figure 3. This could happen in different ways. In figure 3b, the graph is broken into two, effectively replacing one conserved quantity with two conserved quantities, one for each subgraph (one graph has only one node, the other six). The equilibrium depends on how much ‘mass’ is allocated to each subgraph initially. The same is the case if two dead-end nodes are created, nodes from which no mass can flow out; see figure 3c. Also here, an additional conserved quantity is created. See lemma A.3 for further details.

Figure 3. — Each subnetwork in table 1 is a catalytic mono-molecular reaction network with time-dependent reaction rates. The designated species with indices h*, (h*, h*) or v*, depending on the subnetwork, correspond to the nodes in the middle of the reaction graphs. (a) If the reaction rates are all positive, then the reaction graph is strongly connected. (b) Three reaction rates are zero, breaking the graph into two connected components and creating one node (blue) from which all mass eventually disappears. (c) Two reaction rates are zero, creating two dead-end nodes (blue) from which mass cannot flow out. All mass eventually accumulates in the blue nodes.

To complete the story, discrepancies between BW1 and BW2 (and similarly BW3) might also be found if some reaction rates eventually become zero. We illustrate this with a simple example. Imagine a two-species reaction network X₁ → X₂ with rate κ(t) = e^−t, t ≥ 0, and conserved amount T = x₁(t) + x₂(t). It has solution x₁(t) = x₁(0)exp (e^−t − 1) → x₁(0)/e as t → ∞, whereas the limiting reaction network with zero reaction rate has constant solution. Thus, a trajectory of the limiting reaction network only approximates a trajectory of the reaction network with time-dependent reaction rate, if the initial condition of the latter is e ≈ 2.71 times that of the former.

5. Multiple sequences and unknown π

If there are multiple observed sequences, v₁, …, v_R, $v_{i} = (v_{i 1}, \dots, v_{i L_{i}})$ , i = 1, …, R, potentially of different length, then the forward and backward algorithms in the BW algorithm are replaced with R forward and backward algorithms, one for each sequence and initialized with the same values of (θ, ψ) and π [41]. Similarly, the E-step is replaced by R E-steps, one for each sequence. The main difference lies in the M-step, which is replaced by [41]

π_{g}^{'} = \frac{1}{R} \sum_{i = 1}^{R} γ_{i g}, θ_{g h}^{'} = \frac{\sum_{i = 1}^{R} \sum_{ℓ = 1}^{L_{i} - 1} ξ_{i ℓ g h}}{\sum_{i = 1}^{R} \sum_{ℓ = 1}^{L_{i} - 1} \sum_{f \in H} ξ_{i ℓ g f}}

and

ψ_{h w}^{'} = \frac{\sum_{i = 1}^{R} \sum_{ℓ = 1}^{L_{i}} γ_{i ℓ h} δ_{w, v_{i ℓ}}}{\sum_{i = 1}^{R} \sum_{ℓ = 1}^{L_{i}} γ_{i ℓ h}},

where π now is updated and the additional index i in γ_iℓh and ξ_iℓgh refers to the ith sequence, and otherwise similar to the one sequence algorithm (if π is considered known, the update step for π is just ignored). The forward and backward algorithms and the E-step are implemented similarly to the one sequence reaction network with a set of species for each observed sequence and common species for θ, ψ and π. The M-step might be implemented by the reactions

π_{g}^{'} + γ_{i 1 g^{*}} \to π_{g^{*}}^{'} + γ_{i 1 g^{*}}

5.1

π_{g^{*}}^{'} + γ_{i 1 g} \to π_{g}^{'} + γ_{i 1 g}

5.2

\begin{aligned} θ_{g h}^{'} + ξ_{i ℓ g h^{*}} & \to θ_{g h^{*}}^{'} + ξ_{i ℓ g h^{*}} \\ θ_{g h^{*}}^{'} + ξ_{i ℓ g h} & \to θ_{g h}^{'} + ξ_{i ℓ g h} \\ ψ_{h w}^{'} + γ_{i ℓ h} + E_{ℓ v^{*}} & \to ψ_{h v^{*}}^{'} + γ_{i ℓ h} + E_{ℓ v^{*}}^{i} \\ ψ_{h v^{*}}^{'} + γ_{i ℓ h} + E_{ℓ w} & \to ψ_{h w}^{'} + γ_{i ℓ h} + E_{ℓ w}^{i}, \end{aligned}

where $E_{ℓ w}^{i}$ is defined similarly to E_ℓw, but for the ith sequence alone. All these reactions take the same mono-molecular form as in the one sequence algorithm, resulting in statements analogous to theorems 4.1–4.3 (with analogous proofs). The reaction networks for the R sequences are executed simultaneously for the R sequences in the three different ways BW1, BW2 and BW3.

6. Examples

To illustrate the performance of BW3 versus the BW algorithm, we simulated an observed sequence of length L = 100 from an HMM with two hidden states and two visible states v₁, v₂; see appendix A for details and figure 4. Out of the 100 symbols, 49 were v₁, while 51 were v₂. Then, we ran the two algorithms for different choices of initial values with π = (0.5, 0.5) fixed. In one case, the two algorithms returned the same estimated values (figure 4a), namely

(θ^{'}, ψ^{'}) = ([\begin{matrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{matrix}], [\begin{matrix} 0.51 & 0.49 \\ 0.51 & 0.49 \end{matrix}]),

far from the true values (see appendix A), while in the second case (figure 4b), the two algorithms returned markedly different values. For the BW algorithm, we obtained

(θ^{'}, ψ^{'}) = ([\begin{matrix} 0.150 & 0.850 \\ 0.998 & 0.002 \end{matrix}], [\begin{matrix} 0.642 & 0.358 \\ 0.356 & 0.644 \end{matrix}]),

with log-likelihood −68.5617, while for BW3, we obtained values on the boundary,

(θ^{'}, ψ^{'}) = ([\begin{matrix} 0.000 & 1.000 \\ 0.229 & 0.771 \end{matrix}], [\begin{matrix} 0.000 & 1.000 \\ 0.631 & 0.369 \end{matrix}]),

with log-likelihood −68.6750.

Figure 4. — Dynamics of the BW3 and the BW algorithm for an HMM with two hidden states and two visible states for an observed sequence of length L = 100, and different initialization. (a) Log-likelihood of the sequence for BW3 (left) compared to BW algorithm (right) starting with equal initial parameter values. (b) Log-likelihood of the sequence for the BW3 (left) compared to BW algorithm (right) starting with random initial parameter values. In both cases, the likelihood is non-decreasing over time/iterations. See the main text and appendix A for details.

As a second example, we study a small observed sequence v₂, v₁, v₂, v₁, v₂ of length 5, generated from an HMM with two hidden states and two visible states. With initial parameter values given in appendix A, the BW algorithm returns the boundary equilibrium

(θ^{'}, ψ^{'}) = ([\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}], [\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}]),

while the BW3 returns a different boundary equilibrium

(θ^{'}, ψ^{'}) = ([\begin{matrix} 0.366 & 0.634 \\ 1 & 0 \end{matrix}], [\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}]) .

Both examples illustrate that at the boundary different things might happen. In the latter case, the BW3 equilibrium belongs to $Θ$ . However, when initiated at that point, the BW algorithm returns the first equilibrium point. Hence, the BW3 equilibrium is not an equilibrium of the BW algorithm (as the equilibrium in not positive, then theorem 4.3 is not contradicted).

We further study an example where the initial probabilities can be tuned by adding new reactions according to equations (5.1)–(5.2). Consider a casino game which depends on the output of a three-sided die. A fair die would have equal probabilities of each outcome [42]. However, the casino can switch to an unfair die with unequal outcome probabilities. The player only observes the die outcome sequence. An HMM with two hidden states of the casino (Honest and Dishonest) with three visible states as the outcome of the die can be used to predict if and when the casino is dishonest [42]. We use the following HMM:

(π^{'}, θ^{'}, ψ^{'}) = ([\begin{matrix} 1.0 \\ 0.0 \end{matrix}], [\begin{matrix} 0.95 & 0.05 \\ 0.25 & 0.75 \end{matrix}], [\begin{matrix} 0.34 & 0.33 & 0.33 \\ 0.01 & 0.01 & 0.98 \end{matrix}])

to generate sample data with 300 rolls, where the Dishonest state has unequal probabilities of the die roll. The statistics of the generated sample is visualized in figure 5a. We then start with a random initial distribution of the HMM parameters and train it using both BW and BW3 on the training data (first 150 rolls) and infer the hidden states based on the learned parameters on the test data (next 150 rolls). BW converges to

(π^{'}, θ^{'}, ψ^{'}) = ([\begin{matrix} 0.0 \\ 1.0 \end{matrix}], [\begin{matrix} 0.9 & 0.01 \\ 0.03 & 0.96 \end{matrix}], [\begin{matrix} 0.03 & 0.07 & 0.90 \\ 0.37 & 0.35 & 0.28 \end{matrix}]),

with a log-likelihood of −147.9. BW3 converges to

(π^{'}, θ^{'}, ψ^{'}) = ([\begin{matrix} 0.81 \\ 0.19 \end{matrix}], [\begin{matrix} 0.48 & 0.52 \\ 0.43 & 0.57 \end{matrix}], [\begin{matrix} 0.28 & 0.23 & 0.49 \\ 0.27 & 0.35 & 0.38 \end{matrix}]),

with a log-likelihood of −157.4. The inference result of the training is visualized in figure 5b. Spikes indicate the switch to the hidden state Dishonest. We find that both BW and BW3 predict the hidden state sequence with high accuracy when compared with the HMM used for generating the sequences. More details can be found in appendix A.

7. Discussion

Some machine learning algorithms like gradient descent are based on continuous-time dynamical systems while others like message passing appear essentially discrete. Before our work, the BW algorithm has fallen in the second category. Performing the different steps in order was seen as an important part of the algorithm. Here, we have shown a continuous-time dynamical system based on the BW algorithm which implements HMM learning. Our work has exposed that there is a greater design space for BW algorithm implementations than was previously known. Specifically, the steps need not be run to completion, and need not be run in order. This reduces the synchronization burden for distributed implementations of such extended BW algorithms. As a result, our algorithm can be implemented on a distributed network of edge devices, and in this manner serves as a federated learning scheme. Our work exposes that reaction networks are a natural design language to think about the design of machine learning algorithms when the data and computation need to be handled in a distributed manner with minimum overhead of synchronization.

Our implementation of this algorithm is explicitly given in the form of a chemical reaction network. This opens the possibility of molecular implementations of this algorithm. If implemented in an artificial cell, it might provide the cell with the ability to sample possible realities, and act according to these imaginings. This is important because the world of molecules is a noisy world. To obtain exquisite control over a large number of molecules—which is one of the main goals of nanotechnology—requires algorithms that will be robust to such noise. Denoising or error correction algorithms are essentially statistical algorithms of the kind that we have implemented here. Thus, not only does our work point to future technological directions in nanotechnology, we believe reaction network implementations of algorithms of this kind are inevitable when attempting to exquisitely control large ensembles of molecules.

One of the big challenges in biology has been the immense complexity of living cells. While on one hand, cells are capable of behaviours of incredible sophistication, on the other hand our imagination about their workings has been limited to a very low level of abstraction described by systems biology. To understand the behaviour of cells will require the invention of layers of abstraction above that of systems biology. These layers will have to explain the algorithmic power of biochemical reaction networks. Biochemical reaction networks in living systems are known to be performing inference. There may be opportunity to understand them from the vantage point of our work which is giving concrete schemes by which reaction networks can lead to inference. Our design has reproduced certain characteristics found in biochemical reaction networks like the ubiquitous presence of enzymes and futile cycles, for example in phosphorylation and dephosphorylation of the same site on a protein. This reproduction has led to a novel idea to explain futile cycles: their purpose is to achieve probabilistic inference. For example, reactions of the form $A \to A^{*}$ catalysed in the forward direction by a kinase and in the backward direction by a phosphatase are shown here to be the paradigmatic form of reactions needed to carry out probabilistic inference. A research programme to pursue this idea to its logical conclusions is suggested by the work here, but is beyond the scope of the current paper.

Appendix A

A.1. The ODE system of the Baum–Welch reaction network

With mass-action kinetics the ODE system of the reaction network in table 1 becomes

{\dot{α}}_{1 h} = α_{1 h^{*}} π_{h} ψ_{h v_{1}} - α_{1 h} π_{h^{*}} ψ_{h^{*} v_{1}},

A 1

{\dot{α}}_{ℓ h} = α_{ℓ h^{*}} \sum_{g \in H} α_{ℓ - 1, g} θ_{g h} ψ_{h v_{ℓ}} - α_{ℓ h} \sum_{g \in H} α_{ℓ - 1, g} θ_{g h^{*}} ψ_{h^{*} v_{ℓ}},

A 2

{\dot{β}}_{L} = 0,

A 3

{\dot{β}}_{ℓ h} = β_{ℓ h^{*}} \sum_{g \in H} β_{ℓ + 1, g} θ_{g h} ψ_{h v_{ℓ + 1}} - β_{ℓ h} \sum_{g \in H} β_{ℓ + 1, g} θ_{g h^{*}} ψ_{h^{*} v_{ℓ + 1}},

A 4

{\dot{γ}}_{ℓ h} = γ_{ℓ h^{*}} α_{ℓ h} β_{ℓ h} - γ_{ℓ h} α_{ℓ h^{*}} β_{ℓ h^{*}},

A 5

\begin{aligned} {\dot{ξ}}_{ℓ g h} = ξ_{ℓ h^{*} h^{*}} α_{ℓ g} θ_{g h} β_{ℓ + 1, g} ψ_{h v_{ℓ + 1}} \\ - ξ_{ℓ g h} α_{ℓ h^{*}} θ_{h^{*} h^{*}} β_{ℓ + 1, h^{*}} ψ_{h^{*} v_{ℓ + 1}}, \end{aligned}

A 6

{\dot{θ}}_{g h}^{'} = θ_{g h^{*}}^{'} \sum_{ℓ = 1}^{L - 1} ξ_{ℓ g h} - θ_{g h}^{'} \sum_{ℓ = 1}^{L - 1} ξ_{ℓ g h^{*}}

A 7

and {\dot{ψ}}_{h w}^{'} = ψ_{h v^{*}}^{'} \sum_{ℓ = 1}^{L} γ_{ℓ h} δ_{w, v_{ℓ}} - ψ_{h w}^{'} \sum_{ℓ = 1}^{L} γ_{ℓ h} δ_{v^{*}, v_{ℓ}},

A 8

where h* ∈ H and v* ∈ V are fixed states, $h \neq h^{*}$ in (A 1)–(A 5), $(g, h) \neq (h^{*}, h^{*})$ in (A 6), g ∈ H, $h \neq h^{*}$ in (A 7), and h ∈ H, $w \neq v^{*}$ in (A 8). In (A 2), ℓ = 2, …, L, in (A 4) and (A 6), ℓ = 1, …, L − 1, and in (A 5), ℓ = 1, …, L. The equations for the remaining species are obtained as ${\dot{α}}_{1 h^{*}} = - \sum_{h \neq h^{*}} {\dot{α}}_{1 h}$ , and similarly for the other species, reflecting that the sum $A_{ℓ} = \sum_{h} α_{1 h} (t)$ is conserved through time. The explicit dependence on time is suppressed in (A 1)–(A 8). The species E_ℓw are left out for convenience.

A.2. Reformulation of the Baum–Welch algorithm

The BW algorithm can be formulated in an equivalent way that exposes the relationship to the dynamics of the BW reaction network; see the electronic supplementary material for proof and details. Define α_ℓh, β_ℓh, h ∈ H, ℓ = 1, …, L, recursively by

α_{1 h} = \frac{A_{1} π_{h} ψ_{h v_{1}}}{\sum_{g \in H} π_{g} ψ_{g v_{1}}}, α_{ℓ h} = \frac{A_{ℓ} \sum_{g \in H} α_{ℓ - 1, g} θ_{g h} ψ_{h v_{ℓ}}}{\sum_{f \in H} \sum_{g \in H} α_{ℓ - 1, f} θ_{f g} ψ_{g v_{ℓ}}}

and

β_{L h} = \frac{B_{L}}{# H}, β_{ℓ h} = \frac{B_{ℓ} \sum_{g \in H} θ_{h g} ψ_{g v_{ℓ + 1}} β_{ℓ + 1, g}}{\sum_{f \in H} \sum_{g \in H} θ_{f g} ψ_{g v_{ℓ + 1}} β_{ℓ + 1, g}},

where A_ℓ, B_ℓ are arbitrary positive constants, ℓ = 1, …, L. These two recursions correspond to the forward and backward algorithms of the standard BW algorithm, respectively, with the essential difference being that α_ℓh, β_ℓh scale to A_ℓ, B_ℓ, respectively. The E-step and the M-step are identical to the corresponding steps of the standard BW algorithm.

In this formulation, α_ℓh, β_ℓh, h ∈ H, in the BW algorithm can be seen as the equilibrium values of the corresponding reaction networks, $R_{ℓ}^{α}, R_{ℓ}^{β}$ ; see lemma A.3. As the sum of γ_ℓh, ξ_{ℓg h}, θ_ℓh, respectively, over h is one, a similar interpretation applies in this case. Similarly for ψ_ℓv, where the sum over v is one.

A.3. Mono-molecular reaction networks

Properties of mono-molecular reaction networks are of particular importance for the proofs of the statements in the main text.

Consider a mono-molecular reaction network with n species, X₁, …, X_n, and reactions

X_{r_{i}} \overset{κ_{i} (t)}{\to} X_{p_{j}}, for i = 1, \dots, m,

where $r_{i} \neq p_{i}$ , r_i, p_i ∈ {1, …, n} and κ_i(t), i = 1, …, m, are mass-action time-dependent reaction rate constants, t ≥ 0. We assume κ_i(t) > 0 and let κ(t) = (κ₁(t), …, κ_m(t)) be the vector of reaction rate constants. Furthermore, we assume there is at most one reaction i such that (r_i, p_i) = (r, p) ∈ {1, …, n}². (If there is more than one reaction, the reaction rate constants are summed.)

The (non-autonomous) ODE system of the reaction network is

{\dot{x}}_{i} = - x_{i} \sum_{j : r_{j} = i} κ_{j} (t) + \sum_{j : p_{j} = i} κ_{j} (t) x_{p_{j}}, i = 1, \dots, n .

Define the n × n matrix A(t) = (a_ij(t))_{i,j=1, …,n} by

a_{i j} (t) = κ_{k} (t), if there is k \in {1, \dots, m} such that (r_{k}, p_{k}) = (j, i)

and

a_{i i} (t) = - \sum_{j : r_{j} = i} κ_{j} (t) .

Then, the ODE system can be written as

\dot{x} = A (t) x .

A 9

The matrix A(t) is a Laplacian matrix with column sums zero, the off-diagonal elements are non-negative and the diagonal elements are non-positive. In particular, the non-zero eigenvalues of A(t) have negative real part [44,45].

The Laplacian graph G corresponding to A(t) is independent of time as κ(t) is positive for t ≥ 0 by assumption. Let λ be the number of connected components of G and τ the number of maximal strongly connected subgraphs. Always τ ≥ λ. In figure 3a: τ = λ = 1, figure 3b: τ = λ = 2, and figure 3c: τ = 2, λ = 1. Each connected component of the Laplacian graph gives rise to a conserved quantity, namely the sum of the variables in the connected component:

T_{j} = \sum_{i \in G_{j}} x_{i}, T_{j} \geq 0,

where G_j is the jth connected component, j = 1, …, λ. The non-negative vector $T = (T_{1}, \dots, T_{λ})$ defines an invariant subspace of (A 9). Let $z_{j} \in R^{n}$ be the vector with ith component 1 if i ∈ G_j and otherwise zero. Then, z_j is a left kernel vector of A(t), z_jA(t) = 0, and a linear first integral of (A 9).

Lemma A.1. —

The following hold:

i
The rank of A(t) equals n − λ if and only if τ = λ.

ii
If τ = λ, then $z_{1}, \dots, z_{λ}$ is a basis of the left kernel of A(t). Any linear first integral is a linear combination of $z_{1}, \dots, z_{λ}$ .

iii
Assume A(t) = A is independent of time. If τ = λ, then there is a unique non-negative exponentially and globally stable equilibrium within each invariant subspace given by T ≥ 0. The equilibrium is positive if and only if the connected components are strongly connected and T > 0.

Lemma A.2. —

Assume τ = λ and that κ(t), t ≥ 0, converges exponentially fast towards $κ = (κ_{1}, \dots, κ_{m}) \in R_{> 0}^{m}$ as t → ∞, i.e. there exists γ₁ > 0 and K₁ > 0 such that

$‖ κ (t) - κ ‖ \leq K_{1} e^{- γ_{1} t} for t \geq 0.$

Let A be the matrix A(t) with κ inserted for κ(t). Then, any solution to $\dot{x} = A (t) x$ converges exponentially fast towards the unique globally stable equilibrium $x^{*} \in R_{> 0}^{n}$ of the ODE system $\dot{x} = A x$ within the relevant invariant subspace, provided T > 0. That is, there exists γ₂ > 0 and K₂ > 0 such that

$‖ x (t) - x^{*} ‖ \leq K_{2} e^{- γ_{2} t} for t \geq 0.$

Lemma A.3. —

Consider a mono-molecular reaction network with reactions

$X_{i} ⇌_{κ_{i}}^{κ_{n}} X_{n}, for i = 1, \dots, n - 1,$

such that κ_i > 0 for i = 1, …, n − 1 (then the Laplacian graph is as in figure 3a). In particular, λ = t = 1. Then, the unique positive equilibrium for T > 0 is

$x^{*} = (x_{1}^{*}, \dots, x_{n}^{*}) and x_{i}^{*} = \frac{T κ_{i}}{\sum_{j = 1}^{n} κ_{j}}, for i = 1, \dots, n .$

A.4. Proofs

We provide proofs of theorems 4.1–4.3 and lemma 4.4. The remaining statements are proven in the electronic supplementary material, except lemma A.3 which is straightforward. We assume π > 0, L ≥ 2, and note that similar arguments apply if π ≥ 0 and L ≥ 3.

A.4.1. Proof of theorem 4.1

It is sufficient to demonstrate that (θ_n, ψ_n), n ≥ 0, are the same and positive after each iteration of the BW algorithm and BW1. For this, assume θ, ψ (and π) are positive. Then the subnetwork $R_{1}^{α}$ is mono-molecular with species α_1h, h ∈ H, and fixed positive reaction rates. The Laplacian graph is strongly connected and τ = λ = 1. Lemma A.1(iii) gives global stability of the unique positive equilibrium for the concentrations of the species α_1h, h ∈ H, identical to that of the BW algorithm (compare lemma A.3). This argument is repeated inductively for each subnetwork (in the order listed for BW1). Specifically, for the kth subnetwork (k = 2, …, 4L), the equilibrium of the k′th subnetwork, k′ < k, is positive by induction hypothesis. Hence the reaction rates for the kth subnetwork are positive and fixed, given by the equilibrium values of the subnetworks k′ < k and θ, ψ. Hence, the Laplacian graph is strongly connected with τ = λ = 1. Using lemma A.1(iii) completes the argument. In particular, (θ′, ψ′) is positive and the same for the BW algorithm and BW1.

A.4.2. Proof of theorem 4.2

The proof is similar to that of theorem 4.1. The layered structure of BW2 implies that the equilibrium of the species in the kth subnetwork is unaffected by the presence of the k′th subnetwork, k′ > k. Assume θ, ψ (and π) are positive. Then the subnetwork $R_{1}^{α}$ is mono-molecular with species α_1h, h ∈ H, and fixed positive reaction rates. The Laplacian graph is strongly connected and τ = λ = 1. Lemma A.1(iii) gives exponential stability of the unique positive equilibrium for the concentrations of the species α_1h, h ∈ H. By lemma A.1(iii), it is the same as that of BW1. Consider next the subnetwork $R_{2}^{α}$ . It is mono-molecular with positive time-dependent reaction rates, given by the concentrations of the species α_1h, h ∈ H, at time t ≥ 0 and θ, ψ. Hence, the Laplacian graph is strongly connected and τ = λ = 1. Applying lemma A.2 yields exponential convergence towards the unique positive equilibrium for the concentrations of the species α_2h, h ∈ H. This equilibrium is the same as that of BW1 for $R_{2}^{α}$ (lemma A.2). The theorem now follows by repeated application of the above procedure for all subnetworks.

A.4.3. Proof of theorem 4.3

If the catalyst species concentrations of a subnetwork at equilibrium are positive, then the unique equilibrium concentrations of the non-catalyst species are likewise positive (lemma A.3). Consequently, if θ, ψ (and π) are positive, then the unique equilibrium concentrations of the species α_{1 h}, h ∈ H, in $R_{1}^{α}$ are positive. Inductively, the equilibrium concentrations in any subnetwork are positive, if θ, ψ (and π) are positive.

For BW1 and BW2, if $(θ, ψ) \in Θ_{0}$ is an equilibrium, then (θ′, ψ′) is well-defined and (θ′, ψ′) = (θ, ψ). Consequently, the species concentrations at equilibrium solve the same equations in the three cases, thus the sets of equilibria are the same. The same reasoning gives the uniqueness statement. Theorem 4.1 gives uniqueness for the BW algorithm.

A.4.4. Proof of lemma 4.4

The first part follows by standard arguments as the trajectories are confined to a compact space. The second part follows from [46, theorem 1.2] and lemma A.1(iii).

A.5. Simulated examples

We consider an HMM with two hidden states and two visible states v₁, v₂, and generate a random sequence of length L = 100 from π = (0.5, 0.5) (fixed throughout) and

(θ, ψ) = ([\begin{matrix} 0.2 & 0.8 \\ 0.7 & 0.3 \end{matrix}], [\begin{matrix} 0.75 & 0.25 \\ 0.3 & 0.7 \end{matrix}]) .

The generated sequence is

\begin{aligned} (v_{2}, v_{1}, v_{2}, v_{2}, v_{1}, v_{2}, v_{2}, v_{1}, v_{2}, v_{1}, v_{1}, v_{1}, v_{2}, v_{1}, v_{1}, v_{1}, v_{2}, \\ v_{1}, v_{1}, v_{2}, v_{2}, v_{2}, v_{2}, v_{1}, v_{1}, v_{2}, v_{2}, v_{1}, v_{1}, v_{1}, v_{2}, v_{1}, v_{2}, v_{2}, \\ v_{2}, v_{2}, v_{1}, v_{1}, v_{1}, v_{2}, v_{2}, v_{2}, v_{1}, v_{2}, v_{1}, v_{2}, v_{1}, v_{1}, v_{2}, v_{2}, v_{2}, \\ v_{2}, v_{1}, v_{1}, v_{1}, v_{1}, v_{2}, v_{1}, v_{2}, v_{1}, v_{2}, v_{1}, v_{1}, v_{1}, v_{1}, v_{1}, v_{1}, v_{2}, \\ v_{1}, v_{2}, v_{2}, v_{1}, v_{2}, v_{1}, v_{2}, v_{1}, v_{2}, v_{1}, v_{2}, v_{1}, v_{2}, v_{1}, v_{1}, v_{2}, v_{2}, \\ v_{2}, v_{2}, v_{1}, v_{2}, v_{2}, v_{1}, v_{1}, v_{1}, v_{1}, v_{1}, v_{1}, v_{2}, v_{2}, v_{2}, v_{2}) . \end{aligned}

The log-likelihood of the sequence is −68.7955. We trained the HMM using BW3 and the BW algorithm, starting from two different initials values: (1) equal transition and emission probabilities

(θ^{'}, ψ^{'}) = ([\begin{matrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{matrix}], [\begin{matrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{matrix}])

and (2) randomly chosen values

(θ^{'}, ψ^{'}) = ([\begin{matrix} 0.774 & 0.226 \\ 0.427 & 0.573 \end{matrix}], [\begin{matrix} 0.916 & 0.084 \\ 0.510 & 0.490 \end{matrix}]) .

The other species in the reactions of the BW3 were initialized with (1) equal concentrations and (2) concentrations obtained from one step of the BW algorithm and appropriate normalization.

In the second example, the initial values are π = (0.6, 0.4) (fixed), and initial transition and emission probabilities

(θ^{'}, ψ^{'}) = ([\begin{matrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{matrix}], [\begin{matrix} 0.6 & 0.4 \\ 0.6 & 0.4 \end{matrix}]) .

In the final example of a dishonest casino, the randomly chosen initial values are π = (0.83, 0.17), and initial transition and emission probabilities

(θ^{'}, ψ^{'}) = ([\begin{matrix} 0.1 & 0.9 \\ 0.02 & 0.98 \end{matrix}], [\begin{matrix} 0.12 & 0.08 & 0.80 \\ 0.07 & 0.89 & 0.04 \end{matrix}]) .

For all of the examples, BW3 was simulated using the ODEINT package from the Scipy library for a resolution of 100 points starting with time 0 to the final time. For the BW algorithm, the hmmlearn package was used from the Scikit library.

Data accessibility

Code used for the simulation of examples in this paper may be found at: https://github.com/GeekyPeas/Chemical-Baum-Welch-Algorithm.

The data are provided in electronic supplementary material [43].

Authors' contributions

C.W.: conceptualization, formal analysis, funding acquisition, methodology, writing—original draft; A.B.: conceptualization, methodology, software, validation, writing—review and editing; A.S.: conceptualization, funding acquisition, methodology, software, validation, writing—review and editing; M.G.: conceptualization, writing—original draft.

All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

Conflict of interest declaration

We declare we have no competing interests.

Funding

The work presented in this article is supported by Novo Nordisk Foundation, grant no. NNF19OC0058354. A.B. thanks Bharti Center for Communication, IIT Bombay. A.S. thanks the INSPIRE DST Scholarship for Higher Education.

References

1.Soloveichik D, Seelig G, Winfree E. 2010. DNA as a universal substrate for chemical kinetics. Proc. Natl Acad. Sci. USA 107, 5393-5398. ( 10.1073/pnas.0909380107) [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Srinivas N. 2015. Programming chemical kinetics: engineering dynamic reaction networks with DNA strand displacement. PhD thesis, California Institute of Technology.
3.Qian L, Soloveichik D, Winfree E. 2011. Efficient Turing-universal computation with DNA polymers. In DNA computing and molecular programming (eds Sakakibara Y, Mi Y). Lecture Notes in Computer Science, vol. 6518, pp. 123-140. Berlin, Germany: Springer. ( 10.1007/978-3-642-18305-8_12) [DOI] [Google Scholar]
4.Cardelli L. 2011. Strand algebras for DNA computing. Nat. Comput. 10, 407-428. ( 10.1007/s11047-010-9236-7) [DOI] [Google Scholar]
5.Lakin MR, Youssef S, Cardelli L, Phillips A. 2011. Abstractions for DNA circuit design. J. R. Soc. Interface 9, 470-486. ( 10.1098/rsif.2011.0343) [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Cardelli L. 2013. Two-domain DNA strand displacement. Math. Struct. Comp. Sci. 23, 247-271. ( 10.1017/S0960129512000102) [DOI] [Google Scholar]
7.Chen YJ, Dalchau N, Srinivas N, Phillips A, Cardelli L, Soloveichik D, Seelig G. 2013. Programmable chemical controllers made from DNA. Nat. Nanotech. 8, 755-762. ( 10.1038/nnano.2013.189) [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Lakin MR, Stefanovic D, Phillips A. 2016. Modular verification of chemical reaction network encodings via serializability analysis. Theor. Comp. Sci. 632, 21-42. ( 10.1016/j.tcs.2015.06.033) [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Srinivas N, Parkin J, Seelig G, Winfree E, Soloveichik D. 2017. Enzyme-free nucleic acid dynamical systems. Science 358, eaal2052. ( 10.1126/science.aal2052) [DOI] [PubMed] [Google Scholar]
10.Cherry KM, Qian L. 2018. Scaling up molecular pattern recognition with DNA-based winner-take-all neural networks. Nature 559, 370-376. ( 10.1038/s41586-018-0289-6) [DOI] [PubMed] [Google Scholar]
11.Zechner C, Seelig G, Rullan M, Khammash M. 2016. Molecular circuits for dynamic noise filtering. Proc. Natl Acad. Sci. USA 113, 4729-4734. ( 10.1073/pnas.1517109113) [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Hjelmfelt A, Weinberger ED, Ross J. 1991. Chemical implementation of neural networks and Turing machines. Proc. Natl Acad. Sci. USA 88, 10 983-10 987. ( 10.1073/pnas.88.24.10983) [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Buisman HJ, Hilbers PAJ, Liekens AML. 2009. Computing algebraic functions with biochemical reaction networks. Artif. Life 15, 5-19. ( 10.1162/artl.2009.15.1.15101) [DOI] [PubMed] [Google Scholar]
14.Oishi K, Klavins E. 2011. Biomolecular implementation of linear I/O systems. Syst. Biol. IET 5, 252-260. ( 10.1049/iet-syb.2010.0056) [DOI] [PubMed] [Google Scholar]
15.Soloveichik D, Cook M, Winfree E, Bruck J. 2008. Computation with finite stochastic chemical reaction networks. Nat. Comput. 7, 615-633. ( 10.1007/s11047-008-9067-y) [DOI] [Google Scholar]
16.Chen HL, Doty D, Soloveichik D. 2014. Deterministic function computation with chemical reaction networks. Nat. Comput. 13, 517-534. ( 10.1007/s11047-013-9393-6) [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Qian L, Winfree E. 2011. Scaling up digital circuit computation with DNA strand displacement cascades. Science 332, 1196-1201. ( 10.1126/science.1200520) [DOI] [PubMed] [Google Scholar]
18.Napp NE, Adams RP. 2013. Message passing inference with chemical reaction networks. Adv. Neural Inform. Proc. Syst. 26, 2247-2255. [Google Scholar]
19.Qian L, Winfree E, Bruck J. 2011. Neural network computation with DNA strand displacement cascades. Nature 475, 368-372. ( 10.1038/nature10262) [DOI] [PubMed] [Google Scholar]
20.Cardelli L, Kwiatkowska M, Whitby M. 2018. Chemical reaction network designs for asynchronous logic circuits. Nat. Comput. 17, 109-130. ( 10.1007/s11047-017-9665-7) [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Kobayashi TJ. 2010. Implementation of dynamic Bayesian decision making by intracellular kinetics. Phys. Rev. Lett. 104, 228104. ( 10.1103/PhysRevLett.104.228104) [DOI] [PubMed] [Google Scholar]
22.Kobayashi TJ. 2013. Information decoding in microscopic biological processes. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2013, 2704-2707. ( 10.1109/EMBC.2013.6610098) [DOI] [PubMed] [Google Scholar]
23.McGregor S, Vasas V, Husbands P, Fernando C. 2012. Evolution of associative learning in chemical networks. PLoS Comput. Biol. 8, e1002739. ( 10.1371/journal.pcbi.1002739) [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Bowsher CI, Swain P. 2012. Identifying sources of variation and the flow of information in biochemical networks. Proc. Natl Acad. Sci. USA 109, E1320-E1328. ( 10.1073/pnas.1119407109) [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Durbin R, Eddy S, Krogh A, Mitchison G. 1998. Biological sequence analysis. Cambridge, UK: Cambridge University Press. [Google Scholar]
26.Juang BH, Rabiner LR. 1991. Hidden Markov models for speech recognition. Technometrics 33, 251-272. ( 10.1080/00401706.1991.10484833) [DOI] [Google Scholar]
27.Gopalkrishnan M. 2016. A scheme for molecular computation of maximum likelihood estimators for log-linear models. In DNA computing and molecular programming (eds Rondelez Y, Woods D). Lecture Notes in Computer Science, vol. 9818, pp. 3-18. Cham, Switzerland: Springer. ( 10.1007/978-3-319-43994-5_1) [DOI] [Google Scholar]
28.Virinchi MV, Behera A, Gopalkrishnan M. 2017. A stochastic molecular scheme for an artificial cell to infer its environment from partial observations. In DNA computing and molecular programming (eds Brijder R, Qian L). Lecture Notes in Computer Science, vol. 10467, pp. 82-97. Cham, Switzerland: Springer. ( 10.1007/978-3-319-66799-7_6) [DOI] [Google Scholar]
29.Virinchi MV, Behera A, Gopalkrishnan M. 2018. A reaction network scheme which implements the EM algorithm. In DNA computing and molecular programming (eds Doty D, Dietz H). Lecture Notes in Computer Science, vol. 11145, pp. 189-207. Cham, Switzerland: Springer. ( 10.1007/978-3-030-00030-1_12) [DOI] [Google Scholar]
30.Singh A, Wiuf C, Behara A, Gopalkrishnan M. 2019. A reaction network scheme which implements inference and learning for hidden Markov models. In DNA computing and molecular programming (eds Thachuk C, Liu Y). Lecture Notes in Computer Science, vol. 11648, pp. 54-79. Cham, Switzerland: Springer. ( 10.1007/978-3-030-26807-7_4) [DOI] [Google Scholar]
31.Poole W, Ortiz-Munoz A, Behera A, Jones NS, Ouldridge TE, Winfree E, Gopalkrishnan M. 2017. Chemical Boltzmann machines. In DNA computing and molecular programming (eds Brijder R, Qian L). Lecture Notes in Computer Science, vol. 10467, pp. 210-231. Cham, Switzerland: Springer. ( 10.1007/978-3-319-66799-7_14) [DOI] [Google Scholar]
32.Rabiner LR. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257-286. ( 10.1109/5.18626) [DOI] [Google Scholar]
33.Shin JS, Pierce NA. 2004. A synthetic DNA walker for molecular transport. J. Am. Chem. Soc. 126, 10 834-10 835. ( 10.1021/ja047543j) [DOI] [PubMed] [Google Scholar]
34.Reif J. 2003. The design of autonomous DNA nano-mechanical devices: walking and rolling DNA. DNA Comput. 2, 439-461. ( 10.1023/B:NACO.0000006775.03534.92) [DOI] [Google Scholar]
35.Sherman W, Seeman N. 2004. A precisely controlled DNA biped walking device. Nano Lett. 4, 1203-1207. ( 10.1021/nl049527q) [DOI] [Google Scholar]
36.Dempster AP, Laird NM, Rubin DB. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1-38. ( 10.1111/j.2517-6161.1977.tb01600.x) [DOI] [Google Scholar]
37.Wu CFJ. 1993. On the convergence properties of the EM algorithm. Ann. Stat. 11, 95-103. [Google Scholar]
38.McLachlan GJ, Krishnan T. 2008. The EM algorithm and extensions. Hoboken, NJ: John Wiley & Sons, Inc. [Google Scholar]
39.Pachter L, Sturmfels B. 2005. Algebraic statistics for computational biology. Cambridge, UK: Cambridge University Press. [Google Scholar]
40.Feinberg M. 2019. Foundations of chemical reaction network theory. Cham, Switzerland: Springer. [Google Scholar]
41.Li X, Parizeau M, Plamondon R. 2000. Training hidden Markov models with multiple observations: a combinatorial method. IEEE Trans. Pattern Anal. Mach. Intell. 22, 371-377. ( 10.1109/34.845379) [DOI] [Google Scholar]
42.Durbin R, Eddy SR, Krogh A, Mitchison G. 1998. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge, UK: Cambridge University Press. [Google Scholar]
43.Wiuf C, Behera A, Singh A, Gopalkrishnan M. 2023. A reaction network scheme for hidden Markov model parameter learning. Figshare. ( 10.6084/m9.figshare.c.6688821) [DOI] [PMC free article] [PubMed]
44.Fiedler M, Pták V. 1962. On matrices with non-positive off-diagonal elements and positive principal minors. Czech. Math. J. 12, 382-400. ( 10.21136/CMJ.1962.100526) [DOI] [Google Scholar]
45.Berman A, Plemmons RJ. 1994. Nonnegative matrices in the mathematical sciences, vol. 9. Classics in Applied Mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics. [Google Scholar]
46.Thieme HR. 1999. Asymptotically autonomous differential equations in the plane. Rocky Mt. J. Math. 24, 351-380. ( 10.1216/rmjm/1181072470) [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Code used for the simulation of examples in this paper may be found at: https://github.com/GeekyPeas/Chemical-Baum-Welch-Algorithm.

The data are provided in electronic supplementary material [43].

[RSIF20220877C1] 1.Soloveichik D, Seelig G, Winfree E. 2010. DNA as a universal substrate for chemical kinetics. Proc. Natl Acad. Sci. USA 107, 5393-5398. ( 10.1073/pnas.0909380107) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20220877C2] 2.Srinivas N. 2015. Programming chemical kinetics: engineering dynamic reaction networks with DNA strand displacement. PhD thesis, California Institute of Technology.

[RSIF20220877C3] 3.Qian L, Soloveichik D, Winfree E. 2011. Efficient Turing-universal computation with DNA polymers. In DNA computing and molecular programming (eds Sakakibara Y, Mi Y). Lecture Notes in Computer Science, vol. 6518, pp. 123-140. Berlin, Germany: Springer. ( 10.1007/978-3-642-18305-8_12) [DOI] [Google Scholar]

[RSIF20220877C4] 4.Cardelli L. 2011. Strand algebras for DNA computing. Nat. Comput. 10, 407-428. ( 10.1007/s11047-010-9236-7) [DOI] [Google Scholar]

[RSIF20220877C5] 5.Lakin MR, Youssef S, Cardelli L, Phillips A. 2011. Abstractions for DNA circuit design. J. R. Soc. Interface 9, 470-486. ( 10.1098/rsif.2011.0343) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20220877C6] 6.Cardelli L. 2013. Two-domain DNA strand displacement. Math. Struct. Comp. Sci. 23, 247-271. ( 10.1017/S0960129512000102) [DOI] [Google Scholar]

[RSIF20220877C7] 7.Chen YJ, Dalchau N, Srinivas N, Phillips A, Cardelli L, Soloveichik D, Seelig G. 2013. Programmable chemical controllers made from DNA. Nat. Nanotech. 8, 755-762. ( 10.1038/nnano.2013.189) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20220877C8] 8.Lakin MR, Stefanovic D, Phillips A. 2016. Modular verification of chemical reaction network encodings via serializability analysis. Theor. Comp. Sci. 632, 21-42. ( 10.1016/j.tcs.2015.06.033) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20220877C9] 9.Srinivas N, Parkin J, Seelig G, Winfree E, Soloveichik D. 2017. Enzyme-free nucleic acid dynamical systems. Science 358, eaal2052. ( 10.1126/science.aal2052) [DOI] [PubMed] [Google Scholar]

[RSIF20220877C10] 10.Cherry KM, Qian L. 2018. Scaling up molecular pattern recognition with DNA-based winner-take-all neural networks. Nature 559, 370-376. ( 10.1038/s41586-018-0289-6) [DOI] [PubMed] [Google Scholar]

[RSIF20220877C11] 11.Zechner C, Seelig G, Rullan M, Khammash M. 2016. Molecular circuits for dynamic noise filtering. Proc. Natl Acad. Sci. USA 113, 4729-4734. ( 10.1073/pnas.1517109113) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20220877C12] 12.Hjelmfelt A, Weinberger ED, Ross J. 1991. Chemical implementation of neural networks and Turing machines. Proc. Natl Acad. Sci. USA 88, 10 983-10 987. ( 10.1073/pnas.88.24.10983) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20220877C13] 13.Buisman HJ, Hilbers PAJ, Liekens AML. 2009. Computing algebraic functions with biochemical reaction networks. Artif. Life 15, 5-19. ( 10.1162/artl.2009.15.1.15101) [DOI] [PubMed] [Google Scholar]

[RSIF20220877C14] 14.Oishi K, Klavins E. 2011. Biomolecular implementation of linear I/O systems. Syst. Biol. IET 5, 252-260. ( 10.1049/iet-syb.2010.0056) [DOI] [PubMed] [Google Scholar]

[RSIF20220877C15] 15.Soloveichik D, Cook M, Winfree E, Bruck J. 2008. Computation with finite stochastic chemical reaction networks. Nat. Comput. 7, 615-633. ( 10.1007/s11047-008-9067-y) [DOI] [Google Scholar]

[RSIF20220877C16] 16.Chen HL, Doty D, Soloveichik D. 2014. Deterministic function computation with chemical reaction networks. Nat. Comput. 13, 517-534. ( 10.1007/s11047-013-9393-6) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20220877C17] 17.Qian L, Winfree E. 2011. Scaling up digital circuit computation with DNA strand displacement cascades. Science 332, 1196-1201. ( 10.1126/science.1200520) [DOI] [PubMed] [Google Scholar]

[RSIF20220877C18] 18.Napp NE, Adams RP. 2013. Message passing inference with chemical reaction networks. Adv. Neural Inform. Proc. Syst. 26, 2247-2255. [Google Scholar]

[RSIF20220877C19] 19.Qian L, Winfree E, Bruck J. 2011. Neural network computation with DNA strand displacement cascades. Nature 475, 368-372. ( 10.1038/nature10262) [DOI] [PubMed] [Google Scholar]

[RSIF20220877C20] 20.Cardelli L, Kwiatkowska M, Whitby M. 2018. Chemical reaction network designs for asynchronous logic circuits. Nat. Comput. 17, 109-130. ( 10.1007/s11047-017-9665-7) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20220877C21] 21.Kobayashi TJ. 2010. Implementation of dynamic Bayesian decision making by intracellular kinetics. Phys. Rev. Lett. 104, 228104. ( 10.1103/PhysRevLett.104.228104) [DOI] [PubMed] [Google Scholar]

[RSIF20220877C22] 22.Kobayashi TJ. 2013. Information decoding in microscopic biological processes. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2013, 2704-2707. ( 10.1109/EMBC.2013.6610098) [DOI] [PubMed] [Google Scholar]

[RSIF20220877C23] 23.McGregor S, Vasas V, Husbands P, Fernando C. 2012. Evolution of associative learning in chemical networks. PLoS Comput. Biol. 8, e1002739. ( 10.1371/journal.pcbi.1002739) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20220877C24] 24.Bowsher CI, Swain P. 2012. Identifying sources of variation and the flow of information in biochemical networks. Proc. Natl Acad. Sci. USA 109, E1320-E1328. ( 10.1073/pnas.1119407109) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20220877C25] 25.Durbin R, Eddy S, Krogh A, Mitchison G. 1998. Biological sequence analysis. Cambridge, UK: Cambridge University Press. [Google Scholar]

[RSIF20220877C26] 26.Juang BH, Rabiner LR. 1991. Hidden Markov models for speech recognition. Technometrics 33, 251-272. ( 10.1080/00401706.1991.10484833) [DOI] [Google Scholar]

[RSIF20220877C27] 27.Gopalkrishnan M. 2016. A scheme for molecular computation of maximum likelihood estimators for log-linear models. In DNA computing and molecular programming (eds Rondelez Y, Woods D). Lecture Notes in Computer Science, vol. 9818, pp. 3-18. Cham, Switzerland: Springer. ( 10.1007/978-3-319-43994-5_1) [DOI] [Google Scholar]

[RSIF20220877C28] 28.Virinchi MV, Behera A, Gopalkrishnan M. 2017. A stochastic molecular scheme for an artificial cell to infer its environment from partial observations. In DNA computing and molecular programming (eds Brijder R, Qian L). Lecture Notes in Computer Science, vol. 10467, pp. 82-97. Cham, Switzerland: Springer. ( 10.1007/978-3-319-66799-7_6) [DOI] [Google Scholar]

[RSIF20220877C29] 29.Virinchi MV, Behera A, Gopalkrishnan M. 2018. A reaction network scheme which implements the EM algorithm. In DNA computing and molecular programming (eds Doty D, Dietz H). Lecture Notes in Computer Science, vol. 11145, pp. 189-207. Cham, Switzerland: Springer. ( 10.1007/978-3-030-00030-1_12) [DOI] [Google Scholar]

[RSIF20220877C30] 30.Singh A, Wiuf C, Behara A, Gopalkrishnan M. 2019. A reaction network scheme which implements inference and learning for hidden Markov models. In DNA computing and molecular programming (eds Thachuk C, Liu Y). Lecture Notes in Computer Science, vol. 11648, pp. 54-79. Cham, Switzerland: Springer. ( 10.1007/978-3-030-26807-7_4) [DOI] [Google Scholar]

[RSIF20220877C31] 31.Poole W, Ortiz-Munoz A, Behera A, Jones NS, Ouldridge TE, Winfree E, Gopalkrishnan M. 2017. Chemical Boltzmann machines. In DNA computing and molecular programming (eds Brijder R, Qian L). Lecture Notes in Computer Science, vol. 10467, pp. 210-231. Cham, Switzerland: Springer. ( 10.1007/978-3-319-66799-7_14) [DOI] [Google Scholar]

[RSIF20220877C32] 32.Rabiner LR. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257-286. ( 10.1109/5.18626) [DOI] [Google Scholar]

[RSIF20220877C33] 33.Shin JS, Pierce NA. 2004. A synthetic DNA walker for molecular transport. J. Am. Chem. Soc. 126, 10 834-10 835. ( 10.1021/ja047543j) [DOI] [PubMed] [Google Scholar]

[RSIF20220877C34] 34.Reif J. 2003. The design of autonomous DNA nano-mechanical devices: walking and rolling DNA. DNA Comput. 2, 439-461. ( 10.1023/B:NACO.0000006775.03534.92) [DOI] [Google Scholar]

[RSIF20220877C35] 35.Sherman W, Seeman N. 2004. A precisely controlled DNA biped walking device. Nano Lett. 4, 1203-1207. ( 10.1021/nl049527q) [DOI] [Google Scholar]

[RSIF20220877C36] 36.Dempster AP, Laird NM, Rubin DB. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1-38. ( 10.1111/j.2517-6161.1977.tb01600.x) [DOI] [Google Scholar]

[RSIF20220877C37] 37.Wu CFJ. 1993. On the convergence properties of the EM algorithm. Ann. Stat. 11, 95-103. [Google Scholar]

[RSIF20220877C38] 38.McLachlan GJ, Krishnan T. 2008. The EM algorithm and extensions. Hoboken, NJ: John Wiley & Sons, Inc. [Google Scholar]

[RSIF20220877C39] 39.Pachter L, Sturmfels B. 2005. Algebraic statistics for computational biology. Cambridge, UK: Cambridge University Press. [Google Scholar]

[RSIF20220877C40] 40.Feinberg M. 2019. Foundations of chemical reaction network theory. Cham, Switzerland: Springer. [Google Scholar]

[RSIF20220877C41] 41.Li X, Parizeau M, Plamondon R. 2000. Training hidden Markov models with multiple observations: a combinatorial method. IEEE Trans. Pattern Anal. Mach. Intell. 22, 371-377. ( 10.1109/34.845379) [DOI] [Google Scholar]

[RSIF20220877C42] 42.Durbin R, Eddy SR, Krogh A, Mitchison G. 1998. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge, UK: Cambridge University Press. [Google Scholar]

[RSIF20220877C43] 43.Wiuf C, Behera A, Singh A, Gopalkrishnan M. 2023. A reaction network scheme for hidden Markov model parameter learning. Figshare. ( 10.6084/m9.figshare.c.6688821) [DOI] [PMC free article] [PubMed]

[RSIF20220877C44] 44.Fiedler M, Pták V. 1962. On matrices with non-positive off-diagonal elements and positive principal minors. Czech. Math. J. 12, 382-400. ( 10.21136/CMJ.1962.100526) [DOI] [Google Scholar]

[RSIF20220877C45] 45.Berman A, Plemmons RJ. 1994. Nonnegative matrices in the mathematical sciences, vol. 9. Classics in Applied Mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics. [Google Scholar]

[RSIF20220877C46] 46.Thieme HR. 1999. Asymptotically autonomous differential equations in the plane. Rocky Mt. J. Math. 24, 351-380. ( 10.1216/rmjm/1181072470) [DOI] [Google Scholar]

PERMALINK

A reaction network scheme for hidden Markov model parameter learning

Carsten Wiuf

Abhishek Behera

Abhinav Singh

Manoj Gopalkrishnan

Roles

Abstract

1. Introduction

2. The Baum–Welch algorithm

Figure 1.

Lemma 2.1. —

3. Baum–Welch reaction network

Table 1.

4. The dynamics of the Baum–Welch reaction network

4.1. BW1

4.2. BW2

Figure 2.

4.3. BW3

Theorem 4.1. —

Theorem 4.2. —

Theorem 4.3. —

Lemma 4.4. —

Figure 3.

5. Multiple sequences and unknown π

6. Examples

Figure 4.

Figure 5.

7. Discussion

Appendix A

Lemma A.1. —

Lemma A.2. —

Lemma A.3. —

Data accessibility

Authors' contributions

Conflict of interest declaration

Funding

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases