Bayesian Inference of Gene Regulatory Networks at Stochastic Steady State

Anshi Gupta; Ryeongkyung Yoon; Krešimir Josić

doi:10.64898/2026.01.10.698684

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2026 Jan 12:2026.01.10.698684. [Version 1] doi: 10.64898/2026.01.10.698684

Bayesian Inference of Gene Regulatory Networks at Stochastic Steady State

Anshi Gupta ¹, Ryeongkyung Yoon ³, Krešimir Josić ^1,^2,^4,^*

PMCID: PMC12879671 PMID: 41659677

Abstract

Gene Regulatory Networks (GRNs) form the regulatory backbone that coordinates gene expression. The architecture of GRNs shapes their function and constraints the biochemical pathways through which information flows. Inferring the structure of regulatory interactions is thus essential for understanding biological systems, and designing targeted therapies. Despite substantial progress in GRN inference, most approaches – from statistical methods to deep learning – do not take into account fundamental biochemical processes that drive regulatory dynamics. To address this shortcoming, here we present a novel Bayesian inference approach based on using the Chemical Langevin Equation (CLE) as a model of gene expression dynamics at stochastic equilibrium. Interactions in GRNs are sparse, and we thus use a regularized horseshoe prior enabling selective shrinkage of unsupported interactions while identifying strong regulatory edges. We evaluate our method using synthetic gene expression data, allowing for benchmarking against a known ground truth. Our approach allows us to infer kinetic parameters, identify network structure, and infer regulatory cycles without the need to observe transient dynamics. This Bayesian alternative to current methods thus provides both biological interpretability and structural identifiability in GRN inference.

1. Introduction

Gene Regulatory Networks (GRNs) play a central role in controlling gene expression and coordinating cellular functions during development, cell differentiation, and responses to environmental stimuli [12]. Inferring the structure of GRNs is thus essential to understanding the regulatory mechanisms that govern complex biological processes. Despite extensive efforts to develop experimental and statistical approaches for GRN inference, reconstructing network structure and dynamics from gene expression data remains a challenging inverse problem. Key obstacles include stochasticity in gene expression [67], sparse connectivity between regulatory components [16], and nonlinear interactions within the network [45]. Sparse sampling and measurements near dynamical equilibrium further compound these challenges, making it difficult to resolve dynamic regulatory interactions from gene expression data.

Previous approaches have typically attempted to infer regulatory interactions from high-throughput, high-dimensional biological data, often without relying on detailed mechanistic assumptions. Techniques such as mutual information [46, 78], regression-based inference [36, 37], and deep learning [66] have been employed to recover GRN structure, aiming to balance predictive performance with interpretability. Although such methods capture complex, nonlinear gene interactions, issues with interpretability and validation persist. On the other hand, many model-based inference methods require measurements of transient dynamics: Since multiple parameter combinations are consistent with identical steady-state behavior, observations of transients are often needed to ensure parameter identifiability [35].

Few existing methods for GRN inference capture stochastic gene expression dynamics, enforce biologically motivated sparsity to infer network structure, interaction types, and kinetic parameters from data in stochastic equilibrium, while providing principled uncertainty quantification [38, 39]. To address this problem we introduce a Bayesian inference framework that describes intrinsic stochasticity using the Chemical Langevin Equation (CLE) [22, 30] and incorporates a regularized horseshoe prior [57] to reflect the sparsity of interactions in GRNs. We apply our method to synthetic data from GRNs at stochastic equilibrium, with measurements sampled from the stationary distribution. At equilibrium, the temporal structure of fluctuations provides information about regulatory interactions and kinetic parameters that is not captured by the moments of the distribution alone [61]. Our approach is closely related to the SINDy algorithm [8, 32, 43], and is conceptually related to fluctuation-dissipation relations in statistical physics, where equilibrium fluctuations reveal response properties without requiring external perturbations [77]. We show that we can identify the underlying network structure and infer kinetic parameters and interaction types, producing interpretable outputs with well-calibrated uncertainty estimates. Notably, we can recover features such as regulatory cycles and autoregulatory interactions, which are often missed by other Bayesian methods [20].

Our approach is based on a mechanistic model of regulatory interactions, in contrast to non-parametric models which offer greater flexibility but may sacrifice interpretability and biological grounding. We demonstrate reliable performance on small-scale networks, even when only partially observed. We thus lay the foundation for a robust, mechanistic network inference framework applicable in more complex settings.

2. Materials and Methods

Our goal is to infer the structure and regulatory parameters of gene regulatory networks (GRNs) from stationary gene expression data. To do so, we fit a mechanistic model of gene expression to data using a Bayesian approach. The resulting posterior distributions determine network topology (presence/absence of edges), interaction types (activation or repression), and reaction rates, while providing principled uncertainty quantification for all inferred quantities.

2.1. GRN model

A gene regulatory network (GRN) is a system in which regulatory molecules (primarily transcription factors (TFs), but also non-coding RNAs and other factors) regulate gene expression through activation or repression. Here, we concentrate on regulation by TFs, and represent a GRN as a directed graph with nodes corresponding to genes (or more precisely, gene products). For each gene, $i$ , we denote by $X_{i} (t)$ the abundance of the corresponding transcription factor at time $t$ , and let $X (t) = (X_{1} (t), \dots, X_{p} (t))$ be the state of the network, where $p$ is the total number of TFs and genes. We assume reactions in a system at constant volume so that $X_{i} (t)$ can represent the molecular count or concentration. A directed edge, $i \to j$ , represents a regulatory interaction where gene $i$ (via its product) impacts the expression of gene $j$ . Edges carry a sign indicating activation (the TF increases the target’s transcription rate) or repression (the TF decreases it). We allow for auto-regulation which is represented by a self-loop. At the molecular level, regulation arises when TFs bind promoter or enhancer regions and modulate RNA polymerase recruitment [56, 58]. Although other regulatory mechanisms are possible, here we consider only transcriptional regulation via activation and repression.

The relationship between TF concentrations and transcriptional output is nonlinear, due to saturation at high concentrations and regulatory thresholds [6, 41]. The dependence of transcription rate on TF concentration is commonly described using sigmoidal (Hill-type) functions, which approximate equilibrium TF–promoter binding and reproduce the nonlinear dose–response curves observed experimentally [3, 40, 62]. We model the effect of a single regulator on transcription rate by

f_{a} (X) = β \frac{X^{n}}{K^{n} + X^{n}}, f_{r} (X) = γ \frac{K^{n}}{K^{n} + X^{n}},

(1)

where $β$ and $γ$ denote the maximal transcription rates under activation ( $f_{a}$ ) and repression ( $f_{r}$ ), respectively. The parameter $K$ is the half-maximal effective concentration, representing the TF level at which transcription reaches half of its maximal rate. The Hill coefficient, $n$ , determines the steepness of the input–output curve and reflects the apparent cooperativity of the regulatory interactions. For simplicity, we assume negligible basal transcription, but our framework can be extended to include it.

Many genes are regulated by multiple TFs, and promoter activity often reflects their combined influence. Because the molecular mechanisms of combinatorial TF regulation are often unknown or intractable, promoter responses are typically modeled using coarse-grained logic functions including AND, OR, and hybrid gates [3, 9, 63, 73]. Here, we focus on the AND gate, and assume that transcription occurs only when all required TFs are bound to their respective binding sites. This represents a common regulatory motif with multiple inputs converging to activate transcription. While we only consider the case of AND gates with two TFs, the framework can be extended to other logic gates (OR, NOT, hybrid) and to genes with more than two regulators.

We use the following functions to describe AND gate type regulation for different combinations of two activators and repressors [55]:

\begin{array}{l} Activation - Activation (AND logic) : \\ H_{a a} (X_{1}, X_{2}) = \frac{X_{1}^{n_{1}} X_{2}^{n_{2}}}{(X_{1}^{n_{1}} + K_{1}^{n_{1}}) (X_{2}^{n_{2}} + K_{2}^{n_{2}})}, \\ Repression - Repression (NOR logic) : \\ H_{r r} (X_{1}, X_{2}) = \frac{K_{1}^{n_{1}} K_{2}^{n_{2}}}{(X_{1}^{n_{1}} + K_{1}^{n_{1}}) (X_{2}^{n_{2}} + K_{2}^{n_{2}})}, \\ Activation - Repression (AND logic) : \\ H_{a r} (X_{1}, X_{2}) = \frac{X_{1}^{n_{1}} K_{2}^{n_{2}}}{(X_{1}^{n_{1}} + K_{1}^{n_{1}}) (X_{2}^{n_{2}} + K_{2}^{n_{2}})} . \end{array}

(2)

These normalized functions are multiplied by the maximal transcription rate to obtain the actual production rate.

Gene expression is inherently stochastic at the single-cell level due to the discrete nature of molecular events and the probabilistic timing of biochemical reactions. Intrinsic noise is due to the probabilistic nature of biochemical events (e.g., transcription factor or RNA polymerase binding), while extrinsic noise stems from cell-to-cell variability in global factors such as growth state and environmental influences [15, 54, 67, 68]. We focus on intrinsic noise as it directly reflects the biochemical processes we model (transcription, degradation, regulation), as well as the regulatory topology. Incorporating extrinsic noise would require including additional sources of variability.

We validate our approach using synthetic data. To do so we generate sample expression trajectories using the stochastic simulation algorithm (SSA), with the reaction rates as defined in Eqs. (1) and (2). The SSA produces trajectories whose probability law is specified by the Chemical Master Equation (CME) which determines the evolution of the probability distribution over all possible molecular states [22].

Direct inference using the CME quickly becomes computationally intractable. Instead, we use a discretized form (Euler–Maruyama) of the Chemical Langevin Equation (CLE) [23, 30]. The CLE is a mesoscopic SDE whose sample paths are distributed approximately according to the probability law described by the CME. Intuitively, the CLE replaces the discrete change in TF number over small time intervals by Gaussian increments with reaction propensities determining the mean and variance of each increment. This diffusion approximation is appropriate under well-stirred conditions, and is valid when molecule numbers are sufficiently large (typically > 50) so that continuous state variables are appropriate, yet small enough that stochastic fluctuations remain important. The discretized CLE yields tractable transition probabilities, enabling efficient likelihood-based parameter inference [25–27].

2.2. Example: Single gene autoregulation

As a first example, and to explain notation, we consider single-gene autoregulation [4, 42]. This simplest of GRNs can exhibit rich dynamics, and serves as the basis for multi-gene systems analyzed later. Let $X$ denote the TF concentration, synthesized with a state-dependent propensity $f (X)$ and degraded at rate $α X$ , resulting in the following reactions

Ø \overset{f (X)}{\to} X, X \overset{α X}{\to} Ø .

The corresponding CLE takes the form [30, 74]:

d X_{t} = (f (X_{t}) - α X_{t}) d t + \sqrt{f (X_{t}) + α X_{t}} d W_{t},

(3)

where $W_{t}$ is a standard Wiener process. The drift term represents net synthesis and degradation, while the diffusion term accounts for stochastic fluctuations inherent in both processes. Setting $f (X)$ equal to one of the Hill functions in Eqs. (1) allows us to model activation or repression.

The CLE framework provides a mechanistic description of stochastic gene expression, and can be used as a generative model for statistical inference of GRN structure and parameters given a sequence of TF measurements. When discretized using the Euler–Maruyama scheme, the conditional distribution of the state at time $X_{t + Δ t}$ given the current state $X_{t}$ is Gaussian, with its mean determined by the drift term and variance by the diffusion term [1, 8, 29, 64]. For parameter inference when network architecture is known, we can use the discretized version of Eq. (3) which takes the form

Δ X = (f (X) - α X) Δ t + \sqrt{f (X) + α X} \sqrt{Δ t} ϵ,

(4)

where $ϵ \sim N (0, 1)$ denotes a sample from a standard Gaussian distribution, $Δ X = X (t) - X (t - Δ t)$ , and we suppressed the explicit dependence on $t$ . This discretization yields a stochastic process, $X (t)$ , evolving in discrete time steps of size $Δ t$ approximating the continuous in time process $X_{t}$ .

Our goal is also to infer the existence and type of interactions in a GRN. Thus we will not assume that the architecture of the GRN is known, but will use models that include a range of potential interactions. In the present example this means that for inference we use the following model which allows for both activation and repression,

\begin{array}{l} Δ X (t) = (\frac{γ K^{n}}{X^{n} + K^{n}} + \frac{β X^{n}}{X^{n} + K^{n}} - α X) Δ t \\ + \sqrt{\frac{γ K^{n}}{X^{n} + K^{n}} + \frac{β X^{n}}{X^{n} + K^{n}} + α X} \sqrt{Δ t} ϵ . \end{array}

(5)

We assume that elements in the GRN either activate or repress one another, or do not interact. We discuss below how a Bayesian approach using Eq. (5) and horseshoe priors for $β$ and $γ$ results in successful inference with estimates $\hat{β} = 0$ or $\hat{γ} = 0$ , or both.

2.3. General Inference Model

Larger GRNs can be modeled by adding further nodes and interactions. The model we use for inference will always be a generalization of the model used to generate synthetic data that includes a range of potential interactions, e.g. compare Eq. (4) and Eq. (5). In the following we therefore only describe the model used for inference. We will see that a generalization of Eq. (5) yields explicit transition densities and can be used to define a likelihood function that can be evaluated directly from a sequence of TF measurements. The CLE that approximates the generative model described exactly by the CME also underlies the statistical model used for inference. Hence, the inferred parameters such as activation or repression rates, degradation constants retain their meaning in the underlying biochemical reactions.

To extend the inference model to larger GRNs, we represent activation, repression, and degradation using matrix-valued functions of the state vector. For simplicity, we suppress the explicit dependence on time in all models defined in this section. If the state of a GRN comprising $p$ genes and the corresponding TFs is given by the vector $X \in R^{p}$ , the discretized CLE generalizing Eq. (5) in the absence of combinatorial regulation has the form

\begin{array}{l} Δ X = \underset{μ (X)}{\underset{⏟}{[(β ⊙ A + γ ⊙ R) 1_{p} - α X]}} Δ t \\ + \underset{σ (X)}{\underset{⏟}{\sqrt{(β ⊙ A + γ ⊙ R) 1_{p} + α X}}} ⊙ \sqrt{Δ t} ϵ, \end{array}

(6)

where $A : R^{p} \to R^{p \times p}$ and $R : R^{p} \to R^{p \times p}$ are matrix-valued functions. Each matrix element is a Hill function of a single state variable such that $A_{i, j} (X) = f_{a} (X_{j})$ captures the transcription rate of the TF encoded by gene $i$ when regulated by the product of gene $j$ . Similarly, $R_{i j} (X) = f_{r} (X_{j})$ models repression. The matrix $α \in R^{p \times p}$ is diagonal with protein-specific degradation/dilution rates, while $β, γ \in R^{p \times p}$ contain the maximal transcription rates under activation and repression, respectively. Here and below the vector $ϵ \in R^{p}$ consists of $p$ independent samples from the standard Gaussian distribution, and $1_{p} \in R^{p}$ is a vector with all entries equal to 1. In this and subsequent models, $μ (X)$ and $σ (X)$ denote the drift and diffusion terms, respectively.

For example, in a two gene regulatory network, the parameter matrices take the form

\begin{array}{l} β = (\begin{array}{l} β_{11} & β_{12} \\ β_{21} & β_{22} \end{array}), γ = (\begin{array}{l} γ_{11} & γ_{12} \\ γ_{21} & γ_{22} \end{array}), α = (\begin{matrix} α_{1} & 0 \\ 0 & α_{2} \end{matrix}), \\ A = (\begin{array}{l} A_{11} (X_{1}) & A_{12} (X_{2}) \\ A_{21} (X_{1}) & A_{22} (X_{2}) \end{array}), R = (\begin{array}{l} R_{11} (X_{1}) & R_{12} (X_{2}) \\ R_{21} (X_{1}) & R_{22} (X_{2}) \end{array}), \end{array}

(7)

where the elements of the matrices $A$ and $R$ are Hill functions,

A_{i j} (X_{j}) = \frac{X_{j}^{n}}{K_{j}^{n} + X_{j}^{n}}, R_{i j} (X_{j}) = \frac{K_{j}^{n}}{K_{j}^{n} + X_{j}^{n}} .

To capture combinatorial regulation, we extend the model to allow for two regulators per gene via an AND gate interaction:

\begin{array}{l} Δ X = \underset{μ (X)}{\underset{⏟}{[(β ⊙ A + γ ⊙ R) 1_{p} + (η ⊙ H) 1_{q} - α X]}} Δ t \\ + \underset{σ (X)}{\underset{⏟}{\sqrt{(β ⊙ A + γ ⊙ R) 1_{p} + (η ⊙ H) 1_{q} + α X}}} ⊙ \sqrt{Δ t} ϵ, \end{array}

(8)

Here, $q$ is the maximum number of regulatory input combinations that a gene can receive, and the matrix $H : R^{p} \to R^{p \times q}$ contains Hill functions representing combinatorial regulation. Each entry $H_{m n}^{i} (X)$ depends on two components of the state vector. For example, in a two-gene regulatory network with possible AND-gate regulation, $q = 4$ corresponding to the four possible two-input combinations so that the matrix $H$ takes the form:

H = (\begin{matrix} H_{a a}^{1} (X_{1}, X_{2}) & H_{a r}^{1} (X_{1}, X_{2}) & H_{r a}^{1} (X_{1}, X_{2}) & H_{r r}^{1} (X_{1}, X_{2}) \\ H_{a a}^{2} (X_{1}, X_{2}) & H_{a r}^{2} (X_{1}, X_{2}) & H_{r a}^{2} (X_{1}, X_{2}) & H_{r r}^{2} (X_{1}, X_{2}) \end{matrix}),

where the superscripts denote the gene index, and the subscripts, $a$ and $r$ indicate whether the corresponding regulator is activating or repressing. The functions used in this matrix are defined in Eq. (2). The parameter matrix $η \in R^{p \times q}$ determines the strength of AND gate mediated regulation. The activation and repression matrices, $A$ and $R$ , and the parameter matrices $α, β$ , $γ$ are defined as in the absence of combinatorial regulation, i.e. as in Eq. (6).

We found that Eq. (8) cannot be used to accurately infer parameters in GRNs with three or more nodes. Convergence and identifiability issues become more pronounced with an increasing number of horseshoe parameters (see Supplementary Fig. 10). To improve inference in the presence of combinatorial regulation, we thus performed inference in two steps. In the first step, we identify only the presence and absence of regulatory edges. We do so by introducing a matrix of Boolean variables, $Ω \in {0, 1}^{p \times q}$ , whose elements indicate the presence or absence of AND gate mediated regulation, and matrix $θ \in {0, 1}^{p \times p}$ indicating whether regulatory interaction corresponds to single TF activation or repression. This reduces the number of parameters governed by the horseshoe prior, leading to faster and more stable inference. The resulting model is given by:

Δ X = [\underset{μ (X)}{\underset{⏟}{((ρ Ω) ⊙ H) 1_{q} + (v ⊙ (θ ⊙ A + \tilde{θ} ⊙ R)) 1_{p} - α X}}] Δ t + \underset{σ (X)}{\underset{⏟}{\sqrt{((ρ Ω) ⊙ H) 1_{q} + (v ⊙ (θ ⊙ A + \tilde{θ} ⊙ R)) 1_{p} + α X} ⊙ \sqrt{Δ t} ϵ .}}

(9)

Here $H, A$ and $R$ are defined as in Eq. (8). Also, $\tilde{θ} = 1 - θ$ with the difference taken entry-wise. Thus $θ_{i j} = 1$ indicates the potential presence of an activation of gene $i$ by the product of gene $j$ , and the absence of corresponding repression. The matrix $ρ \in R^{p \times p}$ determines the strength of the AND gate mediated regulation, while $v \in R^{p \times p}$ represents the strength of direct (single-input) regulatory interactions. The entries in row $i$ of the matrix $Ω$ determine the potential presence of AND gates at gene : Let $ω_{i, j} = 1$ if the product of gene $j$ activates gene $i$ in combination with another TF, $ω_{i, j} = 0$ if it represses it, and let ${\tilde{ω}}_{i j} = 1 - ω_{i j}$ . Then the elements in the $i^{th}$ row of the matrix $Ω$ are the $q = 2 p (p - 1)$ possible pairwise products of all combinations of $ω_{i, j}$ and ${\tilde{ω}}_{i, j}$ with differing indices $j$ . Thus, $ω_{i, j_{1}} = ω_{i, j_{2}} = 1$ indicates potential activation-activation regulation of gene $i$ by genes $j_{1}$ and $j_{2}$ , while $ω_{i, j_{1}} = {\tilde{ω}}_{i, j_{2}} = 1$ indicates potential activation-repression type regulation.

In the same example of two genes with possible AND-gate regulation discussed above, the matrix $H$ remains unchanged, while the matrices $ρ, Ω, v$ , and $θ$ take the following forms:

\begin{array}{l} ρ = (\begin{matrix} ρ_{1} & 0 \\ 0 & ρ_{2} \end{matrix}), Ω = (\begin{matrix} ω_{11} ω_{12} & ω_{11} {\tilde{ω}}_{12} & {\tilde{ω}}_{11} ω_{12} & {\tilde{ω}}_{11} {\tilde{ω}}_{12} \\ ω_{21} ω_{22} & ω_{21} {\tilde{ω}}_{22} & {\tilde{ω}}_{21} ω_{22} & {\tilde{ω}}_{21} {\tilde{ω}}_{22} \end{matrix}) \\ v = (\begin{array}{l} v_{11} & v_{12} \\ v_{21} & v_{22} \end{array}), θ = (\begin{array}{l} θ_{11} & θ_{12} \\ θ_{21} & θ_{22} \end{array}), \tilde{θ} = (\begin{array}{l} {\tilde{θ}}_{11} & {\tilde{θ}}_{12} \\ {\tilde{θ}}_{21} & {\tilde{θ}}_{22} \end{array}) . \end{array}

Here only one of the entries on each row of $Ω$ is non-zero, since only one of the possible interactions listed in Eq. (2) can be present.

In the second inference step we infer the reaction rates. To do so we use the GRN architecture inferred in the first step, and the model given by Eq. (8). Assuming a fixed GRN architecture allows us to infer the reaction rates by estimating a substantially reduced number of parameters.

Thus, we use Eq. (6) for inference under the assumption that each gene has a single regulator. Eq. (9) is a generalization that we use to allow for possible combinatorial regulation. In the former case, inference requires estimating only continuous parameters $(β, γ, α)$ , while in the latter both continuous ( $ρ, v, α$ ) and discrete ( $θ, Ω$ ) parameters need to be estimated. These descriptions of the evolution of gene products allow us to define appropriate likelihoods, and infer both network structure and kinetic rates from data. We describe this inference framework next.

2.4. Bayesian inference framework

We next show how to use Bayesian inference to infer both GRN network architecture and reaction rates from data [7]. Bayes’ Theorem relates model parameters, $θ$ , and the observed measurements, $𝒟$ , via the relation $p (θ ∣ 𝒟) \propto p (𝒟 ∣ θ) p (θ)$ . In the present case the likelihood $p (𝒟 ∣ θ)$ is determined by the discretized CLE. Using the Euler–Maruyama approximation, the increments, $Δ X$ , follow a Gaussian distribution with mean determined by the drift term and variance by the diffusion term in the CLE [8]. The likelihood of a full trajectory is then the product of Gaussians evaluated at the observed counts or concentrations across all time steps,

\begin{array}{l} p (𝒟 ∣ θ) = \\ \prod_{t = 1}^{T - 1} 𝒩 (X (t); X (t - Δ t) + μ (X (t - Δ t)) Δ t, Σ^{2} (X (t - Δ t)) Δ t), \end{array}

where $𝒩 (x; μ, Σ)$ denotes the probability density function of a Gaussian distribution with mean $μ$ and covariance $Σ$ evaluated at $x$ . Here, the variance matrix is diagonal and equals $Σ^{2} (X) = d i a g (σ^{2} (X))$ , where $σ^{2} (X)$ is a vector obtained by squaring each component of the diffusion vector $σ (X)$ elementwise. The variance matrix is diagonal because in the underlying biochemical reaction model the noise terms for different species arise from independent reaction channels [23]. Thus the increments follow the laws defined by the generative models in Eqs. (6)–(9).

Parameters Estimated.

In all examples, we estimated only the maximal transcription rates and degradation rates, which were consistently identifiable from the data. In contrast, the parameters $K$ (half-maximal concentration) and $n$ (Hill coefficient) were generally more difficult to infer. For instance, as shown in Fig. 1(e), when input concentrations fall within the saturated region of the response curve, the system becomes insensitive to changes in $K$ , rendering this parameter difficult to identify. In contrast, Fig. 1 (f) illustrates that when the input spans the dynamic range around $K$ , its influence on the output becomes observable, enabling more reliable inference. Even in such regimes, inference of $K$ remained sensitive to prior assumptions; good estimates required using informative priors. The Hill coefficient, $n$ , was typically even more weakly identifiable. It is known that $K$ and $n$ are often structurally or practically unidentifiable, as they do not strongly determine the system’s dynamics [11, 31, 48]. We thus fixed both parameters to their known values during inference.

Figure 1: — **(a)-(b)** TF time-series and the corresponding network structures for a single gene: (a) self-activating, and (b) self-repressing regulatory network. Each realization was generated using the Gillespie algorithm. Parameters were chosen so that the marginal means and variances of the simulated trajectories were approximately equal for both systems. **(c)-(d)** Normalized boxplots of the posterior means of the inferred reaction parameters for the self-activating (c) and self-repressing (d) models across 50 realizations. Here, and in the following figures, we show the ratio between the posterior mean and the true value when the true value is nonzero (green), whereas the posterior mean of parameters whose true value is zero is shifted by 1 (red). Hence, estimates close to 1 indicate accurate inference. **(e)-(f)** Activation (blue) and repression (red) Hill function in the identifiable (f) and non-identifiable (e) parameter regimes. Detailed results and summary statistics are provided in the Supplementary Material.

Priors.

We used uninformative priors in all examples. These priors could be changed to include available information about the structure of the GRN or reaction rates. We distinguish two cases corresponding to the two model formulations:

Single-regulator model (no combinatorial regulation) (Eq. (6)). Reaction rates are determined by the activation strengths, $β$ , repression strengths, $γ$ , and degradation rates, $α$ . To promote sparsity in interactions, we used horseshoe priors for $β$ and $γ$ [57]. These priors reflect the need for strong evidence for an interaction, and the assumption that a TF cannot both activate and repress the same target, i.e. that for each gene pair, $(i, j)$ , at most one of $β_{i j} = 0$ and $γ_{i j} = 0$ is nonzero. These shrinkage priors strongly pull most interaction strengths toward zero while allowing a subset to remain large, consistent with the expectation that only a few of all possible interactions are present in GRNs.
Combinatorial model (Eq. (9)). In addition to single-input activation and repression regulatory strengths in the parameter matrix $v$ and degradation rates $α$ , this model includes the gate-strength matrix, $ρ$ , as well as matrix of binary indicator variables, $θ$ , corresponding to activation/repression and $Ω$ corresponding to AND gate regulation. We used regularized horseshoe priors for parameters $ρ$ and $v$ to reflect the assumption of sparsity in both gated and non-gated interactions. The binary indicators are assigned independent, uninformative Bernoulli(0.5) priors, representing the prior belief that each potential interaction or gate is equally likely to exist or not.

For the degradation rates, $α$ , which are always positive, we used uninformative half-normal priors. This enforces non-negativity while constraining the inferred values to the expected range of degradation rates.

To improve sampling efficiency and avoid issues related to thick Cauchy tails when using horseshoe priors, we adopted the recommendations by Piironen and Vehtari [57] and replaced the Half-Cauchy priors with half-Student- $t$ distributions. Specifically, we set

\begin{array}{l} ψ ∣ {\tilde{λ}}_{i}, τ, c \sim 𝒩 (0, {\tilde{λ}}_{i}^{2} τ^{2}) \\ {\tilde{λ}}_{i} = \frac{c λ_{i}}{\sqrt{c^{2} + λ_{i}^{2} τ^{2}}} \\ λ_{i} \sim Half-student t (1, 1) \\ c^{2} \sim Inv-Gamma (\frac{v}{2}, \frac{v s^{2}}{2}) \\ τ \sim Half-student t (1, 0.1) \end{array}

Here $ψ$ represents parameters for which we used the horseshoe prior. Following standard practice we chose $v = 4$ and $s = 2$ for the inverse-gamma prior on $c^{2}$ . This formulation provides robust shrinkage, ensures well-behaved posteriors, and facilitates efficient posterior sampling for high-dimensional or weakly identified models [32, 57].

Approximating the posterior distributions.

Samples from the posterior distributions were generated using PyMC [2, 53]. For models with only continuous parameters, we used Hamiltonian Monte Carlo (HMC) [5] with the No-U-Turn Sampler (NUTS) [33], accelerated via the BlackJAX backend for efficient sampling on CPUs and GPUs. When discrete parameters were present (e.g. indicator matrices), we reverted to PyMC’s standard NUTS implementation, which marginalizes over discrete variables when possible. Convergence was assessed using standard diagnostics, including the $\hat{R}$ statistic, effective sample size, and visual inspection of trace plots.

2.5. Synthetic data

We generated synthetic gene-expression trajectories using the stochastic simulation algorithm (SSA) [22, 24]. In all simulations, we discarded an initial transient and retained measurements after the system equilibrated. In this regime molecular counts fluctuated randomly, but with stable statistical properties.

We chose the sampling interval $Δ t$ to be sufficiently small so that the assumed discretization provides a good approximation of the underlying dynamics, and sufficiently large so that increments in TF numbers are approximately Gaussian and the correlation between consecutive observations was not excessive. For each parameter setting, we generated 50 independent realizations, each consisting of 3, 000 measurements. Replicate datasets were generated to ensure robust inference and allowed us to account for variability across stochastic sample paths [68]. All SSA simulations were implemented using the Python package GillesPy2 [47].

3. Results

GRNs govern cellular behavior through transcriptional regulation, with transcription factors (TFs) activating or repressing target genes. Inferring GRN structure and dynamics from gene expression data is challenging due to the complexity of regulatory interactions and the stochasticity of gene expression. We show that the proposed Bayesian inference framework allows us to simultaneously recover network topology, regulatory modes (activation/repression), and kinetic parameters from time-series expression data.

We demonstrate the performance of the proposed inference method using synthetic data from a range of canonical gene regulatory motifs. Throughout, we consider GRNs at stochastic equilibrium, with molecular fluctuations having reached their stationary probability distribution. ODE and logic-based models cannot capture stationary fluctuations in molecule numbers [71]. Structural inference of GRNs using such models therefore requires the observation of transients. In contrast, we show that a generative model that accounts for intrinsic stochasticity allows for the inference of kinetic parameters and regulatory structure from measurements of GRNs at stochastic equilibrium.

We illustrate the proposed method using a sequence of examples of increasing complexity. We begin with single-gene autoregulation, then consider the classic two-gene toggle switch [21], followed by an extension to a toggle with combinatorial regulation. We next consider the three-gene repressilator [14], a canonical oscillatory circuit, and finally turn to a coherent type-1 feed-forward loop [3, 44]. This allows us to assess both the accuracy and the scalability of the framework.

Throughout we work in rescaled (non-dimensional) time, so all parameters are expressed without physical units. Under this convention, the transcription rate parameters correspond to the maximal expected number of molecules transcribed per unit rescaled time. In all examples we performed inference on 50 independent realizations to quantify robustness. We also report posterior means (denoted by $\hat{ψ}$ for a parameter $ψ$ ) and highest density intervals (HDIs) for a single, representative realization. We show representative results here, while the full inference results along with a complete description of the models used for inference are provided in the Supplementary Material.

3.1. Single-Gene Motifs

We first consider a single-gene autoregulatory circuit. In this motif, the product of a gene regulates its own production, as shown in Fig. 1(a,b). Here, and in all subsequent examples, black edges represent activation and red edges repression. We consider both self-activation and self-repression, with transcription rates defined by Hill functions given in Eq. (1). Details about the parameters used to generate the synthetic data for this and subsequent examples are provided in the Supplementary Material.

We first asked whether we can use the proposed inference framework to distinguish between activation and repression from trajectories in stochastic equilibrium. To do so, we generated synthetic TF trajectories for both cases using parameters that produced marginal distributions with matched means and variances. Example trajectories are shown in Fig. 1(a,b). The associated marginal distributions of TF counts are nearly indistinguishable (see Supplementary Fig. 1). Many existing inference approaches, such as moment-matching [61] and fitting to the steady-state distribution [28], use only marginal statistics and would therefore fail to distinguish activation and repression from such data.

Despite the similarity in marginal distributions, we successfully inferred the correct regulatory mode in all 50 independent realizations. In the activation case, the repression parameter $γ_{11}$ was estimated to be near zero (posterior mean ${\hat{γ}}_{11} = 0.001$ , HDI: [−0.95, 1.05]), whereas the activation parameter $β_{11}$ was inferred to be close to its true value of 10 ( ${\hat{β}}_{11} = 9.09$ , HDI: [8.31, 9.87]). Conversely, in the repression case, the activation parameter $β_{22}$ was inferred to be near zero ( ${\hat{β}}_{22} = - 0.003$ , HDI: [−1.02, 0.94]), while the repression parameter $γ_{22}$ was estimated to be close to 10 ( ${\hat{γ}}_{22} = 9.52$ , HDI: [8.74, 10.40]) (see Supplementary Table 3 for exact values). Fig. 1(c) and (d) show the normalized box plots of the inferred parameters for self-activation and self-repression across 50 independent SSA realizations.

Key to the successful discrimination of regulatory modes are the observation of the temporal dynamics: Self-activation and self-repression exhibit distinct autocorrelation structures and transient responses to fluctuations, even when their steady-state distributions are identical [13]. Using the discretized CLE as a generative model exploits these temporal features to accurately infer regulatory modes that are difficult to distinguish using marginal TF distributions alone.

However, we found that the system is not identifiable in the saturated regimes of the Hill function (Fig. 1(e)). At sufficiently high or very low concentrations of the transcription factor, the Hill function saturates and fluctuations in counts of a regulating TF have negligible effect on the production rate. In this regime, the likelihood function is approximately constant, making it difficult to distinguish between activation and repression, and infer the transcription rates. This limitation is well recognized: in saturated regimes, different Hill functions can produce indistinguishable observable statistics [62]. A likelihood-based inference framework helps to determine when such identifiability issues arise [60], and to determine the parameter regimes in which the likelihood is sufficiently sensitive to changes in TF count to make inference possible (Fig. 1(f)). In all the examples discussed in this text, we therefore work with synthetic data generated in this regime.

3.2. Two-Gene Motifs

We next consider a two gene toggle-switch–type network, in which one gene activates a partner while being repressed in return as shown in Fig. 2(a). Such canonical motifs have been studied in both natural regulatory circuits and synthetic gene networks [19, 76]. We generated synthetic data from the GRN shown in Fig. 2(a) and modeled the system using Eq. (6), where the two-gene case is provided as an illustrative example (see Eq. (7) for the corresponding matrices). For inference, we used a fully connected GRN model that encompasses all potential interactions (See Supplementary Eq (2) for the definition of the likelihoods).

Figure 2: — (a) Expression levels and corresponding network structures for a two-gene system without combinatorial regulation. (b) Normalized boxplots of the posterior means of the inferred parameters obtained from 50 realizations of the trajectories. (c) Expression levels for a two-gene regulatory system with combinatorial AND-gate regulation. (d) Posterior means of the inferred binary regulatory indicators ( $θ, ω$ ) obtained from 50 time series realizations. We used a threshold of 0.5 to determine the presence and type of regulatory interactions. (e) Normalized posterior means of the inferred reaction rates. Here and in subsequent figures we present results for a representative subset of key parameters; detailed results are provided in the Supplementary Material.

We again accurately recovered the underlying regulatory network in 50 independent realizations. The activation parameter $β_{12} (t r u e v a l u e = 10)$ was inferred to be close to its true value ( ${\hat{β}}_{12} = 9.7$ , HDI: [8.47, 10.46]), correctly identifying gene 2 as an activator of gene 1. Likewise, the repression parameter $γ_{21} (t r u e v a l u e = 20)$ was estimated to be close to 20 ( ${\hat{γ}}_{21} = 19.183$ , HDI: [17.05, 21.31]), accurately capturing the repressive interaction. The degradation parameters $α_{1}$ and $α_{2}$ were also accurately estimated ( ${\hat{α}}_{1} = 0.028,$ HDI: [0.022, 0.032] and ${\hat{α}}_{2} = 0.018$ , HDI: [0.011, 0.022], respectively), corresponding to relative errors below 5%. All other regulatory parameters $β_{11}$ and $γ_{11}$ corresponding to nonexistent interactions had posterior means below 0.2, with 95% HDIs restricted to values near zero. Hence, we did not identify any spurious interactions.

These inference results are summarized in Fig. 2(b), which demonstrates that we can accurately infer both the structure of the GRN and the parameters governing TF dynamics at a stochastic equilibrium.

3.3. Two-Gene motifs with an AND gate

We next considered the case of a two-gene motif with both cross-regulation and autoregulation shown in Fig. 2(c). Such network motifs, combining cross-feedback with autoregulation, are common in both natural and synthetic gene networks [34] and can generate rich dynamical behaviors including oscillations and bistability [10, 52]. We again focus on the non-oscillatory regime.

This circuit poses a stringent test for network inference because autoregulatory interactions can be obscured by cross-regulatory signals and because self-regulation produces temporal correlations that can confound parameter estimation [72]. Successful recovery of both auto- and cross-regulatory interactions in this motif thus demonstrates the ability to disentangle multiple regulatory modes acting on the same gene.

We used synthetic trajectories from a system in which gene 1 was regulated by gene 2 and by its own product shown in Fig. 2(c). Regulation was modeled using AND-gate logic according to Eq. (9). For inference, we used Eq. (9) instead of Eq. (8) because it is computationally more efficient, requiring the inference of fewer parameters.

For inference, we again used a fully connected GRN model that included all potential interactions, including autoregulation (See supplementary Eq (4)). Binary-valued variables in the matrices $Ω$ and $Θ$ indicate whether potential regulatory interactions were activating or repressive, while continuous-valued parameters ( $ρ, v$ ) quantified the interaction strength (See Section 2.3). When a continuous parameter was inferred to be close to zero (i.e. when the corresponding HDI is concentrated around 0), the corresponding edge was assumed to be absent. Thus, edge existence and interaction strengths were inferred simultaneously within a single inference step.

We were able to infer the correct two-gene AND gate motif structure from all 50 independent synthetic trajectories. The scatter plots in Fig. 2(d) and Fig. 2(e) show the posterior means for the binary variables and continuous variables, respectively. In this example, the presence of regulatory interactions is determined by the product of the corresponding binary and continuous variables, specifically through the terms $θ_{i j} v_{i j}$ and $ρ_{i} ω_{i j} ω_{i k}$ , which becomes negligible when either component is inferred to be close to zero. In Fig. 2(d), orange dots represent binary variables associated with parameters whose true value are zero.

Since gene 1 represses its own production and is repressed by $X_{2}$ , its reaction is described by a repression-repression term, $ρ_{1} ({\tilde{ω}}_{11}) ({\tilde{ω}}_{12}) H_{r r}^{1}$ . Green dots in Fig. 2(d) correspond to ${\hat{ω}}_{11}, {\hat{ω}}_{12}$ being close to zero indicating the presence of combinatorial AND gate repression at gene 1. The parameter $ρ_{1}$ is inferred to be close to its true value of 20 ( ${\hat{ρ}}_{1} = 18.27$ , HDI: [16.81, 19.73]), while the parameters $θ_{11}$ and $θ_{12}$ are estimated to be close to zero ( ${\hat{θ}}_{11} = 0$ and ${\hat{θ}}_{12} = 0$ ). The corresponding parameters, $v_{11}$ and $v_{12}$ , are also inferred to be close to zero ( ${\hat{v}}_{11} = - 0.04$ , HDI: [−0.61, 0.48]) and ( ${\hat{v}}_{12} = 0.09$ , HDI: [−0.76, 1.02]) (See Fig. 2(e)), indicating the absence of the corresponding edges in the GRN.

Gene 2 transcription is activated by the product of gene 1, which is captured by the term $v_{21} θ_{21} A_{21}$ . As shown in Fig. 2 (d) and (e), green dots correspond to the estimate ${\hat{θ}}_{21} \approx 1$ . The related parameter $v_{21}$ indicates a strong activating effect of gene 1 on gene 2 ( ${\hat{v}}_{21} = 9.42$ , HDI: [8.82, 10.03]). On the other hand, the parameter $v_{22}$ ( ${\hat{v}}_{22} = 0.03$ , HDI: [−0.65, 0.78]) and $ρ_{2}$ ( ${\hat{ρ}}_{2} = 0.02$ , HDI: [−0.84, 0.94]) was inferred to be close to zero, indicating the absence of self-regulation and combinatorial regulation at gene 2.

Thus, the inferred GRN structure and parameters were consistent with the motif structure we used to generate the data, and we were able to infer both the existence of combinatorial regulation and the strengths of the underlying interactions in the presence of multiple regulatory inputs.

3.4. Repressilator

We next considered the repressilator [50], a synthetic circuit that allows us to test the performance of the proposed inference methods with cyclic regulatory architectures. The repressilator consists of three transcriptional repressors arranged in a cyclic inhibitory loop) [14], as shown in Fig. 3(a). While this network can generate sustained oscillations, we focused on the non-oscillatory regime, where trajectories converge to a stochastic equilibrium.

Figure 3: — (a) Trajectories of TF levels in a fully observed repressilator system $(X_{1} ⊣ X_{2} ⊣ X_{3} ⊣ X_{1})$ . (b) Normalized boxplots of the posterior means of inferred regulatory parameters obtained from 50 realizations of the trajectories. (c) The same repressilator system when only partially observed with the product of gene 3 (light green line) not used for inference. (d) The normalized boxplots of the inferred parameters show that we recovered the correct regulatory topology even when the GRN is only partially observed.

We were again able to both identify the correct regulatory structure, and accurately estimate the maximal transcription rates, as well as degradation rates from synthetic data. The boxplots in Fig. 3(b) show that the recovered parameters closely match the ground truth. Across 50 independent realizations, the repression parameters $γ_{13}, γ_{21}$ and $γ_{32}$ were inferred with posterior mean estimates close to their true values of 10 ( ${\hat{γ}}_{13} = 9.485$ , HDI:[8.43, 10.36]), 20 ( ${\hat{γ}}_{21} = 18.287$ , HDI:[16.53, 19.86]), and 15 ( ${\hat{γ}}_{32} = 14.197$ , HDI:[13.20, 15.15]), respectively. All other interaction parameters ( ${\hat{β}}_{i j}$ and ${\hat{γ}}_{i j}$ ) were estimated to be close to zero, correctly indicating the absence of other interactions. The degradation rates, ${\hat{α}}_{1}, {\hat{α}}_{2}$ and ${\hat{α}}_{3}$ , were also recovered accurately.

3.5. Partial Observability

In practice, we can often observe only a subset of a GRN. This poses significant challenges for network inference and parameter identifiability [69]. To evaluate our method’s performance under partial observability, we returned to the three-gene repressilator motif but assumed that only the products of genes 1 and 2 are measurable (Fig. 3(c)). This allowed us to test what interactions can be inferred when the network is partially observed.

Despite the absence of direct measurements of the product of gene 3, we recovered all parameters governing the dynamics of the observed nodes. We were able to infer that gene 1 represses gene 2, with $γ_{21}$ estimated close to its true value of 20 ( ${\hat{γ}}_{21} = 18.28$ , HDI: [16.46, 19.84]), and that gene 3 represses gene 1, with $γ_{13}$ estimated near its true value of 10 ( ${\hat{γ}}_{13} = 9.49$ , HDI:[8.43, 10.34]). The degradation rates were also inferred accurately, with ${\hat{α}}_{1} = 0.03$ , HDI:[0.027, 0.032] and ${\hat{α}}_{2} = 0.05$ , HDI:[0.042, 0.050]. All other parameters governing the evolution of $X_{1}$ and $X_{2}$ were also estimated well. The horseshoe prior effectively suppressed spurious interactions, with the estimated activation parameters $β_{13}$ and $β_{23}$ taking posterior means near zero ( ${\hat{β}}_{13} = 0.018$ , HDI: [−0.29, 0.38]) and ( ${\hat{β}}_{23} = 0.088$ , HDI:[−0.24, 0.62]), respectively, indicating the absence of the corresponding regulatory effects.

As expected, parameters determining the dynamics of products of the unobserved gene 3, including ${\hat{γ}}_{32}$ and ${\hat{α}}_{3}$ , had large posterior uncertainty. However, the parameter $γ_{13}$ captures the regulatory impact of the unobserved gene 3 on the observed system dynamics, and was inferred correctly. Thus, our method correctly inferred the presence and strength of this hidden regulatory link without generating false positive interactions, demonstrating robust performance under partial observability.

3.6. Coherent Feedforward Loop (C1-FFL)

The coherent type-1 feedforward loop (C1-FFL) is one of the most frequently occurring regulatory motifs in transcriptional networks [3, 44]. It consists of a regulator that activates a downstream target both directly and indirectly through an intermediate regulator, with all three interactions being activating. C1-FFLs generate sign-sensitive delays, in which activation of the target requires sustained input. This property enables the motif to filter out transient fluctuations and contributes to the robustness of gene expression [44].

To model this GRN we assumed that gene 1 is transcribed at a constant rate, and activates the expression of gene 2, while both the products of gene 1 and 2 jointly activate the expression of gene 3 as shown in Fig. 4(a). In the model used for inference we did not allow for self loops as they significantly increased the number of possible combinations of interactions at each gene. Although they could be incorporated, doing so would considerably increase computational demands. We performed inference in two steps since inferring both the GRN structure and transcription rates produced unreliable results. These difficulties were due in part to limitations of the Hamiltonian Monte Carlo (HMC/NUTS) samplers, which frequently exhibited poor chain mixing and persistent divergences, often caused by chains becoming trapped. Sampling performance improved substantially even over the course of this project as the HMC/NUTS implementations in PyMC were refined (see Conclusion).

Figure 4: — (a) Gene Expression trajectories generated of a network motif where gene $X_{1}$ is produced at a constant rate of 5 units, activates $X_{2}$ , and $X_{3}$ is jointly activated by both $X_{1}$ and $X_{2}$ through an AND gate. (b) Normalized boxplots from the second inference step, showing the ratios of posterior means to true parameter values for $β$ (activation strength), $α$ (degradation rate), and $η$ (combinatorial regulation strength). (c) Scatter plot of inferred binary regulatory indicators $(θ, ω)$ from the first inference step, where a threshold of 0.5 was used to determine the presence and type of regulatory interactions. (d) Scatter plots of the inferred continuous parameters $(v, ρ)$ from the first inference step, representing the strength of individual and combinatorial regulatory interactions.

The first inference step was focused on inferring GRN architecture. Here we used Eq. (9) to define the likelihood of the model parameters. This formulation of the generative model incorporates binary indicator variables to identify active regulatory interactions. When the associated continuous parameters had HDIs concentrated around zero, the corresponding interactions were assumed to be absent. Pruning these potential interactions yielded a reduced model that we used in the second inference step focused on inferring interaction strengths and degradation rates. We used Eq. (8), with the architecture inferred in the first step. This two-stage approach reduced the effective dimensionality of the parameter space and mitigated the influence of spurious edges on rate estimates.

In the first step across 50 realizations the model consistently inferred ${\hat{θ}}_{12} = 1, {\hat{θ}}_{23} = 1, {\hat{θ}}_{21} = 1, {\hat{θ}}_{13} = 1, {\hat{θ}}_{31} = 1$ , and ${\hat{θ}}_{32} = 1$ , indicating that all six potential interactions were present (See Fig. 4(c)). The AND-gate binary indicator variables $ω_{i j}$ were also inferred to be active: ${\hat{ω}}_{12} = 1, {\hat{ω}}_{13} = 1, {\hat{ω}}_{21} = 1, {\hat{ω}}_{23} = 1, {\hat{ω}}_{31} = 1$ , and ${\hat{ω}}_{32} = 1$ .

As shown in Fig. 4(d), the AND-gate indicators were estimated as ${\hat{ρ}}_{1} = 0$ , HDI:[−0.71, 0.36] and ${\hat{ρ}}_{2} = 10.42$ , HDI:[−0.36, 19.33], suggesting no AND-gate regulation at gene 1 but a potential combinatorial effect at gene 2. The single-input parameters acting on gene 2 were inferred as ${\hat{v}}_{21} = 9.11$ , HDI:[−0.27, 19.33] and ${\hat{v}}_{23} = 0.13$ , HDI:[−0.52, 0.78]. Since only ${\hat{v}}_{21}$ admits clearly nonzero values, we concluded that $X_{1}$ regulates $X_{2}$ , ruling out the presence of an actual AND gate at gene 2.

For gene 3, $ρ_{3}$ was inferred to be nonzero ( ${\hat{ρ}}_{3} = 10.87$ , HDI:[−0.40, 19.75]), and the corresponding single-input parameters, estimated as ${\hat{v}}_{31} = 9.32$ , HDI:[−0.46, 19.42] and ${\hat{v}}_{32} = 10.14$ , HDI:[−0.34, 19.76], had wide HDIs that included substantial nonzero values, indicating an AND-gate interaction involving TFs $X_{1}$ and $X_{2}$ . In the second step, only the AND gate regulation of $X_{3}$ was retained, with the single-input parameters $v_{31}$ and $v_{32}$ pruned to zero. Although the single-input terms had positive estimated values, they were pruned because the strong interaction term $ρ_{3}$ alone was sufficient to explain the data. This resulted in a simpler and more interpretable model, in which gene 3 is activated exclusively by the simultaneous presence of both transcription factors $X_{1}$ and $X_{2}$ , with no independent activation by either factor alone.

The final inference results demonstrate that the structure of the GRN and the interaction strengths were accurately recovered (See Fig. 4(b)). Thus, a two step approach consisting of first identifying the GRN structure, and then estimating the corresponding rate parameters can improve inference in complex networks, enabling both accurate identification of network architecture and parameter estimation.

4. Discussion

We have shown that the Chemical Langevin Equation (CLE) provides a tractable and biologically grounded framework for inferring network structure and kinetic parameters in gene regulatory networks operating in the mesoscopic regime, where intrinsic stochasticity is significant but molecule counts are sufficiently large to justify a continuous approximation. The CLE captures both deterministic reaction kinetics and stochastic fluctuations within a unified stochastic differential equation framework, making it well-suited for modeling gene expression, where noise plays important functional roles [15, 59]. While exact likelihood-based inference using the Chemical Master Equation (CME) is theoretically optimal, it becomes computationally intractable for multi-gene networks [51]. The CLE offers a practical alternative that retains essential stochastic information while remaining amenable to efficient Bayesian inference algorithms. The CLE assumes a well-stirred, constant-volume system and does not provide a good approximation when molecule numbers approach zero [23]. Despite these limitations, our results demonstrate that a CLE-based inference approach can be used to successfully recover network topology and parameters from simulated gene expression time series across a range of biologically relevant regimes, including systems with cross-regulation and strong regulatory feedback.

Central to the proposed Bayesian approach is the use of horseshoe priors [32] which reflect the assumed sparsity of interactions, and allow for the accurate recovery of network architecture. We have shown that such priors suppress spurious edges while retaining true regulatory interactions. We have also tried alternatives, such as Laplacian priors [65] (not shown), and found that well tuned horseshoe priors provided consistently sharper discrimination between existing and absent interactions.

We used a particular form of regulatory interactions to define the statistical model and the resulting likelihood functions, such as the AND gate model of cross-regulation. Other regulatory logic functions can be incorporated in this framework, and our approach can be extended to determine which function best fits the data if multiple candidates are included in the model.

Inference in gene regulatory networks is subject to both structural uncertainty – the inability to distinguish between architectures – and parameter identifiability when either the equations do not uniquely determine certain parameters regardless of how much data is available [60], or, in practice, when finite, noisy observations do not constrain parameter estimates [17, 70]. We consistently recovered the correct network topology in all test cases. However, we observed practical identifiability limitations for certain parameters. Specifically, in the case of a coherent type-1 feed-forward network, during the first step of the inference procedure, structural identifiability issues became evident. Some parameters were linearly correlated; for example, $v_{21}$ and $ρ_{2}$ were consistently assigned similar values, indicating that the model could not uniquely distinguish them. Similarly, $ρ_{2}$ and $v_{23}, v_{32}$ and $ρ_{3}$ , and $v_{31}$ and $ρ_{3}$ also showed strong correlations. These dependencies limit the accuracy with which individual parameter values can be inferred. These correlation patterns are shown in Supplementary Fig. 10.

Our approach has several limitations. First, computational and data requirements grow with network size: inference for $N$ genes involves $O (N^{2})$ parameters, making inference in larger networks (> 5 genes) challenging. Second, for validation we used only synthetic data under idealized conditions. While this is a necessary first step in validating an inference method, we ignored extrinsic noise, measurement error, and parameter non-stationarity present in real experiments [15]. Third, our model assumes Hill function kinetics, CLE approximations valid at moderate copy numbers, and negligible time delays – assumptions that can be violated in biological systems and may thus impact inference. Future work should address these limitations through benchmarking, experimental validation, and model extensions that incorporate biologically important system properties.

Several avenues are available to address these challenges. First, incorporating both intrinsic and extrinsic noise sources into the likelihoods would yield more biologically realistic models. Second, systematic validation on experimental time-series data will be essential to establish practical utility. Third, improving scalability through approximate inference schemes [75], dimensionality reduction [49], or variational Bayesian approaches [18] will broaden applicability to larger systems. Finally, extending the framework to more complex network architectures, including feedback loops and cross-regulatory modules, could help uncover design principles of natural gene networks at larger scales. Together, these directions can enhance both the computational feasibility and biological relevance of the method, paving the way for systems-level studies of gene regulation.

Supplementary Material

Supplement 1

media-1.pdf^{(21.6MB, pdf)}

Acknowledgements

This work was supported by funding from the joint National Science Foundation and National Institutes of Health grant 1R01GM144959 and National Science Foundation EFRI-OI grant 2515431 (A.G. and K.J.).

Code Availability

Code implementing the proposed inference algorithm and reproducing all results in the text is available at http://bit.ly/3Yq14Rx

References

[1].Aalto A., Viitasaari L., Ilmonen P., Mombaerts L., and Gonçalves J.. Gene regulatory network inference from sparsely sampled noisy data. Nature Communications, 11(1):3493, 2020. [Google Scholar]
[2].Abril-Pla O., Andreani V., Carroll C., Dong L., Fonnesbeck C. J., Kochurov M., Kumar R., Lao J., Luhmann C. C., Martin O. A., et al. Pymc: a modern, and comprehensive probabilistic programming framework in Python. PeerJ Computer Science, 9:e1516, 2023. [Google Scholar]
[3].Alon U.. An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman and Hall/CRC, 2nd edition, 2019. [Google Scholar]
[4].Becskei A., Séraphin B., and Serrano L.. Positive feedback in eukaryotic gene networks: cell differentiation by graded to binary response conversion. The EMBO Journal, 20(10):2528–2535, 2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Betancourt M.. A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv:1701.02434, 2017. Preprint, available at arXiv:1701.02434. [Google Scholar]
[6].Bintu L., Buchler N. E., Garcia H. G., Gerland U., Hwa T., Kondev J., and Phillips R.. Transcriptional regulation by the numbers: models. Current Opinion in Genetics & Development, 15(2):116–124, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Box G. E. P. and Tiao G. C.. Bayesian Inference in Statistical Analysis. John Wiley & Sons, 1973. Reprinted 2011. [Google Scholar]
[8].Brunton S. L., Proctor J. L., and Kutz J. N.. Sparse identification of nonlinear dynamics with control (SINDYc). IFAC-PapersOnLine, 49(18):710–715, 2016. [Google Scholar]
[9].Buchler N. E., Gerland U., and Hwa T.. On schemes of combinatorial transcription logic. Proceedings of the National Academy of Sciences, 100(9):5136–5141, 2003. [Google Scholar]
[10].Chen Y., Kim J. K., Hirning A. J., Josić K., and Bennett M. R.. Emergent genetic oscillations in a synthetic microbial consortium. Science, 349(6251):986–989, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Chis O.-T., Banga J. R., and Balsa-Canto E.. Structural identifiability of systems biology models: a critical comparison of methods. PLoS ONE, 6(11):e27755, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Davidson E. H. and Levine M.. Gene regulatory networks. Proceedings of the National Academy of Sciences, 102(14):4935–4936, 2005. [Google Scholar]
[13].Dunlop M. J., III Cox R. S., Levine J. H., Murray R. M., and Elowitz M. B.. Regulatory activity revealed by dynamic correlations in gene expression noise. Nature Genetics, 40(12):1493–1498, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Elowitz M. B. and Leibler S.. A synthetic oscillatory network of transcriptional regulators. Nature, 403(6767):335–338, 2000. [DOI] [PubMed] [Google Scholar]
[15].Elowitz M. B., Levine A. J., Siggia E. D., and Swain P. S.. Stochastic gene expression in a single cell. Science, 297(5584):1183–1186, 2002. [DOI] [PubMed] [Google Scholar]
[16].Espinosa-Soto C.. On the role of sparseness in the evolution of modularity in gene regulatory networks. PLoS Computational Biology, 14(5):e1006172, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].FitzGerald C. E., Reich S., Agaba V., Mathur A., Werner M. S., and Mangan N. M.. Practical indistinguishability in a gene regulatory network inference problem, a case study. arXiv preprint arXiv:2508.21006, 2025. Preprint, available at arXiv:2508.21006. [Google Scholar]
[18].Fox C. W. and Roberts S. J.. A tutorial on variational Bayesian inference. Artificial Intelligence Review, 38(2):85–95, 2012. [Google Scholar]
[19].François P. and Hakim V.. Core genetic module: the mixed feedback loop. Physical Review E, 72(3):031908, 2005. [Google Scholar]
[20].Friedman N., Linial M., Nachman I., and Pe’er D.. Using Bayesian networks to analyze expression data. Journal of Computational Biology, 7(3–4):601–620, 2000. [DOI] [PubMed] [Google Scholar]
[21].Gardner T. S., Cantor C. R., and Collins J. J.. Construction of a genetic toggle switch in Escherichia coli. Nature, 403(6767):339–342, 2000. [DOI] [PubMed] [Google Scholar]
[22].Gillespie D. T.. Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry, 81(25):2340–2361, 1977. [Google Scholar]
[23].Gillespie D. T.. The chemical langevin equation. The Journal of Chemical Physics, 113(1):297–306, 2000. [Google Scholar]
[24].Gillespie D. T.. Stochastic simulation of chemical kinetics. Annual Review of Physical Chemistry, 58:35–55, 2007. [Google Scholar]
[25].Golightly A. and Wilkinson D. J.. Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics, 61(3):781–788, 2005. [DOI] [PubMed] [Google Scholar]
[26].Golightly A. and Wilkinson D. J.. Bayesian sequential inference for stochastic kinetic biochemical network models. Journal of Computational Biology, 13(3):838–851, 2006. [DOI] [PubMed] [Google Scholar]
[27].Golightly A. and Wilkinson D. J.. Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo. Interface Focus, 1(6):807–820, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[28].Herbach U., Bonnaffoux A., Espinasse T., and Gandrillon O.. Inferring gene regulatory networks from single-cell data: a mechanistic approach. BMC Systems Biology, 11(1):105, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[29].Higham D. J.. An algorithmic introduction to numerical simulation of stochastic differential equations. SIAM Review, 43(3):525–546, 2001. [Google Scholar]
[30].Higham D. J.. Modeling and simulating chemical reactions. SIAM Review, 50(2):347–368, 2008. [Google Scholar]
[31].Hines K. E., Middendorf T. R., and Aldrich R. W.. Determination of parameter identifiability in nonlinear biophysical models: a Bayesian approach. Journal of General Physiology, 143(3):401–416, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Hirsh S. M., Barajas-Solano D. A., and Kutz J. N.. Sparsifying priors for Bayesian uncertainty quantification in model discovery. Royal Society Open Science, 9(2):211823, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Hoffman M. D. and Gelman A.. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1):1593–1623, 2014. Available at http://jmlr.org/papers/v15/hoffman14a.html. [Google Scholar]
[34].Huang S., Guo Y.-P., May G., and Enver T.. Bifurcation dynamics in lineage-commitment in bipotent progenitor cells. Developmental Biology, 305(2):695–713, 2007. [DOI] [PubMed] [Google Scholar]
[35].Huang X.-N., Shi W.-J., Zhou Z., and Zhang X.-J.. The identifiability of gene regulatory networks: the role of observation data. Journal of Biological Physics, 48(1):93–110, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36].Huynh-Thu V. A. and Geurts P.. dyngenie3: dynamical genie3 for the inference of gene networks from time series expression data. Scientific Reports, 8(1):3384, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[37].Huynh-Thu V. A., Irrthum A., Wehenkel L., and Geurts P.. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE, 5(9):e12776, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].James G. M., Sabatti C., Zhou N., and Zhu J.. Sparse regulatory networks. The Annals of Applied Statistics, 4(2):663–686, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[39].Jiang R., Singh P., Wrede F., Hellander A., and Petzold L.. Identification of dynamic mass-action biochemical reaction networks using sparse Bayesian methods. PLoS Computational Biology, 18(1):e1009830, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
[40].Kim H. and Gelenbe E.. Stochastic gene expression modeling with Hill function for switch-like gene responses. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4):973–979, 2012. [DOI] [PubMed] [Google Scholar]
[41].Kuhlman T., Zhang Z., Jr M. H.. Saier, and T. Hwa. Combinatorial transcriptional control of the lactose operon of Escherichia coli. Proceedings of the National Academy of Sciences, 104(14):6043–6048, 2007. [Google Scholar]
[42].Kuwahara H. and Soyer O. S.. Bistability in feedback circuits as a byproduct of evolution of evolvability. Molecular Systems Biology, 8:564, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[43].Mangan N. M., Brunton S. L., Proctor J. L., and Kutz J. N.. Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Transactions on Molecular, Biological and Multi-Scale Communications, 2(1):52–63, 2016. [Google Scholar]
[44].Mangan S., Zaslaver A., and Alon U.. The coherent feedforward loop serves as a sign-sensitive delay element in transcription networks. Journal of Molecular Biology, 334(2):197–204, 2003. [DOI] [PubMed] [Google Scholar]
[45].Manicka S., Johnson K., Levin M., and Murrugarra D.. The nonlinearity of regulation in biological networks. NPJ Systems Biology and Applications, 9(1):10, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
[46].Margolin A. A., Nemenman I., Basso K., Wiggins C., Stolovitzky G., Dalla Favera R., and Califano A.. Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 7(Suppl 1):S7, 2006. [Google Scholar]
[47].Matthew S., Carter F., Cooper J., Dippel M., Green E., Hodges S., Kidwell M., Nickerson D., Rumsey B., Reeve J., Petzold L. R., Sanft K. R., and Drawert B.. GillesPy2: a biochemical modeling framework for simulation driven biological discovery. Letters in Biomathematics, 10(1):87–103, 2023. [PMC free article] [PubMed] [Google Scholar]
[48].Middendorf T. R. and Aldrich R. W.. The structure of binding curves and practical identifiability of equilibrium ligand-binding parameters. Journal of General Physiology, 149(1):121–147, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[49].Moon K. R., van Dijk D., Wang Z., Gigante S., Burkhardt D. B., Chen W. S., Yim K., van den Elzen A., Hirn M. J., Coifman R. R., et al. Visualizing structure and transitions in high-dimensional biological data. Nature Biotechnology, 37(12):1482–1492, 2019. [Google Scholar]
[50].Müller S., Hofbauer J., Endler L., Flamm C., Widder S., and Schuster P.. A generalized model of the repressilator. Journal of Mathematical Biology, 53(6):905–937, 2006. [DOI] [PubMed] [Google Scholar]
[51].Munsky B. and Khammash M.. The finite state projection algorithm for the solution of the chemical master equation. The Journal of Chemical Physics, 124(4):044104, 2006. [DOI] [PubMed] [Google Scholar]
[52].Panovska-Griffiths J., Page K. M., and Briscoe J.. A gene regulatory motif that generates oscillatory or multiway switch outputs. Journal of The Royal Society Interface, 10(79):20120826, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[53].Patil A., Huard D., and Fonnesbeck C. J.. Pymc: Bayesian stochastic modelling in Python. Journal of Statistical Software, 35(4):1–81, 2010. [PMC free article] [PubMed] [Google Scholar]
[54].Paulsson J.. Summing up the noise in gene networks. Nature, 427(6973):415–418, 2004. [DOI] [PubMed] [Google Scholar]
[55].Phillips R., Bois J., and Meyer M.. Analysis of feed-forward loops. http://be150.caltech.edu/2020/content/lessons/05_ffls.html, 2020. Caltech BE/Bi 150: Biological Circuit Design, Lesson 5. [Google Scholar]
[56].Phillips R., Kondev J., Theriot J., and Garcia H. G.. Physical Biology of the Cell. Garland Science, 2 edition, 2012. [Google Scholar]
[57].Piironen J. and Vehtari A.. Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2):5018–5051, 2017. [Google Scholar]
[58].Ptashne M. and Gann A.. Genes & Signals. Cold Spring Harbor Laboratory Press, 2002. [Google Scholar]
[59].Raj A. and van Oudenaarden A.. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell, 135(2):216–226, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
[60].Raue A., Kreutz C., Maiwald T., Bachmann J., Schilling M., Klingmüller U., and Timmer J.. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics, 25(15):1923–1929, 2009. [DOI] [PubMed] [Google Scholar]
[61].Ruess J. and Lygeros J.. Moment-based methods for parameter inference and experiment design for stochastic biochemical reaction networks. ACM Transactions on Modeling and Computer Simulation, 25(2):8, 2015. [Google Scholar]
[62].Santillán M.. On the use of the Hill functions in mathematical models of gene regulatory networks. Mathematical Modelling of Natural Phenomena, 3(2):85–97, 2008. [Google Scholar]
[63].Schlitt T. and Brazma A.. Current approaches to gene regulatory network modelling. BMC Bioinformatics, 8(Suppl 6):S9, 2007. [Google Scholar]
[64].Schnoerr D., Sanguinetti G., and Grima R.. Approximation and inference methods for stochastic biochemical kinetics—a tutorial review. Journal of Physics A: Mathematical and Theoretical, 50(9):093001, 2017. [Google Scholar]
[65].Seeger M., Steinke F., and Tsuda K.. Bayesian inference and optimal design in the sparse linear model. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 444–451, 2007. Available at http://proceedings.mlr.press/v2/seeger07a.html. [Google Scholar]
[66].Shu H., Zhou J., Lian Q., Li H., Zhao D., Zeng J., and Ma J.. Modeling gene regulatory networks using neural network architectures. Nature Computational Science, 1(7):491–501, 2021. [DOI] [PubMed] [Google Scholar]
[67].Swain P. S., Elowitz M. B., and Siggia E. D.. Intrinsic and extrinsic contributions to stochasticity in gene expression. Proceedings of the National Academy of Sciences, 99(20):12795–12800, 2002. [Google Scholar]
[68].Taniguchi Y., Choi P. J., Li G.-W., Chen H., Babu M., Hearn J., Emili A., and Xie X. S.. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science, 329(5991):533–538, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[69].Villaverde A. F.. Observability and structural identifiability of nonlinear biological systems. Complexity, 2019:8497093, 2019. [Google Scholar]
[70].Villaverde A. F., Barreiro A., and Papachristodoulou A.. Structural identifiability of dynamic systems biology models. PLoS Computational Biology, 12(10):e1005153, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[71].Wang R.-S., Saadatpour A., and Albert R.. Boolean modeling in systems biology: an overview of methodology and applications. Physical Biology, 9(5):055001, 2012. [DOI] [PubMed] [Google Scholar]
[72].Wang Y. and He S.. Inference on autoregulation in gene expression with variance-to-mean ratio. Journal of Mathematical Biology, 86(5):87, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
[73].Warmflash A. and Dinner A. R.. Signatures of combinatorial regulation in intrinsic biological noise. Proceedings of the National Academy of Sciences, 105(45):17262–17267, 2008. [Google Scholar]
[74].Wilkinson D. J.. Stochastic modelling for quantitative description of heterogeneous biological systems. Nature Reviews Genetics, 10(2):122–133, 2009. [Google Scholar]
[75].Wilkinson D. J.. Stochastic Modelling for Systems Biology. Chapman and Hall/CRC, 3rd edition, 2018. [Google Scholar]
[76].Williams K., Savageau M. A., and Blumenthal R. M.. A bistable hysteretic switch in an activator–repressor regulated restriction–modification system. Nucleic Acids Research, 41(12):6045–6057, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[77].Yan C.-C. S. and Hsu C.-P.. The fluctuation-dissipation theorem for stochastic kinetics—implications on genetic regulations. The Journal of Chemical Physics, 139(22):224109, 2013. [DOI] [PubMed] [Google Scholar]
[78].Zhang X., Zhao X.-M., He K., Lu L., Cao Y., Liu J., Hao J.-K., Liu Z.-P., and Chen L.. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics, 28(1):98–104, 2012. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

media-1.pdf^{(21.6MB, pdf)}

Data Availability Statement

Code implementing the proposed inference algorithm and reproducing all results in the text is available at http://bit.ly/3Yq14Rx

[R1] [1].Aalto A., Viitasaari L., Ilmonen P., Mombaerts L., and Gonçalves J.. Gene regulatory network inference from sparsely sampled noisy data. Nature Communications, 11(1):3493, 2020. [Google Scholar]

[R2] [2].Abril-Pla O., Andreani V., Carroll C., Dong L., Fonnesbeck C. J., Kochurov M., Kumar R., Lao J., Luhmann C. C., Martin O. A., et al. Pymc: a modern, and comprehensive probabilistic programming framework in Python. PeerJ Computer Science, 9:e1516, 2023. [Google Scholar]

[R3] [3].Alon U.. An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman and Hall/CRC, 2nd edition, 2019. [Google Scholar]

[R4] [4].Becskei A., Séraphin B., and Serrano L.. Positive feedback in eukaryotic gene networks: cell differentiation by graded to binary response conversion. The EMBO Journal, 20(10):2528–2535, 2001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Betancourt M.. A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv:1701.02434, 2017. Preprint, available at arXiv:1701.02434. [Google Scholar]

[R6] [6].Bintu L., Buchler N. E., Garcia H. G., Gerland U., Hwa T., Kondev J., and Phillips R.. Transcriptional regulation by the numbers: models. Current Opinion in Genetics & Development, 15(2):116–124, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Box G. E. P. and Tiao G. C.. Bayesian Inference in Statistical Analysis. John Wiley & Sons, 1973. Reprinted 2011. [Google Scholar]

[R8] [8].Brunton S. L., Proctor J. L., and Kutz J. N.. Sparse identification of nonlinear dynamics with control (SINDYc). IFAC-PapersOnLine, 49(18):710–715, 2016. [Google Scholar]

[R9] [9].Buchler N. E., Gerland U., and Hwa T.. On schemes of combinatorial transcription logic. Proceedings of the National Academy of Sciences, 100(9):5136–5141, 2003. [Google Scholar]

[R10] [10].Chen Y., Kim J. K., Hirning A. J., Josić K., and Bennett M. R.. Emergent genetic oscillations in a synthetic microbial consortium. Science, 349(6251):986–989, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Chis O.-T., Banga J. R., and Balsa-Canto E.. Structural identifiability of systems biology models: a critical comparison of methods. PLoS ONE, 6(11):e27755, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Davidson E. H. and Levine M.. Gene regulatory networks. Proceedings of the National Academy of Sciences, 102(14):4935–4936, 2005. [Google Scholar]

[R13] [13].Dunlop M. J., III Cox R. S., Levine J. H., Murray R. M., and Elowitz M. B.. Regulatory activity revealed by dynamic correlations in gene expression noise. Nature Genetics, 40(12):1493–1498, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Elowitz M. B. and Leibler S.. A synthetic oscillatory network of transcriptional regulators. Nature, 403(6767):335–338, 2000. [DOI] [PubMed] [Google Scholar]

[R15] [15].Elowitz M. B., Levine A. J., Siggia E. D., and Swain P. S.. Stochastic gene expression in a single cell. Science, 297(5584):1183–1186, 2002. [DOI] [PubMed] [Google Scholar]

[R16] [16].Espinosa-Soto C.. On the role of sparseness in the evolution of modularity in gene regulatory networks. PLoS Computational Biology, 14(5):e1006172, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].FitzGerald C. E., Reich S., Agaba V., Mathur A., Werner M. S., and Mangan N. M.. Practical indistinguishability in a gene regulatory network inference problem, a case study. arXiv preprint arXiv:2508.21006, 2025. Preprint, available at arXiv:2508.21006. [Google Scholar]

[R18] [18].Fox C. W. and Roberts S. J.. A tutorial on variational Bayesian inference. Artificial Intelligence Review, 38(2):85–95, 2012. [Google Scholar]

[R19] [19].François P. and Hakim V.. Core genetic module: the mixed feedback loop. Physical Review E, 72(3):031908, 2005. [Google Scholar]

[R20] [20].Friedman N., Linial M., Nachman I., and Pe’er D.. Using Bayesian networks to analyze expression data. Journal of Computational Biology, 7(3–4):601–620, 2000. [DOI] [PubMed] [Google Scholar]

[R21] [21].Gardner T. S., Cantor C. R., and Collins J. J.. Construction of a genetic toggle switch in Escherichia coli. Nature, 403(6767):339–342, 2000. [DOI] [PubMed] [Google Scholar]

[R22] [22].Gillespie D. T.. Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry, 81(25):2340–2361, 1977. [Google Scholar]

[R23] [23].Gillespie D. T.. The chemical langevin equation. The Journal of Chemical Physics, 113(1):297–306, 2000. [Google Scholar]

[R24] [24].Gillespie D. T.. Stochastic simulation of chemical kinetics. Annual Review of Physical Chemistry, 58:35–55, 2007. [Google Scholar]

[R25] [25].Golightly A. and Wilkinson D. J.. Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics, 61(3):781–788, 2005. [DOI] [PubMed] [Google Scholar]

[R26] [26].Golightly A. and Wilkinson D. J.. Bayesian sequential inference for stochastic kinetic biochemical network models. Journal of Computational Biology, 13(3):838–851, 2006. [DOI] [PubMed] [Google Scholar]

[R27] [27].Golightly A. and Wilkinson D. J.. Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo. Interface Focus, 1(6):807–820, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] [28].Herbach U., Bonnaffoux A., Espinasse T., and Gandrillon O.. Inferring gene regulatory networks from single-cell data: a mechanistic approach. BMC Systems Biology, 11(1):105, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] [29].Higham D. J.. An algorithmic introduction to numerical simulation of stochastic differential equations. SIAM Review, 43(3):525–546, 2001. [Google Scholar]

[R30] [30].Higham D. J.. Modeling and simulating chemical reactions. SIAM Review, 50(2):347–368, 2008. [Google Scholar]

[R31] [31].Hines K. E., Middendorf T. R., and Aldrich R. W.. Determination of parameter identifiability in nonlinear biophysical models: a Bayesian approach. Journal of General Physiology, 143(3):401–416, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Hirsh S. M., Barajas-Solano D. A., and Kutz J. N.. Sparsifying priors for Bayesian uncertainty quantification in model discovery. Royal Society Open Science, 9(2):211823, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Hoffman M. D. and Gelman A.. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1):1593–1623, 2014. Available at http://jmlr.org/papers/v15/hoffman14a.html. [Google Scholar]

[R34] [34].Huang S., Guo Y.-P., May G., and Enver T.. Bifurcation dynamics in lineage-commitment in bipotent progenitor cells. Developmental Biology, 305(2):695–713, 2007. [DOI] [PubMed] [Google Scholar]

[R35] [35].Huang X.-N., Shi W.-J., Zhou Z., and Zhang X.-J.. The identifiability of gene regulatory networks: the role of observation data. Journal of Biological Physics, 48(1):93–110, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] [36].Huynh-Thu V. A. and Geurts P.. dyngenie3: dynamical genie3 for the inference of gene networks from time series expression data. Scientific Reports, 8(1):3384, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] [37].Huynh-Thu V. A., Irrthum A., Wehenkel L., and Geurts P.. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE, 5(9):e12776, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] [38].James G. M., Sabatti C., Zhou N., and Zhu J.. Sparse regulatory networks. The Annals of Applied Statistics, 4(2):663–686, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] [39].Jiang R., Singh P., Wrede F., Hellander A., and Petzold L.. Identification of dynamic mass-action biochemical reaction networks using sparse Bayesian methods. PLoS Computational Biology, 18(1):e1009830, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] [40].Kim H. and Gelenbe E.. Stochastic gene expression modeling with Hill function for switch-like gene responses. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4):973–979, 2012. [DOI] [PubMed] [Google Scholar]

[R41] [41].Kuhlman T., Zhang Z., Jr M. H.. Saier, and T. Hwa. Combinatorial transcriptional control of the lactose operon of Escherichia coli. Proceedings of the National Academy of Sciences, 104(14):6043–6048, 2007. [Google Scholar]

[R42] [42].Kuwahara H. and Soyer O. S.. Bistability in feedback circuits as a byproduct of evolution of evolvability. Molecular Systems Biology, 8:564, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] [43].Mangan N. M., Brunton S. L., Proctor J. L., and Kutz J. N.. Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Transactions on Molecular, Biological and Multi-Scale Communications, 2(1):52–63, 2016. [Google Scholar]

[R44] [44].Mangan S., Zaslaver A., and Alon U.. The coherent feedforward loop serves as a sign-sensitive delay element in transcription networks. Journal of Molecular Biology, 334(2):197–204, 2003. [DOI] [PubMed] [Google Scholar]

[R45] [45].Manicka S., Johnson K., Levin M., and Murrugarra D.. The nonlinearity of regulation in biological networks. NPJ Systems Biology and Applications, 9(1):10, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] [46].Margolin A. A., Nemenman I., Basso K., Wiggins C., Stolovitzky G., Dalla Favera R., and Califano A.. Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 7(Suppl 1):S7, 2006. [Google Scholar]

[R47] [47].Matthew S., Carter F., Cooper J., Dippel M., Green E., Hodges S., Kidwell M., Nickerson D., Rumsey B., Reeve J., Petzold L. R., Sanft K. R., and Drawert B.. GillesPy2: a biochemical modeling framework for simulation driven biological discovery. Letters in Biomathematics, 10(1):87–103, 2023. [PMC free article] [PubMed] [Google Scholar]

[R48] [48].Middendorf T. R. and Aldrich R. W.. The structure of binding curves and practical identifiability of equilibrium ligand-binding parameters. Journal of General Physiology, 149(1):121–147, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] [49].Moon K. R., van Dijk D., Wang Z., Gigante S., Burkhardt D. B., Chen W. S., Yim K., van den Elzen A., Hirn M. J., Coifman R. R., et al. Visualizing structure and transitions in high-dimensional biological data. Nature Biotechnology, 37(12):1482–1492, 2019. [Google Scholar]

[R50] [50].Müller S., Hofbauer J., Endler L., Flamm C., Widder S., and Schuster P.. A generalized model of the repressilator. Journal of Mathematical Biology, 53(6):905–937, 2006. [DOI] [PubMed] [Google Scholar]

[R51] [51].Munsky B. and Khammash M.. The finite state projection algorithm for the solution of the chemical master equation. The Journal of Chemical Physics, 124(4):044104, 2006. [DOI] [PubMed] [Google Scholar]

[R52] [52].Panovska-Griffiths J., Page K. M., and Briscoe J.. A gene regulatory motif that generates oscillatory or multiway switch outputs. Journal of The Royal Society Interface, 10(79):20120826, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] [53].Patil A., Huard D., and Fonnesbeck C. J.. Pymc: Bayesian stochastic modelling in Python. Journal of Statistical Software, 35(4):1–81, 2010. [PMC free article] [PubMed] [Google Scholar]

[R54] [54].Paulsson J.. Summing up the noise in gene networks. Nature, 427(6973):415–418, 2004. [DOI] [PubMed] [Google Scholar]

[R55] [55].Phillips R., Bois J., and Meyer M.. Analysis of feed-forward loops. http://be150.caltech.edu/2020/content/lessons/05_ffls.html, 2020. Caltech BE/Bi 150: Biological Circuit Design, Lesson 5. [Google Scholar]

[R56] [56].Phillips R., Kondev J., Theriot J., and Garcia H. G.. Physical Biology of the Cell. Garland Science, 2 edition, 2012. [Google Scholar]

[R57] [57].Piironen J. and Vehtari A.. Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2):5018–5051, 2017. [Google Scholar]

[R58] [58].Ptashne M. and Gann A.. Genes & Signals. Cold Spring Harbor Laboratory Press, 2002. [Google Scholar]

[R59] [59].Raj A. and van Oudenaarden A.. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell, 135(2):216–226, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R60] [60].Raue A., Kreutz C., Maiwald T., Bachmann J., Schilling M., Klingmüller U., and Timmer J.. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics, 25(15):1923–1929, 2009. [DOI] [PubMed] [Google Scholar]

[R61] [61].Ruess J. and Lygeros J.. Moment-based methods for parameter inference and experiment design for stochastic biochemical reaction networks. ACM Transactions on Modeling and Computer Simulation, 25(2):8, 2015. [Google Scholar]

[R62] [62].Santillán M.. On the use of the Hill functions in mathematical models of gene regulatory networks. Mathematical Modelling of Natural Phenomena, 3(2):85–97, 2008. [Google Scholar]

[R63] [63].Schlitt T. and Brazma A.. Current approaches to gene regulatory network modelling. BMC Bioinformatics, 8(Suppl 6):S9, 2007. [Google Scholar]

[R64] [64].Schnoerr D., Sanguinetti G., and Grima R.. Approximation and inference methods for stochastic biochemical kinetics—a tutorial review. Journal of Physics A: Mathematical and Theoretical, 50(9):093001, 2017. [Google Scholar]

[R65] [65].Seeger M., Steinke F., and Tsuda K.. Bayesian inference and optimal design in the sparse linear model. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 444–451, 2007. Available at http://proceedings.mlr.press/v2/seeger07a.html. [Google Scholar]

[R66] [66].Shu H., Zhou J., Lian Q., Li H., Zhao D., Zeng J., and Ma J.. Modeling gene regulatory networks using neural network architectures. Nature Computational Science, 1(7):491–501, 2021. [DOI] [PubMed] [Google Scholar]

[R67] [67].Swain P. S., Elowitz M. B., and Siggia E. D.. Intrinsic and extrinsic contributions to stochasticity in gene expression. Proceedings of the National Academy of Sciences, 99(20):12795–12800, 2002. [Google Scholar]

[R68] [68].Taniguchi Y., Choi P. J., Li G.-W., Chen H., Babu M., Hearn J., Emili A., and Xie X. S.. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science, 329(5991):533–538, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R69] [69].Villaverde A. F.. Observability and structural identifiability of nonlinear biological systems. Complexity, 2019:8497093, 2019. [Google Scholar]

[R70] [70].Villaverde A. F., Barreiro A., and Papachristodoulou A.. Structural identifiability of dynamic systems biology models. PLoS Computational Biology, 12(10):e1005153, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R71] [71].Wang R.-S., Saadatpour A., and Albert R.. Boolean modeling in systems biology: an overview of methodology and applications. Physical Biology, 9(5):055001, 2012. [DOI] [PubMed] [Google Scholar]

[R72] [72].Wang Y. and He S.. Inference on autoregulation in gene expression with variance-to-mean ratio. Journal of Mathematical Biology, 86(5):87, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R73] [73].Warmflash A. and Dinner A. R.. Signatures of combinatorial regulation in intrinsic biological noise. Proceedings of the National Academy of Sciences, 105(45):17262–17267, 2008. [Google Scholar]

[R74] [74].Wilkinson D. J.. Stochastic modelling for quantitative description of heterogeneous biological systems. Nature Reviews Genetics, 10(2):122–133, 2009. [Google Scholar]

[R75] [75].Wilkinson D. J.. Stochastic Modelling for Systems Biology. Chapman and Hall/CRC, 3rd edition, 2018. [Google Scholar]

[R76] [76].Williams K., Savageau M. A., and Blumenthal R. M.. A bistable hysteretic switch in an activator–repressor regulated restriction–modification system. Nucleic Acids Research, 41(12):6045–6057, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R77] [77].Yan C.-C. S. and Hsu C.-P.. The fluctuation-dissipation theorem for stochastic kinetics—implications on genetic regulations. The Journal of Chemical Physics, 139(22):224109, 2013. [DOI] [PubMed] [Google Scholar]

[R78] [78].Zhang X., Zhao X.-M., He K., Lu L., Cao Y., Liu J., Hao J.-K., Liu Z.-P., and Chen L.. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics, 28(1):98–104, 2012. [DOI] [PubMed] [Google Scholar]

PERMALINK

This is a preprint.

Bayesian Inference of Gene Regulatory Networks at Stochastic Steady State

Anshi Gupta

Ryeongkyung Yoon

Krešimir Josić

Abstract

1. Introduction

2. Materials and Methods

2.1. GRN model

2.2. Example: Single gene autoregulation

2.3. General Inference Model

2.4. Bayesian inference framework

Parameters Estimated.

Figure 1: Inference of regulation in a single-gene system.

Priors.

Approximating the posterior distributions.

2.5. Synthetic data

3. Results

3.1. Single-Gene Motifs

3.2. Two-Gene Motifs

Figure 2: Inference in two-gene regulatory networks with and without combinatorial regulation.

3.3. Two-Gene motifs with an AND gate

3.4. Repressilator

Figure 3: Inference in the non-oscillatory repressilator.

3.5. Partial Observability

3.6. Coherent Feedforward Loop (C1-FFL)

Figure 4: Results of the two-step inference procedure applied to the coherent type-1 feed-forward loop (C1-FFL) motif.

4. Discussion

Supplementary Material

Acknowledgements

Code Availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases