Spectral solutions to stochastic models of gene expression with bursts and regulation

Andrew Mugler; Aleksandra M Walczak; Chris H Wiggins

doi:10.1103/PhysRevE.80.041921

. Author manuscript; available in PMC: 2011 Jun 15.

Published in final edited form as: Phys Rev E Stat Nonlin Soft Matter Phys. 2009 Oct 20;80(4 Pt 1):041921. doi: 10.1103/PhysRevE.80.041921

Spectral solutions to stochastic models of gene expression with bursts and regulation

Andrew Mugler ^1,^*, Aleksandra M Walczak ^2,^†, Chris H Wiggins ^3,^‡

PMCID: PMC3115574 NIHMSID: NIHMS189350 PMID: 19905356

Abstract

Signal-processing molecules inside cells are often present at low copy number, which necessitates probabilistic models to account for intrinsic noise. Probability distributions have traditionally been found using simulation-based approaches which then require estimating the distributions from many samples. Here we present in detail an alternative method for directly calculating a probability distribution by expanding in the natural eigenfunctions of the governing equation, which is linear. We apply the resulting spectral method to three general models of stochastic gene expression: a single gene with multiple expression states (often used as a model of bursting in the limit of two states), a gene regulatory cascade, and a combined model of bursting and regulation. In all cases we find either analytic results or numerical prescriptions that greatly outperform simulations in efficiency and accuracy. In the last case, we show that bimodal response in the limit of slow switching is not only possible but optimal in terms of information transmission.

I. INTRODUCTION

Signals are processed in cells using networks of interacting components, including proteins, mRNAs, and small signaling molecules. These components are usually present in low numbers [1-6], which means the size of the fluctuations in their copy counts is comparable to the copy counts themselves. Noise in gene networks has been shown to propagate [7] and therefore explicitly accounting for the stochastic nature of gene expression appears important when predicting the properties of real biological networks.

Although summary statistics such as mean and variance are sometimes sufficient for answering questions of biological interest [8], calculating certain quantities, such as information transmission [9-14], requires knowing the full probability distribution. Full knowledge of the probability distribution can also be used to discern different molecular models of the noise sources based on recent exact measurements of probability distributions [15-18].

Much analytical and purely computational effort has gone into detailed models of noise in small genetic switches [8,19-21]. The most general description is based on the master equation describing the time evolution of the joint probability distribution over all copy counts [22]. Some progress has been made by applying approximations to the master equation [24-26]. For example, a wide class of approximations focuses on limits of large concentrations or small switches [8,19,23]. More often, modelers resort to stochastic simulation techniques, the most common of which is based on the varying-step Monte Carlo method [27,28]. This method requires a computational challenge (generating many sample trajectories) followed by an even more difficult statistical challenge (parametrizing or otherwise estimating the probability distribution from which the samples are drawn) [29]. Recently there have been significant advances in simulation-based methods to circumvent these problems [20,21,30]. Simulation techniques are especially useful for more detailed studies of experimentally well-characterized systems, including those incorporating DNA looping, non-specific binding [30], and explicit spatial effects [20,21]. However, it is often beneficial to first gain intuition from simplified analytical models.

In a recent paper [31] we introduce an alternative method for calculating the steady-state distributions of chemical reactants, which we call the spectral method. The procedure relies on exploiting the natural basis of a simpler problem from the same class. The full problem is then solved numerically as an expansion in the natural basis [32]. In the spectral method we use the analytical guidance of a simple birth-death problem to reduce the master equation for a cascade to a set of algebraic equations. We break the problem into two parts: a parameter-independent preprocessing step and the parameter-dependent step of obtaining the actual probability distributions. The spectral method allows huge computational gains with respect to simulations. In prior work [31] we illustrate the method in the example of gene regulatory cascades. We combine the spectral method with a Markov approximation, which exploits the observation that the behavior of a given species should depend only weakly on distant nodes given the proximal nodes.

In this paper we expand upon the application of the spectral method to more biologically realistic models of regulation: (i) a model of bursting in gene expression and (ii) a model that includes both bursts and explicit regulation by binding of transcription factor proteins. In both cases we demonstrate how the spectral method gives either analytic results or reduced algebraic expressions that can be solved numerically in orders of magnitude less time than stochastic simulations.

We note that although information propagation in biological networks is impeded by numerous mechanisms collectively modeled as noise, here we focus on the inescapable or “intrinsic” noise resulting from the finite copy number of the molecules. One should consider these results, then, as an upper bound on information propagation, further hampered by, for example, cell division [33]; spatial effects [20,21]; active degradation of the constituents (here modeled via a constant degradation rate); and other complicating mechanisms particular to specific systems. Additionally, here we focus on steady-state solutions, but extensions to dynamics are straightforward and currently being pursued.

We begin with a model of a multistate birth-death process, a special case of which has been used to describe transcriptional bursting [17,34]. We illustrate how the spectral method reduces the model to a simple iterative algebraic equation, and in the appropriate limiting case recovers the known analytic results. We also use this section to introduce the basic notation used throughout the paper. Next we explore the problem of gene regulation in detail. The main idea behind the spectral method is the exploitation of an underlying natural basis for a problem which we can solve exactly. We explore four different spectral representations of the regulation model used in previous work [31] that arise from four natural choices of eigenbasis in which to expand the solution (cf. Sec. II A and Fig. 2). All representations reduce the master equation to a set of linear algebraic equations, and one admits an analytic solution by virtue of the tridiagonal matrix algorithm. We compare the efficiencies of the representations’ numerical implementations and show that all outperform simulation. Lastly, we apply the spectral method to a model that combines bursting and regulation. We obtain a linear algebraic expression that permits large speedup over simulation and thus admits optimization of information transmission. Optimization reveals two types of solutions: a unimodal response when the rates of switching between expression states are comparable to degradation rates and a bimodal response when switching rates are much slower than degradation rates.

FIG. 2 — Summary of the bases discussed in Sec. II A (black) and their gauge freedoms (barred parameters in gray). In the top row, neither the m nor the n sector is expanded in an eigenbasis; in the middle row, one sector is expanded; and in the bottom row, both sectors are expanded. The |*j,k*〉 basis can be viewed as a special case of the |*j,k_j*〉 or |*j,k_n*〉 basis with *q‾_j*=q‾ or *q‾_n*=q‾, respectively. The |*j,m*〉 basis is not discussed as it is not useful in simplifying the problem.

II. BURSTS OF GENE EXPRESSION

We first consider a model of gene expression in which a gene exists in one of Z stochastic “states,” i.e., protein production obeys a simple birth-death process but with a state-dependent birth rate. In the special case of Z=2, this corresponds to a gene existing in an on or an off state due, for example, to the binding and unbinding of the RNA polymerase. Such a model has been used to describe transcriptional bursting [17,34], and we specialize to this case in Sec. II C.

For a general Z-state process, the master equation

{\dot{p}}_{n}^{z} = g_{z} p_{n - 1}^{z} + (n + 1) p_{n + 1}^{z} - (g_{z} + n) p_{n}^{z} + \sum_{z^{'}} Ω_{z z^{'}} p_{n}^{z^{'}}

(1)

describes the time evolution of the joint probability distribution $p_{n}^{z}$ , where z specifies the state (1≤z≤Z), n is the number of proteins, g_z is the production rate in state z, Ω_zz’ is a stochastic matrix of transition rates between states, and ${\dot{p}}_{n}^{z}$ denotes differentiation of the probability distribution with respect to time. Time and all rates have been nondimensionalized by the protein degradation rate. Note that conservation of probability requires

\sum_{z} Ω_{z z^{'}} = 0 .

(2)

The relationship between the transition rates in Ω_zz’ and the probabilities $π_{z} = Σ_{n} p_{n}^{z}$ of being in the zth state can be seen by summing Eq. (1) over n; at steady state one obtains

\sum_{z^{'}} Ω_{z z^{'}} π_{z^{'}} = 0,

(3)

and normalization requires

\sum_{z} π_{z} = 1 .

(4)

In the following sections, we introduce the spectral method and demonstrate how it can be used to solve for the full joint distribution $p_{n}^{z}$ .

A. Notation and definitions

We begin our solution of Eq. (1) by defining the generating function [22] $G_{z} (x) = Σ_{n} p_{n}^{z} x^{n}$ over complex variable x [35] (note that superscript z is an index, while superscript n on xⁿ is a power). It will prove more convenient to rewrite the generating function in a more abstract representation using states indexed by protein number |n〉,

∣ G_{z} 〉 = \sum_{n} p_{n}^{z} ∣ n 〉,

(5)

with inverse transform

p_{n}^{z} = 〈 n ∣ G_{z} 〉 .

(6)

The more familiar form can be recovered by projecting the position space 〈x| onto Eq. (5), with the provision that

〈 x ∣ n 〉 = x^{n} .

(7)

Concurrent choices of conjugate state

〈 n ∣ x 〉 = \frac{1}{x^{n + 1}}

(8)

and inner product

〈 f ∣ f^{'} 〉 = \oint \frac{d x}{2 π i} 〈 f ∣ x 〉 〈 x ∣ f^{'} 〉

(9)

ensure orthonormality of the states, 〈n|n’〉=δ_nn’, as can be verified using Cauchy’s theorem,

\oint \frac{d x}{2 π i} \frac{f (x)}{{(x - x_{0})}^{n + 1}} = \frac{1}{n!} \partial_{x}^{n} {[f (x)]}_{x = x_{0}} θ (n + 1),

(10)

where the convention θ(0)=0 is used for the Heaviside function.

With these definitions, summing Eq. (1) over n against |n〉 yields

∣ {\dot{G}}_{z} 〉 = - ({\hat{a}}^{+} - 1) ({\hat{a}}^{-} - g_{z}) ∣ G_{z} 〉 + \sum_{z^{'}} Ω_{z z^{'}} ∣ G_{z^{'}} 〉,

(11)

where the operators â⁺ and â⁻ raise and lower protein number, respectively, i.e.,

{\hat{a}}^{+} ∣ n 〉 = ∣ n + 1 〉,

(12)

{\hat{a}}^{-} ∣ n 〉 = n ∣ n - 1 〉,

(13)

with adjoint operations

〈 n ∣ {\hat{a}}^{+} = 〈 n - 1 ∣,

(14)

〈 n ∣ {\hat{a}}^{-} = (n + 1) 〈 n + 1 ∣ .

(15)

As in the operator treatment of the simple harmonic oscillator in quantum mechanics, the raising and lower operators satisfy the commutation relation [â⁻,â⁺]=1 and â⁺â⁻ is a number operator, i.e., â⁺â⁻|n〉=n|n〉 [36]. This operator formalism for the generating function was introduced in the context of diffusion independently by Doi [37] and Zel’dovich [38] and developed by Peliti [39]. We note that all results can equivalently be obtained by remaining in x space (for example, â⁺↔x and â⁻↔∂_x). However, the raising and lowering operators define an extremely simple algebra which allows us to calculate all projections without explicitly computing overlap integrals (as shown, for example, in Appendix A). A review by Mattis and Glasser [40] introduces and discusses the applications of the formalism for diffusion.

The factorized form of the birth-death operator in Eq. (11) suggests the definition of shifted raising and lowering operators

{\hat{b}}^{+} = {\hat{a}}^{+} - 1,

(16)

{\hat{b}}_{z}^{-} = {\hat{a}}^{-} - g_{z},

(17)

making Eq. (11)

∣ {\dot{G}}_{z} 〉 = - \hat{b} + {\hat{b}}_{z}^{-} ∣ G_{z} 〉 + \sum_{z^{'}} Ω_{z z^{'}} ∣ G_{z^{'}} 〉 .

(18)

Since b̂⁺ and ${\hat{b}}_{z}^{-}$ are simply shifted from â⁺ and â⁻ by constants, ${\hat{b}}^{+} {\hat{b}}_{z}^{-}$ is a new number operator, and its eigenvalues are nonnegative integers j, i.e.,

\hat{b} + {\hat{b}}_{z}^{-} ∣ j_{z} 〉 = j ∣ j_{z} 〉,

(19)

where j indexes (z-dependent) eigenfunctions |j_z〉. In position space, b̂⁺ and ${\hat{b}}_{z}^{-}$ are x−1 and ∂_x−g_z, respectively. Projecting 〈x| onto Eq. (19) therefore yields a first-order ordinary differential equation whose solution (up to normalization) is

〈 x ∣ j_{z} 〉 = {(x - 1)}^{j} e^{g_{z} (x - 1)},

(20)

with conjugate

〈 j_{z} ∣ x 〉 = \frac{e^{- g_{z} (x - 1)}}{{(x - 1)}^{j + 1}},

(21)

such that orthonormality $〈 j_{z} ∣ j_{z}^{'} 〉 = δ_{j j^{'}}$ is satisfied under the inner product in Eq. (9). Note that b̂⁺ and ${\hat{b}}_{z}^{-}$ raise and lower eigenstates |j_z〉 as in Eqs. (12)–(15), i.e.,

{\hat{b}}^{+} ∣ j_{z} 〉 = ∣ {(j + 1)}_{z} 〉,

(22)

{\hat{b}}_{z}^{-} ∣ j_{z} 〉 = j ∣ {(j - 1)}_{z} 〉,

(23)

〈 j_{z} ∣ {\hat{b}}^{+} = 〈 {(j - 1)}_{z} ∣,

(24)

〈 j_{z} ∣ {\hat{b}}_{z}^{-} = (j + 1) 〈 {(j + 1)}_{z} ∣ .

(25)

As we will see in this and subsequent sections, going between the protein number basis |n〉 and the eigenbasis |j_z〉 requires the mixed product 〈n|j_z〉 or its conjugate 〈j_z|n〉. There are several ways of computing these objects, as described in Appendix A. Notable special cases are

〈 {(j = 0)}_{z} ∣ n 〉 = 1,

(26)

〈 n ∣ {(j = 0)}_{z} 〉 = e^{- g_{z}} \frac{{(g_{z})}^{n}}{n!},

(27)

where the latter is the Poisson distribution.

B. Spectral method

We now demonstrate how the spectral method exploits the eigenfunctions |j_z〉 to decompose and simplify the equation of motion. Expanding the generating function in the eigenbasis,

∣ G_{z} 〉 = \sum_{j} G_{j}^{z} ∣ j_{z} 〉,

(28)

and projecting the conjugate state 〈j_z| onto Eq. (18) yields the equation of motion

{\dot{G}}_{j}^{z} = - j G_{j}^{z} + \sum_{z^{'}} Ω_{z z^{'}} \sum_{j^{'}} G_{j^{'}}^{z^{'}} 〈 j_{z} ∣ j_{z^{'}}^{'} 〉

(29)

for the expansion coefficients $G_{j}^{z}$ [where the dummy index j in Eq. (28) has been changed to j’ in Eq. (29)]. Using Eqs. (9), (10), (20), and (21), the product $〈 j_{z} ∣ j_{z^{'}}^{'} 〉$ evaluates to

〈 j_{z} ∣ j_{z^{'}}^{'} 〉 = \frac{{(- Δ_{z z^{'}})}^{j - j^{'}}}{(j - j^{'})!} θ (j - j^{'} + 1),

(30)

where Δ_zz’=g_z−g_z’. Noting that 〈j_z|j_z’〉=1 and that $〈 j_{z} ∣ j_{z}^{'} 〉 = 0$ for j’<j, Eq. (29) becomes

{\dot{G}}_{j}^{z} = - j G_{j}^{z} + \sum_{z^{'}} Ω_{z z^{'}} G_{j}^{z^{'}} + \sum_{z^{'} \neq z} Ω_{z z^{'}} \sum_{j^{'} < j} G_{j^{'}}^{z^{'}} \frac{{(- Δ_{z z^{'}})}^{j - j^{'}}}{(j - j^{'})!} .

(31)

The last term in Eq. (31) makes clear that each jth term is slaved to terms with j’<j, allowing the $G_{j}^{z}$ to be computed iteratively in j. The lower-triangular structure of the equation is a consequence of rotating to the eigenspace of the birth-death operator; this structure was not present in the original master equation.

At steady state, $G_{j}^{z}$ obeys

j G_{j}^{z} - \sum_{z^{'}} Ω_{z z^{'}} G_{j}^{z^{'}} = \sum_{z^{'} \neq z} Ω_{z z^{'}} \sum_{j^{'} < j} G_{j^{'}}^{z^{'}} \frac{{(- Δ_{z z^{'}})}^{j - j^{'}}}{(j - j^{'})!},

(32)

from which it is clear that $G_{j}^{z}$ can be computed successively in j. Since

G_{0}^{z} = 〈 {(j = 0)}_{z} ∣ G_{z} 〉 = \sum_{n} p_{n}^{z} 〈 {(j = 0)}_{z} ∣ n 〉 = π_{z}

(33)

[cf. Eq. (26)], the computation is initialized using Eqs. (3) and (4), i.e.,

\sum_{z^{'}} Ω_{z z^{'}} G_{0}^{z^{'}} = 0,

(34)

\sum_{z} G_{0}^{z} = 1 .

(35)

Recalling Eqs. (6) and (28), the probability distribution is retrieved via

p_{n}^{z} = \sum_{j} G_{j}^{z} 〈 n ∣ j_{z} 〉,

(36)

where the mixed product 〈n|j_z〉 can be computed as described in Appendix A.

There is an alternative way to decompose the master equation spectrally. Instead of expanding the generating function in eigenfunctions |j_z〉, which depend on the production rates g_z in each state, we may expand in eigenfunctions parametrized by a single rate g‾, i.e.,

∣ G_{z} 〉 = \sum_{j} G_{j}^{z} ∣ j 〉,

(37)

where

〈 x ∣ j 〉 = {(x - 1)}^{j} e^{\overset{‒}{g} (x - 1)},

(38)

〈 j ∣ x 〉 = \frac{e^{- \overset{‒}{g} (x - 1)}}{{(x - 1)}^{j + 1}} .

(39)

The parameter g‾ is arbitrary and thus acts as a “gauge” freedom.

We may now partition the birth-death operator as

- {\hat{b}}^{+} {\hat{b}}_{z}^{-} = - {\hat{b}}^{+} {\overset{‒}{b}}^{-} - {\hat{b}}^{+} Γ_{z}

(40)

where b‾⁻=â⁻−g‾ such that the |j〉 are the eigenstates of b̂⁺b‾⁻, i.e.,

{\hat{b}}^{+} {\overset{‒}{b}}^{-} ∣ j 〉 = j ∣ j 〉,

(41)

and Γ_z=g‾−g_z describes the deviation of each state’s production rate from the constant g‾.

Projecting the conjugate state 〈j| onto Eq. (18) and using Eq. (37) (with dummy index j changed to j’) gives

{\dot{G}}_{j}^{z} = - j G_{j}^{z} + \sum_{j^{'}} 〈 j ∣ \hat{b} + Γ_{z} ∣ j^{'} 〉 G_{j^{'}}^{z} + \sum_{z^{'}} Ω_{z z^{'}} G_{j}^{z^{'}} .

(42)

Recalling Eq. (24), Eq. (42) at steady state becomes

\sum_{z^{'}} (Ω_{z z^{'}} - j δ_{z z^{'}}) G_{j}^{z^{'}} = Γ_{z} G_{j - 1}^{z} .

(43)

Equation (43) is subdiagonal in j, meaning computation of the jth term requires only the previous (j−1)th term and the inversion of the Z-by-Z matrix (Ω−jI) (where I is the identity matrix). It is initialized with Eqs. (34) and (35) and solved successively in j. The probability distribution is retrieved via

p_{n}^{z} = \sum_{j} G_{j}^{z} 〈 n ∣ j 〉,

(44)

where 〈n|j〉 is computed as described in Appendix A.

As an example of a simple computation employing the spectral method, Fig. 1 shows probability distributions for the case of Z=3 states, corresponding to a gene that is either off, producing proteins at a low rate, or producing proteins at a high rate. For simplicity we set the rates of switching among all states equal to a constant ω, making the stochastic matrix

Ω = (\begin{matrix} - 2 ω & ω & ω \\ ω & - 2 ω & ω \\ ω & ω & - 2 ω \end{matrix}) .

(45)

As seen in Fig. 1, when ω≪1 (corresponding to a switching rate much slower than the degradation rate) the dwell time in each expression state lengthens. The slow switching gives the protein copy number time to equilibrate in any of the three expression states, resulting in a trimodal marginal distribution p_n. When ω≫1 (corresponding to a switching rate much faster than the degradation rate), the system switches frequently among the three expression states, resulting in an average production rate. In this limit, the expression state equilibrates on a faster time scale than the protein number state.

C. On/off gene

For the special case of Z=2 states, as when a gene is either “on” (z=+) or “off” (z=−), it is useful to demonstrate how the spectral method reproduces known analytic results [17,34]. The probability distribution can be written in vector form as

{\vec{p}}_{n} = (\begin{matrix} p_{n}^{-} \\ p_{n}^{+} \end{matrix}),

(46)

and defining ω₊ and ω₋ as the transition rates to and from the on state, respectively, the stochastic matrix takes the form

Ω = (\begin{matrix} - ω_{+} & ω_{-} \\ ω_{+} & - ω_{-} \end{matrix}) .

(47)

Note that Eq. (3) implies

\frac{π_{-}}{π_{+}} = \frac{ω_{-}}{ω_{+}},

(48)

which makes clear that increasing the rate of transition to either state increases the probability of being in that state.

From Eq. (32) the spectral expansion coefficients obey

j G_{j}^{\pm} + ω_{\mp} G_{j}^{\pm} - ω_{\pm} G_{j}^{\mp} = ω_{\pm} \sum_{j^{'} < j} G_{j^{'}}^{\mp} \frac{{(\mp Δ)}^{j - j^{'}}}{(j - j^{'})!},

(49)

where Δ=Δ₊₋=−Δ₋₊. Initializing with $G_{0}^{\pm} = ω_{\pm} ∕ (ω_{+} + ω_{-})$ and computing the first few terms reveals the pattern

G_{j}^{\pm} = \frac{ω_{\pm}}{ω_{+} + ω_{-}} \frac{{(\mp Δ)}^{j}}{j!} \frac{\prod_{j^{'} = 0}^{j - 1} (j^{'} + ω_{\mp})}{\prod_{j^{″} = 0}^{j - 1} (j^{″} + ω_{+} + ω_{-} + 1)}

(50)

= \frac{ω_{\pm}}{ω_{+} + ω_{-}} \frac{{(\mp Δ)}^{j}}{j!} \frac{Γ (j + ω_{\mp})}{Γ (ω_{\mp})} \frac{Γ (ω_{+} + ω_{-} + 1)}{Γ (j + ω_{+} + ω_{-} + 1)},

(51)

where in the second line the products are written in terms of the Gamma function.

For comparison with known results [17,34] we write the total generating function |G〉=Σ_±|G_±〉 in position space,

G (x) = \sum_{\pm} 〈 x ∣ G_{\pm} 〉 = \sum_{\pm} \sum_{j} 〈 x ∣ j_{\pm} 〉 〈 j_{\pm} ∣ G_{\pm} 〉

(52)

= \sum_{\pm} \sum_{j} {(x - 1)}^{j} e^{g_{\pm} (x - 1)} G_{j}^{\pm}

(53)

= \sum_{\pm} \frac{ω_{\pm}}{ω_{+} + ω_{-}} e^{g_{\pm} (x - 1)} \times Φ [ω_{\mp}, ω_{+} + ω_{-} + 1; \mp Δ (x - 1)],

(54)

where

Φ [α, β; y] = \sum_{j} \frac{Γ (j + α)}{Γ (α)} \frac{Γ (β)}{Γ (j + β)} \frac{y^{j}}{j!}

(55)

is the confluent hypergeometric function. As shown in Appendix B, in the limit g₋=0, Eq. (54) reduces to

G (x) = Φ [ω_{+}, ω_{+} + ω_{-}; g_{+} (x - 1)],

(56)

and the marginal p_n is given by

p_{n} = \frac{g_{+}^{n}}{n!} \frac{Γ (n + ω_{+})}{Γ (ω_{+})} \frac{Γ (ω_{+} + ω_{-})}{Γ (n + ω_{+} + ω_{-})} \times Φ [ω_{+} + n, ω_{+} + ω_{-} + n; - g_{+}],

(57)

in agreement with the results of Iyer-Biswas et al. [34] and Raj et al. [17]. We remind the reader that in addition to reducing to known results in the special case of Z=2 states with a vanishing off-rate, the spectral method is valid for any number of states with arbitrary production rates.

III. GENE REGULATION

Next we consider a two-gene regulatory cascade, in which the production rate of the second gene is a function of the number of proteins of the first gene. As shown in previous work [31], a cascade of any length can be reduced to such a generalized two-dimensional system using the Markov approximation, which asserts that the probability distribution for a given node of the cascade should depend only weakly on the probability distributions of the distant nodes given the proximal nodes.

In the present section, we consider only the generalized two-dimensional equation and explore different approaches to solving it. The equation describes two genes, each with one expression state, with regulation encoded by a functional dependence of the downstream protein production rate on the upstream protein copy number. In Sec. III D we make an explicit connection between the on/off gene discussed in Sec. II C and the case when the functional dependence is a threshold. Finally, in Sec. III we combine the two types of models and consider a system with regulation and bursts.

A. Representations of the master equation

1. |n,m〉 basis

Defining n and m as the numbers of proteins produced by the first and second gene, respectively, the master equation describing the time evolution of the joint probability distribution p_nm is [31]

{\dot{p}}_{n m} = g_{n - 1} p_{n - 1, m} + (n + 1) p_{n + 1, m} - (g_{n} + n) p_{n m} + ρ [q_{n} p_{n, m - 1} + (m + 1) p_{n, m + 1} - (q_{n} + m) p_{n m}] .

(58)

The function q_n describes the regulation of the second species by the first, and the function g_n describes the effective autoregulation of the first species due either to a non-Poissonian input distribution or to effects further upstream in the case of a longer cascade [31]. Time is rescaled by the first gene’s degradation rate so that each gene’s production rate (g_n or q_n) is normalized by its respective degradation rate, and ρ is ratio of the second gene’s degradation rate to that of the first. We impose no constraints on the form of g_n or q_n—they can be arbitrary nonlinear functions.

Summing Eq. (58) over m gives a simple recursion relation between g_n and p_n at steady state, from which explicit relations are easily identified. If p_n is known, g_n is found as

g_{n} = (n + 1) \frac{p_{n + 1}}{p_{n}} .

(59)

If on the other hand g_n is known, p_n is found as

p_{n} = \frac{p_{0}}{n!} \prod_{n^{'} = 0}^{n - 1} g_{n^{'}},

(60)

with p₀ set by normalization. Note that if the first species obeys a simple birth-death process, g_n=g=constant, and Eq. (60) reduces to the Poisson distribution.

In the current representation [Eq. (58)], which we denote the |n,m〉 basis, finding the steady-state solution for the joint distribution p_nm means finding the null space of an infinite (or, effectively for numerical purposes, very large) locally banded tridiagonal matrix. More precisely, defining N as the numerical cutoff in protein number n or m, the problem amounts to inverting an N²-by-N² matrix, which is computationally taxing even for moderate cutoffs N.

In order to solve Eq. (58) more efficiently we will employ the spectral method. We begin as before by defining the generating function [22] G(x, y)=Σ_nm p_nmxⁿy^m over complex variables x and y or, in state notation,

∣ G 〉 = \sum_{n m} p_{n m} ∣ n, m 〉,

(61)

with inverse transform

p_{n m} = 〈 n, m ∣ G 〉 .

(62)

Summing Eq. (58) over n and m against |n,m〉 and employing the same operator notation as in Eqs. (12)–(15) yields

∣ \dot{G} 〉 = - \hat{H} ∣ G 〉,

(63)

where

\hat{H} = {\hat{b}}_{n}^{+} {\hat{b}}_{n}^{-} (n) + ρ {\hat{b}}_{m}^{+} {\hat{b}}_{m}^{-} (n)

(64)

and

{\hat{b}}_{n}^{+} = {\hat{a}}_{n}^{+} - 1,

(65)

{\hat{b}}_{m}^{+} = {\hat{a}}_{m}^{+} - 1,

(66)

{\hat{b}}_{n}^{-} (n) = {\hat{a}}_{n}^{-} - {\hat{g}}_{n},

(67)

{\hat{b}}_{m}^{-} (n) = {\hat{a}}_{m}^{-} - {\hat{q}}_{n} .

(68)

Here the regulation functions have been promoted to operators obeying ĝ_n|n〉=g_n|n〉 and q̂_n|n〉=q_n|n〉, subscripts on operators denote the sector (n or m) on which they operate, and the arguments of ${\hat{b}}_{n}^{-}$ and ${\hat{b}}_{m}^{-}$ remind us that both are n dependent.

Equation (64) makes clear that if not for the n dependence of the operators the Hamiltonian Ĥ would be diagonalizable, or, equivalently, if g_n and q_n were constants the master equation would factorize into two individual birth-death processes. We may still, however, partition the full Hamiltonian as

\hat{H} = {\hat{H}}_{0} + {\hat{H}}_{1},

(69)

where Ĥ₀ is a diagonalizable part (and Ĥ₁ is the corresponding deviation from the diagonal form), and expand |G〉 in the eigenbasis of Ĥ₀ to exploit the diagonality.

As with the multistate system in Sec. II B, where we expand the solution in two different bases, there are several natural choices of eigenbasis of Ĥ₀. Figure 2 summarizes these choices diagrammatically: starting in the |n,m〉 basis (at the top of Fig. 2), one may expand in eigenfunctions either the first species, yielding the |j,m〉 basis (left), or the second species, yielding the |n,k_n〉 basis (right; in general we allow the parameter defining the second species’ eigenfunctions to depend on n to reflect the regulation of the second species by the first). From either the |j,m〉 or the |n,k_n〉 basis, one may expand in eigenfunctions the remaining species, yielding either the |j,k_n〉 basis (bottom left), in which the second species’ eigenfunctions depend on the first species’ copy number n or the |j,k_j〉 basis (bottom right), in which the second species’ eigenfunctions depend on the first species’ eigenmode number j. Both bases reduce to the |j,k〉 basis (bottom center) when the parameter of the second species’ eigenfunctions is a constant.

The |n,m〉 and |j,m〉 bases are less numerically useful than the other bases: as discussed above and detailed in Fig. 3, the |n,m〉 basis is numerically inefficient; and the |j,m〉 basis does not exploit the natural structure of the problem, since, unlike the other bases, it neither retains the tridiagonal structure in n nor gains a lower triangular structure in k (see the sections below). Each of the remaining bases, however, has preferable properties in terms of numerical stability and ability to represent the function sparsely yet accurately (either using a few values of n in the |n,k_n〉 basis or a few values of j in the |j,k〉, |j,k_j〉, or |j,k_n〉 bases, for example). Moreover, the equation of motion simplifies differently in each of the different bases. We present the derivations of the equations of motion in the following sections, beginning with the |j,k〉 basis, generalizing to the |j,k_j〉 basis, moving to the |n,k_n〉 basis, and ending with the |j,k_n〉 basis.

2. |j,k〉 basis

For expository purposes we start by recalling the spectral representation used in previous work [31]. We choose the diagonal part of the Hamiltonian to correspond to two birth-death process with constant production rates g‾ and q‾,

{\hat{H}}_{0} = {\hat{b}}_{n}^{+} {\overset{‒}{b}}_{n}^{-} + ρ {\hat{b}}_{m}^{+} {\overset{‒}{b}}_{m}^{-},

(70)

where ${\hat{b}}_{n}^{-} = {\hat{a}}_{n}^{-} - \overset{‒}{g}$ and ${\hat{b}}_{m}^{-} = {\hat{a}}_{m}^{-} - \overset{‒}{q}$ . As in Sec. II B, the gauge parameters g‾ and q‾ can be set arbitrarily without affecting the final solution; however, their values can affect the numerical stability of the method. For example, when the regulation q_n is a threshold function, a large discontinuity can narrow the range of q‾ for which the method is numerically stable. The nondiagonal part,

{\hat{H}}_{1} = {\hat{b}}_{n}^{+} {\hat{Γ}}_{n} + ρ {\hat{b}}_{m}^{+} {\hat{Δ}}_{n},

(71)

captures the deviations Γ̂_n=g‾−ĝ_n and Δ̂_n=q‾−q̂_n of the regulation functions from the constant rates. We expand the generating function as

∣ G 〉 = \sum_{j k} G_{j k} ∣ j, k 〉,

(72)

where |j,k〉 is the eigenbasis of Ĥ₀, i.e.,

{\hat{H}}_{0} ∣ j, k 〉 = (j + ρ k) ∣ j, k 〉 .

(73)

The eigenbasis is parametrized by the rates g‾ and q‾, meaning in position space 〈x|j〉 is as in Eq. (38) and similarly for 〈y|k〉 with x→y, j→k, and g‾→q‾.

With Eqs. (70)–(72), projecting the conjugate state 〈j,k| onto Eq. (63) yields

{\dot{G}}_{j k} = - (j + ρ k) G_{j k} - \sum_{j^{'} k^{'}} 〈 j ∣ {\hat{b}}_{n}^{+} {\hat{Γ}}_{n} ∣ j^{'} 〉 〈 k ∣ k^{'} 〉 G_{j^{'} k^{'}} - ρ \sum_{j^{'} k^{'}} 〈 k ∣ {\hat{b}}_{m}^{+} ∣ k^{'} 〉 〈 j ∣ {\hat{Δ}}_{n} ∣ j^{'} 〉 G_{j^{'} k^{'}}

(74)

[where the dummy indices j and k in Eq. (72) have been changed to j’ and k’, respectively, in Eq. (74)]. Recalling Eq. (24) and restricting attention to steady state, Eq. (74) becomes

0 = - (j + ρ k) G_{j k} - \sum_{j^{'}} Γ_{j - 1, j^{'}} G_{j^{'} k} - ρ \sum_{j^{'}} Δ_{j j^{'}} G_{j^{'}, k - 1},

(75)

where the deviations have been rotated into the eigenbasis as

Γ_{j j^{'}} = 〈 j ∣ {\hat{Γ}}_{n} ∣ j^{'} 〉 = \sum_{n} 〈 j ∣ n 〉 (\overset{‒}{g} - g_{n}) 〈 n ∣ j^{'} 〉,

(76)

Δ_{j j^{'}} = 〈 j ∣ {\hat{Δ}}_{n} ∣ j^{'} 〉 = \sum_{n} 〈 j ∣ n 〉 (\overset{‒}{q} - q_{n}) 〈 n ∣ j^{'} 〉 .

(77)

Equation (75) is subdiagonal in k and is therefore similar to Eq. (43) in that the last term acts as a source term. The subdiagonality is a consequence of the topology of the two-gene network: the first species regulates only itself (effectively) and the second species. Although the spectral method is fully applicable to systems with feedback, the subdiagonal structure is not preserved.

Equation (75) is initialized using

G_{j 0} = 〈 j, k = 0 ∣ G 〉 = \sum_{n m} p_{n m} 〈 j ∣ n 〉 〈 0 ∣ m 〉

(78)

= \sum_{n} p_{n} 〈 j ∣ n 〉,

(79)

[cf. Eq. (26)] with known p_n [cf. Eq. (60)], then solved at each subsequent k using the result for k−1. Equation (75) can be written in linear algebraic notation as

{\vec{G}}_{k} = - ρ {(D^{k} + S^{-} Γ)}^{- 1} Δ {\vec{G}}_{k - 1},

(80)

where ${\vec{G}}_{k}$ is a vector over j, bold denotes matrices, and $D_{j j^{'}}^{k} = (j + ρ k) δ_{j j^{'}}$ and $S_{j j^{'}}^{-} = δ_{j - 1, j^{'}}$ are diagonal and subdiagonal matrices, respectively. Equation (80) makes clear that the solution involves only matrix multiplication and the inversion of a J-by-J matrix K times, where J and K are cutoffs in the eigenmode numbers j and k, respectively. In fact, if the first species obeys a simple birth-death process, i.e., g_n=g=constant, setting g‾=g makes Γ_jj’=0, and since D^k is diagonal, the solution involves only matrix multiplication. The decomposition of the master equation into a linear algebraic equation results in huge gains in efficiency over direct solution in the |n,m〉 basis; the efficiency of all bases presented in this section is described in Sec. II B and illustrated in Fig. 3.

Recalling Eqs. (62) and (72), the joint distribution is retrieved from G_jk via the inverse transform

p_{n m} = \sum_{j k} 〈 n ∣ j 〉 G_{j k} 〈 m ∣ k 〉,

(81)

a computation again involving only matrix multiplication. The mixed product matrices 〈n|j〉 and 〈m|k〉 are computed as described in Appendix A.

3. |j,k_j〉 basis

The |j,k〉 basis treats both genes similarly by expanding each around a constant production rate. We may instead imagine an eigenbasis that more closely reflects the underlying asymmetry imposed by the regulation and make the basis of the second gene a function of that of the first. That is, we expand the first gene in a basis |j〉 with gauge g‾ as before, but now we expand the second gene in a basis |k_j〉 with a j-dependent local gauge q‾_j. We write the generating function as

∣ G 〉 = \sum_{j k} G_{j k} ∣ j, k_{j} 〉,

(82)

and Eq. (63) at steady state becomes

0 = - \sum_{j k} G_{j k} [{\hat{H}}_{0} (j) + {\hat{H}}_{1} (j)] ∣ j, k_{j} 〉,

(83)

where we have partitioned the Hamiltonian for each j as

{\hat{H}}_{0} (j) = {\hat{b}}_{n}^{+} {\overset{‒}{b}}_{n}^{-} + ρ {\hat{b}}_{m}^{+} {\overset{‒}{b}}_{m}^{-} (j),

(84)

{\hat{H}}_{1} (j) = {\hat{b}}_{n}^{+} {\hat{Γ}}_{n} + ρ {\hat{b}}_{m}^{+} {\hat{Δ}}_{n} (j),

(85)

with ${\overset{‒}{b}}_{n}^{-} = {\hat{a}}_{n}^{-} - \overset{‒}{g}$ and Γ̂_n=g‾−ĝ_n as before and now ${\overset{‒}{b}}_{m}^{-} (j) = {\hat{a}}_{m}^{-} - {\overset{‒}{q}}_{j}$ and Δ̂_n=q‾_j−q̂_n. Note that this basis enjoys the eigenvalue equation

{\hat{H}}_{0} (j) ∣ j, k_{j} 〉 = (j + ρ k) ∣ j, k_{j} 〉 .

(86)

Projecting the conjugate state 〈j,k_j| onto Eq. (83) yields, after some simplification (cf. Appendix C), the equation of motion

0 = (j + ρ k) G_{j k} + \sum_{j^{'}} Γ_{j - 1, j^{'}} G_{j^{'} k} + \sum_{ℓ = 1}^{k} \sum_{j^{'}} (Γ_{j - 1, j^{'}} V_{j j^{'}}^{ℓ} + ρ Δ_{j j^{'}} V_{j j^{'}}^{ℓ - 1}) G_{j^{'}, k - ℓ},

(87)

where Γ_jj’ is as in Eq. (76) and

Δ_{j j^{'}} = 〈 j ∣ {\hat{Δ}}_{n} (j) ∣ j^{'} 〉 = \sum_{n} 〈 j ∣ n 〉 ({\overset{‒}{q}}_{j} - q_{n}) 〈 n ∣ j^{'} 〉,

(88)

V_{j j^{'}}^{ℓ} = \frac{{(- Q_{j j^{'}})}^{ℓ}}{ℓ!},

(89)

with Q_jj’=q‾_j−q‾_j’. Equation (87) can be written linear algebraically as

{\vec{G}}_{k} = - {(D^{k} + S^{-} Γ)}^{- 1} \sum_{ℓ = 1}^{k} [(S^{-} Γ) * V^{ℓ} + ρ Δ * V^{ℓ - 1}] {\vec{G}}_{k - ℓ},

(90)

where $D_{j j^{'}}^{k}$ and $S_{j j^{'}}^{-}$ are defined as before [cf. Eq. (80)], and * denotes an element-by-element matrix product. Once again, Eq. (90) is lower-triangular in k and requires only matrix multiplication and the inversion of a J-by-J matrix K times. ${\vec{G}}_{k}$ is initialized as in Eq. (79), and the kth term is computed from the previous k’<k terms. The joint distribution is retrieved via the inverse transform

p_{n, m} = \sum_{j k} 〈 n ∣ j 〉 G^{j k} 〈 m ∣ k_{j} 〉 .

(91)

4. |n,k_n〉 basis

Expansion in either the |j,k〉 or the |j,k_j〉 basis conveniently turns the master equation into a lower-triangular linear algebraic equation and replaces cutoffs in protein number with cutoffs in eigenmode number (which can be smaller with appropriate choices of gauge). However, these bases sacrifice the original tridiagonal structure of the master equation in the copy number of first gene, n. Therefore we now consider a mixed representation, in which the first gene remains in protein number space |n〉, and we expand the second gene in an n-dependent eigenbasis |k_n〉,

∣ G 〉 = \sum_{n k} G_{n k} ∣ n, k_{n} 〉 .

(92)

If the rate parameter of the |k_n〉 basis were the regulation function q_n, it would be natural to make Ĥ₀ the entire Hamiltonian [Eq. (64)]. For generality we will instead allow the rate parameter of the |k_n〉 basis to be an arbitrary n-dependent local gauge q‾_n, such that Eq. (63) at steady state naturally partitions as

0 = - \sum_{n k} G_{n k} [{\hat{H}}_{0} (n) + {\hat{H}}_{1} (n)] ∣ n, k_{n} 〉,

(93)

where

{\hat{H}}_{0} (n) = {\hat{b}}_{n}^{+} {\hat{b}}_{n}^{-} + ρ {\hat{b}}_{m}^{+} {\overset{‒}{b}}_{m}^{-} (n),

(94)

{\hat{H}}_{1} (n) = ρ {\hat{b}}_{m}^{+} {\hat{Δ}}_{n} (n),

(95)

with ${\overset{‒}{b}}_{m}^{-} (n) = {\hat{a}}_{m}^{-} - {\overset{‒}{q}}_{n}$ and Δ̂_n(n)=q‾_n−q̂_n. Note that |n,k_n〉 is not the eigenbasis of Ĥ₀(n) but rather Ĥ₀(n)|n,k_n〉 retains the original tridiagonal structure in n, i.e.,

{\hat{H}}_{0} (n) ∣ n, k_{n} 〉 = (g_{n} + n) ∣ n, k_{n} 〉 - g_{n} ∣ n + 1, k_{n} 〉 - n ∣ n - 1, k_{n} 〉 + ρ k ∣ n, k_{n} 〉,

(96)

where Eqs. (12), (13), (65), and (67) are recalled in applying the first term of Ĥ₀(n).

Projecting the conjugate state 〈n,k_n| onto Eq. (93) yields, after some simplification (cf. Appendix C), the equation of motion

g_{n - 1} G_{n - 1, k} + (n + 1) G_{n + 1, k} - (g_{n} + n + ρ k) G_{n k} = - g_{n - 1} \sum_{ℓ = 1}^{k} V_{n ℓ}^{-} G_{n - 1, k - ℓ} - (n + 1) \times \sum_{ℓ = 1}^{k} V_{n ℓ}^{+} G_{n + 1, k - ℓ} + ρ Δ_{n} G_{n, k - 1},

(97)

where

Δ_{n} = {\overset{‒}{q}}_{n} - q_{n},

(98)

V_{n ℓ}^{\pm} = \frac{{(- Q_{n}^{\pm})}^{ℓ}}{ℓ!},

(99)

and $Q_{n}^{\pm} = {\overset{‒}{q}}_{n} - {\overset{‒}{q}}_{n \pm 1}$ . Equation (97) can be written linear algebraically as

{\vec{G}}_{k} = {(T^{k})}^{- 1} {- (S^{-} \vec{g}) * diag [V^{-} {(S^{-} \tilde{G})}^{T}] - (S^{+} \vec{n}) * diag [V^{+} {(S^{+} \tilde{G})}^{T}] + ρ \vec{Δ} * {\vec{G}}_{k - 1}},

(100)

where X^T indicates the transpose of X, * denotes an element-by-element product, $S_{n n^{'}}^{\pm} = δ_{n \pm 1, n^{'}}$ are super-(+) and subdiagonal (−) matrices, $T_{n n^{'}}^{k} = g_{n - 1} δ_{n - 1, n^{'}} + (n + 1) δ_{n + 1, n^{'}} - (g_{n} + n + ρ k) δ_{n n^{'}}$ is a tridiagonal matrix, and $\tilde{G}$ , which is G with the columns reversed (i.e., ${\tilde{G}}_{n ℓ} = G_{n, k - ℓ}$ ), is built incrementally in k. As with the |j,k〉 and |j,k_j〉 bases, the kth term is slaved to the previous k’<k terms; in total the solution requires K inversions of an N-by-N matrix. However, here the task of inversion is simplified because the matrix to be inverted is tridiagonal. In fact, using the Thomas algorithm [42], we obtain an analytic solution for the case of constant production of the first gene and threshold regulation of the second gene, as described in Sec. II C.

The solution is initialized at k=0 using

G_{n 0} = 〈 n, {(k = 0)}_{n} ∣ G 〉 = \sum_{n^{'} m} 〈 n ∣ n^{'} 〉 〈 0_{n} ∣ m 〉 p_{n^{'} m} = p_{n}

(101)

[cf. Eq. (26)], where p_n is known [cf. Eq. (60)], and the joint distribution is retrieved via the inverse transform

p_{n m} = 〈 n, m ∣ G 〉 = \sum_{k} G_{n k} 〈 m ∣ k_{n} 〉 .

(102)

5. |j,k_n〉 basis

We now consider a basis which employs both the constant-rate eigenfunctions |j〉 in the n sector and the n-dependent eigenfunctions |k_n〉 in the m sector. Expressing the joint distribution directly in terms of the eigenbasis expansion of the generating function [43], we write

p_{n m} = \sum_{j k} G_{j k} 〈 n, m ∣ j, k_{n} 〉,

(103)

G_{j k} = \sum_{n m} p_{n m} 〈 j, k_{n} ∣ n, m 〉 .

(104)

Substituting Eq. (103) into Eq. (58) at steady state gives, after some simplification (cf. Appendix C),

0 = \sum_{j k} G_{j k} ∣ n, m 〉 {- \hat{H} ∣ j, k_{n} 〉 + {\hat{a}}_{n}^{+} {\hat{g}}_{n} ∣ j, δ_{-} k_{n} 〉 + {\hat{a}}_{n}^{-} ∣ j, δ_{+} k_{n} 〉},

(105)

where Ĥ is defined as in Eq. (64), ${\hat{a}}_{n}^{+}$ and ${\hat{a}}_{n}^{-}$ act as in Eqs. (12)-(15), and

∣ δ_{\pm} k_{n} 〉 \equiv ∣ k_{n \pm 1} 〉 - ∣ k_{n} 〉 .

(106)

We may now partition Ĥ=Ĥ₀+Ĥ₁ as

{\hat{H}}_{0} (n) = {\hat{b}}_{n}^{+} {\overset{‒}{b}}_{n}^{-} + ρ {\hat{b}}_{m}^{+} {\overset{‒}{b}}_{m}^{-} (n),

(107)

{\hat{H}}_{1} (n) = {\hat{b}}_{n}^{+} {\hat{Γ}}_{n} + ρ {\hat{b}}_{m}^{+} {\hat{Δ}}_{n} (n),

(108)

with ${\overset{‒}{b}}_{n}^{-} = {\hat{a}}_{n}^{-} - \overset{‒}{g}$ and Γ̂_n=g‾−ĝ_n as in the |j,k〉 and |j,k_j〉 bases and ${\overset{‒}{b}}_{m}^{-} (n) = {\hat{a}}_{m}^{-} - {\overset{‒}{q}}_{n}$ and Δ̂_n(n)=q‾_n−q̂_n as in the |n,k_n〉 basis. Noting that |j,k_n〉 is the eigenbasis of Ĥ₀(n), i.e.,

{\hat{H}}_{0} (n) ∣ j, k_{n} 〉 = (j + ρ k) ∣ j, k_{n} 〉,

(109)

Equation (105) becomes, after more simplification (cf. Appendix C),

0 = - (j + ρ k) G_{j^{'} k} - \sum_{j^{'}} Γ_{j - 1, j^{'}} G_{j^{'} k} - ρ \sum_{j^{'}} Δ_{j j^{'}} G_{j^{'}, k - 1} + \sum_{\pm} \sum_{j^{'}} \sum_{ℓ = 1}^{k} Λ_{j j^{'}}^{\pm ℓ} G_{j^{'}, k - ℓ},

(110)

where Γ_jj’ is as in Eq. (76),

Δ_{j j^{'}} = \sum_{n} 〈 j ∣ n 〉 ({\overset{‒}{q}}_{n} - q_{n}) 〈 n ∣ j^{'} 〉,

(111)

Λ_{j j^{'}}^{+ ℓ} = \sum_{n} 〈 j ∣ n 〉 (n + 1) 〈 n + 1 ∣ j^{'} 〉 V_{n l}^{+},

(112)

Λ_{j j^{'}}^{- ℓ} = \sum_{n} 〈 j ∣ n 〉 g_{n - 1} 〈 n - 1 ∣ j^{'} 〉 V_{n l}^{-},

(113)

and $V_{n ℓ}^{\pm}$ is as in Eq. (99). Linear algebraically,

{\vec{G}}_{k} = {(D^{k} + S^{-} Γ)}^{- 1} (- ρ Δ {\vec{G}}_{k - 1} + \sum_{\pm} \sum_{ℓ = 1}^{k} Λ^{\pm ℓ} {\vec{G}}_{k - ℓ}),

(114)

with $D_{j j^{'}}^{k}$ and $S_{j j^{'}}^{-}$ defined as before [cf. Eq. (80)], revealing once again a lower-triangular equation (i.e., each kth term is slaved to the previous k’<k terms) requiring only matrix multiplication and the inversion of a J-by-J matrix K times. Recalling Eq. (104), the scheme is initialized using

G_{j 0} = \sum_{n m} p_{n m} 〈 j ∣ n 〉 〈 {(k = 0)}_{n} ∣ m 〉 = \sum_{n} p_{n} 〈 j ∣ n 〉

(115)

[cf. Eq. (26)] with known p_n [cf. Eq. (60)], and the joint distribution is retrieved using Eq. (103).

B. Comparison of the representations

The spectral representations in Sec. II A produce equations of motion with similar levels of numerical complexity. In all cases, the original two-dimensional master equation has been reduced by the lower-triangular structure in the second gene’s eigenmode number k to a hierarchy of evaluations of one-dimensional problems. The bases differ in the rate parameters or equivalently gauge freedoms that one is free to choose: the |j,k〉 basis requires two constants g‾ and q‾; the |j,k_j〉 basis requires g‾ and a J-valued vector q‾_j; the |n,k_n〉 basis requires a N-valued vector q‾_n; and the |j,k_n〉 basis requires g‾ and q‾_n.

The bases also differ in the types of problems for which they are most suitable. For example, the |j,k〉, |j,k_j〉, and |j,k_n〉 bases, which all expand the parent species in eigenfunctions |j〉, are best when a cutoff in j is most appropriate, such as when the parent distribution is a Poisson. The |n,k_n〉 basis, on the other hand, is useful when a cutoff is n is most appropriate, such as when the parent species is concentrated at low protein number. Different bases are more robust to numerical errors for different regulation functions as well: the |n,k_n〉 and |j,k_n〉 bases, which both rely upon repeated manipulation of the object $Q_{n}^{\pm} = {\overset{‒}{q}}_{n} - {\overset{‒}{q}}_{n \pm 1}$ , are best for smooth regulation functions, for which the differences between q‾_n and q‾_n+1 are small; the |j,k〉 and |j,k_j〉 bases on the other hand, which involve the deviations q‾−q_n and q‾_j−q_n respectively, are less susceptible to numerical error given sharp regulation functions, such as a threshold.

As indicated in Fig. 2, the |j,k〉 basis can be viewed as a special case of either the |j,k_j〉 basis with q‾_j=q‾ [in which case Eq. (87) reduces to Eq. (75)] or of the |j,k_n〉 basis with q‾_n=q‾ [in which case Eq. (110) reduces to Eq. (75)]. Although possible in principle, expanding in the |j,m〉 basis does not exploit the natural structure of the problem, since it neither retains the tridiagonal structure in n nor gains the lower triangular structure in k. This example explicitly shows that not all bases are good candidates for simplifying the master equation.

The strength of all the spectral bases discussed in this section and of the proposed spectral method in general is that it allows for a fast and accurate calculation of full steady-state probability distributions of the number of protein molecules in a gene regulatory network. In Fig. 3 we demonstrate this property for the two-gene system by plotting error versus computational runtime for each spectral basis, as well as for a stochastic simulation using a varying step Monte Carlo procedure [28]. For error we use the Jensen-Shannon divergence [41] (a measure in bits between two probability distributions) between the distribution p_nm computed in the |n,m〉 basis (via iterative solution of the original master equation) and the distribution computed either via the spectral formulas in this section or by stochastic simulation. We plot this measure against the runtime of each method scaled by the runtime of the iterative solution in the |n,m〉 basis (all numerical experiments are performed in matlab). We find that the computations via the spectral bases achieve accuracy up to machine precision ~10³–10⁴ times faster than the iterative method’s runtime and ~10⁷–10⁸ times faster than the runtime necessary for the stochastic simulation to achieve the same accuracy. Computation in the |j,k〉 basis is most efficient since its equation of motion is simplest [cf. Eq. (80)]; the |j,k_j〉 and |j,k_n〉 bases tend to be slightly less efficient since they require inner loops over ℓ [cf. Eqs. (90) and (114)]. Figure 3 demonstrates the tremendous gain in performance achieved by the joint analytic-numerical spectral method over traditional simulation approaches.

C. Analytic solution

In general, the equations of motion in the spectral representations [Eqs. (75), (87), (97), and (110)] need to be evaluated numerically. In the case of the |n,k_n〉 basis, however, we can exploit the tridiagonal structure of Eq. (97) to find an exact analytic solution. Specifically, in the case of a Poisson parent (g_n=g=const) and for threshold regulation, i.e.,

q_{n} = {\begin{matrix} q_{-} & for n \leq n_{0} \\ q_{+} & for n > n_{0}, \end{matrix}

(116)

setting q‾_n=q_n makes Eq. (97)

g G_{n - 1, k} + (n + 1) G_{n + 1, k} - (ρ k + g + n) G_{n k} = - g ϕ_{k}^{-} δ_{n n_{1}} - n_{1} ϕ_{k}^{+} δ_{n n_{0}},

(117)

where

ϕ_{k}^{-} = \sum_{ℓ = 1}^{k} \frac{{(- Δ)}^{ℓ}}{ℓ!} G_{n_{0}, k - ℓ},

(118)

ϕ_{k}^{+} = \sum_{ℓ = 1}^{k} \frac{Δ^{ℓ}}{ℓ!} G_{n_{1}}, k - ℓ,

(119)

Δ=q₊−q₋, and n₁=n₀+1. Equation (117) is solved using the tridiagonal matrix algorithm (also called the Thomas algorithm [42]), as described in detail in Appendix D. The result is an analytic expression for the kth column of G_nk in terms of its previous columns (i.e., the matrix inversion has been performed explicitly),

G_{n k} = \frac{n_{1}}{g} \frac{(n_{0} - 1)!}{n!} \frac{η_{n}^{k}}{η_{n_{0} - 1}^{k}} \times {\begin{matrix} (ϕ_{k}^{+} + f_{k} F_{n_{1}}^{k}) ∕ (∊_{n_{0}}^{k} - 1) & n \leq n_{0} \\ f_{k} F_{n}^{k} ∕ \prod_{i = n_{0}}^{n - 1} (∊_{i}^{k} - 1) & n > n_{0}, \end{matrix}

(120)

where

f_{k} = ϕ_{k}^{+} - \frac{g n_{0}}{n_{1}} \frac{η_{n_{0} - 1}^{k}}{η_{n_{0}}^{k}} (∊_{n_{0}}^{k} - 1) ϕ_{k}^{-},

(121)

F_{n}^{k} = \sum_{i = 0}^{N - n} \prod_{ℓ = n}^{n + i} \frac{1}{∊_{ℓ}^{k} - 1},

(122)

∊_{n}^{k} = \frac{ρ k + g + n}{g n} \frac{η_{n}^{k}}{η_{n - 1}^{k}},

(123)

η_{n}^{k} = \sum_{i = 0}^{n} \frac{n!}{i! (n - i)!} g^{n - i} \prod_{ℓ = 0}^{i - 1} (ρ k + ℓ)

(124)

= \frac{1}{Γ (ρ k)} \int_{0}^{t} d t e^{- t} t^{(ρ k - 1)} {(g + t)}^{n},

(125)

and N is the cutoff in protein number n. Along with the analytic form of the mixed product

〈 m ∣ k_{n} 〉 = {(- 1)}^{k} e^{- q n} {(q_{n})}^{m} k! \times \sum_{ℓ = 0}^{\min (m, k)} \frac{1}{ℓ! (m - ℓ)! (k - ℓ)! {(- q_{n})}^{ℓ}}

(126)

(cf. Appendix A), Eq. (120) in the limit N → ∞ constitutes an exact analytic solution for the joint distribution p_nm as calculated using Eq. (102).

D. Threshold-regulated gene approximates the on/off gene

If a gene is regulated via a threshold function [cf. Eq. (116)], its steady-state protein distribution p_m can be well approximated by the two-state process discussed in Sec. II C. To make the connection clear, we first observe that the off-state (z=−) corresponds to the first gene expressing the same or fewer proteins n than the threshold n₀, i.e.,

p_{m}^{-} = \sum_{n \leq n_{0}} p_{n m},

(127)

and the on state (z=+) corresponds to the first gene expressing more proteins than the threshold, i.e.,

p_{m}^{+} = \sum_{n > n_{0}} p_{n m} .

(128)

The dynamics of $p_{m}^{\pm}$ are then obtained by summing the master equation for two-gene regulation [Eq. (58)] over either all n≤n₀ or all n>n₀, giving

{\dot{p}}_{m}^{\pm} = ρ [q_{\pm} p_{m - 1}^{\pm} + (m + 1) p_{m + 1}^{\pm} - (q_{\pm} + m) p_{m}^{\pm}] \mp n_{1} p_{n_{1} m} \pm g_{n_{0}} p_{n_{0} m},

(129)

where n₁=n₀+1. Making the approximations

\frac{p_{m}^{-}}{π_{-}} = p (m ∣ -) \approx p (m ∣ n_{0}) = \frac{p_{n_{0} m}}{p_{n_{0}}},

(130)

\frac{p_{m}^{+}}{π_{+}} = p (m ∣ +) \approx p (m ∣ n_{1}) = \frac{p_{n_{1} m}}{p_{n_{1}}},

(131)

where

π_{-} = \sum_{m} p_{m}^{-} = \sum_{n \leq n_{0}} p_{n},

(132)

π_{+} = \sum_{m} p_{m}^{+} = \sum_{n > n_{0}} p_{n}

(133)

are the total probabilities of being in the off and on states, respectively, and noting from Eq. (59) that g_n₀=n₁p_n₁/p_n₀, Eq. (129) at steady state becomes

0 = q_{z} p_{m - 1}^{z} + (m + 1) p_{m + 1}^{z} - (q_{z} + m) p_{m}^{z} + \sum_{z^{'}} Ω_{z z^{'}} p_{m}^{z^{'}},

(134)

with z=± and

Ω = (\begin{matrix} - ω_{+} & ω_{-} \\ ω_{+} & - ω_{-} \end{matrix}),

(135)

where

ω_{\pm} = \frac{n_{1} p_{n_{1}}}{ρ π_{\mp}} .

(136)

Equations (134) and (135) have the same form as Eqs. (1) and (47) at steady state with n→m and g→q, and Eq. (136) relates the effective switching rates ω_± to input and regulation parameters p_n₁, π_±, and n₁, and the ratio ρ of the degradation rate of the second gene to that of the first. Note that Eq. (136) satisfies

\frac{π_{-}}{π_{+}} = \frac{ω_{-}}{ω_{+}},

(137)

in agreement with Eq. (48), and exhibits the intuitive behavior that increasing ρ (i.e., decreasing the relative response rate of the first gene) is equivalent to decreasing the switching rates ω_±.

A comparison of the distributions of a threshold-regulated gene with those of an on/off gene for various parameter settings reveals that Eqs. (130) and (131) are a good approximation. Figure 4 shows a demonstration for a threshold-regulated system with a Poisson input distribution. In the first column, the mean g of the input lies above the threshold n₀, making the output more likely to be in the on-state, i.e., π₊>π₋; in the second column, g<n₀, making π₊<π₋. In the first row ρ<1; in the second row ρ>1, corresponding to lower effective switching rates ω_± and producing bimodal distributions with peaks near the on/off rates q_±. In all examples, the approximation as a two-state process with switching rates given by Eq. (136) agrees well with the actual output from threshold regulation.

FIG. 4 — Protein distributions for a gene regulated by a threshold function [dots; calculated via Eq. (75)] and a gene with two stochastic states [circles; calculated via Eq. (43)]. The relationship between regulation parameters and state transition rates is given by Eq. (136). In all panels the input to the regulation is a Poisson distribution with mean g=7, and the regulation rates [cf. Eq. (116)] are q₋=2 and q₊=15. In the first column the threshold is n₀=4 making π₊=0.827>π₋=0.173; in the second column n₀=8 making π₊=0.271<π₋=0.729. In the first row the ratio of the second gene’s degradation rate to that of the first is ρ=0.1; in the second row ρ=10.

IV. REGULATION WITH BURSTS

The final system we consider combines the multistate process used to model bursts of expression in Sec. I with gene regulation as discussed in Sec. II. Specifically we consider a system of two species, with protein numbers n and m, existing in Z possible states, distinguished by the settings of the two production rates g_z and q_z respectively, where 1≤z≤Z. Regulation is achieved by allowing the rates of transition among states affecting the production of the second gene to depend on the number n of proteins expressed by the first gene. Recalling Eqs. (1) and (58), the master equation describing the evolution of the joint probability distribution $p_{n m}^{z}$ reads

{\dot{p}}_{n m}^{z} = g_{z} p_{n - 1, m}^{z} + (n + 1) p_{n + 1, m}^{z} - (g_{z} + n) p_{n m}^{z} + ρ [q_{z} p_{n, m - 1}^{z} + (m + 1) p_{n, m + 1}^{z} - (q_{z} + m) p_{n m}^{z}] + \sum_{z^{'}} Ω_{z z^{'}} (n) p_{n m}^{z^{'}},

(138)

where the dependence of the stochastic matrix (Ω_zz’ on n incorporates the regulation.

As with the previously discussed models, Eq. (138) benefits from spectral expansion, and for simplicity we present only the formulation in the |j,k〉 basis, parametrized by constant rates g‾ and q‾ respectively, as in Secs. II B and III A 2. As before the first step is to define the generating function

∣ G_{z} 〉 = \sum_{n m} p_{n m}^{z} ∣ n, m 〉,

(139)

with which Eq. (138), upon summing over n and m against |n,m〉, becomes

∣ {\dot{G}}_{z} 〉 = - {\hat{H}}_{z} ∣ G_{z} 〉 + \sum_{z^{'}} {\hat{Ω}}_{z z^{'}} ∣ G_{z^{'}} 〉,

(140)

where

{\hat{H}}_{z} = {\hat{b}}_{n}^{+} {\hat{b}}_{n z}^{-} + ρ {\hat{b}}_{m}^{+} {\hat{b}}_{m z}^{-}

(141)

{\hat{b}}_{n}^{+} = {\hat{a}}_{n}^{+} - 1,

(142)

{\hat{b}}_{m}^{+} = {\hat{a}}_{m}^{+} - 1,

(143)

{\hat{b}}_{n z}^{-} = {\hat{a}}_{n}^{-} - g_{z},

(144)

{\hat{b}}_{m z}^{-} = {\hat{a}}_{m}^{-} - q_{z},

(145)

and Ω̂_zz’ is Ω_zz’(n) with every instance of n replaced by the number operator ${\hat{a}}_{n}^{+} {\hat{a}}_{n}^{-}$ . Defining ${\overset{‒}{b}}_{n}^{-} = {\hat{a}}_{n}^{-} - \overset{‒}{g}$ and ${\overset{‒}{b}}_{m}^{-} = {\hat{a}}_{m}^{-} - \overset{‒}{q}$ , we partition the Hamiltonian as ${\hat{H}}_{z} = {\hat{H}}_{0} + {\hat{H}}_{1}^{z}$ , with

{\hat{H}}_{0} = {\hat{b}}_{n}^{+} {\overset{‒}{b}}_{n}^{-} + ρ {\hat{b}}_{m}^{+} {\overset{‒}{b}}_{m}^{-}

(146)

as the operator of which |j,k〉 is the eigenbasis, i.e.,

{\hat{H}}_{0} ∣ j, k 〉 = (j + ρ k) ∣ j, k 〉

(147)

and

{\hat{H}}_{1}^{z} = {\hat{b}}_{n}^{+} Γ_{z} + ρ {\hat{b}}_{m}^{+} Δ_{z}

(148)

capturing the deviations Γ_z=g‾−g_z and Δ_z=q‾−q_z of the constant rates from the state-dependent rates. Upon expanding the generating function in the eigenbasis,

∣ G_{z} 〉 = \sum_{j k} G_{j k}^{z} ∣ j, k 〉,

(149)

and taking dummy indices j→j’ and k→k’, projecting the conjugate state 〈j,k| onto Eq. (140) gives

{\dot{G}}_{j k}^{z} = - (j + ρ k) G_{j k}^{z} - Γ_{z} G_{j - 1, k}^{z} - Δ_{z} G_{j, k - 1}^{z} + \sum_{z^{'}} \sum_{j^{'}} 〈 j ∣ {\hat{Ω}}_{z z^{'}} ∣ j^{'} 〉 G_{j^{'} k}^{z^{'}},

(150)

where the components of Ω̂_zz’ need only be evaluated in the j sector, not the k sector, because the transition rates depend on only n, not m [cf. Eq. (138)]. Like Eqs. (43) and (75), Eq. (150) is subdiagonal in k and thus far more efficient to solve than the original master equation [Eq. (138)] as we demonstrate for a special case in the next section. The joint distribution is retrieved from $G_{j k}^{z}$ via inverse transform,

p_{n m}^{z} = \sum_{j k} 〈 n ∣ j 〉 G_{j k}^{z} 〈 m ∣ k 〉,

(151)

with the mixed products calculated as in Appendix A.

A. Four-state process

As a simple example of the model in Eq. (138), we consider a system in which each of the two species has an on state and an off state, and the transition rate of the second species to its on state is a function of the number of copies of the first species. This system models both (i) a single gene for which the production of proteins depends on the number of transcripts, and each is produced in on and off states by the binding and unbinding of ribosomes and RNA polymerase, respectively, and (ii) one gene regulating another with each undergoing burstlike expression.

There are a total of Z=4 states, i.e.,

p_{n m}^{z} = (p_{n m}^{- -}, p_{n m}^{+ -}, p_{n m}^{- +}, p_{n m}^{+ +})

(152)

where the first signed index denotes the state of the first gene (with protein count n) and the second signed index denotes the state of the second gene (with protein count m). Defining g_± as the production rates of the first species in its on (+) and off states (−), and similarly q_± for the second species, the production rates of the Z=4 states are

g_{z} = (g_{-}, g_{+}, g_{-}, g_{+}),

(153)

q_{z} = (q_{-}, q_{-}, q_{+}, q_{+}) .

(154)

Defining ω_± as the transition rates of the first species to (+) and from (−) its on state, and similarly α_± for the second species, the transition matrix takes the form

Ω_{z z^{'}} (n) = (\begin{matrix} - ω_{+} - α_{+} (n) & ω_{-} & α_{-} & 0 \\ ω_{+} & - ω_{-} - α_{+} (n) & 0 & α_{-} \\ α_{+} (n) & 0 & - ω_{+} - α_{-} & ω_{-} \\ 0 & α_{+} (n) & ω_{+} & - ω_{-} - α_{-} \end{matrix}) .

(155)

The simple form α₊(n)=cn^ν for constant c and integer ν corresponds to the first species activating the second as a multimer, with ν the order of the multimerization. In the limit of fast switching this description reduces to a Hill function with cooperativity ν [23]. Recalling that ${\hat{b}}_{n}^{+} = {\hat{a}}_{n}^{+} - 1$ and ${\overset{‒}{b}}_{n}^{-} = {\hat{a}}_{n}^{-} - \overset{‒}{g}$ , the n-dependent terms of 〈j|Ω̂_zz’|j’〉 are evaluated as

〈 j ∣ α_{+} ({\hat{a}}_{n}^{+} {\hat{a}}_{n}^{-}) ∣ j^{'} 〉 = c 〈 j ∣ {[({\hat{b}}_{n}^{+} + 1) ({\overset{‒}{b}}_{n}^{-} + \overset{‒}{g})]}^{ν} ∣ j^{'} 〉 .

(156)

Since ${\hat{b}}_{n}^{+}$ and ${\hat{b}}_{n}^{-}$ raise and lower |j’〉 states respectively [cf. Eqs. (22) and (23)], the modified transition matrix 〈j|(Ω̂_zz’|j’〉 is nearly diagonal, with nonzero terms only for |j−j’|≤ν.

Equation (150) at steady state,

- (j + ρ k) G_{j k}^{z} - Γ_{z} G_{j - 1, k}^{z} + \sum_{z^{'}} \sum_{j^{'}} 〈 j ∣ {\hat{Ω}}_{z z^{'}} ∣ j^{'} 〉 G_{j^{'} k}^{z^{'}} = Δ_{z} G_{j, k - 1}^{z},

(157)

is solved successively in k, requiring the inversion of a 4J-by-4J matrix K times. It is initialized at k=0 by computing the null space of the left-hand side and normalizing with $Σ_{z} G_{00}^{z} = 1$ [cf. Eq. (35)]. The joint distribution $p_{n m}^{z}$ is retrieved via inverse transform [Eq. (151)].

With ν=2, a typical solution of Eq. (157) takes a few seconds (in matlab), which, depending on the cutoff N, is ~10²–10³ times faster than direct solution of the master equation [Eq. (138)] by iteration for equivalent accuracy. The advantage of such a large efficiency gain is that it allows repeated evaluations of the governing equation, necessary for parameter inference or optimization [31]. We demonstrate this possibility in the next section by finding and interpreting the solutions that optimize the information flow from the first to the second species.

B. Information-optimal solution

Cells use regulatory processes to transmit relevant information from one species to the next [44-48]. Information processing is quantified by the mutual information I, which, between the first and second species in the four-state process, is

I = \sum_{n m} p_{n m} \log_{2} \frac{p_{n m}}{p_{n} p_{m}},

(158)

where the distributions p_nm, p_n, and p_m are obtained from summing the joint distribution $p_{n m}^{z}$ [cf. Eq. (151)], and the log is taken with base 2 to give I in bits.

Upon optimization of I for the four-state process, two distinct types of optimal solutions become clear: those in which the distribution p_nm has one peak and those in which p_nm has two peaks. The former occur when copy number is constrained to be low, and switching rates are constrained to be near the decay rates of both species, producing a single peak at low copy number [see lower left inset of Fig. 5(B)]. As these constraints are lifted, it is optimal for the switching rates of the parent species to become much less than the decay rate. The slow switching produces a second peak whose location is specified by the on rate of each species [see upper right inset of Fig. 5(B)].

FIG. 5 — (A) Mutual information I [cf. Eq. (158)] versus stiffness σ [cf. Eq. (160)] for fixed gain [γ=16, cf. Eq. (159)] obtained by optimizing Eq. (161) for λ values between 10⁻³ and 10¹. Squares denote solutions whose joint distribution *p_nm* has one peak (cf. B, lower left inset), and dots denote solutions for which *p_nm* has two peaks (cf. B, upper right inset). Solid lines show the convex hulls of the one- and two-peaked solutions. Dotted lines indicate the stiffness value at which the hulls intersect and the stiffness values of the hull points to the left and right of the intersection. (B) Phase diagram between one- and two-peaked optimal solutions in the gain-stiffness plane. Circles and left and right error bars at each gain are determined by the stiffness values at the intersection of the one- and two-peak convex hulls and at the hull points to the left and right of the intersection, respectively (see dotted lines for the example case in A). Solid line shows a line of best fit. Insets show examples of one-(lower left) and two-peaked (upper right) optimal distributions *p_nm*.

To quantify the transition between the two types of solutions, we numerically optimized mutual information over parameters g₊, q₊, ω₋, ω₊, α₋, and c (the off-rates g₋ and q₋ were fixed at 0; the cooperativity ν was fixed at 2; and the decay rate ratio ρ was fixed at 1). Information may always be trivially optimized by allowing infinite copy number or arbitrary separation of relevant time scales. We limit copy number by constraining the gain

γ = \frac{Γ + Δ}{2},

(159)

defined as the average of the parent gain 〈=g₊−g₋ and the child gain Δ=q₊−q₋. Since g₋=q₋=0, the maximum number of particles is dictated by the on rates g₊ and q₊ and thus constraining γ limits the copy number. We limit separation between the switching time scales and the decay time scales by constraining the stiffness

σ = \frac{1}{4} (∣ \log_{10} ω_{-} ∣ + ∣ \log_{10} ω_{+} ∣ + ∣ \log_{10} α_{-} ∣ + ∣ \log_{10} [α_{+} 〈 n^{ν} 〉] ∣),

(160)

where the average 〈n^ν〉 is taken over p_n. Stiffness σ is the average of the absolute deviation of (the logs of) all four switching rates from the (unit) decay rates so constraining σ prevents fast or slow switching. Gain is fixed by varying g₊ and q₊ such that γ is a constant, and stiffness is constrained by optimizing the objective function

L = I - λ σ

(161)

for a given value of the Lagrange multiplier λ.

As shown in Fig. 5(A), one-peaked solutions are more informative at low stiffness, while two-peaked solutions are more informative at high stiffness. We compute the convex hulls of the one- and two-peaked data to remove suboptimal solutions, and the transition occurs at the stiffness value at which the convex hulls intersect [cf. Fig. 5(A)]. Repeating this procedure for many choices of gain allows one to trace out the phase transition shown Fig. 5(B), which makes clear that one-peaked solutions are most informative at low stiffness, two-peaked solutions are most informative at high stiffness, and the critical stiffness decreases weakly with increasing gain.

Examining the two-peaked solutions, which are optimal at high stiffness, we find that the marginal probability to be in the (−,+) state, where the first gene is off and the second gene is on, $π^{-, +} = Σ_{n m} p_{n m}^{-, +}$ , is much smaller than the marginal probabilities to be in the other three states [π^−,+/π^z<0.01, where z={(+,+), (+,−), (−,−)}]. This result states that it is unlikely for the regulated gene to be expressing proteins at a high level, if the regulator protein or mRNA (depending on the interpretation of the model) is not being expressed. Optimizing information, we find rates which result in this intuitive solution.

V. CONCLUSIONS

The presented spectral method exploits the linearity of the master equation to solve for probability distributions directly by expanding in the natural eigenfunctions the linear operator. We demonstrate the method on three models of gene expression: a single gene with multiple expression states, a gene regulatory cascade, and a model that combines multistate expression with explicit regulation through binding of transcription factor proteins.

The spectral method permits huge computational gains over simulation. As demonstrated for all spectral expansions of the two-gene cascade (cf. Fig. 3), directly solving for the distribution via the spectral method is ~10⁷–10⁸ times faster than building the distribution from samples using a simulation technique. This massive speedup makes possible optimization and inference problems requiring full probability distributions that were not computationally feasible previously. For example, by optimizing information flow in a two-gene cascade in which both parent and child undergo two-state production, we reveal a transition from a one-peaked to a two-peaked joint probability distribution when constraints on protein number and time scale separation are relaxed. We emphasize that this optimization would not have been possible without the efficiency of the spectral method.

The spectral method also makes explicit the linear algebraic structure underlying the master equation. In many cases, such as in two-state bursting and the two-gene threshold regulation problem, this leads to analytic solutions. In general, such as shown in the case of the linear cascade, this leads to a set of natural bases for expansion of the generating function and reveals the features of each basis that are better suited to different types of problems. Specifically, bases in which the parent species is expanded in eigenfunctions are best when the parent distribution is Poissonian, and bases in which the parent is left in protein number space are best when the parent distribution is concentrated at low protein number. As well, bases in which the eigenfunctions of the child depend on the number of copies of the parent’s protein are best suited for smooth regulation functions, whereas a basis in which the eigenfunctions of the child are parametrized by a constant is more numerically robust for sharp regulation functions such as thresholds. In all cases the linear algebraic structure of the spectral decomposition yields numerical prescriptions that greatly outperform simulation techniques. We anticipate that the computational speedup of the method, as well as the removal of the statistical obstacle of density estimation inherently limiting simulation-based approaches, will make spectral methods such as those demonstrated here useful in addressing a wide variety of biological questions regarding accurate and efficient modeling of noisy information transmission in biological systems.

ACKNOWLEDGMENTS

A.M. was supported by National Science Foundation Grant No. DGE-0742450. A.M. and C.W. were supported by National Science Foundation Grant No. ECS-0332479. A.M.W. was supported by the Princeton Center for Theoretical Science and by Columbia’s Professional Schools Visiting Scholar Grant. C.W. was supported by National Institutes of Health Grants No. 5PN2EY016586-03 and No. 1U54CA121852-01A1.

APPENDIX A

In this appendix we describe two ways to compute the mixed products 〈n|j〉 and 〈j |n〉 between the protein number states |n〉 and the eigenstates |j〉: by direct evaluation and by recursive updating.

The direct evaluation follows from Eqs. (8)–(10) and (20), and the fact that repeated derivatives of a product follow a binomial expansion. Introducing g as the rate parameter for the |j〉 states,

〈 n ∣ j 〉 = \oint \frac{d x}{2 π i} 〈 n ∣ x 〉 〈 x ∣ j 〉

(A1)

= \oint \frac{d x}{2 π i} \frac{e^{g (x - 1)} {(x - 1)}^{j}}{x^{n + 1}}

(A2)

= \frac{1}{n!} \partial_{x}^{n} {[e^{g (x - 1)} {(x - 1)}^{j}]}_{x = 0}

(A3)

= \frac{1}{n!} \sum_{ℓ = 0}^{n} \frac{n!}{ℓ! (n - ℓ)!} \partial_{x}^{n - ℓ} {[e^{g (x - 1)}]}_{x = 0} \times \partial_{x}^{ℓ} {[{(x - 1)}^{j}]}_{x = 0}

(A4)

= \sum_{ℓ = 0}^{n} \frac{1}{ℓ! (n - ℓ)!} [g^{n - ℓ} e^{- g}] \times [\frac{j!}{(j - ℓ)!} {(- 1)}^{j - ℓ} θ (j - ℓ + 1)]

(A5)

= {(- 1)}^{j} e^{- g} g^{n} j! ξ_{n j},

(A6)

where

ξ_{n j} = \sum_{ℓ = 0}^{\min (n, j)} \frac{1}{ℓ! (n - ℓ)! (j - ℓ)! {(- g)}^{ℓ}} .

(A7)

Similarly, noting Eqs. (7) and (21),

〈 j ∣ n 〉 = n! {(- g)}^{j} ξ_{n j},

(A8)

with ζ_nj as in Eq. (A7). Equations (A6) and (A8) clearly reduce to Eqs. (26) and (27) for the special case j =0.

It is more computationally efficient to take advantage of the selection rules in Eqs. (12)–(15) and (22)–(25) to compute the mixed products recursively. For example, using Eqs. (14), (16), and (22),

〈 n ∣ j + 1 〉 = 〈 n ∣ {\hat{b}}^{+} ∣ j 〉 = 〈 n ∣ ({\hat{a}}^{+} - 1) ∣ j 〉 = 〈 n - 1 ∣ j 〉 - 〈 n ∣ j 〉,

(A9)

which can be initialized using 〈n|0〉=e^−ggⁿ/n! [cf. Eq. (A6)] and updated recursively in j. Equation (A9) makes clear that in n space the (j +1)th mode is simply the (negative of the) discrete derivative of the jth mode.

Alternatively, Eqs. (15), (17), and (23) give

(n + 1) 〈 n + 1 ∣ j 〉 = 〈 n ∣ {\hat{a}}^{-} ∣ j 〉 = 〈 n ∣ ({\hat{b}}^{-} + g) ∣ j 〉 = j 〈 n ∣ j - 1 〉 + g 〈 n ∣ j 〉,

(A10)

which can be initialized using 〈0|j〉=(−1)^je^−g [cf. Eq. (A6)] and updated recursively in n.

One may similarly derive recursion relations for 〈j |n〉, i.e.,

〈 j ∣ n + 1 〉 = 〈 j - 1 ∣ n 〉 + 〈 j ∣ n 〉,

(A11)

(j + 1) 〈 j + 1 ∣ n 〉 = n 〈 j ∣ n - 1 〉 - g 〈 j ∣ n 〉,

(A12)

initialized with 〈j |0〉=(−g)^j / j! or 〈0|n〉=1, respectively [cf. Eq. (A8)] and updated recursively in n or j, respectively.

One may also use the full birth-death operator b̂⁺b̂⁻ to derive the recursion relations

(n + 1) 〈 n + 1 ∣ j 〉 = (g + n - j) 〈 n ∣ j 〉 - g 〈 n - 1 ∣ j 〉,

(A13)

g 〈 j ∣ n + 1 〉 = (g + n - j) 〈 j ∣ n 〉 - n 〈 j ∣ n - 1 〉,

(A14)

initialized with 〈0|j〉=(−1)^je^−g and 〈1|j〉=(−1)^je^−g(g− j) [cf. Eq. (A6)], and 〈j |0〉=(−g)^j / j! and 〈j |1〉=(−g)^j(1−j /g)/ j! [cf. Eq. (A8)], respectively, and updated recursively in n.We find Eqs. (A13) and (A14) are more numerically stable than Eqs. (A9)–(A12), as the former are two-term recursion relations while the latter are one-term recursion relations.

APPENDIX B

In the limit g₋=0, Eq. (54) reads

G (x) = \frac{ω_{+}}{ω_{+} + ω_{-}} e^{y} Φ [ω_{-}, ω_{+} + ω_{-} + 1; - y] + \frac{ω_{-}}{ω_{+} + ω_{-}} Φ [ω_{+}, ω_{+} + ω_{-} + 1; y],

(B1)

where y=e^g₊(x−1). Using the fact that [49]

e^{y} Φ [α, β; - y] = Φ [β - α, β; y],

(B2)

Eq. (B1) can be written

G (x) = \frac{ω_{+}}{ω_{+} + ω_{-}} Φ [ω_{+} + 1, ω_{+} + ω_{-} + 1; y] + \frac{ω_{-}}{ω_{+} + ω_{-}} Φ [ω_{+}, ω_{+} + ω_{-} + 1; y],

(B3)

or noting Eq. (55) and the fact that Γ(z+1)=zΓ(z) for any z,

G (x) = \sum_{j} (\frac{ω_{+}}{ω_{+} + ω_{-}} \frac{Γ (j + ω_{+} + 1)}{Γ (ω_{+} + 1)} + \frac{ω_{-}}{ω_{+} + ω_{-}} \frac{Γ (j + ω_{+})}{Γ (ω_{+})}) \times \frac{Γ (ω_{+} + ω_{-} + 1)}{Γ (j + ω_{+} + ω_{-} + 1)} \frac{y^{j}}{j!}

(B4)

= \sum_{j} (\frac{ω_{+}}{ω_{+} + ω_{-}} \frac{(j + ω_{+}) Γ (j + ω_{+})}{ω_{+} Γ (ω_{+})} + \frac{ω_{-}}{ω_{+} + ω_{-}} \frac{Γ (j + ω_{+})}{Γ (ω_{+})}) \times \frac{(ω_{+} + ω_{-}) Γ (ω_{+} + ω_{-})}{(j + ω_{+} + ω_{-}) Γ (j + ω_{+} + ω_{-})} \frac{y^{j}}{j!}

(B5)

= \sum_{j} \frac{Γ (j + ω_{+})}{Γ (ω_{+})} \frac{Γ (ω_{+} + ω_{-})}{(j + ω_{+} + ω_{-})} \frac{y^{j}}{j!}

(B6)

Φ [ω_{+}, ω_{+} + ω_{-}; y],

(B7)

as in Eq. (56).

The marginal p_n is given by

〈 n ∣ G 〉 = \frac{1}{n!} \partial_{x}^{n} {[G (x)]}_{x = 0}

(B8)

[cf. Eq. (10)]. Using Eq. (56) and the derivative of the confluent hypergeometric function,

\partial_{y}^{n} Φ [α, β; y] = \frac{Γ (n + α)}{Γ (α)} \frac{Γ (β)}{Γ (n + β)} Φ [α + n, β + n; y],

(B9)

one obtains Eq. (57).

APPENDIX C

In this appendix, we fill in the details of the derivations of the equations of motion for the latter three of the four spectral bases discussed in Sec. II A.

1. |j,k_j〉 basis

Projecting the conjugate state 〈j,k_j| onto Eq. (83) (in which dummy indices j and k are changed to j’ and k’, respectively) gives

0 = \sum_{j^{'} k^{'}} (j^{'} + ρ k^{'}) 〈 j ∣ j^{'} 〉 〈 k_{j} ∣ k_{j^{'}}^{'} 〉 G_{j^{'} k^{'}} \sum_{j^{'} k^{'}} 〈 j ∣ {\hat{b}}_{n}^{+} {\hat{Γ}}_{n} ∣ j^{'} 〉 \times 〈 k_{j} ∣ k_{j^{'}}^{'} 〉 G_{j^{'} k^{'}} + ρ \sum_{j^{'} k^{'}} 〈 k_{j} ∣ {\hat{b}}_{m}^{+} ∣ k_{j'}^{'} 〉 〈 j ∣ {\hat{Δ}}_{n} (j^{'}) ∣ j^{'} 〉 G_{j^{'} k^{'}} .

(C1)

From the orthonormality of states, the first term of Eq. (C1) simplifies to

\sum_{k^{'}} (j + ρ k^{'}) 〈 k_{j} ∣ k_{j}^{'} 〉 G_{j k^{'}} = (j + ρ k) G_{j k} .

(C2)

Recalling Eq. (30), the product $〈 k_{j} ∣ k_{j^{'}}^{'} 〉$ simplifies to

〈 k_{j} ∣ k_{j^{'}}^{'} 〉 = \frac{- {(Q_{j j^{'}})}^{k - k^{'}}}{(k - k^{'})!} θ (k - k^{'} + 1),

(C3)

with Q_jj_’=q̂_j−q̂_j_’, whereupon Eq. (C1), separating the part of its second term which is diagonal in k from that which is subdiagonal and applying Eq. (24) to its third term, becomes

0 = (j + ρ k) G_{j k} + \sum_{j^{'}} Γ_{j - 1, j^{'}} G_{j^{'} k} + \sum_{k^{'} < k} \sum_{j^{'}} Γ_{j - 1, j^{'}} \frac{{(- Q_{j j^{'}})}^{k - k^{'}}}{(k - k^{'})!} G_{j^{'} k^{'}} + ρ \sum_{k^{'} < k} \sum_{j^{'}} Δ_{j j^{'}} \frac{{(- Q_{j j^{'}})}^{k - k^{'} - 1}}{(k - k^{'} - 1)!} G_{j^{'} k^{'}},

(C4)

with Γ_jj’ as in Eq. (76) and

Δ_{j j^{'}} = 〈 j ∣ {\hat{Δ}}_{n} (j^{'}) ∣ j^{'} 〉 = 〈 j ∣ ({\overset{‒}{q}}_{j^{'}} - {\hat{q}}_{n}) ∣ j^{'} 〉

(C5)

= 〈 j ∣ ({\overset{‒}{q}}_{j} - {\hat{q}}_{n}) ∣ j^{'} 〉

(C6)

= \sum_{n} 〈 j ∣ n 〉 ({\overset{‒}{q}}_{j} - q_{n}) 〈 n ∣ j^{'} 〉

(C7)

[where the orthonormality of |j〉 states is used in going from Eq. (C5) to Eq. (C6)]. Defining ℓ=k−k’ and

V_{j j^{'}}^{ℓ} = \frac{{(- Q_{j j^{'}})}^{ℓ}}{ℓ!},

(C8)

Eq. (C4) can be written more compactly as Eq. (87).

2. |n,k_n〉 basis

Projecting the conjugate state 〈n,k_n| onto Eq. (93) (in which dummy indices n and k are changed to n’ and k’, respectively) gives

0 = \sum_{n^{'} k^{'}} (g_{n^{'}} + n^{'} + ρ k^{'}) 〈 n ∣ n^{'} 〉 〈 k_{n} ∣ k_{n^{'}}^{'} 〉 G_{n^{'} k^{'}} - \sum_{n^{'} k^{'}} g_{n^{'}} 〈 n ∣ n^{'} + 1 〉 \times 〈 k_{n} ∣ k_{n^{'}}^{'} 〉 G_{n^{'} k^{'}} - \sum_{n^{'} k^{'}} n^{'} 〈 n ∣ n^{'} - 1 〉 〈 k_{n} ∣ k_{n^{'}}^{'} 〉 G_{n^{'} k^{'}} + ρ \sum_{n^{'} k^{'}} Δ_{n^{'}} 〈 n ∣ n^{'} 〉 〈 {(k - 1)}_{n} ∣ k_{n^{'}}^{'} 〉 G_{n^{'} k^{'}},

(C9)

where Δ_n=q‾_n−q_n. Noting that, as in Eq. (30),

〈 k_{n} ∣ k_{n \pm 1}^{'} 〉 = \frac{{(- Q_{n}^{\pm})}^{k - k^{'}}}{(k - k^{'})!} θ (k - k^{'} + 1),

(C10)

where $Q_{n}^{\pm} = {\overset{‒}{q}}_{n} - {\overset{‒}{q}}_{n \pm 1}$ , Eq. (C9) becomes

0 = (g_{n} + n + ρ k) G_{j k} - g_{n - 1} \sum_{k^{'} \leq k} \frac{{(- Q_{n}^{-})}^{k - k^{'}}}{(k - k^{'})!} G_{n - 1, k^{'}} - (n + 1) \sum_{k^{'} \leq k} \frac{{(- Q_{n}^{+})}^{k - k^{'}}}{(k - k^{'})!} G_{n + 1, k^{'}} + ρ Δ_{n} G_{n, k - 1} .

(C11)

Separating the parts of the second and third term that are diagonal in k and defining ℓ=k−k’ and

V_{n ℓ}^{\pm} = \frac{{(- Q_{n}^{\pm})}^{ℓ}}{ℓ!},

(C12)

Eq. (C11) becomes Eq. (97).

3. |j,k_n〉 basis

Substituting Eq. (103) into Eq. (58) at steady state gives

0 = \sum_{j k} G_{j k} {g_{n - 1} 〈 n - 1, m ∣ j, k_{n - 1} 〉 + (n + 1), 〈 n + 1, m ∣ j, k_{n + 1} 〉 - (g_{n} + n) 〈 n, m ∣ j, k_{n} 〉 + ρ [q_{n} 〈 n, m - 1 ∣ j, k_{n} 〉 + (m + 1) 〈 n, m + 1 ∣ j, k_{n} 〉 - (q_{n} + m) 〈 n, m ∣ j, k_{n} 〉]} .

(C13)

or in terms of raising and lowering operators [cf. Eqs. (14)]

0 = \sum_{j k} G_{j k} 〈 n, m ∣ {{\hat{a}}_{n}^{+} {\hat{g}}_{n} ∣ j, k_{n - 1} 〉 + {\hat{a}}_{n}^{-} ∣ j, k_{n + 1} 〉 - ({\hat{g}}_{n} + {\hat{a}}_{n}^{+} {\hat{a}}_{n}^{-}) ∣ j, k_{n} 〉 + ρ [{\hat{a}}_{m}^{+} {\hat{q}}_{n} ∣ j, k_{n} 〉 + {\hat{a}}_{m}^{-} ∣ j, k_{n} 〉 - ({\hat{q}}_{n} + {\hat{a}}_{m}^{+} {\hat{a}}_{m}^{-}) ∣ j, k_{n} 〉] .

(C14)

Using the definitions in Eqs. (64)–(68), Eq. (C14) can be written as Eq. (105).

Using Eqs. (107)–(109), Eq. (105) can be written

0 = - 〈 n, m ∣ \sum_{j^{'} k^{'}} (j^{'} + ρ k^{'}) ∣ j^{'}, k_{n}^{'} 〉 G_{j^{'} k^{'}} - 〈 n, m ∣ \sum_{j^{'} k^{'}} {\hat{b}}_{n}^{+} {\hat{Γ}}_{n} ∣ j^{'}, k_{n}^{'} 〉 G_{j^{'} k^{'}} - 〈 m ∣ ρ \sum_{j^{'} k^{'}} ({\overset{‒}{q}}_{n} - q_{n}) \times 〈 n ∣ j^{'} 〉 {\hat{b}}_{m}^{+} ∣ k_{n}^{'} 〉 G_{j^{'} k^{'}} + 〈 m ∣ \sum_{j^{'} k^{'}} g_{n - 1} 〈 n - 1 ∣ j^{'} 〉 ∣ δ_{-} k_{n}^{'} 〉 G_{j^{'} k^{'}} + 〈 m ∣ \sum_{j^{'} k^{'}} (n + 1) 〈 n + 1 ∣ j^{'} 〉 ∣ δ_{+} k_{n}^{'} 〉 G_{j^{'} k^{'}},

(C15)

where

∣ δ_{\pm} k_{n} 〉 \equiv ∣ k_{n \pm 1} 〉 - ∣ k_{n} 〉,

(C16)

and the dummy indices j and k have been changed to j’ and k’ respectively. Using Eq. (C10) to note that

〈 k_{n} ∣ δ_{\pm} k_{n}^{'} 〉 = \frac{{(- Q_{n}^{\pm})}^{k - k^{'}}}{(k - k^{'})!} θ (k - k^{'}),

(C17)

where $Q_{n}^{\pm} = {\overset{‒}{q}}_{n} - {\overset{‒}{q}}_{n \pm 1}$ , we multiply Eq. (C15) by 〈k_n|m〉 and sum over m to obtain

0 = - 〈 n ∣ \sum_{j^{'}} (j^{'} + ρ k) ∣ j^{'} 〉 G_{j^{'} k} - 〈 n ∣ \sum_{j^{'}} {\hat{b}}_{n}^{+} {\hat{Γ}}_{n} ∣ j^{'} 〉 G_{j^{'} k} - ρ \sum_{j^{'}} ({\overset{‒}{q}}_{n} - q_{n}) 〈 n ∣ j^{'} 〉 G_{j^{'}, k - 1} + \sum_{j^{'}} g_{n - 1} 〈 n - 1 ∣ j^{'} 〉 \sum_{ℓ = 1}^{k} V_{n ℓ}^{-} G_{j^{'}, k - ℓ} + \sum_{j^{'}} (n + 1) 〈 n + 1 ∣ j^{'} 〉 \sum_{ℓ = 1}^{k} V_{n ℓ}^{+} G_{j^{'}, k - ℓ},

(C18)

in which we exploit the completeness of |m〉 states, i.e., Σ_m|m〉〈m|=1, and $V_{n ℓ}^{\pm}$ is as in Eq. (99). Multiplying Eq. (C18) by 〈j |n〉, summing over n, and exploiting Σ_n|n〉〈n|=1 for the first two terms, we obtain Eq. (110).

APPENDIX D

In this appendix we explicitly solve for G_nk in Eq. (117) using the tridiagonal matrix or Thomas [42] algorithm. We start by identifying the subdiagonal, diagonal, superdiagonal, and right-hand side elements of Eq. (117), respectively, as

A_{n} = g (n = 1, \dots, N),

(D1)

B_{n} = - (ρ k + g + n) (n = 0, \dots, N),

(D2)

C_{n} = n + 1 (n = 0, \dots, N - 1),

(D3)

R_{n} = - g ϕ_{k}^{-} δ_{n n_{1}} - n_{1} ϕ_{k}^{+} δ_{n n_{0}} (n = 0, \dots, N),

(D4)

where N is the cutoff in protein count n and n₁=n₀+1. Auxiliary variables are defined iteratively as

C_{0}^{'} = \frac{C_{0}}{B_{0}},

(D5)

C_{n}^{'} = \frac{C_{n}}{B_{n} - C_{n - 1}^{'} A_{n}} (n = 1, \dots, N - 1),

(D6)

R_{0}^{'} = \frac{R_{0}}{B_{0}},

(D7)

R_{n}^{'} = \frac{R_{n} - R_{n - 1}^{'} A_{n}}{B_{n} - C_{n - 1}^{'} A_{n}} (n = 1, \dots, N),

(D8)

and the solution is obtained by backward iteration with

G_{N}^{k} = R_{N}^{'},

(D9)

G_{n - 1}^{k} = R_{n - 1}^{'} - C_{n - 1}^{'} G_{n}^{k} (n = N, \dots, 1)

(D10)

(where k has been moved from subscript to superscript for ease of reading).

Computing the first few terms of Eq. (D6) reveals the pattern

C_{n}^{'} = - (n + 1) \frac{η_{n}^{k}}{η_{n + 1}^{k}},

(D11)

where

η_{n}^{k} = \sum_{i = 0}^{n} \frac{n!}{i! (n - i)!} g^{n - i} \prod_{ℓ = 0}^{i - 1} (ρ k + ℓ),

(D12)

with the convention that $\prod_{a}^{b} [\cdot] = 1$ if a>b. Note that since $\prod_{ℓ = 0}^{i - 1} (ρ k + ℓ) = Γ (ρ k + i) ∕ Γ (ρ k)$ , we may also use the integral representation of the Gamma function to write

η_{n}^{k} = \frac{1}{Γ (ρ k)} \sum_{i = 0}^{n} \frac{n!}{i! (n - i)!} g^{n - i} \int_{0}^{t} {dte}^{- t} t^{(ρ k + i - 1)}

(D13)

= \frac{1}{Γ (ρ k)} \int_{0}^{t} {dte}^{- t} t^{(ρ k - 1)} \sum_{i = 0}^{n} \frac{n!}{i! (n - i)!} g^{n - i} t^{i}

(D14)

= \frac{1}{Γ (ρ k)} \int_{0}^{t} {dte}^{- t} t^{(ρ k - 1)} {(g + t)}^{n} .

(D15)

Using Eq. (D8) it is immediately clear that

R_{n < n_{0}}^{'} = 0 .

(D16)

The first nonzero term is

R_{n_{0}}^{'} = \frac{n_{1} ϕ_{k}^{+}}{g n_{0}} \frac{η_{n_{0}}^{k}}{η_{n_{0} - 1}^{k}} \frac{1}{∊_{n_{0}}^{k} - 1},

(D17)

where we have defined

∊_{n}^{k} = \frac{ρ k + g + n}{g n} \frac{η_{n}^{k}}{η_{n - 1}^{k}} .

(D18)

Further iteration of Eq. (D8) makes clear that

R_{n > n_{0}}^{'} = \frac{n_{1}}{g} f_{k} \prod_{i = n_{0}}^{n} (\frac{1}{i} \frac{η_{i}^{k}}{η_{i - 1}^{k}} \frac{1}{∊_{i}^{k} - 1})

(D19)

= \frac{n_{1}}{g} \frac{(n_{0} - 1)!}{n!} \frac{η_{n}^{k}}{η_{n_{0} - 1}^{k}} f_{k} \prod_{i = n_{0}}^{n} \frac{1}{∊_{i}^{k} - 1},

(D20)

where

f_{k} = ϕ_{k}^{+} - \frac{g n_{0}}{n_{1}} \frac{η_{n_{0} - 1}^{k}}{η_{n_{0}}^{k}} (∊_{n_{0}}^{k} - 1) ϕ_{k}^{-} .

(D21)

Computing the first few terms of Eq. (D10) reveals that

G_{n > n_{0}}^{k} = (∊_{n}^{k} - 1) R_{n > n_{0}}^{'} F_{n}^{k}

(D22)

= \frac{n_{1}}{g} \frac{(n_{0} - 1)!}{n!} \frac{η_{n}^{k}}{η_{n_{0} - 1}^{k}} f_{k} F_{n}^{k} \prod_{i = n_{0}}^{n - 1} \frac{1}{∊_{i}^{k} - 1},

(D23)

where

F_{n}^{k} = \sum_{i = 0}^{N - n} \prod_{ℓ = n}^{n + i} \frac{1}{∊_{ℓ}^{k} - 1} .

(D24)

At the threshold Eq. (D10) gives

G_{n_{0}}^{k} = \frac{n_{1}}{g n_{0}} \frac{η_{n_{0}}^{k}}{η_{n_{0} - 1}^{k}} \frac{1}{∊_{n_{0}}^{k} - 1} (ϕ_{k}^{+} + f_{k} F_{n_{1}}^{k})

(D25)

and since $R_{n < n_{0}}^{'} = 0$ , the solution is easily completed using Eq. (D10), giving

G_{n < n_{0}} = G_{n_{0}} \prod_{i = 1}^{n_{0} - n} (- C_{n_{0} - i}^{'})

(D26)

= \frac{n_{1}}{g} \frac{(n_{0} - 1)!}{n!} \frac{η_{n}^{k}}{η_{n_{0} - 1}^{k}} \frac{ϕ_{k}^{+} + f_{k} F_{n_{1}}^{k}}{∊_{n_{0}}^{k} - 1} .

(D27)

These results are summarized in Eqs. (120)–(125).

Footnotes

PACS number(s): 87.10.Mn, 82.20.Fd, 87.10.Vg, 02.70.Hm

Contributor Information

Andrew Mugler, Department of Physics, Columbia University, New York, New York 10027, USA.

Aleksandra M. Walczak, Princeton Center for Theoretical Science, Princeton University, Princeton, New Jersey 08544, USA.

Chris H. Wiggins, Department of Applied Physics and Applied Mathematics, Center for Computational Biology and Bioinformatics, Columbia University, New York, New York 10027, USA.

References

[1].Hooshangi S, Thiberge S, Weiss R. Proc. Natl. Acad. Sci. U.S.A. 2005;102:3581. doi: 10.1073/pnas.0408507102. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Thattai M, van Oudenaarden A. Biophys. J. 2002;82:2943. doi: 10.1016/S0006-3495(02)75635-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Elowitz MB, Levine AJ, Siggia ED, Swain PS. Science. 2002;297:1183. doi: 10.1126/science.1070919. [DOI] [PubMed] [Google Scholar]
[4].Ozbudak EM, Thattai M, Kurtser I, Grossman AD, van Oudenaarden A. Nat. Genet. 2002;31:69. doi: 10.1038/ng869. [DOI] [PubMed] [Google Scholar]
[5].Swain PS, Elowitz MB, Siggia ED. Proc. Natl. Acad. Sci. U.S.A. 2002;99:12795. doi: 10.1073/pnas.162041399. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Paulsson J, Berg OG, Ehrenberg M. Proc. Natl. Acad. Sci. U.S.A. 2000;97:7148. doi: 10.1073/pnas.110057697. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Pedraza JM, van Oudenaarden A. Science. 2005;307:1965. doi: 10.1126/science.1109090. [DOI] [PubMed] [Google Scholar]
[8].Paulsson J. Nature (London) 2004;427:415. doi: 10.1038/nature02257. [DOI] [PubMed] [Google Scholar]
[9].Tkačik G, Callan CG, Bialek W. Phys. Rev. E. 2008;78:011910. doi: 10.1103/PhysRevE.78.011910. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Tkacik G, Callan CG, Jr., Bialek W. Proc. Natl. Acad. Sci. U.S.A. 2008;105:12265. doi: 10.1073/pnas.0806077105. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Ziv E, Nemenman I, Wiggins CH. PLoS ONE. 2007;2:e1077. doi: 10.1371/journal.pone.0001077. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Mugler A, Ziv E, Nemenman I, Wiggins CH. IET Sys. Bio. 2009;3:379. doi: 10.1049/iet-syb.2008.0165. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Emberly E. Phys. Rev. E. 2008;77:041903. doi: 10.1103/PhysRevE.77.041903. [DOI] [PubMed] [Google Scholar]
[14].Tostevin F, ten Wolde PR. Phys. Rev. Lett. 2009;102:218101. doi: 10.1103/PhysRevLett.102.218101. [DOI] [PubMed] [Google Scholar]
[15].Elf J, Li G-W, Xie XS. Science. 2007;316:1191. doi: 10.1126/science.1141967. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Choi PJ, Cai L, Frieda K, Xie XS. Science. 2008;322:442. doi: 10.1126/science.1161427. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. PLoS Biol. 2006;4:e309. doi: 10.1371/journal.pbio.0040309. [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Golding I, Paulsson J, Zawilski SM, Cox EC. Cell. 2005;123:1025. doi: 10.1016/j.cell.2005.09.031. [DOI] [PubMed] [Google Scholar]
[19].Tǎnase-Nicola S, Warren PB, Rein ten Wolde P. Phys. Rev. Lett. 2006;97:068102. doi: 10.1103/PhysRevLett.97.068102. [DOI] [PubMed] [Google Scholar]
[20].van Zon JS, Morelli MJ, Tǎnase-Nicola S, ten Wolde PR. Biophys. J. 2006;91:4350. doi: 10.1529/biophysj.106.086157. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].van Zon JS, ten Wolde PR. J. Chem. Phys. 2005;123:234910. doi: 10.1063/1.2137716. [DOI] [PubMed] [Google Scholar]
[22].van Kampen NG. Stochastic Processes in Physics and Chemistry. North-Holland; Amsterdam: 1992. [Google Scholar]
[23].Walczak AM, Sasai M, Wolynes PG. Biophys. J. 2005;88:828. doi: 10.1529/biophysj.104.050666. [DOI] [PMC free article] [PubMed] [Google Scholar]
[24].Lan Y, Papoian GA. J. Chem. Phys. 2006;125:154901. doi: 10.1063/1.2358342. [DOI] [PubMed] [Google Scholar]
[25].Lan Y, Wolynes PG, Papoian GA. J. Chem. Phys. 2006;125:124106. doi: 10.1063/1.2353835. [DOI] [PubMed] [Google Scholar]
[26].Lan Y, Papoian GA. Phys. Rev. Lett. 2007;98:228301. doi: 10.1103/PhysRevLett.98.228301. [DOI] [PubMed] [Google Scholar]
[27].Bortz AB, Kalos MH, Lebowitz JL. J. Comput. Phys. 1975;17:10. [Google Scholar]
[28].Gillespie DT. J. Phys. Chem. 1977;81:2340. [Google Scholar]
[29].Bellman RE. Adaptive Control Processes. Princeton University Press; Princeton, NJ: 1961. [Google Scholar]
[30].Morelli MJ, ten Wolde PR, Allen RJ. Proc. Natl. Acad. Sci. U.S.A. 2009;106:8101. doi: 10.1073/pnas.0810399106. [DOI] [PMC free article] [PubMed] [Google Scholar]
[31].Walczak AM, Mugler A, Wiggins CH. Proc. Natl. Acad. Sci. U.S.A. 2009;106:6529. doi: 10.1073/pnas.0811999106. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].A master equation in which the coordinate appears explicitly in the rates [e.g. g_n or q_n in Eq. (58)] is sometimes mistakenly termed “nonlinear” in the literature, perhaps discouraging calculations which exploit its inherent linear algebraic structure. We remind the reader that the master equation is perfectly linear.
[33].Klumpp S, Hwa T. Proc. Natl. Acad. Sci. U.S.A. 2008;105:20245. doi: 10.1073/pnas.0804953105. [DOI] [PMC free article] [PubMed] [Google Scholar]
[34].Iyer-Biswas S, Hayot F, Jayaprakash C. Phys. Rev. E. 2009;79:031911. doi: 10.1103/PhysRevE.79.031911. [DOI] [PubMed] [Google Scholar]
[35].Note that setting x=e^ik makes clear that the generating function is simply the Fourier transform.
[36].Note here, however, the difference with respect to the normalization convention commonly used in quantum mechanics in the prefactors of the creation and annihilation operations.
[37].Doi M. J. Phys. A. 1976;9:1465. [Google Scholar]
[38].Zel’dovich Ya. B., Ovchinnikov AA. JETP. 1978;47:829. [Google Scholar]
[39].Peliti L. J. Phys. A. 1986;19:L365. [Google Scholar]
[40].Mattis DC, Glasser ML. Rev. Mod. Phys. 1998;70:979. [Google Scholar]
[41].Lin J. IEEE Trans. Inf. Theory. 1991;37:145. [Google Scholar]
[42].Thomas L. Watson Sci. Columbia University; New York: 1949. [Google Scholar]
[43].Because this basis is a function of three coordinates (j, k, and n), we cannot abstractly define the generating function as in Eq. (61); we must instead work directly with the joint probability distribution.
[44].Gomperts BD, Kramer IM, Tantham PER. Signal Transduction. Academic Press; San Diego: 2002. [Google Scholar]
[45].Ting AY, Endy D. Science. 2002;298:1189. doi: 10.1126/science.1079331. [DOI] [PubMed] [Google Scholar]
[46].Detwiler PB, Ramanathan S, Sengupta A, Shraiman BI. Biophys. J. 2000;79:2801. doi: 10.1016/S0006-3495(00)76519-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
[47].Bolouri H, Davidson EH. Proc. Natl. Acad. Sci. U.S.A. 2003;100:9371. doi: 10.1073/pnas.1533293100. [DOI] [PMC free article] [PubMed] [Google Scholar]
[48].Bassler BL. Curr. Opin. Microbiol. 1999;2:582. doi: 10.1016/s1369-5274(99)00025-9. [DOI] [PubMed] [Google Scholar]
[49].Koepf W. Hypergeometric Summation: An Algorithmic Approach to Summation and Special Function Identities. Vieweg; Braunschweig, Germany: 1998. [Google Scholar]

[R1] [1].Hooshangi S, Thiberge S, Weiss R. Proc. Natl. Acad. Sci. U.S.A. 2005;102:3581. doi: 10.1073/pnas.0408507102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] [2].Thattai M, van Oudenaarden A. Biophys. J. 2002;82:2943. doi: 10.1016/S0006-3495(02)75635-X. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Elowitz MB, Levine AJ, Siggia ED, Swain PS. Science. 2002;297:1183. doi: 10.1126/science.1070919. [DOI] [PubMed] [Google Scholar]

[R4] [4].Ozbudak EM, Thattai M, Kurtser I, Grossman AD, van Oudenaarden A. Nat. Genet. 2002;31:69. doi: 10.1038/ng869. [DOI] [PubMed] [Google Scholar]

[R5] [5].Swain PS, Elowitz MB, Siggia ED. Proc. Natl. Acad. Sci. U.S.A. 2002;99:12795. doi: 10.1073/pnas.162041399. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Paulsson J, Berg OG, Ehrenberg M. Proc. Natl. Acad. Sci. U.S.A. 2000;97:7148. doi: 10.1073/pnas.110057697. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Pedraza JM, van Oudenaarden A. Science. 2005;307:1965. doi: 10.1126/science.1109090. [DOI] [PubMed] [Google Scholar]

[R8] [8].Paulsson J. Nature (London) 2004;427:415. doi: 10.1038/nature02257. [DOI] [PubMed] [Google Scholar]

[R9] [9].Tkačik G, Callan CG, Bialek W. Phys. Rev. E. 2008;78:011910. doi: 10.1103/PhysRevE.78.011910. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Tkacik G, Callan CG, Jr., Bialek W. Proc. Natl. Acad. Sci. U.S.A. 2008;105:12265. doi: 10.1073/pnas.0806077105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Ziv E, Nemenman I, Wiggins CH. PLoS ONE. 2007;2:e1077. doi: 10.1371/journal.pone.0001077. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Mugler A, Ziv E, Nemenman I, Wiggins CH. IET Sys. Bio. 2009;3:379. doi: 10.1049/iet-syb.2008.0165. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Emberly E. Phys. Rev. E. 2008;77:041903. doi: 10.1103/PhysRevE.77.041903. [DOI] [PubMed] [Google Scholar]

[R14] [14].Tostevin F, ten Wolde PR. Phys. Rev. Lett. 2009;102:218101. doi: 10.1103/PhysRevLett.102.218101. [DOI] [PubMed] [Google Scholar]

[R15] [15].Elf J, Li G-W, Xie XS. Science. 2007;316:1191. doi: 10.1126/science.1141967. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Choi PJ, Cai L, Frieda K, Xie XS. Science. 2008;322:442. doi: 10.1126/science.1161427. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. PLoS Biol. 2006;4:e309. doi: 10.1371/journal.pbio.0040309. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] [18].Golding I, Paulsson J, Zawilski SM, Cox EC. Cell. 2005;123:1025. doi: 10.1016/j.cell.2005.09.031. [DOI] [PubMed] [Google Scholar]

[R19] [19].Tǎnase-Nicola S, Warren PB, Rein ten Wolde P. Phys. Rev. Lett. 2006;97:068102. doi: 10.1103/PhysRevLett.97.068102. [DOI] [PubMed] [Google Scholar]

[R20] [20].van Zon JS, Morelli MJ, Tǎnase-Nicola S, ten Wolde PR. Biophys. J. 2006;91:4350. doi: 10.1529/biophysj.106.086157. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].van Zon JS, ten Wolde PR. J. Chem. Phys. 2005;123:234910. doi: 10.1063/1.2137716. [DOI] [PubMed] [Google Scholar]

[R22] [22].van Kampen NG. Stochastic Processes in Physics and Chemistry. North-Holland; Amsterdam: 1992. [Google Scholar]

[R23] [23].Walczak AM, Sasai M, Wolynes PG. Biophys. J. 2005;88:828. doi: 10.1529/biophysj.104.050666. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] [24].Lan Y, Papoian GA. J. Chem. Phys. 2006;125:154901. doi: 10.1063/1.2358342. [DOI] [PubMed] [Google Scholar]

[R25] [25].Lan Y, Wolynes PG, Papoian GA. J. Chem. Phys. 2006;125:124106. doi: 10.1063/1.2353835. [DOI] [PubMed] [Google Scholar]

[R26] [26].Lan Y, Papoian GA. Phys. Rev. Lett. 2007;98:228301. doi: 10.1103/PhysRevLett.98.228301. [DOI] [PubMed] [Google Scholar]

[R27] [27].Bortz AB, Kalos MH, Lebowitz JL. J. Comput. Phys. 1975;17:10. [Google Scholar]

[R28] [28].Gillespie DT. J. Phys. Chem. 1977;81:2340. [Google Scholar]

[R29] [29].Bellman RE. Adaptive Control Processes. Princeton University Press; Princeton, NJ: 1961. [Google Scholar]

[R30] [30].Morelli MJ, ten Wolde PR, Allen RJ. Proc. Natl. Acad. Sci. U.S.A. 2009;106:8101. doi: 10.1073/pnas.0810399106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] [31].Walczak AM, Mugler A, Wiggins CH. Proc. Natl. Acad. Sci. U.S.A. 2009;106:6529. doi: 10.1073/pnas.0811999106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].A master equation in which the coordinate appears explicitly in the rates [e.g. g_n or q_n in Eq. (58)] is sometimes mistakenly termed “nonlinear” in the literature, perhaps discouraging calculations which exploit its inherent linear algebraic structure. We remind the reader that the master equation is perfectly linear.

[R33] [33].Klumpp S, Hwa T. Proc. Natl. Acad. Sci. U.S.A. 2008;105:20245. doi: 10.1073/pnas.0804953105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] [34].Iyer-Biswas S, Hayot F, Jayaprakash C. Phys. Rev. E. 2009;79:031911. doi: 10.1103/PhysRevE.79.031911. [DOI] [PubMed] [Google Scholar]

[R35] [35].Note that setting x=e^ik makes clear that the generating function is simply the Fourier transform.

[R36] [36].Note here, however, the difference with respect to the normalization convention commonly used in quantum mechanics in the prefactors of the creation and annihilation operations.

[R37] [37].Doi M. J. Phys. A. 1976;9:1465. [Google Scholar]

[R38] [38].Zel’dovich Ya. B., Ovchinnikov AA. JETP. 1978;47:829. [Google Scholar]

[R39] [39].Peliti L. J. Phys. A. 1986;19:L365. [Google Scholar]

[R40] [40].Mattis DC, Glasser ML. Rev. Mod. Phys. 1998;70:979. [Google Scholar]

[R41] [41].Lin J. IEEE Trans. Inf. Theory. 1991;37:145. [Google Scholar]

[R42] [42].Thomas L. Watson Sci. Columbia University; New York: 1949. [Google Scholar]

[R43] [43].Because this basis is a function of three coordinates (j, k, and n), we cannot abstractly define the generating function as in Eq. (61); we must instead work directly with the joint probability distribution.

[R44] [44].Gomperts BD, Kramer IM, Tantham PER. Signal Transduction. Academic Press; San Diego: 2002. [Google Scholar]

[R45] [45].Ting AY, Endy D. Science. 2002;298:1189. doi: 10.1126/science.1079331. [DOI] [PubMed] [Google Scholar]

[R46] [46].Detwiler PB, Ramanathan S, Sengupta A, Shraiman BI. Biophys. J. 2000;79:2801. doi: 10.1016/S0006-3495(00)76519-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] [47].Bolouri H, Davidson EH. Proc. Natl. Acad. Sci. U.S.A. 2003;100:9371. doi: 10.1073/pnas.1533293100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] [48].Bassler BL. Curr. Opin. Microbiol. 1999;2:582. doi: 10.1016/s1369-5274(99)00025-9. [DOI] [PubMed] [Google Scholar]

[R49] [49].Koepf W. Hypergeometric Summation: An Algorithmic Approach to Summation and Special Function Identities. Vieweg; Braunschweig, Germany: 1998. [Google Scholar]

PERMALINK

Spectral solutions to stochastic models of gene expression with bursts and regulation

Andrew Mugler

Aleksandra M Walczak

Chris H Wiggins

Abstract

I. INTRODUCTION

FIG. 2.

II. BURSTS OF GENE EXPRESSION

A. Notation and definitions

B. Spectral method

FIG. 1.

C. On/off gene

III. GENE REGULATION

A. Representations of the master equation

1. |n,m〉 basis

FIG. 3.

2. |j,k〉 basis

3. |j,kj〉 basis

4. |n,kn〉 basis

5. |j,kn〉 basis

B. Comparison of the representations

C. Analytic solution

D. Threshold-regulated gene approximates the on/off gene

FIG. 4.

IV. REGULATION WITH BURSTS

A. Four-state process

B. Information-optimal solution

FIG. 5.

V. CONCLUSIONS

ACKNOWLEDGMENTS

APPENDIX A

APPENDIX B

APPENDIX C

1. |j,kj〉 basis

2. |n,kn〉 basis

3. |j,kn〉 basis

APPENDIX D

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3. |j,k_j〉 basis

4. |n,k_n〉 basis

5. |j,k_n〉 basis

1. |j,k_j〉 basis

2. |n,k_n〉 basis

3. |j,k_n〉 basis