Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Apr 4.
Published in final edited form as: Phys Rev Res. 2026 Mar 18;8(1):013296. doi: 10.1103/f7qj-f7qy

Optimizing information transmission in optogenetic Wnt signaling

Olivier Witteveen 1, Samuel J Rosen 2, Ryan S Lach 3, Maxwell Z Wilson 4,5,6,7,*, Marianne Bauer 1,
PMCID: PMC13048778  NIHMSID: NIHMS2158384  PMID: 41939533

Abstract

Populations of cells regulate gene expression in response to external signals, but their ability to make reliable collective decisions is limited by both intrinsic noise in molecular signaling and variability between individual cells. In this work, we use optogenetic control of the canonical Wnt pathway as an example to study how reliably information about an external signal is transmitted to a population of cells, and determine an optimal encoding strategy to maximize information transmission from Wnt signals to gene expression. We find that it is possible to reach an information capacity beyond 1 bit only through an appropriate, discrete encoding of signals: using no Wnt, a short Wnt pulse, or a sustained Wnt signal. By averaging over an increasing number of outputs, we systematically vary the effective noise in the pathway. As the effective noise decreases, the optimal encoding comprises more discrete input signals. These signals do not need to be fine-tuned to achieve near-optimal information transmission. The optimal code transitions into a continuous code in the small-noise limit, which can be shown to be consistent with the Jeffreys prior. We visualize the performance of different signal encodings using decoding maps. Our results suggest that optogenetic Wnt signaling allows for regulatory control beyond a simple binary switch and provide a framework to apply ideas from information processing to single-cell in vitro experiments.

I. INTRODUCTION

Cells respond to external signals by adapting their gene expression [1]. However, gene regulatory responses can fluctuate [25]. Especially in the context of development, precise responses are important for coordinated cell-fate decisions that lead to the healthy development of an organism [68]; therefore, the considerable cell-to-cell variability that has been observed in the downstream targets of signaling pathways crucial for development may seem surprising [911].

The mutual information between a signal and an output quantifies the precision in information transfer [12,13]. Experiments on mammalian signaling pathways often report values barely exceeding one bit [14,15], which is the minimum amount of information required for a reliable decision between only two states, for example, a differentiated and an undifferentiated state. To understand whether cells actually have more information available, a variety of approaches have been proposed in different biological contexts, including specific computational strategies, signaling architectures, cell-to-cell contact adjustments, or incorporating the information contained in temporal dynamics [9,10,1621]. Here, we turn to an optimization over the space of possible input signals to investigate how information transmission in an important signaling pathway can be made more precise.

We focus on the canonical Wnt signaling pathway, a key regulator of cell-fate decisions during development and maintenance of adult tissues. Wnt signaling is crucial for the differentiation of stem cells into lineages such as skin, bone, and other tissues [22,23]. We study cellular responses using an established reporter of Wnt transcriptional activity, TopFlash, following optogenetic activation of the canonical Wnt signaling pathway [24,25]. We vary the duration of the optogenetic Wnt signals and observe long-tailed distributions of gene expression; we describe them using gamma distributions. Similar distributions have been widely observed for protein expression among cells in a population using fluorescent reporters [5], and are consistent with stochastic gene expression models [26,27].

We optimize information transmission between optogenetic inputs and gene expression outputs by determining the optimal prior distribution over input signals. We find that if signals are chosen uniformly from all possible signal durations, the mutual information is approximately 0.7 bits, whereas with a prior that involves only two discrete signals—no Wnt, or a long sustained Wnt signal—we can reach almost one bit. We can obtain more than one bit when we fully optimize the distribution of input signals, and find that the optimal prior consists of three different input signals: a Wnt off state, a short Wnt pulse, and a long sustained Wnt signal. Since different maps between input and output can have the same mutual information, we visualize which input signals can be distinguished based on the output using decoding maps, for which we provide analytic expressions whenever possible.

It is possible that cells have more information available than observed with our fluorescent reporter. This increased precision could be a consequence of different biological scenarios: for example, if cells average expressed proteins among neighbors [2830], or if they derive differentiation outcomes in response to not one but multiple relevant Wnt targets [16,3133]. Therefore, we explore how these optimal signals change if gene expression responses are represented by those we observe, but more precise; mathematically, we describe this increased precision by a narrowing gamma distribution. We use the Blahut-Arimoto algorithm to optimize the input distribution. A central result is that the optimal input signals are discrete for the noisy gene expression we observe and smoothly transition to a continuous encoding as the noise decreases. We show that the continuous distribution in the small-noise limit is equivalent to the Jeffreys prior. Finally, we present calculations to show that the discrete optimal priors yield “sloppy” optima [34,35]: This means that they do not need to be fine-tuned.

This signal-level optimization has potential application in engineering contexts: Our optimized signals correspond to those that external users should supply such that the output can be decoded with high precision. It may also offer insight into the signals that cells might encounter in natural contexts: Ideas from efficient coding [36] suggest that signal transmission is optimized, and therefore would predict that the signals we calculate should be those typically encountered by cells (see Refs. [3739] for work on neurons).

While it is unclear if cells need to decode Wnt signal durations, how exactly Wnt provides information is also not yet established: hypotheses in different contexts include Wnt timing or duration [4043], fold-changes [44,45], or absolute concentrations, either in gradients or dynamics [42,4649]. Timing and duration of signals can play a key role in guiding differentiation [25,50,51], and we chose to investigate the duration of Wnt here because it is easily accessible in the opto-Wnt experiments. Since our work depends only on the parametrization of the conditional output distribution, our results will also apply to other gene regulatory outputs and input signals. We discuss how cells may be able to interpret signals with discrete priors inspired by recent work that optimizes an understanding or model of the world with finite samples [52].

More broadly, we present a systematic framework for investigating how heterogeneity in gene expression across a population impacts information transmission, including cases where the realistic biological noise is difficult to characterize.

II. CELLULAR RESPONSES TO WNT SIGNALING

We explore the expression of genes that respond to the canonical Wnt signaling pathway in a clonal established human embryonic kidney cell line (HEK293T) engineered to respond to optogenetic Wnt signals [24,25]. The duration of the Wnt signal can be varied experimentally, and we use this duration t as an input signal. We measure cellular responses to Wnt signaling using a synthetic fluorescent TCF/LEF (TopFlash-type) iRFP reporter that reflects the activation of Wnt/β-catenin target genes [24]. This reporter is established for Wnt targets [24,5355] and we refer to it as TopFlash from now on. At the molecular level, TopFlash and many canonical Wnt/β-catenin target genes are activated as a result of β-catenin accumulation in the cytoplasm and nucleus, following the binding of extracellular Wnt ligands to membrane receptors [56].

We collect the output expression levels (fluorescent intensity per cell) of TopFlash, denoted by g, of ca. 1500 ± 800 cells to optogenetic Wnt input signals of varying durations t ranging from 0 to 20h [Fig. 1(a)]. The experiment is conducted using a high-throughput light stimulation device, the LITOS plate, which enables optogenetic activation across multiple experimental conditions simultaneously (Appendix A) [57]. To ensure that the measured fluorescence has stabilized and to remove effects from residual signaling dynamics, we include a 4-h cool-down period after signal termination before measuring g. This allows Wnt pathway effectors, such as stabilized β-catenin, to return to baseline levels [25]. Since g is not degraded and cell division in the cool-down period is negligible, the value of g represents a robust measure for gene expression as a consequence of the Wnt pulse.

FIG. 1.

FIG. 1.

(a) Optogenetic control of Wnt signaling. In the absence of light, there is no Wnt signal and no expression of TopFlash g. When the light is activated, Wnt target genes are expressed. We vary the duration t of the Wnt signal and measure the resulting gene expression g. (b) The histograms over g are long-tailed, left-skewed, and unimodal; shown here for Wnt signal durations t=5,10,15, and 20 h. Black lines show the gamma distribution from Eq. (1), evaluated at the appropriate t. (c) The mean μg(t) of each histogram scales linearly with the standard deviation σg(t). The black line shows the linear dependence predicted by Eq. (1). (d) Rescaling the histograms (dividing by the standard deviation) shows a collapse of gene expression data. The collapsed data are described well by a gamma distribution with shape parameter kˆ2.88±0.01 and unit variance (black line). (e) The mean gene expression μg(t) grows linearly with time, as captured by Eq. (1) (black line). Error bars show the standard deviation. (f) We can view our system analogously to a communication channel where input t is mapped to output g via the noisy transmission probability p(g|t).

Histograms of g for a given signal duration t are left-skewed, long-tailed, and unimodal [see Fig. 1(b) for signal durations t=5,10,15,and20h]. We observe that the histograms are well described by gamma distributions, and that the mean response μg(t) is directly proportional to the standard deviation σg(t) [Fig. 1(c)]. The latter implies that the gamma distribution is parametrized by a constant shape parameter k and a time-dependent scale parameter θ(t):

p(gt)=1Γ(k)θ(t)kgk-1e-g/θ(t), (1)

where Γ is a gamma function. The mean and variance of g are given by

μg(t)=kθ(t), (2)
σg2(t)=kθ(t)2, (3)

respectively. The distribution of Eq. (1) predicts that one can collapse all histograms by normalizing each by their standard deviation [Fig. 1(d)]: Indeed, after rescaling, all data collapse onto a gamma distribution with shape parameter kˆ=2.88±0.01 and unit variance.

To identify how the scale parameter θ(t) depends on the Wnt signal duration t, we plot the mean μg(t) for all experimental conditions [Fig. 1(e)]. For Wnt signals longer than ~1h, we find that the mean gene expression grows linearly with the signal duration. Therefore, the scale parameter θ(t)at must also grow linearly with time, where we estimate aˆ=23.0±0.1a.u.h-1. We add a small correction term ϵe-t/τ to the scale parameter θ(t) to fit the data in the regime tτ1h (Appendix B).

We note that the gamma distribution was fit empirically and is thus a phenomenological description of the data. We choose it here due to its convenient parametrization. Other long-tailed distributions, such as log-normal, may also fit the data well. These long-tailed distributions, as well as the linear relationship between the standard deviation and the mean, have been suggested to hold universally for protein fluctuations among cells in a population [5] and emerge in bottom-up stochastic models of gene expression [26,27,58].

Next, we quantify how precisely we can reconstruct the Wnt signal from the gene expression. Given the broad, long-tailed distributions with substantial overlap between experimental conditions, we anticipate that the information transmission in the pathway will appear limited.

III. INFERRING THE WNT SIGNAL FROM GENE EXPRESSION IN SINGLE CELLS

We can view our system analogously to a communication channel tg with transmission probability p(g|t) [Fig. 1(f)]. As such, we quantify how much information about the input t is captured by the output g using the mutual information [12,59]:

Ig;t=0dt0dgpgtptlog2pgtpg. (4)

This mutual information I(g;t) captures (in bits) how much we expect to learn about the Wnt signal by observing the gene expression. For the rest of this manuscript, we use the phrase “input signal” or “Wnt signal” to refer to the Wnt signal duration t.

The mutual information requires knowledge of the distribution of input signals p(t), also referred to as the prior distribution [12,13,59]. A sensible prior distribution that favors no particular signal condition, like the experiment, is one that is uniform over all available signals t[0,). For this uniform prior, we obtain I(g;t)0.67 bits. Since 1 bit is the minimum required to reliably distinguish two states (e.g., an “on-off” switch), this result suggests that the gene expression carries less than the information required to support even a binary regulatory decision.

The numerical value of the mutual information can be difficult to assess abstractly. It is bounded from above by the entropy of the input distribution, which in turn depends on the size of the state space of possible input signals. For example, if the input distribution includes several discrete states, a mutual information of 1 bit does not necessarily imply that any two particular states are neatly distinguishable. Therefore, it can be useful to employ quantities other than the mutual information that allow us to more clearly identify which signals become confused in the information transmission from input to output.

To do so, we ask how well one can infer the optogenetic Wnt signal t from a measurement of the gene expression g. This is captured by the posterior distribution p(t|g), which one obtains from Bayes’ theorem, p(t|g)=p(g|t)p(t)/p(g), where p(g)=0dtp(gt)p(t). With a uniform distribution p(t) over the interval t[0,), we find

p(tg)=p(gt)0dtpgta(k-1)p(gt), (5)

where the final expression is valid in the regime t1h. In principle, features of the posterior p(t|g) can be used to quantify the precision in this inference problem. Decoding errors, such as the variance of inferred t around its true value, are often used to quantify inference precision [60]. However, such metrics rely on selecting a decoding rule, such as the posterior mean or MAP estimate, which may be misleading when the posterior p(t|g) is skewed, heavy-tailed, or multimodal [61]. Here, we use a decoding map to quantify our ability to decode without subscribing to an estimator. Decoding maps have been used to quantify positional precision from gap gene expression patterns in the early fly embryo [61,62].

The decoding map quantifies the average posterior p(t|g) generated from a true input t*. To construct the decoding map, we consider a Markov chain in Fig. 2(a) and integrate out the regulatory output through which we intend to infer

p(1)tt*=0dgp(tg)pgt*. (6)

The superscript “(1)” refers to the fact that we are considering gene expression from a single (N=1) cell. While the benefit of decoding maps is most obvious for multidimensional g, where they provide a means to visualize the precision in the inference in a two-dimensional object, they can also be useful for scalar g: We will use them later to visualize the performance of different signal encodings. If the gene expression provides enough information to reconstruct the Wnt signal accurately, the density p(1)tt* will be sharply peaked around the diagonal t=t*.

FIG. 2.

FIG. 2.

(a) The input signal t* leads to a gene output g drawn from the transmission probability pgt*. Based on a measurement of g, one can use the posterior distribution p(t|g) to infer the input signal. (b) Decoding map p(1)tt* from Eq. (7), showing the average probability assigned to t by the posterior p(t|g) given that the true signal is t*. Here, the input distribution p(t) is uniform over all possible signals t[0,)h.

We can compute the distribution p(1)tt* analytically in the regime t,t*1h, by inserting the posterior from Eq. (5) into Eq. (6) and performing the change of variables g=g1/θ(t)+1/θt*, to obtain

p(1)tt*Γ(2k-1)Γ(k)Γ(k-1)(tt*)k-1t+t*2k-1. (7)

The distribution in Eq. (7) is a beta-prime distribution, and the normalizing constant can be identified as a beta-function B(k,k-1)=Γ(k)Γ(k-1)/Γ(2k-1) [63]. We plot the decoding map in Fig. 2(b) and observe that the width of the decoding map broadens linearly with the Wnt signal duration t*. This suggests that absolute decoding errors also grow proportional to t*, while relative errors remain constant.

Next, we ask how reliable information transfer from Wnt to a single target gene could be possible. To do so, we note that the mutual information between the Wnt signal and gene expression depends not only on the channel p(g|t), which we take as given from the experimental data, but also on how one chooses the input signals p(t).

IV. OPTIMAL ENCODING OF WNT SIGNALS USES A DISCRETE DISTRIBUTION

As a first step toward optimizing the signal distribution, we focus on a binary input distribution consisting of a Wnt “off” state (t=0h) and a single Wnt “on” state of duration t=Δth [Fig. 3(a)]. We explore this setup, since for noisy channels with limited capacity (on the order of 1 bit or less), an efficient coding strategy is to use two maximally distinguishable signal states [12,6467]. We find that an optogenetic Wnt signal of approximately Δt10h is necessary to reliably distinguish the “on” state from the “off” state, approaching the information-theoretic upper bound I(g;t)1 bit.

FIG. 3.

FIG. 3.

(a) Binary input distribution of optogenetic Wnt signals, containing (i) an “off” state of t=0h and (ii) an “on” state of t=Δth. (b) Mutual information I(g;t) as a function of the duration Δt of the on state. Results from the data (blue) and predictions from the gamma distribution in Eq. (1) (black) are shown. Error bars obtained via subsampling. The upper bound I1 (gray line) corresponds to perfect distinguishability of the on and off states.

This finding is of biological interest: cells may use Wnt signaling to make binary cell-fate decisions, for example, between remaining undifferentiated or committing to mesoderm [33]. In this context, our results suggest that such a binary decision is only reliable if the Wnt “on” signal persists for durations longer than ~10h. This timescale is biologically realistic and lies well within the doubling time of the cells (ca. 20–30h [6870]). In addition, recent work in the intestinal crypt has shown that stem cells commit to differentiation after Wnt signaling is lost for ca. 10h [50]. This observation suggests that a binary encoding scheme, based on the sustained presence or absence of Wnt signaling, could have biological relevance.

Next, we optimize the signal distribution p(t) to obtain the maximally achievable mutual information or channel capacity:

I=maxp(t)I(g;t). (8)

The capacity-achieving distribution p(t) tells us how to encode Wnt signals to create maximally distinguishable gene expression outcomes within the noisy constraints. In most cases, this optimization is analytically intractable. Instead, we optimize numerically using the Blahut-Arimoto (BA) algorithm [71,72]. The algorithm converges to a discrete solution [Fig. 4(a)]: The optimal encoding of optogenetic Wnt signals selects a set of three discrete signals (or “symbols”) at t1,t2, and t3. We obtain a capacity of

I(1)1.12bits, (9)

which is a significant improvement over the naive uniform encoding.

FIG. 4.

FIG. 4.

(a) Optimal encoding of optogenetic Wnt signals for single cells: the Blahut-Arimoto algorithm converges to a discrete solution p(1)(t), consisting of three optimally distinguishable Wnt signal durations or “symbols.” (b) Decoding map p(1)tt* obtained using the optimal prior: at the cost of discretizing the space of input signals, we gain distinguishability [cf. Fig. 2(b)].

Convergence of the BA algorithm to the discrete solution p(t) is slow compared to the convergence to the information capacity I, especially if the density of symbols is high. We can exploit the knowledge that p(t) is discrete to significantly accelerate convergence to the optimal solution [52,73,74]. To initialize the distribution, we use a weighted sum of K delta functions, representing K discrete symbols:

p(t)=i=1Kwiδt-ti. (10)

We iteratively optimize their locations ti using gradient descent, while updating the weights wi using a BA-type update rule. To find the optimal K, we use lower and upper bounds to the information capacity to either add or remove a delta function after convergence (Appendix C).

It is interesting to note that under noisy conditions, nontrivial inputs beyond binary “on/off” encoding may be optimal for cells using Wnt signaling. The decoding map in Fig. 4(b) visualizes the way the optimal encoding improves information transmission: Even though the input signals are not perfectly distinguishable based on their outputs, information transmission is improved compared to a binary input distribution containing two (almost) completely distinguishable states.

V. DECODING FROM N OUTPUTS

We pursued the above analysis under the assumption that the probability with which a particular cell expresses a target gene g is set by the observed experimental response of TopFlash. However, the distribution of TopFlash observed in our cell culture may be noisier than a biologically relevant output of cells in realistic tissue settings [7577]. Therefore, we ask in this section how our analysis changes if the biologically relevant output is systematically more precise than the TopFlash distributions we observe. For example, cells might communicate output with neighbors via surface signaling or molecular exchange, or have access to multiple regulatory targets. If the observed TopFlash distribution is representative also of other such outputs, we can think of these extra outputs as additional samples from the same distribution.

We consider N gene expression outputs g=g1,g2,,gN, where each output gi is sampled from a gamma distribution, and use g to decode the Wnt signal t. This can correspond to, for example, cells producing N regulatory outputs; alternatively, cells may have access to the mean output of N neighbors, g=i=1Ngi/N, through cell-cell communication. Mathematically, it turns out that both options are equivalent. If the responses gi are independent, one can show that the sample mean g is a sufficient statistic, implying that all information contained in g about the input signal t is preserved in g (Appendix D). Indeed, for the HEK293T cells in our experiment, we find negligible correlation between TopFlash expression of neighboring cells (Appendix E). Hence, the mutual information satisfies I(g;t)=I(g;t) and decoding is identical p(tg)=p(tg). Therefore, we can consider the sample mean g of N cells in what follows.

Since g is the mean of N identically gamma-distributed random variables with shape parameter k and scale parameter θ(t), its distribution is also gamma, with shape parameter Nk and scale parameter θ(t)/N. The distribution of g given a Wnt signal t is thus

p(gt)=1Γ(Nk)(θ(t)/N)NkgNk-1e-Ng/θ(t). (11)

The mean and variance are given by μg(t)=kθ(t) and σg2(t)=kθ(t)2/N, respectively; as such, N in Eq. (11) not only represents an integer number of samples, but can also be seen as a continuous parameter that changes the effective noise by a factor of 1/N.

We now ask how the mutual information and decoding change as we increase N. We can expect the mutual information to scale asymptotically as I~(1/2)log2N (Appendix F). We confirm this scaling for both a prior distribution p(t) that is uniform and one that is optimized [Fig. 5(a)]. The mutual information using the binary prior, where our input is restricted to two signal durations, is limited by definition to 1 bit and is not a good choice for maximizing the mutual information when the effective map from input to output becomes more precise. However, in regimes where the noise is large (N1), a binary encoding comes close to achieving the information capacity.

FIG. 5.

FIG. 5.

(a) We show the information capacity achieved by the optimal encoding of Wnt signals (red line), and show convergence to analytical results in the small- and large-noise regimes. (b) The optimal code for Wnt signaling consists of a discrete number of symbols (blue dots). As the effective noise decreases, the optimal number of symbols increases, and approaches a continuous optimal code p()(t) with a heavy tail that decays as ~1/t. The color of the markers (blue shade) indicates the relative probability mass of the symbols: darker blue indicates a higher probability mass. (c) Decoding maps visualize how encoding strategies affect signal inference. Shown are decoding maps p(N)tt* for ensembles of N=2 (top row) and N=10 (bottom row) cells. A uniform prior over opto-Wnt signals (left column) leads to broad posterior distributions, while the optimized discrete choice of signals (right column) yields more distinguishable responses and higher mutual information I(N)(g;t), at the cost of discretizing the space of input signals.

In the limit of large N, or small effective noise, one can derive an analytic expression for the optimal p(t) [6,13,62,7882]. In this small-noise approximation, we assume that p(gt) is a narrow Gaussian distribution, and that we can calculate p(tg) by performing an expansion (Appendix F); then, taking a variational derivative of I(g;t) with respect to p(t), one finds

p()(t)1σg(t)dμg(t)dt. (12)

Since both μg(t) and σg(t) grow linearly with t for longer Wnt signals, it follows from Eq. (12) that the tail of p()(t) decays as ~1/t.

As expected, the small-noise approximation approaches the information capacity from below [Fig. 5(a)] and is a good approximation for N20. Notably, we can show that this optimal continuous encoding from the small-noise limit in Eq. (12) is equivalent to the Jeffreys prior (Appendix F). The Jeffreys prior is a noninformative prior that is invariant to changes in parametrization, defined as pJ(t)|(t)|1/2, where (t) is the Fisher information [83]:

(t)=0dgp(gt)lnp(gt)t2. (13)

Indeed, it is known that in the limit of an infinite number of identical, independent trials of the same experiment (i.e., N), the prior that maximizes the mutual information between input and output converges weakly to the Jeffreys prior [84].

Next, we investigate how the numerically optimized prior p(N)(t) changes as N increases. Since we know that the optimal prior consists of three discrete symbols for N=1 and should approach the continuous distribution in Eq. (12) for large N, we expect that it will admit an increasing number of symbols as N increases. We find that this is indeed the case: Figure 5(b) shows a bifurcationlike diagram of the positions and weights of the optimal prior distribution, where symbols split into two and additional symbols are added as N increases. For high N, the density of the symbols starts approaching the optimal distribution p()(t) from the small-noise approximation.

As before, we visualize how the optimization improves decoding performance and the mutual information with decoding maps [Fig. 5(c)]. Unlike the uniform prior, which leads to smoothly narrowing posteriors as N increases (approaching a Gaussian for large N), the optimized prior increases the mutual information by admitting more discrete symbols. The optimal number of symbols K in the optimal prior p(N)(t) follows an asymptotic scaling law I~(3/4)log2K, consistent with recent literature (Appendix G) [52,74]. The discretization enables better distinguishability between inputs, as illustrated by increased activity along the diagonal of the decoding map. At the same time, the optimal prior does not achieve perfect distinguishability of symbols, reflected by the remaining off-diagonal activity. The optimal input distribution, like in the N=1 case, finds a balance between distinguishability and adding additional symbols. An alternative strategy is to encode fewer symbols K<K than optimal, and approach the bound log2K bits: This leads to better distinguishability but does not achieve the maximum mutual information.

The fact that the optimal input distribution is discrete is interesting: there may be biological situations in which cells want to distinguish between a discrete number of cell fates, such as the different germ layers. In practice, cell fates are regulated by complex networks of multiple input signals and genetic targets, a more intricate setting than the one we study here. It is therefore not clear if our optimal input distribution carries direct biological meaning. Yet, it is conceivable that also for these more complex input spaces, discrete inputs are optimal and it is therefore interesting to investigate if cells attempt to map signals they receive to discrete input states. We also note that the optimal input distribution is one that permits slight errors, which suggests that some uncertainty in inference of signals is inherently part of optimal information processing; as such, some observed cellular noise in differentiation could be part of an information-maximization strategy.

We observe that the numerical optimization for the optimal prior distribution converges more quickly to the correct value of the mutual information than to the correct number of delta functions K and their positions ti, especially as N becomes larger [52]. This implies that the information landscape at the optimum is smooth, and has some directions where parameters for the prior distribution still change while the optimum is almost attained. These directions are typically referred to as “sloppy” directions [34,35] and their presence has important implications for the ability of biological systems to show variability in parameter space, even at the optimum [85]. Indeed, in Fig. 6(a) we show the mutual information for N=2 as a function of the positions of two out of four delta functions and observe a broad optimum with different sensitivities depending on the direction one moves away from the optimum.

FIG. 6.

FIG. 6.

The exact duration of individual symbols does not need to be fine-tuned. (a) For N=2, the optimal prior consists of K=4 symbols [cf. Fig. 5(b)]. We vary the positions of two symbols t2 and t3, while keeping their weights fixed. (b) Information I(g;t) (color-shade) shows a broad optimum (red star) as a function of the position of peaks t2 and t3. (c) The Hessian matrix [Eq. (14)] has a sloppy spectrum that widens as N increases: Symbols at longer durations become more sloppy and symbols at shorter durations become more stiff.

The sloppiness can be quantified using the Hessian matrix of the cost-function, in this case the mutual information I(g;t) for a given N [62,85]. Calculating this Hessian can be numerically difficult. Here, we have access to the functional form of the probability distribution p(g|t), and can therefore calculate the Hessian with respect to the positions ti of the discrete symbols in the optimal encoding p(N)(t) (Appendix H). Writing p(N)(t) as in Eq. (10), the Hessian matrix becomes

χij=2I(g;t)titj=wiln20dgδij2pgtiti2lnpgtip(g)+1pgtipgtiti2-wjp(g)pgtitipgtjtj. (14)

The eigenvectors of χ determine directions in parameter space ti that have independent effects on the mutual information, and the eigenvalues λi tell us the sensitivity along these directions. We evaluate the Hessian at the stationary point that maximizes I(g;t). After diagonalization, we indeed observe a sloppy spectrum [Fig. 6(b)], with eigenvalues spanning ca. two decades. The most stiff eigendirections correspond to the shorter durations, where the density of symbols is highest. As N increases, the spectrum broadens: symbols at longer durations becoming more sloppy, and those at shorter durations become more stiff. Practically, the fact that the optimal prior is sloppy implies that the optimal signal encoding does not need to be fine-tuned [62,85]; this could indeed be one advantage of information transmission using channels with similarly long-tailed distributions of gene expression outputs.

In deriving Eq. (14), we assumed weights wi are fixed; alternatively, one can keep the weights optimized while varying the position of the symbols. In that case, we also obtain a sloppy spectrum (Appendix H). We emphasize that Eq. (14) (and the extension for variable weights in Appendix H) are general and can be used to obtain the sensitivity spectrum of a discrete prior for any choice of transmission probability.

VI. DISCUSSION

In this work, we optimized information transmission from optogenetic Wnt signals to (representative) Wnt target gene expression. We measured expression of a fluorescent reporter in a HEK cell line for different optogenetic signal durations, and found that gamma distributions with a constant shape parameter describe the output distributions well across different input conditions. We calculated the distribution of input signals that maximizes information transfer and found that it consists of a discrete set of signals that allows for optimal decoding from the output. We also explored how this optimal set of signals would change if cells had access to multiple instances of the output we measure: for example, because they average multiple genetic outputs or exchange outputs with neighboring cells. As the effective noise decreases, the optimal input distribution evolves from three discrete symbols to a continuous distribution: We showed that the latter can be obtained from the small-noise approximation or, equivalently, the Jeffreys prior. Using decoding maps, we visualized how the optimal choice of optogenetic inputs improves signal inference, and finally, we showed that the optimal signals do not need to be fine-tuned. In the following, we discuss the biological and theoretical impact of our findings.

a. Discrete Wnt input signals.

The canonical Wnt signaling pathway is active in different types of cells, often with the goal to direct mammalian cells toward differentiation. The effective features of Wnt signals that cells respond to depend on context and may include absolute or relative concentrations, thresholds, timers, and combinatorial interactions with other signaling pathways. Here, we focus on the Wnt signal duration as a potential input, since it can be precisely controlled in our optogenetic setup. Durations of Wnt signals may have developmental relevance, particularly in the context of organoid development [4143,50]. We found that two Wnt input signals (a binary prior) provide more information than a uniform distribution when cells have to decide based on a single output: When the Wnt input is either no signal or a signal lasting at least ca. 10h, the output expression is precise enough to reliably distinguish between the two. Such a binary response between two states, e.g., cell differentiation or remaining in the undifferentiated state, could be relevant for Wnt biologically [86,87]. This timescale of ca. 10h could also be biologically reasonable: it fits well within a cell’s division cycle, and connects to recent findings in gut cells, where differentiation is triggered only after Wnt signal loss lasting about 10h, while shorter transient losses leave cells undifferentiated [50].

Our work to infer optimal priors would imply, in analogy with ideas with efficient coding, that cells either experience strongly different Wnt signals, or map the Wnt signals they experience onto signals that are strongly differentiable. This idea is similar to ideas from optimal parameter inference with limited data [52]. While it is unclear whether this happens biologically, it is interesting to observe that both recent work on differentiation timers [50], as well as clustering of cells with similar inputs [88], could suggest that cells indeed try to map the input they receive onto clear binary states (here, the presence or absence of Wnt). In our analysis, we find that cells could, in principle, resolve a third, intermediate state as well. We do not know whether this additional state is used as such invivo, but it seems possible that multiple Wnt input states can be decoded in some contexts [89,90] and it is interesting that this possibility emerges purely from the distributions observed in the experiment and a model-free optimization framework.

In addition to suggestions for what signals cells may receive naturally, our work also has meaning from an engineering perspective: the optimal optogenetic signals we infer could be used with the goal to engineer signals that populations of cells will be able to respond to precisely. This could be interesting also in the context of synthetic development.

b. Optimal priors for systematically lower noise, connections to population coding, and decoding maps.

Our work connects to previous work on optimizing information transmission in gene regulation. That a binary prior (or switch) can optimize transmission for noisy systems has also previously been discussed in the context of genetic networks [62,66,67,80]. An interesting and perhaps more surprising finding of our work is the increasing number of discrete signals as the effective noise in the signaling pathway decreases: While the precise phase diagram of symbols depends on the underlying distribution, in general, additional symbols emerge as the noise decreases since the Shannon-optimal prior balances complexity with the ability to resolve different input signals. Reference [52] showed this effect in the context of Bayesian inference, to predict what models should be inferred for noisy data; in our work, it is either the cell or the engineer that would employ the optimal priors for reliable signal processing. The fact that there is sloppiness of the exact positions of these signals can make near-optimal information transmission easier than expected.

We investigated signaling via multiple cells or outputs as a way to improve information transmission; this is equivalent to using multiple communication channels and is related to population coding in neural systems [9193]. An alternative approach to consider if information transmission from signals to outputs is higher than observed is to consider repeated stimuli on single cells: indeed, recent work shows that single cells responding to repeated optogenetic stimuli show a more reproducible output—higher mutual information—compared to population-level measurements [94].

To visualize how the encoding of input signals influences inference, we use decoding maps alongside the mutual information. They are reminiscent of recurrence plots for nonlinear dynamical systems, showing at what times a dynamical system reverts to a state it has visited before. Recurrence plots can be viewed as a measure of the “predictability” of the system [95], similar to how decoding maps can visualize ambiguity in inference from multimodal posterior distributions [61,62]. In our case, the decoding maps visualize how the optimal encoding discretizes the input space to improve distinguishability between signals, and how signal inference becomes more precise as the effective noise decreases.

c. Outlook and future work.

Our work is directly applicable to gene expression responses that follow gamma distributions, and we expect qualitatively similar results for other long-tailed distributions that are commonly observed for gene expression [5,26,27,58]. This suggests that the implications of our work may go beyond Wnt signaling. Our systematic reduction of noise would also predict for those signaling pathways that discrete input distributions maximize information flow. While we do not know whether these discrete intermediate durations are a relevant input in vivo, Wnt/β-catenin systems exhibit multistate dynamic decoding in other contexts, and developing embryoid systems pass through intermediate Wnt activity states during patterning [89,90]. In the context of general gene regulatory responses, it seems possible that smooth inputs are mapped onto a discrete but not infinite number of outputs, such as a finite number of segments in the fly embryo body plan: In this context, it may be beneficial to map signal inputs onto different discrete states. A more detailed investigation of evidence of the use of discrete signal priors in gene regulatory or other biological contexts is an interesting direction for future work.

Our work relates closely to the problem of selecting effective models that maximize the information extracted from finite data, as discussed in Ref. [52]. In their framework, this corresponds to choosing a Bayesian prior that maximizes the mutual information between parameters and predictions. In our system, there are two concrete applications: First, the optimal prior can predict signal durations that should be used in synthetic, engineered experiments if cells are to respond distinguishably. Second, if cells operate consistent with the model selection work of Ref. [52], the cell itself should map the signal onto a discrete state space to optimally extract information. It will be interesting to explore the optimal input distributions in multi-input settings [33,96,97] in the future.

ACKNOWLEDGMENTS

We thank Florian Berger, William Bialek, Aneta Koseska, Pieter Rein ten Wolde, and the members of the Bauer group for useful discussions. We acknowledge funding from the NWO VIDI Talent Programme, NWO/VI.Vidi.223.169 (M.B. and O.W.). We also acknowledge funding from the U.S. Army Research Office and experiments were accomplished under contract W911NF-19-D-0001 and cooperative agreement W911NF-19–2-0026 for the Institute for Collaborative Biotechnologies (S.J.R., R.S.L., and M.Z.W.).

APPENDIX A: EXPERIMENTAL METHODS

Human 293T cells were cultured at 37 °C and 5% CO2 in Dulbecco’s Modified Eagle Medium, high glucose GlutaMAX (Thermo Fisher Scientific, 10566016) medium supplemented with 10% fetal bovine serum (Atlas Biologicals, F-0500-D) and 1% penicillin-streptomycin (Thermo Fisher Cat. No. 15140–122). Clonal 293Ts containing CRISPR tdmRuby3-β-cat, oLRP6-Puro (AddGene ID: 249712), and pPig-8X-TOPFlash-tdIRFP-Puro (AddGene ID: 249713) were obtained from Dr. Ryan Lach’s previous experiments [24,25].

We seeded these cells onto a 96-well glass-bottom plate (Cellvis Cat. No.: P96–1.5H-N) coated with fibronectin (Thermo Fisher Cat. No.: 33010018) in Dulbecco’s Modified Eagle Medium, high glucose GlutaMAX (Thermo Fisher Scientific, 10566016) medium supplemented with 10% fetal bovine serum (Atlas Biologicals, F-0500-D) and 1% penicillin-streptomycin. Cells were allowed to adhere to the plate for 24h. Afterward, cells were stimulated via a benchtop LED array purpose-built for light delivery to cells in standard tissue culture plates (LITOS) [57]. Light was patterned to cover the entire surface of intended wells of plates used, rather than a single microscope imaging field. Post-LITOS stimulation, cells were moved into dark conditions for 4h to ensure the Wnt pathway was properly deactivated prior to imaging. After 4h, cells were then imaged using a Nikon W2 SoRa spinning-disk confocal microscope equipped with an incubation chamber maintaining cells at 37 °C and 5% CO2. We then used our previously published image analysis pipeline (methods, image analysis from Ref. [25]). We quantified the TopFlash-tdiRFP fluorescent intensity (measured in arbitrary units, a.u.) by taking the median pixel value for each individual segmented cell; we refer to this quantity as g. The data from the experiment are openly available in Ref. [98], along with the code used for its analysis [99].

We analyzed g for populations of ca. 1500 ± 800 cells that were optically stimulated for up to 24h, which is of order of the doubling time of 293T cells (20–30 h [6870]). We included data of up to until 20h in this experiment, since 20h is on the lower end of possible cell cycle estimates, and we observe a clear linear increase of mean TopFlash expression with time in this time frame. After 1–3 division cycles, cells grow beyond the volume of the plate, and tracking becomes more difficult as this time frame is approached. Therefore, a time frame of ca. 20h marks both a natural end to our experiment and lies well within the regime of reliable experimental data acquisition. Finally, we note that cell division is negligible during ca. 4-h cooldown window after the Wnt signal, since only ca. 15% of cells are expected to divide in this time frame. We therefore expect no decay of TopFlash intensity in this regime, and TopFlash accumulation has been found to be stable in similar experiments even for over 10h after the end of the Wnt “on” condition; for details on TopFlash dynamics, we refer to Ref. [25]. Analysis from the smaller dataset of Ref. [25] (not shown) suggests that TopFlash distributions at longer cool-down windows are also gamma-distributed with the same shape parameter k but different scale parameters; since most analytic results in our manuscript depend explicitly on k, our results do not sensitively depend on the cool-down window and could easily be extended to others.

APPENDIX B: FITTING THE GENE EXPRESSION DATA

Gene expression distributions observed in cell cultures are frequently long-tailed, and have been successfully modeled using, e.g., gamma, negative binomial, or log-normal distributions [5]. These choices are not only empirically well matched to the data, but also mechanistically plausible in systems where gene expression is shaped by multiple interacting timescales [26,27,58]. In our case, we find that our distributions are particularly well described by a gamma distribution:

p(gk,θ(t))=1Γ(k)θ(t)kgk-1e-g/θ(t), (B1)

parametrized by a constant shape parameter k and a time-dependent scale parameter θ(t). In fact, one can show that this parametrization is our only choice given that the mean μg(t) and standard deviation σg(t) are directly proportional [Fig. 1(c)]. Our aim here is to estimate k and θ(t) from the data using a maximum-likelihood estimate.

We denote the data for TopFlash as gij and the signal durations as ti, where i=1,,n and j=1,,mi. The ith experimental condition is populated by mi different cells at the end of the experiment. We first consider the regime where θ(t)at. Thus, our task is to estimate k and a. The likelihood function is given by

(k,a)=i=1nj=1mipgijk,a,ti, (B2)

and hence our log-likelihood is

ln((k,a))=i=1nj=1milnpgijk,a,ti=i=1nj=1mi-ln(Γ(k))-kln(a)-klnti+(k-1)lngij-gijati. (B3)

Setting

ln()akˆ,aˆ=0, (B4)

we obtain

aˆ=1kˆ1l=1nmli=1nj=1migijti. (B5)

Further, setting

ln()kkˆ,aˆ=0, (B6)

we get

ψ(0)(kˆ)=ln(kˆ)+1l=1nmli=1nj=1mi-lngij+ln1r=1nmrp=1nq=1mpgpqtp+lnti, (B7)

where ψ(0)(k)=dln(Γ(k))/dk is the polygamma function of order 0. We can solve this numerically for kˆ without too much trouble, though we can continue analytically to very good approximation using the asymptotic expansion:

ψ(0)(k)~ln(k)-l=1Bllkl. (B8)

Here, Bl are Bernoulli numbers with the convention B1=+12. Keeping the first two terms in the sum, we obtain the following estimator for k:

kˆ=1+1+43C4C, (B9)

where

C=1l=1nmli=1nj=1mi-lngij+ln1r=1nmrp=1nq=1mpgpqtp+lnti. (B10)

We obtain, using Eqs. (B5) and (B9),

kˆ=2.88±0.01, (B11)
aˆ=23.0±0.1h-1, (B12)

where the error is obtained using the asymptotic normality of maximum-likelihood estimators.

If we take the scale parameter to be θ(t)=at, the gamma distribution in Eq. (B1) has a singularity at t=0. The singularity is not physical, as it implies that any signal t>0 is perfectly distinguishable from t=0, and more importantly does not match the experimental data in this regime. To regularize the behavior of p(g|t) near t=0, we add a small exponential term to the scale parameter:

θ(t)=at+ϵe-t/τ, (B13)

where ϵˆ=0.86h and τˆ=0.87h are positive constants that we estimate from the data. The mean and variance are therefore given by

μg(t)=kat+ϵe-t/τ, (B14)
σg2(t)=ka2t+ϵe-t/τ2, (B15)

respectively. At times tτ, the exponential term becomes irrelevant and we recover θ(t)at.

APPENDIX C: ALGORITHM FOR COMPUTING THE CHANNEL CAPACITY

We want to maximize the mutual information I(g;t) with respect to the input distribution p(t):

I=maxp(t)I(g;t), (C1)

where I is the channel capacity and p(gt) is fixed. In most cases, this optimization is analytically intractable and we must proceed numerically. The BA algorithm is the standard algorithm for solving this problem [71,72]. One starts with an initial guess p(0)(t) for the input distribution, and with each iteration it is updated as follows:

p(τ)(t)=1Z(τ-1)p(τ-1)(t)efKL(τ-1)(t), (C2)

where

fKL(τ)(t)=0dgp(gt)lnp(gt)p(τ)(g) (C3)

and

p(τ)(g)=0Tdtp(gt)p(τ)(t). (C4)

Practically, we restrict ourselves to a finite domain t[0,T], where T is the maximum signal duration.

After τ iterations, the lower bound to the channel capacity is given by

IL(τ)=1ln20Tdtp(τ)(t)fKL(τ)(t), (C5)

and an upper bound is given by

IU(τ)=maxtfKL(τ)(t)ln2. (C6)

As such, we iterate until convergence

IU(τ)-IL(τ)<ϵ (C7)

and use IIL(τ) and p(t)p(τ)(t) as our estimates for the channel capacity and the optimal input distribution, respectively. The optimization problem in Eq. (C1) is convex and guaranteed to converge to the global maximum [12].

As shown in Fig. 4 and as noted by Mattingly et al. [52], convergence to a discrete solution in the interior of the domain t[0,T] is rather slow compared to the boundaries, especially when there is high density of delta functions. To overcome this, we can exploit the knowledge that p(t) is discrete by starting with a sum of K delta functions:

p(t)=i=1Kwiδt-ti (C8)

and adjusting the weights wi and positions ti iteratively. Starting with K equally spaced delta functions, we use the BA algorithm to adjust the weights wi until convergence. After this, we use the gradient

Iti=wiln20dgpgtitilnpgtip(g) (C9)

to adjust the positions ti. Iterating the adjustment of the weights via the BA algorithm and the positions by gradient ascent, we can converge to the optimal discrete solution.

In contrast to the BA algorithm, the optimization over wi and ti is not convex: in particular, it depends on the number of peaks K we define beforehand. After convergence, we can compute fKL(t) everywhere in the domain. If maxtfKL(t) is greater than fKL(t) evaluated at any of the peaks ti, we have to add another delta function [52]. This way, we ensure that (1) we have converged to the global maximum and (2) that we have used the optimal number of delta functions.

The accompanying code [99] includes a self-contained example for computing the channel capacity using the procedure above.

APPENDIX D: SUFFICIENT STATISTICS FOR INDEPENDENT, IDENTICAL, GAMMA-DISTRIBUTED VARIABLES

We find that there is negligible spatial correlation in the gene expression g (Appendix E). Hence, we can treat the cells as responding independently conditional on the Wnt signal t. Below, we show that when one considers a group of N cells, the arithmetic mean g is a sufficient statistic for the signal duration t. That is, g contains as much information about t as the whole dataset g, as claimed in the main text.

A single cell i in a group of N cells exposed to a Wnt signal of duration t responds independently by expressing output gi~Gamma(k,θ(t)). Hence, the likelihood function for the group of N cells is given by

p(gt)=i=1Npgit (D1)
=i=1Ngik-1Γ(k)Nθ(t)Nke-i=1Ngi/θ(t). (D2)

In Sec. V, we claim that the arithmetic mean g1Ni=1Ngi is a sufficient statistic for t. A quick way to see this is by observing that Eq. (D2) satisfies Fisher-Neyman factorization [100]. That is, it can be written in the form p(gt)=h(g)f(g,t) for nonnegative functions h and f, from which the defining property p(gg,t)=p(gg) follows. Below, however, we derive this property explicitly.

The likelihood of g conditioned on t can be written as

p(gt)=1Γ(Nk)(θ(t)/N)NkgNk-1e-Ng/θ(t), (D3)

which follows from the addition of N independent and gamma-distributed random variables. Using Bayes’ theorem, we obtain

p(gg,t)=p(gt)p(gt). (D4)

Substituting Eqs. (D2) and (D3) into the above equation, we get

p(gg,t)=i=1Ngik-1gNk-1Γ(Nk)NkΓ(k)N=p(gg) (D5)

as required. The last equality follows as all t dependence has been canceled out.

Sufficiency of g also implies the posterior distributions p(tg) and p(tg) are identical. This can be seen by applying Bayes’ theorem p(tg)=p(gt)p(t)/p(g). Substituting p(gt)=p(gg)p(gt) and p(g)=p(gg)p(g), we obtain

p(tg)=p(gt)p(t)p(g)=p(tg) (D6)

as required.

We can also show that the mutual information satisfies I(t;g)=I(t;g). This makes precise the statement that the statistic g contains as much information about the signal t as the whole dataset g. To show this, we use the data-processing inequality for the mutual information in two ways. First, as g is a function of the dataset g, we must have

I(t;g)I(t;g). (D7)

Second, we have just shown that p(gg,t)=p(gg). This implies that we have a Markov chain tgg, to which we can also apply the data-processing inequality:

I(t;g)I(t;g). (D8)

Together, these inequalities imply that I(t;g)=I(t;g), as required.

Finally, we can show that the decoding maps pgtt* and pgtt* are identical. Starting with pgtt*, we can introduce an integral over g using the law of total probability:

pgtt*=0dgp(tg)pgt* (D9)
=0dg0dgp(tg)p(gg)pgt*. (D10)

By changing the order of integration and using the fact that p(tg)=p(tg), we obtain

pgtt*=0dgp(tg)pgt*0dgp(gg) (D11)
=pgtt*. (D12)

As such, the decoding maps are identical whether we consider the gene expression in the whole group g or just the mean expression g.

FIG. 7.

FIG. 7.

There are negligible spatial correlations between responses g to the optogenetic Wnt signal. Here, we show data from a repeat experiment for t=12.5h. (a) Scatterplot showing TopFlash expression g vs the expression of the nearest neighboring cell, gnn. (b) Mutual information Ig;gnn and Ig;gscr, where gscr is a scrambled permutation of g, as a function of inverse sample size. Extrapolating to infinite sample size we obtain Ig;gnn0.2 bits and Ig;gscr0.1 bits, which are negligible compared to the joint entropies Hg,gnn21.0 bits and Hg,gscr21.1 bits.

APPENDIX E: NEGLIGIBLE SPATIAL CORRELATIONS IN TOPFLASH EXPRESSION

In the main text, we consider the vector response g of a group of N cells to an optogenetic Wnt signal of duration t. We treated each response as independent. To justify this assumption, we investigate here whether there are spatial correlations; specifically, whether the TopFlash response g of a particular cell is correlated to that of its nearest neighbor gnn. Indeed, we find no significant correlation between g and gnn: The results from a repeat experiment for t=12.5h are shown in Fig. 7. The scatterplot between g and gnn reveals an uncorrelated cloud [Fig. 7(a)], and the mutual information Ig;gnn is negligible [Fig. 7(b)].

APPENDIX F: SMALL-NOISE REGIME AND EQUIVALENCE TO THE JEFFREYS PRIOR

As before, we seek to find the distribution p(t) that maximizes the mutual information I(g;t). In this Appendix, we consider regimes where the optimization is analytically tractable. In Fig. 5, we show that we can analytically compute the channel capacity I in regimes where the effective noise is really small (N1). We derive these results here.

As the effective noise approaches zero, it is known that the optimal code for a communication channel becomes continuous [52,74]. In this regime, we can use the fact that the noise is small to derive p(t) analytically. In literature, this is often referred to as the small-noise approximation and is widely used for studying information transmission in biological systems [6,62,7882]. We will also illustrate that the prior obtained in this limit is formally the same as the Jeffreys prior, a noninformative prior that is often used in Bayesian statistics [83].

We explore the gene regulatory response for a particular signal duration, tg, which occurs with probability

p(gt)=1Γ(Nk)(θ(t)/N)NkgNk-1e-Ng/θ(t). (F1)

In the limit of small noise (N1), we can approximate the gamma distribution as a narrow Gaussian distribution with a mean and variance given by μg(t)=kθ(t) and σg2(t)=kθ(t)2/N, respectively.

We write the mutual information as a difference of entropies:

I(g;t)=H(g)-H(gt), (F2)

where

H(g)=-0dgp(g)log2p(g) (F3)

and

H(gt)=-0Tdtp(t)0dgp(gt)log2p(gt). (F4)

When the noise σg(t) is small, the mapping p(gt) is almost deterministic. As such, we can write

p(t)p(g)dμgdt. (F5)

This allows us to write entropy in Eq. (F3) as

H(g)H(t)+0Tdtp(t)log2dμgdt. (F6)

Further, using the Gaussian approximation for p(gt), the conditional entropy in Eq. (F4) becomes

H(gt)120Tdtp(t)log2(2πeσg2(t)). (F7)

Having written both H(g) and H(gt) in a way where p(t) is the only “free” distribution that we can vary, we can now proceed with the optimization. We add a Lagrangian multiplier to ensure normalization of p(t), and optimize

[p(t)]=I(g;t)-β0Tdtp(t). (F8)

Taking the variational derivative with respect to p(t), we get

δδp(t)=log2dμgdt-log2p(t)-1ln2-12log22πeσg2(t)-β. (F9)

Setting δ/δp(t)=0, we obtain

p(t)=1Z1σg(t)dμgdt, (F10)

where

Z=0Tdtσg(t)dμgdt (F11)

is a normalizing constant. Using the optimal input distribution in Eq. (F10) to evaluate I(g;t), we obtain the channel capacity given by [13]

I=log2Z2πe. (F12)

Since θ(t)t for times 1h, both μg(t) and σg(t) grow linearly with t in this regime. It follows from Eq. (F10) that the tail of the optimal prior decays functionally as ~1/t [cf. Fig. 5(b)].

In Fig. 5, we show the mutual information obtained using the prior in Eq. (F10) derived from the small-noise approximation. Indeed, asymptotically with increasing N, we converge to the channel capacity I. To derive the functional behavior of the channel capacity I with large N we can proceed by inspection. By construction, the mean μg(t) does not depend on N. The variance obeys σg2(t)1/N, and therefore the normalizing constant Z in Eq. (F11) satisfies ZN1/2. As such, the channel capacity in Eq. (F12) asymptotically scales as

I~12log2N+o(1). (F13)

We also note that the prior obtained in the small-noise limit is formally identical to the Jeffreys prior [83]. It is known that in the limit of an infinite number of identically, independent trials of the same experiment (i.e., N), the prior that is Shannon-optimal converges weakly to the Jeffreys prior [84]. Indeed, the same is true for a single precise experiment in the limit of small noise. We derive the Jeffreys prior below and verify that it is indeed identical to Eq. (F10).

The Jeffreys prior is designed to be invariant under reparametrization of the probability distribution, and is defined as

pJ(t)|(t)|1/2, (F14)

where (t) is the Fisher information,

(t)=0dgp(gt)lnp(gt)t2. (F15)

In our case, p(gt) is gamma distributed with shape parameter Nk and a time-dependent scale parameter θ(t)/N:

lnp(gt)=-lnΓ(Nk)-Nkln(θ(t)/N)+(Nk-1)lng-Ngθ(t). (F16)

Taking the first derivative and substituting μg(t)=kθ(t) and σg2(t)=kθ(t)2/N, we obtain

lnp(gt)t2=1σg(t)4dμg(t)dt2g-μg(t)2. (F17)

The expectation over g-μg(t)2 is just the variance σg2(t); hence, the Fisher information in Eq. (F15) becomes

(t)=1σg(t)2dμg(t)dt2. (F18)

The Jeffreys prior can now be written as

pJ(t)1σg(t)dμgdt, (F19)

which is identical to the Shannon-optimal prior in the limit of small noise in Eq. (F10). We have thus verified that the Jeffreys prior and the prior that optimizes the mutual information are equivalent in the limit of small noise.

FIG. 8.

FIG. 8.

Asymptotic scaling law between the number of symbols K in the optimized prior p(t) and the mutual information I(g;t), in agreement with literature [52,74]. The bound I(g;t)log2K is shown using a dotted line.

APPENDIX G: ASYMPTOTIC SCALING LAW FOR THE CHANNEL CAPACITY

In the main text, we explored what happens to our optimal prior p(N)(t) as we considered multiple cells N. As N becomes larger, the effective noise level decreases—analogous to sending the same message multiple times in a communication channel. As the noise level approaches zero, it is known that the number of symbols K in the optimal code follows an asymptotic scaling law. In this limit, the channel capacity I scales with the logarithm of the number of symbols K as I~(3/4)log2K [74]. Indeed, we confirm the scaling law for our system in Fig. 8 as a nontrivial check of our optimization and to verify consistency with existing literature. The dotted line shows the fundamental limit Ilog2K, which would reach equality if the K symbols were perfectly distinguishable.

FIG. 9.

FIG. 9.

Fine-tuning of symbols in the optimal code; here, weights wi are kept optimized as positions ti are varied. (a) For N=2, the optimal prior consists of K=4 symbols [cf. Fig. 5(b)]. We vary the positions of two symbols t2 and t3, while keeping the weights wi optimized. (b) Information I(g;t) (colorshade) shows a broad optimum (red star) as a function of the position of peaks t2 and t3. (c) The Hessian matrix [cf. Eq. (H8)] has a sloppy spectrum that widens as N increases.

APPENDIX H: FINE-TUNING OF THE OPTIMAL CODE

Here, we consider a prior distribution p(t) composed of a set of K discrete symbols ti with respective weights wi:

p(t)=i=1Kwiδt-ti, (H1)

and ask to what extent the capacity-achieving distribution p(t) needs to be fine-tuned. Since the BA algorithm converges much more quickly to the channel capacity I than to the optimal prior p(t), we anticipate a sloppy information-landscape around the optimum.

To investigate fine-tuning of the prior, we write the mutual information I(g;t) as a function of the positions ti while keeping the weights wi fixed:

Ig;tti=i=1Kwi0dgpgtilog2pgtip(g). (H2)

Denoting the optimized positions as ti, we can expand the mutual information around its maximum, the capacity

Ig;tti=I+12i,j=1Kti-tiχij(tj-tj)+, (H3)

where χij is the Hessian matrix:

χij=2Ititjti*. (H4)

The eigenvectors of χ determine directions in parameter space ti that have independent effects on the mutual information, whereas the eigenvalues λi tell us the sensitivity along these directions [62]. Since we have have fit the data with a functional form for pgti, we can proceed analytically to compute the Hessian.

To proceed, we express the partial derivatives of the marginal distribution p(g) with respect to symbol ti,

p(g)ti=wipgtiti. (H5)

We can then compute the first derivative of the mutual information with respect to ti:

Iti=wiln20dgpgtitilnpgtip(g). (H6)

Note that this is the same gradient as the one used to iteratively adjust the positions in Eq. (C9). Taking a second derivative with respect to tj, we obtain

χij=2Ititj=wiln20dgδij2pgtiti2lnpgtip(g)+1pgtipgtiti2-wjp(g)pgtitipgtjtj. (H7)

To compute the integral, we insert the functional form p(gt) and integrate numerically, after which we can find the spectrum λi by diagonalizing χ [Fig. 5(d)]. We note that in deriving Eq. (H7), we have not assumed a functional form for p(gt), and thus the equation holds for any choice of transmission probability and discrete prior.

In the above and the main text, we kept the weights wi fixed while adjusting the positions ti of the symbols. Alternatively, one can keep the weights optimized while varying the position of the symbols. In that case, the Hessian becomes a total derivative χij=d2I/dtidtj and receives a contribution from adjusting the weights as one departs from the optimum. Denoting the partial derivatives of the mutual information by Ittij=2I/titj,Iwtij=2I/witj, and Iwwij=2I/wiwj, we can write the Hessian as

χ=Itt-IwtTMIwt, (H8)

where M is a matrix given by

M=Iww-1-Iww-111TIww-11TIww-11, (H9)

and 1=(1,,1). Again, these expressions are general and hold for any choice of transmission probability and discrete prior. In Fig. 9(a), we plot the mutual information for N=2 as a function of the positions of two out of four delta functions while keeping the weights optimized. We observe a broad optimum similar to Fig. 6(a). In Fig. 9(b), we use Eq. (H8) to plot the spectrum as a function of N; spanning four decades, it is more sloppy than the case for fixed weights in the main text.

DATA AVAILABILITY

The data that support the findings of this article are openly available [98,99].

References

  • 1.Jacob F and Monod J, Genetic regulatory mechanisms in the synthesis of proteins, J. Mol. Biol. 3, 318 (1961). [DOI] [PubMed] [Google Scholar]
  • 2.Elowitz MB, Levine AJ, Siggia ED, and Swain PS, Stochastic gene expression in a single cell, Science (New York, N.Y.) 297, 1183 (2002). [DOI] [PubMed] [Google Scholar]
  • 3.Elf J, Paulsson J, Berg OG, and Ehrenberg M, Near-critical phenomena in intracellular metabolite pools, Biophys. J. 84, 154 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Paulsson J, Summing up the noise in gene networks, Nature (London) 427, 415 (2004). [DOI] [PubMed] [Google Scholar]
  • 5.Salman H, Brenner N, Tung C-K, Elyahu N, Stolovicki E, Moore L, Libchaber A, and Braun E, Universal protein fluctuations in populations of microorganisms, Phys. Rev. Lett. 108, 238105 (2012). [DOI] [PubMed] [Google Scholar]
  • 6.Dubuis JO, Tkačik G, Wieschaus EF, Gregor T, and ialek W, Positional information, in bits, Proc. Natl. Acad. Sci. USA 110, 16301 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Merle M, Friedman L, Chureau C, Shoushtarizadeh A, and Gregor T, Precise and scalable self-organization in mammalian pseudo-embryos, Nat. Struct. Mol. Biol. 31, 896 (2024). [DOI] [PubMed] [Google Scholar]
  • 8.Warmflash A, Sorre B, Etoc F, Siggia ED, and Brivanlou AH, A method to recapitulate early embryonic spatial patterning in human embryonic stem cells, Nat. Methods 11, 847 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Stanoev A, Schröter C, and Koseska A, Robustness and timing of cellular differentiation through population-based symmetry breaking, Development 148, dev197608 (2021). [DOI] [PubMed] [Google Scholar]
  • 10.Suderman R, Bachman JA, Smith A, Sorger PK, and Deeds EJ, Fundamental trade-offs between information flow in single cells and cellular populations, Proc. Natl. Acad. Sci. USA 114, 5755 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.el Azhar Y, Schulthess P, van Oostrom MJ, Weterings SDC, Meijer WHM, Tsuchida-Straeten N, Thomas WM, Bauer M, and Sonnen KF, Unravelling differential Hes1 dynamics during axis elongation of mouse embryos through single-cell tracking, Development 151, dev202936 (2024). [DOI] [PubMed] [Google Scholar]
  • 12.Cover TM and Thomas JA, ElementsofInformationTheory (Wiley, New York, 1991). [Google Scholar]
  • 13.Bialek W, Biophysics: Searching for Principles (Princeton University Press, Princeton, 2012). [Google Scholar]
  • 14.Cheong R, Rhee A, Wang CJ, Nemenman I, and Levchenko A, Information transduction capacity of noisy biochemical signaling networks, Science (New York, N.Y.) 334, 354 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Uda S, Saito TH, Kudo T, Kokaji T, Tsuchiya T, Kubota H, Komori Y, Ozaki Y.-i., and Kuroda S, Robustness and compensation of information transmission of signaling pathways, Science 341, 558 (2013). [DOI] [PubMed] [Google Scholar]
  • 16.Iyer KS, Prabhakara C, Mayor S, and Rao M, Cellular compartmentalisation and receptor promiscuity as a strategy for accurate and robust inference of position during morphogenesis, eLife 12, e79257 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bettoni R, Dupont G, Walczak AM, and de Buyl S, Optimizing information transmission in neural induction constrains cell surface contacts of ascidian embryos, PRX Life 3, 033004 (2025). [Google Scholar]
  • 18.Selimkhanov J, Taylor B, Yao J, Pilko A, Albeck J, Hoffmann A, Tsimring L, and Wollman R, Accurate information transmission through dynamic biochemical signaling networks, Science 346, 1370 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Tostevin F and ten Wolde PR, Mutual information between input and output trajectories of biochemical networks, Phys. Rev. Lett. 102, 218101 (2009). [DOI] [PubMed] [Google Scholar]
  • 20.Reinhardt M, Tkačik G, and ten Wolde PR, Path weight sampling: Exact Monte Carlo computation of the mutual information between stochastic trajectories, Phys. Rev. X 13, 041017 (2023). [Google Scholar]
  • 21.Moor A-L and Zechner C, Dynamic information transfer in stochastic biochemical networks, Phys. Rev. Res. 5, 013032 (2023). [Google Scholar]
  • 22.Cadigan KM and Nusse R, Wnt signaling: a common theme in animal development, Genes Dev. 11, 3286 (1997). [DOI] [PubMed] [Google Scholar]
  • 23.Arnold CP, Benham-Pyle BW, Lange JJ, Wood CJ, and Sánchez Alvarado A, Wnt and TGFβ coordinate growth and patterning to regulate size-dependent behaviour, Nature (London) 572, 655 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Lach RS, Qiu C, Kajbaf EZ, Baxter N, Han D, Wang A, Lock H, Chirikian O, Pruitt B, and Wilson MZ, Nucleation of the destruction complex on the centrosome accelerates degradation of β-catenin and regulates Wnt signal transmission, Proc. Natl. Acad. Sci. USA 119, e2204688119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Rosen SJ, Witteveen O, Baxter N, Lach RS, Hopkins E, Bauer M, and Wilson MZ, Anti-resonance in developmental signaling regulates cell fate decisions, eLife 14, RP107794 (2026). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Friedman N, Cai L, and Xie XS, Linking stochastic dynamics to population distribution: An analytical framework of gene expression, Phys. Rev. Lett. 97, 168302 (2006). [DOI] [PubMed] [Google Scholar]
  • 27.Cai L, Friedman N, and Xie XS, Stochastic protein expression in individual cells at the single molecule level, Nature (London) 440, 358 (2006). [DOI] [PubMed] [Google Scholar]
  • 28.Mikels AJ and Nusse R, Wnts as ligands: Processing, secretion and reception, Oncogene 25, 7461 (2006). [DOI] [PubMed] [Google Scholar]
  • 29.Erdmann T, Howard M, and ten Wolde PR, Role of spatial averaging in the precision of gene expression patterns, Phys. Rev. Lett. 103, 258101 (2009). [DOI] [PubMed] [Google Scholar]
  • 30.Gasparski AN and Beningo KA, Mechanoreception at the cell membrane: More than the integrins, Arch. Biochem. Biophys. 586, 20 (2015). [DOI] [PubMed] [Google Scholar]
  • 31.Funa NS, Schachter KA, Lerdrup M, Ekberg J, Hess K, Dietrich N, Honoré C, Hansen K, and Semb H, β-Catenin regulates primitive streak induction through collaborative interactions with SMAD2/SMAD3 and OCT4, Cell Stem Cell 16, 639 (2015). [DOI] [PubMed] [Google Scholar]
  • 32.Kim K, Cho J, Hilzinger TS, Nunns H, Liu A, Ryba BE, and Goentoro L, Two-element transcriptional regulation in the canonical Wnt pathway, Curr. Biol. 27, 2357 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kicheva A and Briscoe J, Control of tissue development by morphogens, Annu. Rev. Cell Dev. Biol. 39, 91 (2023). [DOI] [PubMed] [Google Scholar]
  • 34.Gutenkunst RN, Waterfall JJ, Casey FP, Brown KS, Myers CR, and Sethna JP, Universally sloppy parameter sensitivities in systems biology models, PLoS Comput. Biol. 3, e189 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Transtrum MK, Machta BB, Brown KS, Daniels BC, Myers CR, and Sethna JP, Perspective: Sloppiness and emergent theories in physics, biology, and beyond, J. Chem. Phys. 143, 010901 (2015). [DOI] [PubMed] [Google Scholar]
  • 36.Barlow HB, Possible principles underlying the transformations of sensory messages, in Sensory Communication (The MIT Press, Cambridge, 1961). [Google Scholar]
  • 37.Laughlin S, A simple coding procedure enhances a neuron’s information capacity, Z. Naturforsch. Sect. C, Biosci. 36, 910 (1981). [PubMed] [Google Scholar]
  • 38.Bialek W, Rieke F, de RR Ruyter van Steveninck, and D. Warland, Reading a neural code, Science 252, 1854 (1991). [DOI] [PubMed] [Google Scholar]
  • 39.Brenner N, Bialek W, and de R Ruyter van Steveninck, Adaptive rescaling maximizes information transmission, Neuron 26, 695 (2000). [DOI] [PubMed] [Google Scholar]
  • 40.Anton R, Kestler HA, and Kühl M, β-Catenin signaling contributes to stemness and regulates early differentiation in murine embryonic stem cells, FEBS Lett. 581, 5247 (2007). [DOI] [PubMed] [Google Scholar]
  • 41.Spence JR, Mayhew CN, Rankin SA, Kuhar MF, Vallance JE, Tolle K, Hoskins EE, Kalinichenko VV, Wells SI, Zorn AM, Shroyer NF, and Wells JM, Directed differentiation of human pluripotent stem cells into intestinal tissue in vitro, Nature (London) 470, 105 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Sato T and Clevers H, Growing self-organizing mini-guts from a single intestinal stem cell: Mechanism and applications, Science 340, 1190 (2013). [DOI] [PubMed] [Google Scholar]
  • 43.van den Brink SC, Alemany A, van Batenburg V, Moris N, Blotenburg M, Vivié J, Baillie-Johnson P, Nichols J, Sonnen KF, Arias AM, and van Oudenaarden A, Single-cell and spatial transcriptomics reveal somitogenesis in gastruloids, Nature (London) 582, 405 (2020). [DOI] [PubMed] [Google Scholar]
  • 44.Goentoro L and Kirschner MW, Evidence that fold-change, and not absolute level, of β-catenin dictates Wnt signaling, Mol. Cell 36, 872 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Goentoro L, Shoval O, Kirschner MW, and Alon U, The incoherent feedforward loop can provide fold-change detection in gene regulation, Mol. Cell 36, 894 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Aulehla A, Wiegraebe W, Baubet V, Wahl MB, Deng C, Taketo M, Lewandoski M, and Pourquié O, A β-catenin gradient links the clock and wavefront systems in mouse embryo segmentation, Nat. Cell Biol. 10, 186 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Cimetta E, Cannizzaro C, James R, Biechele T, Moon RT, Elvassore N, and Vunjak-Novakovic G, Microfluidic device generating stable concentration gradients for long term cell culture: application to Wnt3a regulation of β-catenin signaling, Lab. Chip 10, 3277 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Sonnen KF, Lauschke VM, Uraji J, Falk HJ, Petersen Y, Funk MC, Beaupeux M, François P, Merten CA, and Aulehla A, Modulation of phase shift between Wnt and Notch signaling oscillations controls mesoderm segmentation, Cell 172, 1079 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Cooper EJ and Scholpp S, Transport and gradient formation of Wnt and FGF in the early zebrafish gastrula, Curr. Top. Dev. Biol. 157, 125 (2024). [DOI] [PubMed] [Google Scholar]
  • 50.Kok R, Zheng X, Spoelstra W, Clement T, van de Grift Y, Clevers H, van Amerongen R, Tans S, and van Zon J, Loss of Paneth cell contact starts a Wnt differentiation timer in intestinal crypts, Res. Square (2024). [Google Scholar]
  • 51.Yadav M, Koch D, and Koseska A, Homeorhetic regulation of cellular phenotype (2025), bioRxiv. [Google Scholar]
  • 52.Mattingly HH, Transtrum MK, Abbott MC, and Machta BB, Maximizing the information learned from finite data selects a simple model, Proc. Natl. Acad. Sci. USA 115, 1760 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Ferrer-Vaquer A, Piliszek A, Tian G, Aho RJ, Dufort D, and Hadjantonakis A-K, A sensitive and bright single-cell resolution live imaging reporter of Wnt/β-catenin signaling in the mouse, BMC Dev. Biol. 10, 121 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Ogamino S, Yamamichi M, Sato K, and Ishitani T, Dynamics of Wnt/β-catenin reporter activity throughout whole life in a naturally short-lived vertebrate, npj Aging 10, 23 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Fuerer C and Nusse R, Lentiviral vectors to probe and manipulate the Wnt signaling pathway, PLoS One 5, e9370 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Liu J, Xiao Q, Xiao J, Niu C, Li Y, Zhang X, Zhou Z, Shu G, and Yin G, Wnt/β-catenin signalling: function, biological mechanisms, and therapeutic opportunities, Sig. Transduct. Target. Ther. 7, 3 (2022). [Google Scholar]
  • 57.Höhener TC, Landolt AE, Dessauges C, Hinderling L, Gagliardi PA, and Pertz O, LITOS: A versatile LED illumination tool for optogenetic stimulation, Sci. Rep. 12, 13139 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Koch AL, The logarithm in biology 1. Mechanisms generating the log-normal distribution exactly, J. Theor. Biol. 12, 276 (1966). [DOI] [PubMed] [Google Scholar]
  • 59.Shannon CE, A mathematical theory of communication, Bell System Tech. J. 27, 379 (1948). [Google Scholar]
  • 60.Zdeborová L and Krzakala F, Statistical physics of inference: Thresholds and algorithms, Adv. Phys. 65, 453 (2016). [Google Scholar]
  • 61.Petkova MD, Tkačik G, Bialek W, Wieschaus EF, and Gregor T, Optimal decoding of cellular identities in a genetic network, Cell 176, 844 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Bauer M, Petkova MD, Gregor T, Wieschaus EF, and Bialek W, Trading bits in the readout from a genetic network, Proc. Natl. Acad. Sci. USA 118, e2109011118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Abramowitz M and Stegun I, Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables, Applied Mathematics Series (Dover Publications, New York, 1965). [Google Scholar]
  • 64.Smith JG, The information capacity of amplitude- and variance-constrained scalar Gaussian channels, Inf. Control 18, 203 (1971). [Google Scholar]
  • 65.Levchenko A and Nemenman I, Cellular noise and information transmission, Curr. Opin. Biotechnol. 28, 156 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Tkačik G, Callan CG, and Bialek W, Information capacity of genetic regulatory elements, Phys. Rev. E 78, 011910 (2008). [Google Scholar]
  • 67.Mijatović T, Kok AR, Zwanikken JW, and Bauer M, Weak transcription factor clustering at binding sites can facilitate information transfer from molecular signals, PRX Life 4, 013003 (2026). [Google Scholar]
  • 68.Bairoch A, The Cellosaurus, a cell-line knowledge resource, J. Biomol. Tech. 29, 25 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Yang J, Guertin P, Jia G, Lv Z, Yang H, and Ju D, Large-scale microcarrier culture of HEK293T cells and Vero cells in single-use bioreactors, AMB Express 9, 70 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Moosemiller J, HEK Cell Splitting and Maintenance, https://receptor.nsm.uh.edu/research/protocols/experimental/hekcells-split, accessed 2 June 2025.
  • 71.Blahut R, Computation of channel capacity and rate-distortion functions, IEEE Trans. Inf. Theory 18, 460 (1972). [Google Scholar]
  • 72.Arimoto S, An algorithm for computing the capacity of arbitrary discrete memoryless channels, IEEE Trans. Inf. Theory 18, 14 (1972). [Google Scholar]
  • 73.Dauwels J, Numerical computation of the capacity of continuous memoryless channels, in Proceedings of the 26th Symposium on Information Theory in the Benelux (WIC, Brussels, Belgium, 2005). [Google Scholar]
  • 74.Abbott MC and Machta BB, A scaling law from discrete to continuous solutions of channel capacity problems in the low-noise limit, J. Stat. Phys. 176, 214 (2019). [Google Scholar]
  • 75.Chakrabarti S and Michor F, Circadian clock effects on cellular proliferation: Insights from theory and experiments, Curr. Opin. Cell Biol. 67, 17 (2020). [DOI] [PubMed] [Google Scholar]
  • 76.Sandberg R and Ernberg I, The molecular portrait of in vitro growth by meta-analysis of gene-expression profiles, Genome Biol. 6, R65 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Liu L and Warmflash A, Self-organized signaling in stem cell models of embryos, Stem Cell Rep. 16, 1065 (2021). [Google Scholar]
  • 78.Tkačik G, Callan CG Jr., and Bialek W, Information flow and optimization in transcriptional regulation, Proc. Natl. Acad. Sci. USA 105, 12265 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Tkačik G and Walczak AM, Information transmission in genetic regulatory networks: A review, J. Phys.: Condens. Matter 23, 153102 (2011). [DOI] [PubMed] [Google Scholar]
  • 80.Walczak AM, Tkačik G, and Bialek W, Optimizing information flow in small genetic networks. II. Feed-forward interactions, Phys. Rev. E 81, 041905 (2010). [Google Scholar]
  • 81.Bauer M, How does an organism extract relevant information from transcription factor concentrations? Biochem. Soc. Trans. 50, 1365 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Bauer M and Bialek W, Information bottleneck in molecular sensing, PRX Life 1, 023005 (2023). [Google Scholar]
  • 83.Jeffreys H, An invariant form for the prior probability in estimation problems, Proc. R. Soc. London A 186, 453 (1946). [Google Scholar]
  • 84.Scholl HR, Shannon optimal priors on independent identically distributed statistical experiments converge weakly to Jeffreys’ prior, Test 7, 75 (1998). [Google Scholar]
  • 85.Bauer M, Bialek W, Goddard C, Holmes CM, Krishnamurthy K, Palmer SE, Pang R, Schwab DJ, and Susman L, Optimization and variability can coexist, arXiv:2505.23398. [Google Scholar]
  • 86.Amel A, Rabeling A, Rossouw S, and Goolam M, Wnt and BMP signalling direct anterior–posterior differentiation in aggregates of mouse embryonic stem cells, Biol. Open 12, bio059981 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.van den Brink SC, Baillie-Johnson P, Balayo T, Hadjantonakis A-K, Nowotschin S, Turner DA, and Martinez Arias A, Symmetry breaking, germ layer specification and axial organisation in aggregates of mouse embryonic stem cells, Development 141, 4231 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Repina NA, Johnson HJ, Bao X, Zimmermann JA, Joy DA, Bi SZ, Kane RS, and Schaffer DV, Optogenetic control of Wnt signaling models cell-intrinsic embryogenic patterning using 2D human pluripotent stem cell culture, Development 150, dev201386 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Rosenbloom AB, Tarczyński M, Lam N, Kane RS, Bugaj LJ, and Schaffer DV, β-catenin signaling dynamics regulate cell fate in differentiating neural stem cells, Proc. Natl. Acad. Sci. USA 117, 28828 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.McNamara HM, Solley SC, Adamson B, Chan MM, and Toettcher JE, Recording morphogen signals reveals mechanisms underlying gastruloid symmetry breaking, Nat. Cell Biol. 26, 1832 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Averbeck BB, Latham PE, and Pouget A, Neural correlations, population coding and computation, Nat. Rev. Neurosci. 7, 358 (2006). [DOI] [PubMed] [Google Scholar]
  • 92.Pillow JW, Shlens J, Paninski L, Sher A, Litke AM, Chichilnisky EJ, and Simoncelli EP, Spatio-temporal correlations and visual signalling in a complete neuronal population, Nature (London) 454, 995 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Schneidman E, Bialek W, and Berry MJ II, Synergy, redundancy, and independence in population codes, J. Neurosci. 23, 11539 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Kramar M, Hahn L, Walczak AM, Mora T, and Coppey M, Single cells can resolve graded stimuli, PRX Life 3, 043016 (2025). [Google Scholar]
  • 95.Marwan N, Romano MC, Thiel M, and Kurths J, Recurrence plots for the analysis of complex systems, Phys. Rep. 438, 237 (2007). [Google Scholar]
  • 96.Camacho-Aguilar E, Yoon ST, Ortiz-Salazar MA, Du S, Guerra MC, and Warmflash A, Combinatorial interpretation of BMP and Wnt controls the decision between primitive streak and extraembryonic fates, Cell Syst. 15, 445 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Lehr S, Brückner DB, Minchington TG, Greunz-chindler M, Merrin J, Hannezo E, and Kicheva A, Self-organized pattern formation in the developing mouse neural tube by a temporal relay of BMP signaling, Dev. Cell 60, 567 (2025). [DOI] [PubMed] [Google Scholar]
  • 98.Rosen SJ, Experimental data for Witteveen et al., Zenodo; (2026), doi: 10.5281/zenodo.18601164. [DOI] [Google Scholar]
  • 99.Witteveen O, GitHub (2026), https://github.com/olivierwitteveen/optimizing-info-in-opto-wnt. [Google Scholar]
  • 100.Halmos PR and Savage LJ, Application of the Radon-Nikodym theorem to the theory of sufficient statistics, Ann. Math. Stat. 20, 225 (1949). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this article are openly available [98,99].

RESOURCES