Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2010 Aug 25.
Published in final edited form as: Phys Rev E Stat Nonlin Soft Matter Phys. 2009 Sep 29;80(3 Pt 1):031920. doi: 10.1103/PhysRevE.80.031920

Optimizing information flow in small genetic networks

Gašper Tkačik 1,*, Aleksandra M Walczak 2,, William Bialek 2,3,
PMCID: PMC2928077  NIHMSID: NIHMS228491  PMID: 19905159

Abstract

In order to survive, reproduce, and (in multicellular organisms) differentiate, cells must control the concentrations of the myriad different proteins that are encoded in the genome. The precision of this control is limited by the inevitable randomness of individual molecular events. Here we explore how cells can maximize their control power in the presence of these physical limits; formally, we solve the theoretical problem of maximizing the information transferred from inputs to outputs when the number of available molecules is held fixed. We start with the simplest version of the problem, in which a single transcription factor protein controls the readout of one or more genes by binding to DNA. We further simplify by assuming that this regulatory network operates in steady state, that the noise is small relative to the available dynamic range, and that the target genes do not interact. Even in this simple limit, we find a surprisingly rich set of optimal solutions. Importantly, for each locally optimal regulatory network, all parameters are determined once the physical constraints on the number of available molecules are specified. Although we are solving an oversimplified version of the problem facing real cells, we see parallels between the structure of these optimal solutions and the behavior of actual genetic regulatory networks. Subsequent papers will discuss more complete versions of the problem.

I. INTRODUCTION

Much of the everyday business of organisms involves the transmission and processing of information. On our human scale, the familiar examples involve the signals taken in through our sense organs [1]. On a cellular scale, information flows from receptors on the cell surface into the cell, modulating biochemical events and ultimately controlling gene expression [2]. In the course of development in multicellular organisms, individual cells acquire information about their location in the embryo by responding to particular “morphogen” molecules whose concentration varies along the main axes of the embryo [3,4]. In all these examples, information of interest to the organism ultimately is represented by events at the molecular level, whether the molecules are transcription factors regulating gene expression or ion channels controlling electrical signals in the brain. This representation is limited by fundamental physical principles: individual molecular events are stochastic, so that with any finite number of molecules there is a limit to the precision with which small signals can be discriminated reliably and there is a limit to the overall dynamic range of the signals. Our goal in this paper (and its sequel) is to explore these limits to information transmission in the context of small genetic control circuits.

The outputs of genetic control circuits are protein molecules that are synthesized by the cell from messenger RNA (mRNA), which in turn is transcribed from the DNA template. The inputs often are protein molecules as well, “transcription factors” that bind to the DNA and regulate the synthesis of the mRNA. In the last decade, a number of experiments have mapped the input/output relations of these regulatory elements and characterized their noise, that is, the fluctuations in the output protein concentration when the inputs are held fixed [517]. In parallel, a number of theoretical papers have tried to understand the origins of this noise, which ultimately reflects the random behavior of individual molecules along the path from input to output—the arrival of transcription factors at their targets along the DNA, the initiation of transcription and the degradation of mRNA, and the initiation of protein synthesis and the degradation of the output proteins [1829]. While open questions remain, it seems fair to say that we have a physical picture of the noise in genetic control that we can use to ask questions about the overall function and design of these systems.

The ability of any system to transmit information is determined not just by input/output relations and noise levels, but also by the distribution of inputs; maximal information transmission requires a matching between the intrinsic properties of the system and the input statistics [30,31]. In the context of sensory information processing, these matching conditions have been explored almost since the inception of information theory [3235]. In particular, because the distribution of sensory inputs varies with time, optimal information transmission requires that the input/output relation track or adapt to these variations and this theoretical prediction has led to a much richer view of adaptation in the neural code [3640]. There are analogous matching conditions for genetic regulatory elements and these conditions provide parameter free predictions about the behavior of the system based on the idea that cells are trying to transmit the maximum amount of information [41]. Comparison with recent experiments has been encouraging [42].

In this paper we go beyond the matching conditions to ask how cells can adjust the input/output relations of genetic regulatory elements so as to maximize the information that is transmitted through these systems. Absent any constraints, the answer will always be to make more molecules, since this reduces the effective noise level, so we consider the problem of maximizing information transmission with a fixed mean or maximum number of molecules at both the input and the output. In this sense we are asking how cells can extract the maximum control power, measured in bits, from a given number of molecules, thus optimizing functionality under clear physical constraints. In general this problem is very difficult, so we start here with the simplest case of a single input transcription factor that controls (potentially) many genes, but there is no interaction among these outputs. Further, we focus on a limit (small noise) where some analytic progress is possible. We will see that, even in this case, the optimal solutions have an interesting structure, which emerges as a result of the interplay between noise sources at the input and the output of the regulatory elements. For other approaches to the optimization of information transmission in biochemical and genetic networks, see Refs. [4345].

Optimization of information transmission is a concise, abstract principle grounded in the physics of the molecular interactions that underlie biological function. It would be attractive if we could derive the behavior of biological systems from such a principle rather than taking the myriad parameters of these systems simply as quantities that must be fit to data. It is not at all clear, however, that such a general principle should apply to real biological systems. Indeed, it is possible that solutions to our optimization problem are far from plausible in comparison with what we find in real cells. Thus, our most important result is that the parameters which we derive are reasonable in relation to experiment. While a realistic comparison requires us to solve the optimization problem in a fully interacting system, even in the simpler problem discussed here we can see the outlines of a theory for real genetic networks. Subsequent papers will address the full, interacting version of the problem.

II. FORMULATING THE PROBLEM

A gene regulatory element translates the concentration of input molecules ℐ into output molecules 𝒪. We would like to measure, quantitatively, how effectively changes in the input serve to control the output. If we make many observations on the state of the cell, we will see that inputs and outputs are drawn from a joint distribution p(ℐ, 𝒪) and our measure of control power should be a functional of this distribution. In his classic work, Shannon showed that there is only one such measure of control power which obeys certain plausible constraints and this is the mutual information between ℐ and 𝒪 [30,46].

To be concrete, we consider a set of genes, i=1,2,…,M, that all are controlled by a single transcription factor. Let the concentration of the transcription factor be c and let the levels of protein expressed from each gene be gi; below we discuss the units and normalization of these quantities. Thus, the input ℐ≡c and the output 𝒪≡{gi}. In principle these quantities all depend on time. We choose to focus here on the steady-state problem, where we assume that the output expression levels reach their equilibrium values before the input transcription factor concentrations change.

We view the steady-state approximation not necessarily as an accurate model of the dynamics in real cells, but as a useful starting point, and already the steady-state problem has a rich structure. In particular, as we will see, in this limit we have analytic control over the role of nonlinearities in the input/output relation describing the function of the different regulatory elements in our network. In contrast, most approaches to information transmission by dynamic signals are limited to the regime of linear response; see, for example, Ref. [45]. Although we are focused here on information transmission in genetic circuits, it is interesting that the same dichotomy—nonlinear analyses of static networks and dynamic analyses of linear networks—also exists in the literature on information transmission in neural networks [34,35].

To specify the joint distribution of inputs and outputs, it is convenient to think that the transcription factor concentration is being chosen out of a probability distribution PTF(c) and then the target genes respond with expression levels chosen out of the conditional distribution P({gi} |c). In general, the mutual information between the set of expression levels {gi} and the input c is given by [30,31]

I({gi};c)=dcdMgP(c,{gi})log2[P(c,{gi})PTF(c)P({gi})], (1)

where the overall distribution of expression levels is given by

P({gi})=dcPTF(c)P({gi}|c). (2)

Shannon’s uniqueness theorem of course leaves open a choice of units and here we make the conventional choice of bits, hence, the logarithm is base 2.

We will approach the problem of optimizing information transmission in two steps. First, we will adjust the distribution PTF(c) to take best advantage of the input/output relations and then we will adjust the input/output relations themselves. Even the first step is difficult in general, so we start by focusing on the limit in which noise is small.

A. Information in the small noise limit

As noted in Sec. I, we will confine our attention in this paper to the case where each gene responds independently to its inputs and there are no interactions among the output genes; we point toward generalizations in Sec. V below and return to the more general problem in subsequent papers. The absence of interactions means that the conditional distribution of expression levels must factorize, P({gi}|c)=i=1MPi(gi|c). Further, we assume that the noise in expression levels is Gaussian. Then we have [47]

P({gi}|c)=exp{M2ln(2π)12i=1Mln[σi2(c)]12i=1M1σi2(c)[gig¯i(c)]2}. (3)

The input/output relation of each gene is defined by the mean i(c), while σi2 measures the variance of the fluctuations or noise in the expression levels at fixed input,

σi2(c)=[gig¯i(c)]2. (4)

In the limit that the noise levels σi are small, we can develop a systematic expansion of the information I({gi};c) generalizing the approach of Refs. [41,42]. The key idea is that, in the small noise limit, observation of the output expression levels {gi} should be sufficient to determine the input concentration c with relatively high accuracy; further, we expect that errors in this estimation process would be well approximated as Gaussian. Formally, this means that we should have

P(c|{gi})12πσc2({gi})exp[[cc*({gi})]22σc2({gi})], (5)

where c*({gi}) is the most likely value of c given the outputs and σc2({gi}) is the variance of the true value around this estimate. We can use this expression to calculate the information by writing I({gi};c) as the difference between two entropies:

I({gi};c)=dcPTF(c)log2PTF(c)dMgP({gi})×[dcP(c|{gi})log2P(c|{gi})] (6)
=dcPTF(c)log2PTF(c)12dMgP({gi})log2[2πeσc2({gi})]. (7)

Intuitively, the first term is the entropy of inputs, which sets an absolute maximum on the amount of information that can be transmitted [48]; the second term is (minus) the entropy of the input given the output or the “equivocation” [30] that results from noise in the mapping from inputs to outputs. To complete the calculation we need an expression for this effective noise level σc.

Using Bayes’ rule, we have

P(c|{gi})=P({gi}|c)PTF(c)P({gi}) (8)
=1𝒵({gi})exp[F(c,{gi})], (9)

where

F(c,{gi})=lnPTF(c)+12i=1Mln[σi2(c)]+12i=1M1σi2(c)[gig¯i(c)]2. (10)

Now it is clear that c*({gi}) and σc({gi}) are defined by

0=F(c,{gi})c|c=c*({gi}), (11)
1σc2({gi})=2F(c,{gi})c2|c=c*({gi}). (12)

The leading term at small σi is then given by

1σc2({gi})=i=1M1σi2(dg¯i(c)dc)2|c=c*({gi}). (13)

Finally, we note that, in the small noise limit, averages over all the expression levels can be approximated by an integral along the trajectory of mean expression levels with an appropriate Jacobian. More precisely,

dMgP({gi})[]dcPTF(c)i=1Mδ(gig¯i(c))[]. (14)

Putting all these terms together, we have

I({gi};c)=dcPTF(c)log2PTF(c)+12dcPTF(c)log2[12πei=1M1σi2(c)(dg¯i(c)dc)2]. (15)

The small noise approximation is not just a theorist’s convenience. A variety of experiments show that fluctuations in gene expression level can be 10%–25% of the mean [5,9,10,13,14,17]. As noted above, maximizing information transmission requires matching the distribution of input signals to the structure of the input/output relations and noise, and in applying these conditions to a real regulatory element in the fruit fly embryo it was shown that the (analytically accessible) small noise approximation gives results which are in semiquantitative agreement with the (numerical) exact solutions [42]. Thus, although it would be interesting to explore the quantitative deviations from the small noise limit, we believe that this approximation is a good guide to the structure of the full problem.

To proceed, Eq. (15) for the information in the small noise limit instructs us to compute the mean response i(c) and the noise σi(c) for every regulated gene. Since the properties of noise in gene expression determine to a large extent the structure of optimal solutions, we present in Sec. II B a detailed description of these noise sources. In Sec. II C we then introduce the “cost of coding” measured by the number of signaling molecules that the cell has to pay to transmit the information reliably. Finally, we look for optimal solutions in Sec. III.

B. Input/output relations and noise

Transcription factors act by binding to DNA near the point at which the “reading” of a gene begins, and either enhancing or inhibiting the process of transcription into mRNA. In bacteria, a simple geometrical view of this process seems close to correct and one can try to make a detailed model of the energies for binding of the transcription factor(s) and the interaction of these bound factors with the transcriptional apparatus, RNA polymerase in particular [49,50]. For eukaryotes the physical picture is less clear, so we proceed phenomenologically. If binding of the transcription factor activates the expression of gene i, we write

g¯i(c)=cnicni+Kini, (16)

and similarly if the transcription factor represses expression we write

g¯i(c)=Kinicni+Kini. (17)

These are smooth monotonic functions that interpolate between roughly linear response (n=1 and large K) and steep, switchlike behavior (n → ∞) at some threshold concentration (c=K). Such “Hill functions” often are used to describe the cooperative binding of n molecules to their target sites [51], with F=−kBT ln K the free energy of binding per molecule, and this is a useful intuition even if it is not correct in detail.

To complete our formulation of the problem we need to understand the noise or fluctuations in expression level at fixed inputs as summarized by variances σi2. There are several contributions to the variance, which we can divide into two broad categories, as in Fig. 1.

FIG. 1.

FIG. 1

Input proteins at concentration c act as transcription factors for the expression of output proteins, g. The diffusive noise in transcription factor concentration and the shot noise at the output both contribute to stochastic gene expression. The regulation process is described using a conditional probability distribution of the output knowing the input, P(g|c), which can be modeled as a Gaussian process with a variance σg2(c). In this paper we consider the case of multiple output genes {gi}, i=1,…,M, each of which is independently regulated by the process illustrated here with the corresponding noise σi2.

The transcription of mRNA and its translation into protein can be thought of as the “output” side of the regulatory apparatus. Ultimately these processes are composed of individual molecular events, and so there should be shot noise from the inherent randomness of these events. This suggests that there will be an output noise variance proportional to the mean, σi,out2g¯i.

The arrival of transcription factor molecules can be thought of as the “input” side of the apparatus and again there should be noise associated with the randomness in this arrival. This noise is equivalent to a fluctuation in the input concentration itself; the variance in concentration should again be proportional to the mean and the impact of this noise needs to be propagated through the input/output relation so that σi,in2c(dg¯i/dc)2.

Putting together the input and output noise, we have

σi2(c)=ag¯i(c)+bc(dg¯i(c)dc)2, (18)

where a and b are constants. Comparing this intuitive estimate to more detailed calculations [21,29] allows us to interpret these constants. If i is normalized so that its maximum value is 1, then a=1/Nmax, where Nmax is the maximum number of independent molecules that are made from gene i. If, for example, each mRNA molecule generates many proteins during its lifetime, then (if the synthesis of mRNA is limited by a single kinetic step) Nmax is the maximum number of mRNAs, as discussed in Refs. [20,22,29].

The shot noise in the arrival of transcription factors at their targets ultimately arises from diffusion of these molecules. Analysis of the coupling between diffusion and the events that occur at the binding site [21,26,28] shows that the total input noise has both a term ∝c(dḡi/dc)2 and additional terms that can be made small by adjusting the parameters describing kinetics of steps that occur after the molecules arrive at their target; here we assume that nature chooses parameters which make these nonfundamental noise sources negligible [52]. In the remaining term, we have b ~ 1/(Dℓτ), where D is the diffusion constant of the transcription factor, ℓ is the size of its target on the DNA, and τ is the time over which signals are integrated in establishing the steady state.

With the (semi)microscopic interpretation of the parameters, we can write

σi2(c)=1Nmax[g¯i(c)+cc0(dg¯i(c)dc)2], (19)

where there is a natural scale of concentration,

c0=NmaxDτ. (20)

To get a rough feeling for this scale, we note that diffusion constants for proteins in the cytoplasm are ~µm2/s [16,5355], target sizes are measured in nanometers, and integration times are minutes or hundreds of seconds (although there are few direct measurements). The maximum number of independent molecules depends on the character of the target genes. In many cases of interest, these are also transcription factors, in which case a number of experiments suggest that Nmax ~ 10–100 [12,22,29]. Putting these numbers together, we have c0 ~ 10–100 (µm)3 or ~15–150 nM, although this (obviously) is just an order of magnitude estimate.

To summarize, two rather general forms of noise limit the information transmission in genetic regulatory networks. Both combine additively and ultimately trace their origin to a finite (and possibly small) number of signaling molecules. The input noise is caused by a small concentration of transcription factor molecules and its effect on the regulated gene is additionally modulated by the input–output relation. The output noise is caused by the small number of gene products and this noise is simply proportional to the mean. It is reasonable to believe that the strengths of these two noise sources, in appropriate units, will be of comparable magnitude. Since the organism has to pay a metabolic price to reduce either noise source, it would be wasting resources if it were to lower the strength of one source alone far below the limiting effect of the other.

C. Constraining means or maxima

To proceed, we need to decide how the problem of maximizing information transmission will be constrained. One possibility is that we fix the maximum number of molecules at the input and the output. The constraint on the output can be implemented by measuring the expression levels in units such that the largest values of the mean expression levels i are all equal to 1 [56]. On the input side, we restrict the range of c to be c ∈ [0, cmax]. With this normalization and limits on the c integrals, we can maximize I({gi};c) directly by varying the distribution of inputs, adding only a Lagrange multiplier to fix the normalization of PTF(c),

δδPTF(c)[I({gi};c)λdcPTF(c)]=0. (21)

As discussed in Ref. [42], the solution to the variational problem defined in Eq. (21) is

PTF*(c)=1Z12πe1σc (22)
=1Z1[12πei=1M1σi2(c)(dg¯i(c)dc)2]1/2, (23)

where the normalization constant Z1 is given by

Z1=0cmaxdc[12πei=1M1σi2(c)(dg¯i(c)dc)2]1/2. (24)

The information transmission with this optimal choice of PTF(c) takes a simple form,

I1*=log2Z1. (25)

The expression for Z1, and hence the optimal information transmission, has a simple geometric interpretation. As the concentration of the input transcription factor varies, the output moves, on average, along a trajectory in the M-dimensional space of expression levels; this trajectory is defined by {i(c)}. Nearby points along this trajectory cannot really be distinguished because of noise; the information transmission should be related to the number of distinguishable points. If the noise level was the same everywhere, this count of distinguishable states would be just the length of the trajectory in units where the standard deviation of the output fluctuations, projected along the trajectory, is 1. Since the noise is not uniform, we should introduce the local noise level into our metric for measuring distances in the space of expression levels and this is exactly what we see in Eq. (24). Thus, we can think of the optimal information transmission as being determined by the length of the path in expression space that the network traces as the input concentration varies, where length is measured with a metric determined by the noise level.

This information capacity still depends upon the input/output relations and the noise levels, so we have a second layer of optimization that we can perform. Before doing this, however, we consider another formulation of the constraints.

As an alternative to fixing the maximum concentration of input transcription factor molecules, we consider fixing the mean concentration. To do this, we introduce, as usual, a second Lagrange multiplier α, so that our optimization problem becomes

δδPTF(c)[I({gi};c)λdcPTF(c)αdcPTF(c)c]=0. (26)

Notice that we can also think of this as maximizing information transmission in the presence of some fixed cost per input molecule.

Solving Eq. (26) for the distribution of inputs, PTF(c), we find

PTF*(c)=1Z2[12πei=1M1σi2(c)(dg¯i(c)dc)2]1/2eαc, (27)

where

Z2=0dc[12πei=1M1σi2(c)(dg¯i(c)dc)2]1/2eαc. (28)

As usual in such problems we need to adjust the Lagrange multipliers to match the constraints, which is equivalent to solving

lnZ2α=c. (29)

The optimal information transmission in this case is

I2*=log2Z2+αc. (30)

One might think that, for symmetry’s sake, we should consider a formulation in which the mean number of output molecules also is constrained. There is some subtlety to this, since if we know the input/output functions, {i(c)}, and the distribution of inputs, PTF(c), then the mean output levels are determined. Thus, it is not obvious that we have the freedom to adjust the mean output levels. We return to this point in Sec. III C.

III. ONE INPUT, ONE OUTPUT

To get a feeling for the structure of our optimization problem, we consider the case where the transcription factor regulates the expression level of just one gene. If we constrain the maximum concentrations at the input and output, then the information capacity is set by I=log2 Z1 [Eq. (25)]; substituting our explicit expression for the noise [Eq. (19)] we have

Z1=0cmaxdc[Nmax2πe(dg¯(c)/dc)2g¯(c)+c0c[dg¯(c)/dc]2]1/2. (31)

The first point to note is that if the natural scale of concentration, c0, is either very large or very small, then the optimization problem loses all of its structure. Specifically, in these two limits we have

Z1(c0)=[Dτ2πe]1/20cmaxdcc, (32)
=[2Dτcmaxπe]1/2, (33)

and

Z1(c00)=[Nmax2πe]1/20cmaxdcg¯(c)|dg¯(c)dc|, (34)
=[2Nmaxπe]1/2|g¯(cmax)g¯(0)|. (35)

In both cases, the magnitude of the information capacity becomes independent of the shape of the input/output relation (c). Thus, the possibility that real input/output relations are determined by the optimization of information transmission depends on the scale c0 being comparable to the range of concentrations actually used in real cells. Although we have only a rough estimate of c0 ~ 15–150 nM, Table I shows that this is the case.

TABLE I.

Concentration scales for transcription factors. We collect absolute concentration measurements on transcription factors from several different systems, sometimes indicating the maximum observed concentration and in other cases the concentration that achieves half-maximal activation or repression (midpoint). Bcd is the bicoid protein, a transcription factor involved in early embryonic pattern formation; GAGA is a transcription factor in Drosophila, crp is a transcription factor that acts on a wide range of metabolic genes in bacteria; lac is the well studied operon that encodes proteins needed for lactose metabolism in E. coli; lac is the transcription factor that represses expression of the lac operon; OR1–3 are binding sites for the lac repressor.

Concentration Scale System Ref.
55 ± 10 nM Midpoint λ repressor in E. coli [10]
55 ± 3 nM Maximum Bcd in Drosophila embryo [17]
5.3 ± 0.7 nM Midpoint GAGA [57]
~5 nM Midpoint crp to lac site [50]
~0.2 nM Midpoint lac to OR1 [50,58]
~3 nM Midpoint lac to OR2 [50,58]
~110 nM Midpoint lac to OR3 [50,58]
22 ± 3 nM Midpoint lac to OR1 in vitro [59]

A. Numerical results with cmax

To proceed, we choose c0 as the unit of concentration, so that

Z1=[Nmax2πe]1/2Z˜1, (36)
Z˜1(K/c0,n;C)=0Cdx[(dg¯(x)/dx)2g¯(x)+x[dg¯(x)/dx]2]1/2, (37)

where C=cmax/c0 and

g¯(x)=xn(K/c0)n+xn (38)

in the case of an activator. It now is straightforward to explore, numerically, the function 1. An example, with cmax/c0=1, is shown in Fig. 2.

FIG. 2.

FIG. 2

(Color online) Information capacity for one (activator) input and one output. The information is I=log2 1 + A, with A independent of the parameters; the map shows 1 as computed from Eq. (37), here with Ccmax/c0=1. We see that there is a broad optimum with cooperativity nopt=1.86 and Kopt=0.48c0=0.48cmax.

We see that, with cmax=c0, there is a well defined but broad optimum of the information transmission as a function of the parameters K and n describing the input/output relation. Maximum information transmission occurs at modest levels of cooperativity (n ≈ 2) and with the midpoint of the input/output relation near the midpoint of the available dynamic range of input concentrations (Kcmax/2).

Optimal solutions for activators and repressors have qualitatively similar behaviors, with the optimal parameters Kopt and nopt both increasing as cmax increases [Fig. 3(a)]. Interestingly, at the same value of cmax, the optimal repressors make fuller use of the dynamic range of outputs. The information capacity itself, however, is almost identical for activators and repressors across a wide range of cmax [Fig. 3(c)]. This is important, because it shows that our optimization problem, even in this simplest form, can have multiple nearly degenerate solutions. We also see that increases in cmax far beyond c0 produce a rapidly saturating information capacity, as expected from Eq. (35). Therefore, although increasing the dynamic range always results in an increase in capacity, the advantage in terms of information capacity gained by the cell being able to use input concentration regimes much larger than c0 is quite small.

FIG. 3.

FIG. 3

(Color online) The optimal solutions for one gene controlled by one transcription factor. The optimization of information transmission in the small noise limit depends on only one parameter, which we take here as the maximum concentration of the input molecules, measured in units determined by the noise itself [c0 from Eq. (20)]. Panel (a) shows the optimal input/output relations with cmax/c0=0.3, 1, 3, 10, 30, 100, 300; activators shown in blue (solid line), repressors in green (dashed line). Although the input/output relation is defined for all c, we show here only the part of the dynamic range that is accessed when 0<c<cmax. Panel (b) shows the optimal distributions, PTF*(c), for each of these solutions. Panel (c) plots log2 1 for these optimal solutions as a function of cmax/c0. Up to an additive constant, this is the optimal information capacity, in bits.

B. Some analytic results

Although the numerical results are straightforward, we would like to have some intuition about these optimal solutions from analytic approximations. Our basic problem is to do the integral defining 1 in Eq. (37). We know that this ntegral becomes simple in the limit that C is either large or small, so let us start by trying to generate an approximation that will be valid at large C.

At large C, the concentration of input molecules can become large, so we expect that the “output noise,” σ2, will be dominant. This suggests that we write

Z˜10Cdx{[dg¯(x)/dx]2g¯(x)+x[dg¯(x)/dx]2}1/20Cdx|dg¯(x)dx|1g¯(x)[112x1g¯(x)(dg¯(x)dx)2+]. (39)

To proceed, we note the combination dx|dḡ/dx|, which invites us to convert this into an integral over . We use the fact that, for activators described by the Hill function in Eq. (38),

x=Kc0(g¯1g¯)1/n, (40)
dg¯(x)dx=nxg¯(1g¯). (41)

Substituting, we find

Z˜10g¯(C)dg¯g¯[1c0n22Kg¯11/n(1g¯)2+1/n+] (42)
=2g¯(C)c0n22K0g¯(C)dg¯g¯1/21/n(1g¯)2+1/n+. (43)

Again, we are interested in large C, so we can approximate (C) ≈ 1 −(K/cmax)n. Similarly, the second term in Eq. (43) can be approximated by letting the upper limit on the integral approach 1; the difference between (C) and 1 generates higher order terms in powers of 1/C. Thus, we have

Z˜1act2(Kcmax)nA(n)c0n22K+, (44)
A(n)=01d𝓏𝓏1/21/n(1𝓏)2+1/n (45)
=Γ(3/21/n)Γ(3+1/n)Γ(9/2). (46)

The approximate expression for 1 expresses the basic compromise involved in optimizing information transmission. On the one hand, we would like K to be small so that the output runs through its full dynamic range; correspondingly, we want to decrease the term (K/cmax)n. On the other hand, we want to move the most sensitive part of the input/output relation to higher concentrations, so that we are less sensitive to the input noise; this corresponds to decreasing the term ∝c0/K. The optimal compromise is reached at

Koptactcmax[nA(n)c02cmax]1/(n+1). (47)

Parallel arguments yield, for repressors,

Z˜1rep22(Kcmax)nB(n)c0n22K+, (48)
Koptrepcmax[nB(n)c02cmax]2/(n+2), (49)
B(n)=01d𝓏𝓏1/2+1/n(1𝓏)21/n (50)
=Γ(3/2+1/n)Γ(31/n)Γ(9/2). (51)

The first thing we notice about our approximate results is that the optimal values of K are almost proportional to cmax, as one might expect, but not quite—the growth of K with cmax is slightly sublinear. Also, one might have expected that K would be chosen to divide the available dynamic range into roughly equal “on” and “off” regions, which should maximize the entropy of the output and hence increase the capacity; to achieve this requires Kopt/cmax ≈ 1/2. In fact we see that the ratio Kopt/cmax is determined by a combination of terms and depends in an essential way on the scale of the input noise c0, even though we assume that the maximal concentration is large compared with this scale.

The basic compromise between extending the dynamic range of the outputs and avoiding low input concentrations works differently for activators and repressors. As a result, the optimal values of K are different in the two cases. From Eq. (37), it is clear that the symmetry between the two types of regulation is broken by the noise term proportional to . Unless the optimal Hill coefficient for repressors were very much smaller than for activators (and it is not), Eqs. (47) and (49) predict that Koptrep will be smaller than Koptact, in agreement with the numerical results in Fig. 3.

To test these analytic approximations, we can compare the predicted values of Kopt with those found numerically. There is a slight subtlety, since our analytic results for Kopt depend on the Hill coefficient n. We can take this coefficient as known from the numerical optimization or we can use the approximations to 1 [as in Eq. (44)] to simultaneously optimize for K and n. In contrast to the optimization of K, however, there is no simple formula for nopt, even in our approximation at large cmax.

Results for the approximate vs numerically exact optimal K are shown in Fig. 4. As it should, the approximation approaches the exact answer as cmax becomes large. In fact, the approximation is quite good even at cmax/c0 ~ 10, and for activators the error in Kopt is only ~15% at cmax/c0 ~ 3. Across the full range of cmax/c0 > 1, the analytic approximation captures the basic trends: Kopt/cmax is a slowly decreasing function of cmax/c0, Koptact is larger than Koptrep by roughly a factor of 2, and for both activators and repressors we have Kopt noticeably smaller than cmax/2. Similarly good results are obtained for the approximate predictions of the optimal Hill coefficient n as shown in Fig. 4(b).

FIG. 4.

FIG. 4

Approximate results for the optimal values of K (a) and n (b) compared with exact numerical results for activators (black lines) and repressors (gray lines). As explained in the text, we can use our analytic approximations to determine, for example, the optimal K assuming n is known (large cmax with known n results) or we can simultaneously optimize both parameters (large cmax results); results are shown for both calculations.

As noted above, the large cmax approximation makes clear that optimizing information transmission is a compromise between using the full dynamic range of outputs and avoiding expression levels associated with large noise at low concentration of the input. The constraint of using the full dynamic range pushes the optimal K downward; this constraint is stronger for repressors [compare the second terms of Eqs. (44) and (48)] causing the optimal Ks of repressors to be smaller than those of the activators. On the other hand, avoiding input noise pushes the most sensitive part of the expression profile toward high concentrations favoring large K. The fact that this approximation captures the basic structure of the numerical solution to the optimization problem encourages us to think that this intuitive compromise is the essence of the problem. It is also worth noting that as cmax increases, activators increase their output range, hence gaining capacity. On the other hand, the output of the repressed systems is small for large cmax and the output noise thus is large, limiting the increase in capacity compared to the activated genes, as is seen in Fig. 3(c).

In the case of small cmax it is harder to obtain detailed expressions for K; however, we can still gain insight from the expression for the capacity in this limit. To obtain the large cmax limit we assumed that x(dḡ/dx)2 in the denominator of the integrand which defines Z1; to obtain the small cmax limit we make the opposite assumption:

Z˜10Cdx{[dg¯(x)/dx]2g¯(x)+x[dg¯(x)/dx]2}1/2=0Cdx1x[11+g¯(x)/{x[dg¯(x)/dx]2}]1/20Cdx1x[1x2n21g(1g)2+], (52)

where in the last step we use the relation in Eq. (41). We see that, if g approaches one, the first correction term will diverge. This allows us to predict the essential feature of the optimal solutions at small cmax, namely, that they do not access the full dynamic range of outputs.

C. Constraining means

Here we would like to solve the same optimization problem by constraining the mean concentrations rather than imposing a hard constraint on the maximal concentrations; as noted above we can also think of this problem as maximizing information subject to a fixed cost per molecule. To compare results in a meaningful way, we should know how the mean concentration varies as a function of cmax when we solve the problem with constrained maxima and this is shown in Fig. 5(a). An interesting feature of these results is that mean concentrations are much less than half of the maximal concentration. Also, the mean input concentrations for activator and repressor systems are similar despite different values of the optimal K. This result shows that for a given dynamic range defined by cmax, there is an optimal mean input concentration, which is independent of whether the input/output relation is up or down regulating.

FIG. 5.

FIG. 5

(a) Mean concentration of the transcription factor when we optimize information transmission subject to a constraint on the maximum concentration. Results are shown for one input and one output, both for activators and repressors. The dashed black line shows equality. (b)–(d) Comparing two formulations of the optimization problem for activators (black lines) and repressors (gray lines) calculated with a finite dynamic range (cmax—circles and solid lines) and constrained means (crosses and dashed lines). The panels show the relative information in panel (b), the optimal value of K in panel (c), and the optimal value of the Hill coefficient in panel (d). In panel (c), approximate results for K are shown as a function of 〈c〉 from Eqs. (56) and (58).

Equation (28) shows us how to compute the partition function Z2 for the case where we constrain the mean concentration of transcription factors and Eq. (30) relates this to the information capacity I2. Substituting our explicit expressions for the noise in the case of one input and one output, we have

Z2=[Nmax2πe]1/2Z˜2, (53)
Z˜2=0dc{[dg¯(c)/dc]2g¯(c)+cc0[dg¯(c)/dc]2}1/2eαc. (54)

As before, we choose Hill functions for (c) and maximize I2 with respect to the parameters K and n. This defines a family of optimal solutions parametrized by the Lagrange multiplier α and we can tune this parameter to match the mean concentration 〈c〉. Using the calibration in Fig. 5(a), we can compare these results with those obtained by optimizing with a fixed maximum concentration. Results are shown in Figs. 5(b)–5(d).

The most important conclusion from Fig. 5 is that constraining mean concentrations and constraining maximal concentrations give—for this simple problem of one input and one output—essentially the same answer. The values of the optimal Ks are almost identical [Fig. 5(c)], as are the actual number of bits that can be transmitted [Fig. 5(b)]. The only systematic difference is in the Hill coefficient n, where having a fixed maximal concentration drives the optimization toward slightly larger values of n [Fig. 5(d)] so that more of the dynamic range of outputs is accessed before the system runs up against the hard limit at c=cmax.

It is interesting that the optimal value of K is more nearly a linear function of 〈c〉 than of cmax, as we see in Fig. 5(c). To understand this, we follow the steps in Sec. III B, expanding the expression for 〈c〉 in the same approximation that we used for large cmax:

c=0Cdcc{[dg¯(c)/dc]2g¯(c)+cc0[dg¯(c)/dc]2}1/20Cdc{[dg¯(c)/dc]2g¯(c)+cc0[dg¯(c)/dc]2}1/2g¯(0)g¯(C)dg¯cg¯12g¯(0)g¯(C)dg¯n2g¯1/2(1g¯)2g¯(0)g¯(C)dg¯1g¯12g¯(0)g¯(C)dg¯n2cg¯1/2(1g¯)2. (55)

In the case of an activator, c=K/c0[/(1−)]1/n, and the leading terms become

c=g¯(0)g¯(C)dg¯g¯1/n1/2(1g¯)1/ng¯(0)g¯(C)dg¯g¯1/2×[K+n22g¯(0)g¯(C)dg¯g¯1/21/n(1g¯)2+1/ng¯(0)g¯(C)g¯1/2+]n22g¯(0)g¯(C)dg¯g¯1/2(1g¯)2g¯(0)g¯(C)dg¯g¯1/2. (56)

To get some intuition for the numerical values of these terms we will assume the integral covers the whole expression range ∈ [0, 1], and n=3. Then this expression simplifies to

c0.86K+0.52, (57)

so we understand how this simple result emerges, at least asymptotically at large cmax.

In the case of repressors the leading terms are

c=g¯(0)g¯(C)dg¯g¯1/n1/2(1g¯)1/ng¯(0)g¯(C)dg¯g¯1/2×[K+n22g¯(0)g¯(C)dg¯g¯1/2+1/n(1g¯)21/ng¯(0)g¯(C)g¯1/2+]n22g¯(0)g¯(C)dg¯g¯1/2(1g¯)2g¯(0)g¯(C)dg¯g¯1/2. (58)

As in the case of the activator, making the rough approximation that n=3 and ∈ [0, 1] allows us to get some intuition for this large cmax result:

c2.8K+1.19. (59)

These extremely crude estimates do predict the basic linear trends in Fig. 5(c), including the fact that for a given value of the mean concentration, the repressor has a smaller K than the activator.

Before leaving this section, we should return to the question of constraining mean outputs, as well as mean inputs. We have measured the input concentration in absolute units (or relative to the physical scale c0), so when we constrain the mean input we really are asking that the system use only a fixed mean number of molecules. In contrast, we have measured outputs in relative units, so that the maximum of (c) is 1. If we want to constrain the mean number of output molecules, we need to fix not 〈g〉, but rather Nmaxg〉, since the factor of Nmax brings us back to counting the molecules in absolute terms [60]. Thus, exploring constrained mean output requires us to view Nmax (and hence the scale c0) as an extra adjustable parameter.

By itself, adding Nmax as an additional optimization parameter makes our simple problem more complicated, but does not seem to add much insight. In principle it would allow us to discuss the relative information gain on adding extra input vs output molecules, with the idea that we might find optimal information transmission subject to some net resource constraint; for initial results in this direction see Ref. [41]. In networks with feedback, the target genes also act as transcription factors and these tradeoffs should be more interesting. We will return to this problem in subsequent papers.

IV. MULTIPLE OUTPUTS

When the single transcription factor at the input of our model system has multiple independent target genes, and we constrain the maximal concentrations, the general form of the information capacity in the small noise limit is given by Eq. (24),

Z1=0cmaxdc{12πei=1M1σi2(c)[dg¯i(c)dc]2}1/2=[Nmax2πe]1/20cmaxdc{i=1M[dg¯i(c)/dc]2g¯i(c)+cc0[dg¯i(c)/dc]2}1/2, (60)

where we assume for simplicity that the basic parameters Nmax and Dℓτ are the same for all the target genes. Once again, c0=Nmax/Dℓτ provides a natural unit of concentration. We limit ourselves to an extended discussion of the case with a hard upper bound, cmax, to the dynamic range of the input. As in the case of a single output, the calculation with a constrained mean input concentration gives essentially the same results.

To get an initial feeling for the structure of the problem, we try the case of five target genes, all of which are activated by the transcription factor. Then,

g¯i(c)=cnicni+Kini, (61)

and we can search numerically for the optimal settings of all the parameters {Ki, ni}. Results are shown in Fig. 6. A striking feature of the problem is that, for small values of the maximal concentration C=cmax/c0, the optimal solution is actually to have all five target genes be completely redundant, with identical values of Ki and ni. As cmax increases, this redundancy is lifted and the optimal solution becomes a sequence of target genes with staggered activation curves, in effect “tiling” the input domain 0<c<cmax. To interpret these results, we realize that for small maximal concentration the input noise dominates and the optimal strategy for M genes is to “replicate” one well-placed gene M times: having M independent and redundant readouts (with identical K and n) of the input concentration will decrease the noise by a factor of M. However, as the dynamic range increases and output noise has a chance to compete with the input noise, more information can be transmitted by using M genes to probe the input at different concentrations, thereby creating a cascade of genes that get activated at successively higher and higher input levels. The transition between these two readout strategies is described in more detail below.

FIG. 6.

FIG. 6

(Color online) Optimal input/output relations for the case of five independent target genes activated by the TF at concentration c. Successive panels [(a)–(e)] correspond to different values of the maximal input concentration as indicated (C=0.3, 1, 3, 5, 10). Panel (f) summarizes the optimal values of the Ki as a function of C=cmax/c0: as C is increased, the Ki of the fully redundant input/output relations for C=0.3 bifurcate such that at C=10 the genes tile the whole input range.

To look more closely at the structure of the problem, we drop down to consider two target genes. Then there are three possibilities—two activators (AA), two repressors (RR), and one of each (AR). For each of these discrete choices, we have to optimize two exponents (n1, n2) and two half-maximal points (K1, K2). In Fig. 7 we show how 1 varies in the (K1, K2) plane assuming that at every point we choose the optimum exponents; the different quadrants correspond to the different discrete choices of activator and repressor. The results show clearly how the redundant (K1=K2) solutions at low values of cmax bifurcate into asymmetric (K1K2) solutions at larger values of cmax; the critical value of cmax is different for activators and repressors. This bifurcation structure is summarized in Fig. 8, where we also see that, for each value of cmax, the three different kinds of solutions (AA, RR, and AR) achieve information capacities that differ by less than 0.1 bits.

FIG. 7.

FIG. 7

(Color online) The case of two target genes. The maps show contour plots of relative information (log2 1) as a function of the K values of the two genes: K1 and K2. In each map, the upper right quadrant (A-A) contains solutions where both genes are activated by a common TF, in the lower left quadrant (R-R) both genes are repressed, and the other two quadrants (A-R) contain an activator-repressor mix. The maximal concentration of the input molecules is fixed at cmax/c0=0.1 in map (a), at 0.5 in map (b), and at 1 in map (c). We see that, for example, only at the highest value of cmax does the two-activator solution in the upper right quadrant correspond to distinct values of K1 and K2; at lower values of cmax the optimum is along the “redundant” line K1=K2. The redundancy is lifted at lower values of cmax in the case of repressors, as we see in the lower left quadrants, and the mixed activator/repressor solutions are always asymmetric. At large cmax we also see that there are two distinct mixed solutions.

FIG. 8.

FIG. 8

The relative information for stable solutions for two genes as a function of cmax [panel (a)]. The inset shows the difference in information transmission for two activators and the mixed case relative to the two repressors. In panel (b), the optimal K1 and K2 are plotted as a function of cmax for two activators (squares) and two repressors (circles). The bifurcation in K is a continuous transition that happens at lower cmax in the case of two repressors.

The information capacity is an integral of the square root of a sum of terms, one for each target gene [Eq. (60)]. Thus, if we add redundant copies of a single gene, all with the same values of K and n, the integral Z1 will scale as M, where M is the number of genes. In particular, as we go from one to two target genes, Z would increase by a factor 2 and hence the information capacity, log2 Z, would increase by one half bit; more generally, with M redundant copies, we have (1/2)log2 M bits of extra information relative to having just one gene. On the other hand, if we could arrange for two target genes to make nonoverlapping contributions to the integral, then two genes could have a value of Z that is twice as large as for one gene, generating an extra bit rather than an extra half bit. In fact a full factor of 2 increase in Z is not achievable because once the two target genes are sampling different regions of concentration they are making different tradeoffs between the input and output noise terms; since the one gene had optimized this tradeoff, bifurcating into two distinguishable targets necessarily reduces the contribution from each target. Indeed, if the maximal concentration is too low then there is no “space” along the c axis to fit two distinct activation (or repression) curves, and this is why low values of cmax favor the redundant solutions.

Figure 9(a) shows explicitly that when we increase the number of target genes at low values of cmax, the optimal solution is to use the genes redundantly and hence the gain in information is (1/2)log2 M. At larger values of cmax, going from one target to two targets one can gain more than half a bit, but this gain is bounded by 1 bit, and indeed over the range of cmax that we explore here the full bit is never quite reached.

FIG. 9.

FIG. 9

The relative information for different values of cmax as a function of the number of genes, M, shown in panel (a). At low cmax the genes are redundant and so the capacity grows as (1/2)log2 M; at high cmax, the increase in capacity is larger but bounded from above by 1 bit. The differences in information for various combinations of activators and repressors are comparable to the size of the plot symbols. In panel (b), the relative information for different numbers of genes as a function of cmax. At higher M, the system can make better use of the input dynamic range.

We can take a different slice through the parameter space of the problem by holding the number of target genes fixed and varying cmax. With a single target gene, we have seen (Fig. 3) that the information capacity saturates rapidly as cmax is increased above c0. We might expect that, with multiple target genes, it is possible to make better use of the increased dynamic range and this is what we see in Fig. 9(b).

For a system with many target genes, it is illustrative to plot the optimal distribution of input levels, PTF*(c)σc1(c). Figure 10 shows the results for the case of M=2,3,…,9 genes at low (C=0.3) and high (C=30) input dynamic range. At low input dynamic range the distributions for various M collapse onto each other (because the genes are redundant), while at high C increasing the number of genes drives the optimal distribution closer to ∝c−1/2. We recall that the input noise is σcc, so this shows that, as the number of targets becomes large, the input noise becomes dominant over a wider and wider dynamic range.

FIG. 10.

FIG. 10

(Color online) The optimal probability distribution of inputs, PTF*(c). In red (dotted line), plotted for C=0.3. In blue (solid line), plotted for C=30. Different lines correspond to solutions with 2, 3,…,9 genes. At low C (red dotted line), the genes are degenerate and the input distribution is independent of the number of genes. At high C (blue solid line), where the genes tile the concentration range, the optimal input distribution approaches (c/c0)−1/2 (dashed line) as the number of target genes increases.

Finally, one can ask how finely tuned the input/output relations for the particular genes need to be in a maximally informative system. To consider how the capacity of the system changes when the parameters of the input/output relations change slightly, we analyzed the (Hessian) matrix of second derivatives of the information with respect to fractional changes in the various parameters; we also made more explicit maps of the variations in information with respect to the individual parameters and sampled the variations in information that result from random variations in the parameters within some range. Results for a two gene system are illustrated in Fig. 11.

FIG. 11.

FIG. 11

(Color online) Parameter variations away from the optimum. Results are shown for a two gene system focusing on the solution with two activators. (a) Analysis of the Hessian matrix for cmax/c0=0.3, where the two genes are redundant. Top four panels show the variation in information (δI in bits) along each dimension of the parameter space (thick red line) and the quadratic approximation. (b) As in (a), but with cmax/c0=10, where the optimal solution is nonredundant. We also show the eigenvectors and eigenvalues of the Hessian matrix. (c) Distribution of information loss ΔI when the parameters K1 and K2 are chosen at random from a uniform distribution in ln K, with widths as shown; here cmax/c0=10. (d) As in (c), but for variations in the Hill coefficients n1 and n2.

The first point concerns the scale of the variations—20% changes in parameters away from the optimum result in only ~0.01 bits of information loss, and this is true both at low cmax where the solutions are redundant and at high cmax where they are not. Interestingly, the eigenmodes of the Hessian reveal that in the asymmetric case the capacity is most sensitive to variations in the larger K. The second most sensitive (much weaker than the first) direction is a linear combination of both of the parameters K and n for the gene which is activated at lower concentrations. Perhaps surprisingly, this means that genes which activate at higher K need to have their input/output relations positioned with greater accuracy along the c axis, even in fractional terms. If we think of K ~ eF/kBT, where F is the binding (free) energy of the transcription factor to its specific target site along the genome, another way of stating this result is that weaker binding energies (smaller F) must be specified with greater precision to achieve a criterion level of performance. Finally, if we allow parameters to vary at random, we see [Figs. 11(c) and 11(d)] that the effects on information capacity are extremely small as long as these variations are bounded, so that the range of the natural log of the parameters is significantly less than 1. If we allow larger fluctuations, there is a transition to a much broader distribution of information capacities, with a substantial loss relative to the optimum.

V. DISCUSSION

The ability of cells to control the expression levels of their genes is central to growth, development, and survival. In this work we have explored perhaps the simplest model for this control process, in which changes in the concentration of a single transcription factor protein modulate the expression of one or more genes by binding to specific sites along the DNA. Such models have many parameters, notably the binding energies of the transcription factor to the different target sites and the interactions or cooperativity among factors bound to nearby sites that contribute to the control of the same gene. This rapid descent from relatively simple physical pictures into highly parametrized models is common to most modern attempts at quantitative analysis of biological systems. Our goal in this work is to understand whether these many parameters can be determined by appeal to theoretical principles rather than solely by fitting to data.

We begin our discussion with a caveat. Evidently, deriving the many parameters that describe a complex biological system is an ambitious goal and what we present here is at best a first step. By confining ourselves to systems in which one transcription factor modulates the expression of many genes, with no further inputs or interactions, we almost certainly exclude the possibility of direct, quantitative comparisons with real genetic control networks. Understanding this simpler problem, however, is a prerequisite to analysis of more complex systems, and, as we argue here, sufficient to test the plausibility of our theoretical approach.

The theoretical principle to which we appeal is the optimization of information transmission. In the context of genetic control systems, we can think of information transmission as a measure of control power—if the system can transmit I bits of information, then adjustment of the inputs allows the cell to access, reliably, 2I distinguishable states of gene expression. In unicellular organisms, for example, these different states could be used to match cellular metabolism to the available nutrients, while in the developing embryo of a multicellular organism these different states could be the triggers for emergence of different cell types or spatial structures; in either case, it is clear that information transmission quantifies our intuition about the control power or (colloquially) complexity that the system can achieve. Although one could imagine different measures, specialized to different situations, we know from Shannon that the mutual information is the unique measure that satisfies certain plausible conditions and works in all situations [30,31].

Information transmission is limited by noise. In the context of genetic control systems, noise is significant because the number of molecules involved in the control process is small and basic physical principles dictate the random behavior of the individual molecules. In this sense, the maximization of information transmission really is the principle that organisms should extract maximum control power from a limited number of molecules. Analysis of experiments on real control elements suggests that the actual number of molecules used by these systems sets a limit of 1–3 bits on the capacity of a transcription factor to control the expression level of one gene, that significant increases in this capacity would require enormous increases in the number of molecules, and that, at least in one case, the system can achieve ~90% of its capacity [41,42]. Although these observations are limited in scope, they suggest that cells may need to operate close to the informational limits set by the number of molecules that they can afford to devote to these genetic control processes.

The strategy needed to optimize information transmission depends on the structure of the noise in the system. In the case of transcriptional control, there are two irreducible noise sources, the random arrival of transcription factors at their target sites and the shot noise in the synthesis and degradation of the output molecules (mRNA or protein). The interplay between these noise sources sets a characteristic scale for the concentration of transcription factors, c0 ~ 15–150 nM. If the maximum available concentration is too much larger or smaller than this scale, then the optimization of information transmission becomes degenerate and we lose predictive power. Further, c0 sets the scale for diminishing returns, such that increases in concentration far beyond this scale contribute progressively smaller amounts of added information capacity. Thus, with any reasonable cost for producing the transcription factor proteins, the optimal tradeoff between bits and cost will set the mean or maximal concentration of transcription factors in the range of c0. Although only a very rough prediction, it follows without detailed calculation, and it is correct (Table I).

The optimization of information transmission is largely a competition between the desire to use the full dynamic range of outputs and the preference for outputs that can be generated reproducibly, that is, at low noise. Because of the combination of noise sources, this competition has nontrivial consequences, even for a single transcription factor controlling one gene. As we consider the control of multiple genes, the structure of the solutions becomes richer. Activators and repressors are both possible and can achieve nearly identical information capacities. With multiple target genes, all the combinations of activators and repressors also are possible [61]. This suggests that, generically, there will be exponentially many networks that are local optima, with nearly identical capacities, making it possible for a theory based on optimization to generate diversity.

For a limited range of input transcription factor concentrations, the solutions which optimize information transmission involve multiple redundant target genes. Absent this result, the observation of redundant targets in real systems would be interpreted as an obvious sign of nonoptimality, a remnant of evolutionary history, or perhaps insurance against some rare catastrophic failure of one component. As the available range of transcription factor concentrations becomes larger, optimal solutions diversify, with the responses of the multiple target genes tiling the dynamic range of inputs. In these tiling solutions, targets that require higher concentrations to be activated or repressed also are predicted to exhibit greater cooperativity; in such an optimized system one thus should find some genes controlled by a small number of strong binding sites for the transcription factor and other genes with a large number of weaker sites.

To a large extent, the basic structure of the (numerically) optimal solutions can be recovered analytically through various approximation schemes. These analytic approximations make clear that the optimization really is driven by a conflict between using the full dynamic range of outputs and avoiding states with high intrinsic noise. In particular, this means that simple intuitions based on maximizing the entropy of output states, which are correct when the noise is unstructured [34], fail. Thus, almost all solutions have the property that at least one target gene is not driven through the full dynamic range of its outputs, and even with one gene the midpoint of the optimal activation curve can be far from the midpoint of the available range of inputs. The interplay between different noise sources also breaks the symmetry between activators and repressors, so that repressors optimize their information transmission by using only a small fraction of the available input range.

The predictive power of our approach depends on the existence of well defined optima. At the same time, it would be difficult to imagine evolution tuning the parameters of these models with extreme precision, so the optima should not be too sharply defined. Indeed, we find that optima are clear but broad. In the case of multiple genes, random ~25% variations in parameters around their optima result in only tiny fractions of a bit of information loss, but once fluctuations become larger than this the information drops precipitously. Looking more closely, we find that proper placement of the activation curves at the upper end of the input range is more critical implying that it is actually the weaker binding sites whose energies need to be adjusted more carefully (perhaps contrary to intuition).

With modest numbers of genes, the optimization approach we propose here has the promise of making rather detailed predictions about structure of the input/output relations, generating what we might think of as a spectrum of Ks and ns. In the limit of larger networks, we might expect this spectrum to have some universal properties and we see hints of this in Fig. 10. Here, as we add more and more target genes, the optimal distribution of inputs approaches an asymptote PTF(c)1/c; more of this limiting behavior is accessible if the available dynamic range of inputs is larger. This is the form we expect if the effective noise is dominated by the input noise, σcc. Thus, adding more targets and placing them optimally allows the system to suppress output noise and approach ever more closely the fundamental limits set by the physics of diffusion.

Although there are not so many direct physical measurements specifying the input/output relations of genetic regulatory elements, there are many systems in which there is evidence for tiling of the concentration axis by a set of target genes, all regulated by the same transcription factor, along the lines predicted here [62]. For example, in quorum sensing by bacteria, the concentrations of extracellular signaling molecules are translated internally into different concentrations of LuxR, which acts as a transcription factor on a number of genes, and these can be classified as being responsive to low, intermediate, and high levels of LuxR [63]. Similarly, the decision of Bacillus subtilis to sporulate is controlled by the phosphorylated form of the transcription factor Spo0A, which regulates the expression of ~30 genes as well as an additional 24 multigene operons [64]. For many of these targets the effects of SpoA ~ P are direct and the sensitivity to high vs low concentrations can be correlated with the binding energies of the transcription factor to the particular promoters [65]. In yeast, the transcription factor Pho4 is a key regulator of phosphate metabolism and activates targets such as pho5 and pho84 at different concentrations [66]. All of these are potential test cases for the theoretical approach we have outlined here (each with its own complications), but a substantially new level of quantitative experimental work would be required to test the theory meaningfully.

The classic example of multiple thresholds in the activation of genes by a single transcription factor is in embryonic development [3,4]. In this context, spatial gradients in the concentration of transcription factors and other signaling molecules mean that otherwise identical cells in the same embryo experience different inputs. If multiple genes are activated by the same transcription factor but at different thresholds, then smooth spatial gradients can be transformed into sharper “expression domains” that provide the scaffolding for more complex spatial patterns. Although controversies remain about the detailed structure of the regulatory network, the control of the “gap genes” in the Drosophila embryo by the transcription factor Bicoid seems to provide a clear example of these ideas [4,6771]. Recent experimental work [16,17] suggests that it will be possible to make absolute measurements of (at least) Bicoid concentrations, and to map the input/output relations and noise in this system, holding out the hope for more quantitative comparison with theory.

Finally, we look ahead to the more general problem in which multiple target genes are allowed to interact. Absent these interactions, even our optimal solutions have a strong degree of redundancy—as the different targets turn on at successively higher concentrations of the input, there is a positive correlation and hence redundancy among the signals that they convey. This redundancy could be removed by mutually repressive interactions among the target genes, increasing the efficiency of information transmission in much the same way that lateral inhibition or center-surround organization enhances the efficiency of neural coding in the visual system [33,35]. It is known that such mutually repressive interactions exist, for example, among the gap genes in the Drosophila embryo [72]. The theoretical challenge is to see if these observed structures can be derived, quantitatively, from the optimization of information transmission.

ACKNOWLEDGMENTS

We thank T. Gregor, J. B. Kinney, P. Mehta, T. Mora, S. F. Nørrelykke, E. D. Siggia, and especially C. G. Callan for helpful discussions. Work at Princeton was supported in part by NSF Grant No. PHY-0650617, and by NIH Grants No. P50 GM071508 and No. R01 GM077599. G.T. was supported in part by NSF Grants No. DMR04-25780 and No. IBN-0344678 and by the Vice Provost for Research at the University of Pennsylvania. W.B. also thanks his colleagues at the University of Rome, La Sapienza for their hospitality during a portion of this work.

References

  • 1.Rieke F, Warland D, de Ruyter van Steveninck R, Bialek W. Spikes: Exploring the Neural Code. Cambridge: MIT Press; 1997. [Google Scholar]
  • 2.Ptashne M, Gann A. Genes and Signals. New York: Cold Spring Harbor Press; 2002. [Google Scholar]
  • 3.Wolpert L. J. Theor. Biol. 1969;25:1. doi: 10.1016/s0022-5193(69)80016-0. [DOI] [PubMed] [Google Scholar]
  • 4.Lawrence PA. The Making of a Fly: The Genetics of Animal Design. Oxford: Blackwell; 1992. [Google Scholar]
  • 5.Elowitz MB, Levine AJ, Siggia ED, Swain PD. Science. 2002;297:1183. doi: 10.1126/science.1070919. [DOI] [PubMed] [Google Scholar]
  • 6.Ozbudak E, Thattai M, Kurtser I, Grossman AD, van Oudenaarden A. Nat. Genet. 2002;31:69. doi: 10.1038/ng869. [DOI] [PubMed] [Google Scholar]
  • 7.Blake WJ, Kaern M, Cantor CR, Collins JJ. Nature (London) 2003;422:633. doi: 10.1038/nature01546. [DOI] [PubMed] [Google Scholar]
  • 8.Setty Y, Mayo AE, Surette MG, Alon U. Proc. Natl. Acad. Sci. U.S.A. 2003;100:7702. doi: 10.1073/pnas.1230759100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Raser JM, O’Shea EK. Science. 2004;304:1811. doi: 10.1126/science.1098641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rosenfeld N, Young JW, Alon U, Swain PS, Elowitz MB. Science. 2005;307:1962. doi: 10.1126/science.1106914. [DOI] [PubMed] [Google Scholar]
  • 11.Pedraza JM, van Oudenaarden A. Science. 2005;307:1965. doi: 10.1126/science.1109090. [DOI] [PubMed] [Google Scholar]
  • 12.Golding I, Paulsson J, Zawilski SM, Cox EC. Cell. 2005;123:1025. doi: 10.1016/j.cell.2005.09.031. [DOI] [PubMed] [Google Scholar]
  • 13.Newman JRS, Ghaemmaghami S, Ihmels J, Breslow DK, Noble M, DeRisi L, Weissman JS. Nature (London) 2006;441:840. doi: 10.1038/nature04785. [DOI] [PubMed] [Google Scholar]
  • 14.Bar-Even A, Paulsson J, Maheshri N, Carmi M, O’Shea E, Pilpel Y, Barkai N. Nat. Genet. 2006;38:636. doi: 10.1038/ng1807. [DOI] [PubMed] [Google Scholar]
  • 15.Kuhlman T, Zhang Z, Saier MH, Jr, Hwa T. Proc. Natl. Acad. Sci. U.S.A. 2007;104:6043. doi: 10.1073/pnas.0606717104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gregor T, Wieschaus EF, McGregor AP, Bialek W, Tank DW. Cell. 2007;130:141. doi: 10.1016/j.cell.2007.05.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Gregor T, Tank DW, Wieschaus EF, Bialek W. Cell. 2007;130:153. doi: 10.1016/j.cell.2007.05.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Arkin A, Ross J, McAdams HH. Genetics. 1998;149:1633. doi: 10.1093/genetics/149.4.1633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kepler T, Elston T. Biophys. J. 2001;81:3116. doi: 10.1016/S0006-3495(01)75949-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Swain PS, Elowitz MB, Siggia ED. Proc. Natl. Acad. Sci. U.S.A. 2002;99:12795. doi: 10.1073/pnas.162041399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bialek W, Setayeshgar S. Proc. Natl. Acad. Sci. U.S.A. 2005;102:10040. doi: 10.1073/pnas.0504321102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Paulsson J. Nature (London) 2004;427:415. doi: 10.1038/nature02257. [DOI] [PubMed] [Google Scholar]
  • 23.Walczak AM, Sasai M, Wolynes PG. Biophys. J. 2005;88:828. doi: 10.1529/biophysj.104.050666. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Buchler NE, Gerland U, Hwa T. Proc. Natl. Acad. Sci. U.S.A. 2005;102:9559. doi: 10.1073/pnas.0409553102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Tanase-Nicola S, Warren PB, ten Wolde PR. Phys. Rev. Lett. 2006;97:068102. doi: 10.1103/PhysRevLett.97.068102. [DOI] [PubMed] [Google Scholar]
  • 26.Tkačik G, Bialek W. Phys. Rev. E. 2009;79:051901. doi: 10.1103/PhysRevE.79.051901. [DOI] [PubMed] [Google Scholar]
  • 27.Morelli MJ, Allen RJ, Tanase-Nicola S, ten Wolde PR. Biophys. J. 2008;94:3413. doi: 10.1529/biophysj.107.116699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Bialek W, Setayeshgar S. Phys. Rev. Lett. 2008;100:258101. doi: 10.1103/PhysRevLett.100.258101. e-print arXiv:q-bio.MN/0601001. [DOI] [PubMed] [Google Scholar]
  • 29.Tkačik G, Gregor T, Bialek W. PLoS ONE. 2008;3:e2774. doi: 10.1371/journal.pone.0002774. e-print arXiv:q-bio.MN/0701002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shannon CE. Bell Syst. Tech. J. 1948;27:379. [Google Scholar]; Shannon CE, Weaver W. The Mathematical Theory of Communication. Urbana: University of Illinois Press; 1949. reprinted in. [Google Scholar]
  • 31.Cover TM, Thomas JA. Elements of Information Theory. New York: John Wiley; 1991. [Google Scholar]
  • 32.Attneave F. Psychol. Rev. 1954;61:183. doi: 10.1037/h0054663. [DOI] [PubMed] [Google Scholar]
  • 33.Barlow HB. Sensory Mechanisms, the Reduction of Redundancy, and Intelligence. In: Blake DV, Uttley AM, editors. Proceedings of the Symposium on the Mechanization of Thought Processes; London: HM Stationery Office; 1959. pp. 537–574. [Google Scholar]
  • 34.Laughlin SB, Naturforsch Z. 1981;36c:910. C. [PubMed] [Google Scholar]
  • 35.Atick JJ, Redlich AN. Neural Comput. 1990;2:308. doi: 10.1162/neco.1996.8.6.1321. [DOI] [PubMed] [Google Scholar]
  • 36.Smirnakis S, Berry MJ, II, Warland DK, Bialek W, Meister M. Nature (London) 1997;386:69. doi: 10.1038/386069a0. [DOI] [PubMed] [Google Scholar]
  • 37.Brenner N, Bialek W, de Ruyter van Steveninck R. Neuron. 2000;26:695. doi: 10.1016/s0896-6273(00)81205-2. [DOI] [PubMed] [Google Scholar]
  • 38.Fairhall AL, Lewen GD, Bialek W, de Ruyter van Steveninck RR. Nature (London) 2001;412:787. doi: 10.1038/35090500. [DOI] [PubMed] [Google Scholar]
  • 39.Maravall M, Petersen RS, Fairhall AL, Arabzadeh E, Diamond ME. PLoS Biol. 2007;5:e19. doi: 10.1371/journal.pbio.0050019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Wark B, Lundstrom BN, Fairhall A. Curr. Opin. Neurobiol. 2007;17:423. doi: 10.1016/j.conb.2007.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Tkačik G, Callan CG, Jr, Bialek W. Phys. Rev. E. 2008;78:011910. doi: 10.1103/PhysRevE.78.011910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Tkačik G, Callan CG, Jr, Bialek W. Proc. Natl. Acad. Sci. U.S.A. 2008;105:12265. doi: 10.1073/pnas.0806077105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Ziv E, Nemenman I, Wiggins CH. PLoS One. 2007;2:e1077. doi: 10.1371/journal.pone.0001077. e-print arXiv:q-bio/0612041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Emberly E. Phys. Rev. E. 2008;77:041903. doi: 10.1103/PhysRevE.77.041903. [DOI] [PubMed] [Google Scholar]
  • 45.Tostevin F, ten Wolde PR. Phys. Rev. Lett. 2009;102:218101. doi: 10.1103/PhysRevLett.102.218101. [DOI] [PubMed] [Google Scholar]
  • 46.Strictly speaking, Shannon considered the question “how much does the output tell us about the input?” which is almost the same as the question about control power posed here.
  • 47.In general, if we allow for interactions, P({gi}|c)=1(2π)M/2exp{12lndet(𝒦)12i,j=1M[gig¯i(c)]𝒦ij[gjg¯j(c)]} where 𝒦 measures the (inverse) covariance of the fluctuations or noise in the expression levels at fixed input, 〈[gii (c)][gjj (c)]〉=(𝒦−1)ij. In the present paper we restrict the discussion to independent genes 𝒦ij=δij/σi2.
  • 48.We are being a bit sloppy here, since the input c is a continuous variable, but the intuition is correct.
  • 49.Bintu L, Buchler NE, Garcia H, Gerland U, Hwa T, Kondev J, Phillips R. Curr. Opin. Genet. Dev. 2005;15:116. doi: 10.1016/j.gde.2005.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Bintu L, Buchler NE, Garcia H, Gerland U, Hwa T, Kondev J, Kuhlman T, Phillips R. Curr. Opin. Genet. Dev. 2005;15:125. doi: 10.1016/j.gde.2005.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hill AV. J. Physiol. 1910;40:iv. doi: 10.1113/jphysiol.1910.sp001366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.The assumption that such nonfundamental noise sources are negligible is supported by a detailed quantitative analysis of the transformation between Bicoid and Hunchback in the early Drosophila embryo [29], although it is not known whether this is true more generally.
  • 53.Elowitz MB, Surette MG, Wolf P-E, Stock JB, Leibler S. J. Bacteriol. 1999;181:197. doi: 10.1128/jb.181.1.197-203.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Golding I, Cox EC. Phys. Rev. Lett. 2006;96:098102. doi: 10.1103/PhysRevLett.96.098102. [DOI] [PubMed] [Google Scholar]
  • 55.Elf J, Li G-W, Xie XS. Science. 2007;316:1191. doi: 10.1126/science.1141967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.This normalization of the mean expression levels is not exactly the same as fixing an absolute maximum, but in the small noise limit the difference is not significant.
  • 57.Pedone PV, Ghirlando R, Clore GM, Gronenborn AM, Felsenfeld G, Omichinski JG. Proc. Natl. Acad. Sci. U.S.A. 1996;93:2822. doi: 10.1073/pnas.93.7.2822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Oehler S, Amouyal M, Kolkhof P, von Wilcken-Bergmann B, Müller-Hill B. EMBO J. 1994;13:3348. doi: 10.1002/j.1460-2075.1994.tb06637.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Wang Y, Guo L, Golding I, Cox EC, Ong NP. Biophys. J. 2009;96:609. doi: 10.1016/j.bpj.2008.09.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Nmax is the maximum number of independent molecular events, which is not the same as the maximum number of protein molecules expressed, but they are proportional.
  • 61.Most transcription factors act either as activators or repressors, but not both. Thus, we should not take the existence of locally optimal solutions with “mixed” functions for the input molecule too literally. We think the important lesson here is that even simple formulations of our optimization problem have many solutions, so we expect that a realistic formulation, with multiple interacting targets, will also have multiple solutions. This is crucial if we want a theory based on optimization to predict the diversity of biological mechanisms.
  • 62.Our discussion here has been in the limit where expression levels are in steady state. A complementary view of why tiling of the concentration axis is important is that, as the transcription factor concentration changes over time, the cell would cross the thresholds for activation or repression of different genes in sequence generating a temporal program of expression in which different proteins are synthesized at different times. See, for example, Refs. [73,74].
  • 63.Waters C, Bassler BL. Genes Dev. 2006;20:2754. doi: 10.1101/gad.1466506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Molle V, Fujita M, Jensen ST, Eichenberger P, González-Pastor JE, Liu JS, Losick R. Mol. Microbiol. 2003;50:1683. doi: 10.1046/j.1365-2958.2003.03818.x. [DOI] [PubMed] [Google Scholar]
  • 65.Fujita M, González-Pastor JE, Losick R. J. Bacteriol. 2005;187:1357. doi: 10.1128/JB.187.4.1357-1368.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Lam FH, Steger DJ, O’Shea EK. Science. 2008;453:246. doi: 10.1038/nature06867.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Driever W, Nüsslein-Vollhard C. Cell. 1988;54:83. doi: 10.1016/0092-8674(88)90182-1. [DOI] [PubMed] [Google Scholar]
  • 68.Driever W, Nüsslein-Vollhard C. Cell. 1988;54:95. doi: 10.1016/0092-8674(88)90183-3. [DOI] [PubMed] [Google Scholar]
  • 69.Driever W, Nüsslein-Vollhard C. Nature (London) 1989;337:138. doi: 10.1038/337138a0. [DOI] [PubMed] [Google Scholar]
  • 70.Struhl G, Struhl K, Macdonald PM. Cell. 1989;57:1259. doi: 10.1016/0092-8674(89)90062-7. [DOI] [PubMed] [Google Scholar]
  • 71.Burz DS, Rivera-Pomar R, Jäckle H, Hanes SD. EMBO J. 1998;17:5998. doi: 10.1093/emboj/17.20.5998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.For a sampling of the evidence and models that describe these interactions, see Refs. [4,7578].
  • 73.Ronen M, Rosenberg R, Shraiman B, Alon U. Proc. Natl. Acad. Sci. U.S.A. 2002;99:10555. doi: 10.1073/pnas.152046799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Zaslaver A, Mayo AE, Rosenberg R, Bashkin P, Sberro H, Tsalyuk M, Surette MG, Alon U. Nat. Genet. 2004;36:486. doi: 10.1038/ng1348. [DOI] [PubMed] [Google Scholar]
  • 75.Jäckle H, Tautz D, Schuh R, Seifert E, Lehmann R. Nature (London) 1986;324:668. [Google Scholar]
  • 76.Rivera-Pomar R, Jäckle H. Trends Genet. 1996;12:478. doi: 10.1016/0168-9525(96)10044-5. [DOI] [PubMed] [Google Scholar]
  • 77.Sanchez L, Thieffry D. J. Theor. Biol. 2001;211:115. doi: 10.1006/jtbi.2001.2335. [DOI] [PubMed] [Google Scholar]
  • 78.Yu D, Small S. Curr. Biol. 2008;18:868. doi: 10.1016/j.cub.2008.05.050. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES