Skip to main content
Biophysical Journal logoLink to Biophysical Journal
. 2011 Sep 7;101(5):1123–1129. doi: 10.1016/j.bpj.2011.07.041

High-Affinity Quasi-Specific Sites in the Genome: How the DNA-Binding Proteins Cope with Them

J Chakrabarti †,‡,, Navin Chandra §, Paromita Raha §, Siddhartha Roy §
PMCID: PMC3164169  PMID: 21889449

Abstract

Many prokaryotic transcription factors home in on one or a few target sites in the presence of a huge number of nonspecific sites. Our analysis of λ-repressor in the Escherichia coli genome based on single basepair substitution experiments shows the presence of hundreds of sites having binding energy within 3 Kcal/mole of the OR1 binding energy, and thousands of sites with binding energy above the nonspecific binding energy. The effect of such sites on DNA-based processes has not been fully explored. The presence of such sites dramatically lowers the occupation probability of the specific site far more than if the genome were composed of nonspecific sites only. Our Brownian dynamics studies show that the presence of quasi-specific sites results in very significant kinetic effects as well. In contrast to λ-repressor, the E. coli genome has orders of magnitude lower quasi-specific sites for GalR, an integral transcription factor, thus causing little competition for the specific site. We propose that GalR and perhaps repressors of the same family have evolved binding modes that lead to much smaller numbers of quasi-specific sites to remove the untoward effects of genomic DNA.

Introduction

Within a bacterial cell, hundreds of DNA-binding proteins coexist with the genome, which consists of several million basepairs of DNA. Many DNA-binding proteins possess high affinity for their cognate target sequence and also have significant affinity for the noncognate sites. This multicomponent system must be highly optimized so that the functions of the DNA-binding proteins are not significantly affected by the presence of the other proteins or by the noncognate sites present within the genome.

It is even possible that the noncognate sequences are used to the advantage of the organism. In their seminal work on target location in the genome by transcription factors, Berg and co-workers (1) first suggested that DNA-binding proteins enhance the target search kinetics by utilizing the noncognate sequences of the genome. However, the mechanisms of this process are still hotly debated. Before the last decade, the genome sequences were largely unknown, which made it difficult to understand the exact role played by the noncognate sequences. Berg and co-workers treated the nontarget DNA as having a uniform binding potential. This was a necessary approximation because little was known about nontarget binding sequences. However, they perceptively mentioned the possibility of higher-affinity sites, based purely on statistical arguments. Takeda and co-workers (2,3) were the first to perform single basepair substitution experiments, which provided information about how single basepairs may contribute to the binding energy of a protein-DNA interaction. In the last several years, a number of high-throughput techniques have given us ideas about the contributions of individual basepairs for several other transcription factors, mostly eukaryotic (4–6).

Several previous studies indicated that the searching of target sequences by DNA-binding proteins may involve many complex steps (7–9). The importance of many of these steps is still not well understood. A number of authors have emphasized that the combination of three-dimensional diffusion with one-dimensional diffusion (sliding) might overcome the entropic barrier of the search process (8,9). To understand the sophisticated mechanisms underlying the optimization of such complex processes, we must first understand the molecular interactions between proteins and DNA sequences. With the elucidation of cocomplex structures of many of these DNA-binding proteins, we now have significant knowledge about how the cognate sequences are recognized by the DNA-binding proteins (10,11). However, the structure of the cocomplex does not directly reveal how a basepair within the target sequence affects the binding potential. The indirect effects of the basepairs on DNA conformation modulate the binding potential without directly coming in contact with the protein, at least in some cases (12). Less is known about the interactions of the DNA-binding proteins with the noncognate DNA (13). The noncognate DNA is generally treated as a single class of sequences that have only nonbase-specific interactions. Thus, in this picture, target sequences are embedded in a sea of nontarget sequences of equal but much weaker binding potential. The situation, however, may be more complicated. The basepairs within the target sequences (generally 14–18 basepairs long) do not contribute equally to the binding potential, giving rise to degenerate sequences with a similar binding potential. Even basepairs that do contribute to the binding energy can be mutated to create sequences that are somewhat less potent in binding but still possess significantly higher affinity than the truly noncognate sequence. We call such sequences quasi-specific sequences.

In this study we attempt to understand how the quasi-specific sites may affect the equilibrium and kinetic properties of DNA binding by proteins. We focus our attention on prokaryotes because eukaryotic chromosomes are more difficult to analyze due to the presence of nucleosomes and, in almost all cases, the presence of multiprotein complexes. In particular, we estimate the number of high-affinity sites for λ-repressor using single basepair substitution data obtained by Takeda and co-workers (3). We further show that such sites have a greater effect than the nontarget site on the specific binding in equilibrium.

We consider a simple theoretical model to understand the kinetic consequences of the pseudo-operator sites in the genome. In this model, the DNA is represented by a fixed chain of beads, and the protein is represented by a ball. We numerically simulate via a Brownian dynamics (BD) algorithm (14) the thermal motion of the ball in the presence of the forces generated by the beads. Each bead is the size of typical protein-binding region of the DNA, whereas the ball is the size of a typical prokaryotic transcription factor. The chain and ball are dispersed in water medium. All of the beads and the ball are charged, with the charges on the ball and the beads being opposite in sign. The neutralizing counterions and other ionic species, dispersed in water, screen the electrostatic attraction between the ball and the beads. Further, the ball experiences short-ranged dispersive forces by the beads. The parameters in the model interaction potential are adjusted so that the minimum value of the interaction potential corresponds to the experimentally known binding energy. We distinguish among specific, nonspecific, and quasi-specific beads on the basis of the minimum in the bead-ball interaction energy. One of the beads is taken to have interaction potential with a deeper minimum. This bead is called the specific bead, and the remaining beads are called nonspecific. The quasi-specific bead is selected out of the nonspecific beads and assigned an interaction potential with a depth of the minimum intermediate between that at the specific site and that of the other nonspecific sites.

In a previous work, we studied the motion of a ball over a fluctuating chain (15). In contrast, the model presented here considers the diffusion of the ball both along the chain and in the three-dimensional space, while the fixed chain generates the force field for the ball in motion. We calculate the characteristics of the ball motion and estimate the first passage time of the ball to the target bead. The probability distribution of separation of the ball with the beads clearly shows the localization of the ball in the vicinity of the quasi-specific bead. The diffusion of the ball along the chain is very slow and is further slowed down by virtue of binding to the quasi-specific bead. The mean first passage time required to locate the target bead starting from an arbitrary nontarget bead shows the signature of slowing down due to the quasi-specific site. These results give us an obvious impetus to perform single basepair substitution measurements for GalR, an integral protein of Escherichia coli, and compare the energy landscape of GalR with that of λ-repressor, which is a guest protein. On the basis of our findings, we suggest that a plausible mechanism evolved in these organisms to counteract the competition of the quasi-specific sites.

Methods

Identification of quasi-specific genomic sites

The complete E. coli K12 genome (NC_000913) was obtained from the National Center for Biotechnology Information web site (ftp://ftp.ncbi.nih.gov/). The λ-repressor binds specifically to a 17-basepair-long OR1 operator sequence. Binding energies from single basepair mutant OR1s were used (2) as the input file for a FORTRAN program, which calculated the total binding energy difference for each 17 base sequence, moving one base at a time throughout the E. coli genome (4639675 site on each strand). The entire genome was scanned in both directions. For GalR, we used the data obtained from the titrations of all possible single base mutants of OE with the Gal repressor protein in this study to assess the distribution of binding sites over the total E. coli genome sequence.

BD calculations

We perform BD (14) calculations on a model system consisting of a fixed chain of N beads and a ball. The ball and the beads are taken to be of the same size. We apply the periodic boundary condition (14) in the z direction such that the chain topologically mimics a circular DNA. The other walls confine the chain and the ball in the box. The equation of motion of the ball (16) is given by 6πηd (dR/dt)=fsys+fwall+frandom. Here η is the viscosity of the surrounding water, R is the position vector of the ball of diameter d at time t, fsys on the right-hand side is the systematic force on the ball due to all of the beads, fwall is the force generated by the confining walls, and frandom is the random force (16) due to numerous particles in the surrounding medium.

The systematic force is calculated from the model interaction potential. The interaction potential is taken to consist of two parts: the first part of the interaction describes the van der Waals interaction originating from the dispersion forces, and the second part takes care of the electrostatic forces. The interaction potential of the ball at R with a bead at r is given by Vα(ξ)=4εα[(d/ξ)12(d/ξ)6]λexp[kξ]/ξ, where ξ=|R r|, εα is the strength of the dispersion interaction with the bead, λ is the strength of the electrostatic interaction, and κ is the inverse Debye screening length in the medium. Here λ = (Qe)2/εkBT, where Q is the charge on the bead, e is the fundamental electronic charge, kB is the Boltzmann constant, T is the absolute temperature, and ε is the dielectric constant of the medium. We take εα to be sensitive to the bead identity: α = T indicates the target bead, α = NT indicates the nontarget beads, and α = P indicates the pseudo-operator bead. We choose εT > εP > εNT. The potential due to the confining walls is given by Vwall(Z)=4ϵw(d/ZH)12, where Z is the z component of R, and H is the wall position. Here fsys=(/R)[VT(ξ)+Vp(ξ)+iVNT,I(ξ)+Vwall(Z)]. Here the summation runs over the nontarget beads.

The equation of motion is discretized with a time step dt, so that the position vector R can be calculated using the following algorithm: R(t+dt)=R(t)+fsysdt/6πηd+Rrandom,. The random part of the displacement comes from the random force. The components of the random displacement are chosen from a Gaussian distribution of mean zero and variance 2Ddt, where D is the diffusion coefficient of the ball. The natural length scale in the problem is set by d and a timescale, ts = d2/2D. We calculate the trajectory of the ball starting from a position of the ball in the vicinity of a randomly selected nontarget bead. Note that the trajectory depends on the realization of the Gaussian random number. We repeat the calculation for a given starting point of the ball for many different realizations of the Gaussian random number. We start from the initial position of the ball in the vicinity of a randomly selected nontarget bead, and integrate the equation of motion of the ball numerically until the ball reaches the target bead within a radius r0.

The BD calculations are carried out for N = 100 beads in a rectangular parallelepiped box of dimensions 100d × 10d × 10d, where d = 30 Å, which is a typical size for the DNA-binding region. The protein sphere is taken to be the same size as a bead. Let us consider the parameters in the bead-ball interactions. The dispersive interaction parameters are chosen as follows: εNT/kBT = 10.0, εT/kBT = 50.0, and εP/kBT = 30.0. We further choose the electrostatic interaction parameters, λ/(dkBT) = 1.0 and κd = 1.0. With these parameters, the ball and the nonspecific bead interaction, VNT (ξ/d), has a minimum of −5 kBT at room temperature at ξ/d = 1.0 that corresponds to a surface-to-surface contact between a nonspecific bead and the ball. Similarly, the minimum of the ball and the target bead interaction, VT(ξ/d), located at ξ/d = 1.0, is ∼−20 kBT, and that for the ball and the pseudo-operator bead interaction, Vp(ξ/d), is −10 kBT. The energy values are comparable to experimentally known binding energies. The target bead, nT, is chosen to be the 50th bead at the center of the box. Note that this choice is quite arbitrary due to the periodic boundary condition in the z direction. The wall potential parameter, εW, is 100. The other parameters are as follows: η = 1 cP, ε = 80, D = 10−8 cm2/s, and r0/d = 1.0. We initially place the ball at a randomly chosen nonspecific bead (nin) over a sphere of radius 1.5 r0/d. The particle trajectory is calculated numerically using the updating algorithm with a time step dt = 0.001 ts, where the typical diffusion time ts ≈ 10−6 s, until the specific bead is approached within a radius r0/d. We repeat all of the calculations for 50 realizations of the Gaussian noise. We carry out the calculations both without any quasi-specific bead and with one quasi-specific bead, replacing one nonspecific site both near (nP = 40) and far from (nP = 10) the target bead.

We calculate the distribution of the ball with respect to an arbitrary bead. To this end, we consider the separation |s| between the ball and a nontarget site in the transverse x, y plane and |z| parallel to the z axis along the chain. We bin the distances in a two-dimensional histogram. This is done over different times and averaged over different realizations for a given initial position of the ball. The normalized histogram gives the probability distribution g(|s|,|z|) of the ball position with respect to the beads in the chain. We similarly compute the probability distributions gNT(|s|,|z|), gp(|s|,|z|), and gT(|s|,|z|) of separations between the ball and the nontarget, quasi-specific, and target bead, respectively. We calculate the distribution h(tb) of time tb spent by the ball in the bound state. We define the ball to be in the bound state if the ball is within radius r0 of a bead. The mean-squared displacement (MSD) of the ball is defined as dz2 = <(Z(t) − Z0)2 > along the z axis. Here Z(t) is the z component of the position vector of the ball, R(t), and Z0 is the initial z coordinates of the ball. The angular brackets indicate averaging over different initial coordinates of the ball. Next we consider the approach of the ball to the target site. For a given starting point, the first passage time, tfp, values in general differ from each other. In some cases the ball does not approach the target within a reasonable time in the calculation. In such cases, we abort the calculation after integrating for a very large time, t. We track the rate of successful encounters, sE, with the target site within t time for different trajectories generated by different realizations of the noise from a given starting point. We estimate the mean first passage time, tmfp, from the distribution of the first passage time for successful encounters with the target only when sE exceeds 0.5.

Results

There are hundreds of quasi-specific sites for λ-repressor in the E. coli genome

One can estimate the quasi-specific sites for a transcription factor on a genome by moving a target-length-sized window around the genome and estimating the binding potential using the single basepair substitution data. We start our analysis by deriving higher-affinity sites for λ-repressor in the E. coli genome based on the values obtained by Sarai and Takeda (2). Fig. 1 shows the estimated number of sites present and the affinity of each sites scattered around the genome. Four sites have ΔΔG < 1 Kcal/mole from OR1 binding energy, 43 sites have <2 Kcal/mole, and 321 sites have <3 Kcal/mole. Some of these energy values are greater than those found in the weaker operator sites of phage l (such as OR3). We now explore the thermodynamic and kinetic effects of having a significant number of such high-affinity sites in the genome.

Figure 1.

Figure 1

Polar plot of high-affinity quasi-specific sites for λ-repressor in E. coli genome based on the data of Sarai and Takeda (2). The radial coordinates refer to ΔΔG in Kcal/mole from the binding energy of OR1. Genome is represented by a 360° circle with the nucleotide 0 is starting at 0° and proceeding counterclockwise. The small filled circles represent the quasi-specific sites.

Thermodynamic consequences of high-affinity binding sites in the genome

We first explore the effect of the presence of such a large number of sites in the genome on the occupancy of the operator sites. The probability of occupation (P) of the target site for a single transcription factor in the presence of large excess of nonspecific genomic sites can be written as

P=exp[U0kT]giexp[UikT], (1)

where U0 is the binding energy of the single specific site, Ui is the energy for other sites, and gi is the respective statistical weight depending on degeneracy. If we assume that most of the proteins are bound to the genomic DNA, we can neglect the free protein term. This is reasonable because many minicell experiments have shown that >85–90% of the protein is bound to the genomic DNA (17,18). We will also assume that there are three classes of quasi-specific sites of numbers, k, l, and m, and with ΔΔG values 1, 2, and 3 Kcal/mole less than the OR1 affinity, respectively; n is the number of nonspecific sites, which is same as the number of basepairs in the genome. Under these circumstances, Eq. 1 reduces to

P=exp[U0kT]exp[U0kT]+kexp[U1kT]+lexp[U2kT]+mexp[U3kT]+nexp[U4kT], (2)

where Ui is the corresponding energy term. On rearrangement, we get from Eq. 2:

P=11+k.exp[U0U1kT]+l.exp[U0U2kT]+m.exp[U0U3kT]+nexp[U0U4kT]. (3)

For λ-repressor, all of the titrations were done in 0.2 M salt, and we use the value at that salt concentration. This is close to the ionic strength in vivo. We take U0 to be −12.8 Kcal/mole. As mentioned before, U1, U2, and U3 are less than U0 by 1, 2, and 3 Kcal/mole, respectively. U4 is taken as 4 Kcal/mole. From Fig. 1, k = 4, l = 39, and m = 278. We assume n to be 4.106 (the same as the genome size). Under these conditions, P reduces to

P=11+0.705+1.605+1.85+1.6.

The last term corresponds to the nonspecific interactions. P then reduces to 0.149. If we keep on adding terms for weaker quasi-specific sites (up to the energy of the nonspecific interaction energy), the probability reduces to a fairly small number (∼0.05). Thus, quasi-specific sites make large contributions, much bigger than the known nonspecific binding, at least in the case of λ-repressor.

Kinetic effects of quasi-specific sites

Here we illustrate the kinetic effects of the quasi-specific bead revealed by our model calculations. For simplicity, we fix the chain along the z axis with the target bead (nT = 50) at the center of the box. We choose the quasi-specific bead at nP = 40. Let us first consider g(|s|/d,|z|/d), the distribution of the beads with respect to the ball. This gives the probability that a bead will have a separation |s| from the ball in the x, y plane transverse to the chain and a separation |z| along the chain, such that the distance between the ball and bead is given by [(s/d)2 + (z/d)2 ]1/2. We observe that g(|s|/d, |z|/d) has a strong peak around |s|/d = 1.0 for |z|/d = 0.25, as shown in the inset of Fig. 2 a. This implies that the ball spends most of the time in the vicinity of the chain. We also consider the points of localization of the ball along the chain. We compute to this end gNT(|s|/d,|z|/d), gT(|s|/d,|z|/d), and gp(|s|/d,|z|/d), the probabilities of separations in the plane transverse and in the direction parallel to the chain, between the ball and the nonspecific, specific, and quasi-specific beads, respectively. We set |s|/d = 1.0, which corresponds to the peak in the inset of Fig. 2 a, and consider the probabilities for |z|/d separation, the corresponding probability distributions denoted by gNT(1)(|z|/d), gT(1)(|z|/d), and gp(1)(|z|/d), respectively. Fig. 2 a shows that gNt(1)(|z|/d) has a peak for small |z|/d and a long tail for larger |z|/d, which implies that the ball remains in the vicinity of the chain in general. Fig. 2, b and c, show gP(1)(|z|/d) and gT(1)(|z|/d), respectively. The strong peak in gT(1)(|z|/d) corresponds to localization of the ball in the vicinity of the specific bead. Similarly, the peak in gP(1)(|z|/d) corresponds to the location of the quasi-specific bead with respect to that of the specific bead. Thus, the quasi-specific bead acts as an additional site for localization of the ball.

Figure 2.

Figure 2

The probability distribution of separation of the beads with respect to the ball in the model BD calculations: (a) Inset: g(|s|/d) versus |s|/d for |z|/d = 0.25, the distribution of an arbitrary bead. Main panel: gNT(1)(|z|/d) versus |z|/d plot, which gives the distribution with respect to the nontarget beads. (b) gP(1)(|z|/d) versus |z|/d plot for the distribution of the pseudo-target bead. (c) gT(1)(|z|/d) versus |z|/d plot for the distribution of the target bead.

We proceed to bring out the dynamic consequences of confining the ball in the vicinity of the chain. We consider the distribution h(tb/t0) of time tb/t0 spent by the ball in the bound state. Let us consider the case without any quasi-specific bead for reference. The data are shown in the inset of Fig. 3 a. We estimate the mean residence time in the bound state, < tb/t0> ∼ 17. The solid line in Fig. 3 a shows the MSDs of the ball along the z axis, denoted by dz2. The best fit to the data is linear in time, indicating a diffusive motion along the chain, with the diffusion coefficient (Dz ∼ 0.5) being estimated from the slope of the linear fit. The ball can diffuse over approximately nD ∼ 25 sites in time <tb/t0> by one-dimensional diffusion with this Dz. Fig. 3 a shows a dramatic effect of the quasi-specific bead on dz2. For this, we consider the case of np = 10 such that the quasi-specific bead is located farther than nD beads away from the target bead. The data clearly show that the short time dynamics (up to a time ∼<tb/t0>) is diffusive with a slope very similar to that observed without the quasi-specific site. However, the long time limit of dz2 shows a very sluggish motion with a much smaller diffusion coefficient than that seen without the quasi-specific site as a consequence of the strong confinement in the vicinity of the quasi-specific site.

Figure 3.

Figure 3

(a) Main panel: MSD dz2 versus t/ts plots. Upper line: Without a pseudo-operator bead; lower line: with a pseudo-operator bead. The dotted line is the best linear fit. Inset: A typical h(tb/t0) versus tb/t0 plot showing the distribution of times the ball spends in the bound state. (b) Inset: Distribution of the first passage time f(tfp/ts) with tfp/ts. Main panel: The mean first passage time tmfp/ts as a function of |ninnp|2. The mean was calculated only when the number of successful encounters exceeded 50% of the Brownian trajectories. Circles: data obtained without a pseudo-operator bead; squares: data obtained with pseudo-operator bead at np= 10; triangles: data obtained with pseudo-operator bead at nP = 40. The lines are the best-fitted ones for guiding the eye.

Next, we consider the approach of the ball to the target bead from an initial nonspecific binding site nin. Let us consider the case without the quasi-specific bead. The number of successful encounters with the target bead, sE>0.5 so far as |nT − nin|<nD, whereas sE falls off quite rapidly for larger values of |nTnin|. We get two important cases in the presence of the quasi-specific bead. For nP = 10, |nPnT|>nD, implying that the localization of the ball at the quasi-specific bead takes place at a distance larger than nD from the target site. However, the short-term diffusive motion in the bound state remains unaffected, so that sE > 0.5 so far as |nTnin|<nD. On the other hand, the localization of the ball takes place within a distance of nD from the target site for nP = 40, where we get sE > 0.5 only for |nTnin|<|nTnP|. These observations indicate that one-dimensional diffusion along the chain would be the dominant process for locating the target bead.

We estimate the mean first passage time tmfp/ts only when sE > 0.5. The inset in Fig. 3 b shows a typical histogram f(tfp/ts) for nin = 46 without any quasi-specific bead where sE ≈ 0.95. We estimate the mean first passage time, tmfp/ts, as the mean of the first passage time distribution. In Fig. 3 b we show tmfp/ts as a function of |ninnP|2. The circles correspond to the case without any quasi-specific bead, the boxes represent the case of a distant quasi-specific bead (np=10), and the triangles indicate a nearby quasi-specific bead (nP = 40). Of interest, the features of the data do not differ appreciably in these cases, as is apparent from the best-fitted lines drawn in the figure to guide the eye. Thus, the bound-state diffusion length nDd is the relevant length scale for the target location. A quasi-specific bead situated at a separation much larger than nDd from the target site hardly affects the rate of search. However, a quasi-specific bead within the diffusion length from the target site severely hinders the target search.

GalR, an integral protein, has a much smoother landscape

Because the estimated effect of quasi-specific sites on λ-repressor binding to specific operator sites is high, we wondered what the situation might be for integral DNA-binding proteins of E. coli. To our knowledge, there is no literature report of single basepair substitution experiments for other E. coli DNA-binding proteins. We measured the single basepair substitution effect of binding of GalR to its external operator, OE (P. Raha, R. Saha, and S. Roy, unpublished observation), and used the resultant data to estimate the number of quasi-specific sites in the E. coli genome for GalR (Fig. 4). In contrast to the λ-repressor scenario, there are very small numbers of quasi-specific sites on the E. coli genome for GalR (only three sites within the ΔΔG value of 3 Kcal/mole of OE). This is because in OE almost all substitutions lead to some loss of binding energy, as opposed to the OR1/λ-repressor, where substitutions at several positions show very little effect on the binding energy.

Figure 4.

Figure 4

Polar plot of high-affinity quasi-sites in the E. coli genome, derived from data for GalR based on experimentally determined values (unpublished observation). The radial coordinates refer to ΔΔG in Kcal/mole from the binding energy of OE. The genome is represented by a 360° circle, with nucleotide 0 starting at 0° and proceeding counterclockwise. The small filled circles represent the quasi-specific sites.

Discussion and Conclusions

Transcription factors operate in the cellular space in the presence of a very large amount of genomic DNA and have to form a stable complex at the target sequence within a reasonable period of time. It is generally believed that affinity for nonspecific binding vis-à-vis affinity for the cognate sequence plays a crucial role in the stability of the cognate complex. The overwhelming concentration difference between the cognate site and noncognate sites indicates that the cognate complex will not form unless the energy difference between specific and nonspecific complexes is significantly large. It is difficult to obtain quantitative estimates because detailed affinity information regarding nontarget genomic DNA is not generally available. In the case of λ-repressor, however, both accurate single basepair substitution measurements and detailed and quantitative information on nonspecific binding are available. As a result, we were able to accurately estimate the binding potential of genomic DNA. Of interest, the quasi-specific sites and not truly nonspecific sites are the major competitors against the target sites. We do not know exactly how λ-repressor evolved a strategy to counteract this opposing force. The formation of protein-protein complexes in the form of octameric and dodecameric loops in the lysogenic state adds a substantial amount of stability to the specific complex (19). This mechanism is not available for the quasi-specific complexes because the quasi-specific sites are unlikely to be in proper proximity to and alignment within the genome. The quasi-specific sites create a severe kinetic bottleneck. A possible mechanism to overcome this bottleneck would be to increase the copy number of the protein, which would increase the probability of occupation of the target site. We note that there are ∼125 λ-repressor dimers in the lysogen, far in excess of occupancy requirements (20).

Because λ-repressor is a guest protein in E. coli, its counteracting strategy may be very different from that of the integral transcription factors of E. coli. We performed a single basepair substitution experiment for a GalR target sequence, OE. In contrast to λ-repressor, most basepairs contribute to the binding energy. As expected, this leads to an orders-of-magnitude lowering of the number of quasi-specific sites. Thus, we see that the evolution of such a mode of specific DNA-protein interaction is a countering strategy for competition by the quasi-specific sites. Our model study indicates that one-dimensional diffusion over the DNA chain is the dominant transport mechanism, and the smoothness of the genomic energy landscape of GalR suggests that the sliding process may be greatly facilitated in this case (21).

One might argue that because it is a guest protein in E. coli, λ-repressor has not evolved an optimized search landscape. However, a better explanation may lie in the nature of the regulatory complex formed by these two repressors. The GalR/HU complex is basically a reversible complex that regulates reversible switching between the on and off states of the gal operon depending on the presence of galactose. The λ-repressor, on the other hand, forms a very stable multimember loop that switches state very infrequently, and the switch is initiated by irreversible cleavage of the repressor. Therefore, it may face fewer kinetic issues, and thus may not need an optimized genomic energy landscape to function. In eukaryotes, particularly higher ones, the burden of genomic DNA is much higher. However, most of the transcription factors in eukaryotes operate as part of a protein-protein complex, and this may be a strategy to counteract the pressure of nonspecific and quasi-specific sites. We conclude that quasi-specific sites offer major impediments against target searching for prokaryotic transcription factors, and that organisms have evolved different strategies to counteract this impediment.

Acknowledgments

This study was supported by the Council of Scientific and Industrial Research (India) and a J. C. Bose Fellowship to S.R.

References

  • 1.Berg O.G., Winter R.B., von Hippel P.H. How do genome-regulatory proteins locate their DNA target sites? Trends Biochem. Sci. 1982;7:52–55. [Google Scholar]
  • 2.Sarai A., Takeda Y. λ Repressor recognizes the approximately 2-fold symmetric half-operator sequences asymmetrically. Proc. Natl. Acad. Sci. USA. 1989;86:6513–6517. doi: 10.1073/pnas.86.17.6513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Takeda Y., Sarai A., Rivera V.M. Analysis of the sequence-specific interactions between Cro repressor and operator DNA by systematic base substitution experiments. Proc. Natl. Acad. Sci. USA. 1989;86:439–443. doi: 10.1073/pnas.86.2.439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Badis G., Berger M.F., Bulyk M.L. Diversity and complexity in DNA recognition by transcription factors. Science. 2009;324:1720–1723. doi: 10.1126/science.1162327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bulyk M.L., Johnson P.L.F., Church G.M. Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res. 2002;30:1255–1261. doi: 10.1093/nar/30.5.1255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Linnell J., Mott R., Udalova I.A. Quantitative high-throughput analysis of transcription factor binding specificities. Nucleic Acids Res. 2004;32:e44. doi: 10.1093/nar/gnh042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Berg O.G., von Hippel P.H. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 1987;193:723–750. doi: 10.1016/0022-2836(87)90354-8. [DOI] [PubMed] [Google Scholar]
  • 8.Halford S.E., Marko J.F. How do site-specific DNA-binding proteins find their targets? Nucleic Acids Res. 2004;32:3040–3052. doi: 10.1093/nar/gkh624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Stanford N.P., Szczelkun M.D., Halford S.E. One- and three-dimensional pathways for proteins to reach specific DNA sites. EMBO J. 2000;19:6546–6557. doi: 10.1093/emboj/19.23.6546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lewis M., Chang G., Lu P. Crystal structure of the lactose operon repressor and its complexes with DNA and inducer. Science. 1996;271:1247–1254. doi: 10.1126/science.271.5253.1247. [DOI] [PubMed] [Google Scholar]
  • 11.Stayrook S., Jaru-Ampornpan P., Lewis M. Crystal structure of the λ repressor and a model for pairwise cooperative operator binding. Nature. 2008;452:1022–1025. doi: 10.1038/nature06831. [DOI] [PubMed] [Google Scholar]
  • 12.Rohs R., Jin X., Mann R.S. Origins of specificity in protein-DNA recognition. Annu. Rev. Biochem. 2010;79:233–269. doi: 10.1146/annurev-biochem-060408-091030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kalodimos C.G., Biris N., Kaptein R. Structure and flexibility adaptation in nonspecific and specific protein-DNA complexes. Science. 2004;305:386–389. doi: 10.1126/science.1097064. [DOI] [PubMed] [Google Scholar]
  • 14.Allen M., Tildesley D.J. Oxford University Press; New York: 1991. Computer Simulations of Liquids. [Google Scholar]
  • 15.Chakrabarti J., Roy S. Simulation of the kinetics of a sphere attached to a fluctuating polymer: Implications for target search by DNA-binding proteins. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2004;69:021904. doi: 10.1103/PhysRevE.69.021904. [DOI] [PubMed] [Google Scholar]
  • 16.Chaikin P.M., Lubensky T.C. Cambridge University Press; Cambridge, UK: 1998. Principles of Condensed Matter Physics. [Google Scholar]
  • 17.Shepherd N., Dennis P., Bremer H. Cytoplasmic RNA polymerase in Escherichia coli. J. Bacteriol. 2001;183:2527–2534. doi: 10.1128/JB.183.8.2527-2534.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kao-Huang Y., Revzin A., von Hippel P.H. Nonspecific DNA binding of genome-regulating proteins as a biological control mechanism: measurement of DNA-bound Escherichia coli lac repressor in vivo. Proc. Natl. Acad. Sci. USA. 1977;74:4228–4232. doi: 10.1073/pnas.74.10.4228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Dodd I.B., Perkins A.J., Egan J.B. Octamerization of CI repressor is needed for effective repression of P RM and efficient switching from lysogeny. Gene Dev. 2001;15:3013–3022. doi: 10.1101/gad.937301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bakk A., Metzler R. In vivo non-specific binding of λ CI and Cro repressors is significant. FEBS Lett. 2004;563:66–68. doi: 10.1016/S0014-5793(04)00249-2. [DOI] [PubMed] [Google Scholar]
  • 21.Slutsky M., Mirny L.A. Kinetics of protein-DNA interaction: facilitated target location in sequence-dependent potential. Biophys. J. 2004;87:4021–4035. doi: 10.1529/biophysj.104.050765. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Biophysical Journal are provided here courtesy of The Biophysical Society

RESOURCES