Many cellular functions depend on highly specific intermolecular interactions, with mutational changes in each component of the interaction imposing coevolutionary pressure on the remaining members (e.g., a transcription factor and its DNA binding sites). The conflict between mutation pressure toward reduced affinity and selective pressure for greater interaction results in an evolutionary equilibrium distribution for the affinity between interacting partners. Nevertheless, conditional on the maintenance of a critical level of molecular recognition, the sites containing the key residues of binding interfaces are free to evolve. The theory developed suggests that most such evolution is a simple consequence of random genetic drift and not an outcome of adaptive fine tuning.
Keywords: cellular evolution, molecular interaction, transcription, random genetic drift, coevolution
Many cellular functions depend on highly specific intermolecular interactions, for example transcription factors and their DNA binding sites, microRNAs and their RNA binding sites, the interfaces between heterodimeric protein molecules, the stems in RNA molecules, and kinases and their response regulators in signal-transduction systems. Despite the need for complementarity between interacting partners, such pairwise systems seem to be capable of high levels of evolutionary divergence, even when subject to strong selection. Such behavior is a consequence of the diminishing advantages of increasing binding affinity between partners, the multiplicity of evolutionary pathways between selectively equivalent alternatives, and the stochastic nature of evolutionary processes. Because mutation pressure toward reduced affinity conflicts with selective pressure for greater interaction, situations can arise in which the expected distribution of the degree of matching between interacting partners is bimodal, even in the face of constant selection. Although biomolecules with larger numbers of interacting partners are subject to increased levels of evolutionary conservation, their more numerous partners need not converge on a single sequence motif or be increasingly constrained in more complex systems. These results suggest that most phylogenetic differences in the sequences of binding interfaces are not the result of adaptive fine tuning but a simple consequence of random genetic drift.
Much of biology relies on the specificity of intermolecular interactions—the regulation of gene expression, the transmission of information via signal transduction, the assembly of monomeric subunits into multimers, vesicle sorting in eukaryotic cells, toxin–antitoxin systems in microbes, mating-type recognition, and many other cellular features. Although the basic structural features of such fitness-related traits would seem to be under strong purifying selection, molecular specificity often seems to be highly flexible in evolutionary time. For example, transcription-factor binding-site motifs often vary dramatically among orthologous genes in different species and even among similarly regulated genes within the same species (1–3). The binding interfaces of multimeric proteins can vary substantially among species, sometimes with no overlap at all (4, 5). The key amino acid sequences involved in intermolecular cross-talk in signal-transduction systems can evolve at high rates (6, 7), and growing evidence suggests that the locations of sites involved in posttranslational modification in individual proteins are under much weaker selective constraints than their absolute numbers (8). Although it is often argued that subtle differences in the motifs involved in intermolecular interactions are molded by the demands of natural selection, seldom has any direct evidence ever been provided in support of such arguments.
The theory provided below makes the case that substantial divergence of the motifs involved in molecular cross-talk is expected even in the face of strong selection for high specificity. Such behavior is a natural consequence of several factors: the degrees of freedom typically associated with the biophysical aspects of molecular binding, the opposing pressures of mutation and selection, the evolutionary noise produced by coevolutionary interactions, and the limits to the efficiency of natural selection. In the following pages, we show how the joint application of theory from biophysics and population genetics helps move our understanding of evolutionary aspects of cell biology beyond the too-common view that all aspects of biodiversity are molded by the unbridled power of natural selection.
Although intermolecular interactions underlie a wide variety of issues in cell biology, the following theory will be phrased in the context of the evolution of a transcription-factor binding site (TFBS) and its cognate transcription factor (TF). The general formulations can be equally applied to the complementary interfaces involving an untranslated region of an mRNA and a microRNA, the monomeric subunits of a dimeric molecule, a cargo protein and its molecular motor, an intron–exon junction and the spliceosome, a sensor kinase and a response regulator protein, and so on.
Gene expression, which in turn influences individual fitness, will be assumed to be a function of binding-site affinity. The DNA-binding domain of any TF defines a specific TFBS motif that maximizes the binding strength. However, owing to the recurrent introduction of mutations, variation will inevitably arise among the TFBS sequences of different genes serviced by the same TF within a species as well as among orthologous genes in different species. Selection will prevent extreme TFBS degeneration, but there are diminishing returns on increasing the binding-site strength beyond the point at which the associated gene is in a near-optimal state of transcriptional activation. Thus, levels of TF–TFBS matching can be expected to wander along the boundaries dictated by the prevailing features of mutation, selection, and random genetic drift. Because there are typically numerous ways in which selectively equivalent binding affinities can be achieved, this means that substantial variation in binding-site signatures may arise even in the face of strong selection.
One Evolvable Component.
As a simple entrée into the problem, consider a single TFBS interacting with a TF whose binding domain is unable to evolve, at least on a time scale comparable to that which allows TFBS flexibility. It will be shown below that this situation is closely approximated for TFs servicing multiple target genes. Such a scenario also applies to a protein binding domain for a nonevolvable substrate such as an inorganic ion or an intermediate metabolite. Several features of this model can be evaluated analytically, the initial point being the probability distribution of the possible binding motifs under drift–mutation–selection equilibrium.
We start with two simple assumptions: (i) that all TFBSs with the same number of matches (m) to the TF are equivalent with respect to binding affinity, regardless of the position of the mismatches, and (ii) that each of the four nucleotides (A, C, G, and T) mutates to each of the other three states at the same rate μ. With a TF recognition motif ℓ nucleotides in length, there are then TFBS matching classes to consider, each consisting of multiple subclasses with equal expected probabilities under selection–mutation equilibrium. For example, matching class has a multiplicity of 3ℓ TFBS types, because the single mismatch can reside in sites 1 to ℓ and involve any of the three possible mismatching nucleotides.
The general approach used here assumes a population that usually resides in a nearly monomorphic TFBS state, with the average interval between stochastic state transitions being long enough that single mutational changes are fixed in a sequential manner. Letting denote the probability that a TFBS resides in matching class m at time t, where denotes the number of mismatches, the time-dependent behavior of the system is described by
[1] |
with the first two terms being dropped when and the last two being dropped when Here, we assume a haploid population of N individuals (for a diploid population, 2N should be substituted for N throughout). This dynamical equation consists of three terms, the first denoting the influx of probability from the next higher (more beneficial) class, with each of the matching sites mutating to nonmatching states at rate in each gene copy (the 3 accounting for mutation to three alternative nucleotide types), and becoming fixed in the population with probability The terms involving account for the efflux from class m to the next higher and lower classes ( and ), again accounting for the number of possible mutations that cause such movement and their probabilities of fixation. The final term describes the influx from the next lower class, which has mismatches, each back-mutating to a matching state at rate μ. The fixation probabilities are provided by Kimura’s (9) diffusion equation for newly arisen mutations,
[2] |
where is the effective population size, is the initial frequency of a mutation (for a haploid population), and is the fractional selective advantage of allelic class y over x.
Because there are nonzero transition probabilities between all adjacent classes, Eq. 1 converges on a global equilibrium probability distribution after a sufficiently long period, regardless of the starting conditions. At this point, the total fluxes into and out of each class are equal, a condition known as detailed balance. One simple approach to obtaining the equilibrium for a linear array of TFBS states is to view the system as a flow diagram with connecting arrows denoting the flux rates between adjacent classes (Fig. 1). The relative probability of each state can then be obtained by multiplying all of the coefficients on the arrows pointing up to the class with the product of all of the coefficients pointing down (5). After a number of simplifying steps, the solution reduces to
[3] |
where C is a normalization constant [equal to the reciprocal of the sum of the terms to the right of C for all m, often referred to as the partition function (10,11)]. The exponential term on the right is the ratio of fixation probabilities in the upward to downward directions, which is derived from Eq. 2 in SI Text. Given a constant set of population-genetic parameters, can be interpreted in two ways: (i) For a single TFBS, it represents the long-term proportion of evolutionary time spent in the various matching states and (ii) for a set of different TFBSs under the same selective constraints, it represents the expected distribution of states at any point in time.
Fig. 1.
Flow diagram for the alternative states of a binding site of length , with the circle diagrams below simply illustrating one specific type within each category of numbers of mismatches (open circles denote a mismatch). The final two classes, with one and zero matches, are not shown. The transition rates are given on the arrows for the case of neutrality, where the probability of fixation is equal to the mutation rate per site, in the case of single-site losses (arrows to the right) because each appropriate nucleotide can mutate to three others, and μ in the case of site improvement (arrows to the left) because each mismatch can only mutate to the appropriate state in one way. With selection in operation, each coefficient needs to be multiplied by the number of individuals (N) and the associated probability of fixation.
There are several notable features of this solution. First, because each transition rate in the linear chain is a factor of μ, the equilibrium probabilities are completely independent of the mutation rate. This would be true even with a different mutation spectrum, although the factor of 3 (here the ratio of deleterious to advantageous mutations) would change. Second, the expression within the square brackets defines the expected distribution of matching states in the absence of selection, that is, the neutral expectation This term is equivalent to the number of unique ways in which a sequence of length ℓ can harbor m sites matching the optimal binding motif, accounting for both the distinct spatial configurations of matches and the fact that there are three alternative inappropriate nucleotides for each mismatch. Eq. 3 shows that the effect of selection on the equilibrium probability distribution is equivalent to a simple transformation of the neutral expectation, with each state probability being weighted by an exponential function of the product of its relative selective advantage and the effective population size . Because is a measure of the power of random genetic drift, the weighting terms are equivalent to the ratio of the power of selection to that of drift.
What remains is the definition of the fitness function. The universality of the basic mode of transcription—the interaction of a specific protein (the TF) with a specific DNA binding site (the TFBS)—provides a mechanistic basis for addressing this issue in biophysical terms (12). The most common approach is to consider individual fitness to be a linear function of the fraction of time that a TFBS with m matching sites is expected to be bound by its cognate TF (a minimum requirement for expression of the associated gene),
[4] |
with α being a scaling factor relating binding probability to fitness (10, 11, 13–15). As , implying neutrality. Under this model, reflects the thermodynamic equilibrium of binding and unbinding of the TF.
The probability of TFBS occupancy by a cognate TF is a function of the binding site itself and of the features of the intracellular environment that restrict the accessibility of the site to cognate TFs. Clearly, will increase with the number of TF molecules within the cell , but equally important is the number of ways in which individual TF molecules can become side-tracked by binding to alternative genomic sites. Other genes serviced by the TF (numbering , where ot denotes off-target) will compete for the pool of TFs, but nonspecific binding of TFs across the genome is almost always more important. Letting G denote the total genome size (in base pairs), there are essentially nonspecific sites in a haploid cell because .
Using results from statistical-mechanics theory, an expression for can be obtained in terms of the binding energy of a motif (β) and background interference (B) within the cell,
[5] |
(SI Text). The exponential term is a measure of the total strength of binding (in Boltzmann units of 0.6 kcal/mol), under the assumption that binding strength scales linearly with the number of matching sites (m) between a TFBS and the optimal binding motif of its TF. Multiple empirical studies involving single-base changes in TFBSs suggest an average energetic cost of a mismatch of (Table 1). Numerous studies also support the additivity assumption as a first-order approximation (16–20), although higher-order effects involving TFBS shape can also contribute to the overall binding energy (21). Eq. 5 treats the binding of alternative TFBSs as being statistically independent and as a model of transcription is more prokaryotic than eukaryotic in nature, in that eukaryotic gene transcription often involves the coordinated use of multiple TFs. However, as noted above, the general issues being evaluated here go well beyond TF/TFBS interactions, many of which are simple enough to be treated in the manner of Eq. 5.
Table 1.
Features of the motifs of well-studied TFs
TF | Species | Motif, bp | Cost of mismatch mean (range) | Refs. |
CI | Lambda phage | 17 | 1.4 (0.5–3.5) | 17 |
Cro | Lambda phage | 9 | 1.4 (0.5–2.5) | 18 |
Mnt | Salmonella phage P22 | 21 | 1.0 (0.3–1.6) | 19, 57 |
CRP | E. coli | 22 | 1.7 (0.9–2.5) | 58, 59 |
CRP | Synechocystis sp. | 22 | 1.8 (0.7–3.0) | 60 |
ArcA | Shewanella oneidensis | 15 | 1.3 (0.1–3.4) | 61, 62 |
Gcn4 | S. cerevisiae | 11 | 1.0 (0.5–1.7) | 63 |
c-Myb | Homo sapiens | 6 | 1.6 (0.6–2.8) | 64 |
Motif size is based on consensus sequences. The estimated costs of mismatches are obtained from binding-strength experiments in which single-base changes were made in motifs. Costs of single-base mismatches are in units of kilocalories per mole; these average to 1.4 across the full set of studies, or in terms of Boltzmann units ( kcal/mol) to 2.3.
In the present context, the interference term is a measure of the concentration of background (nonspecific) binding sites relative to the number of TF molecules available for the specific target site. The approximate magnitude of B can be inferred by noting that G is generally in the range of 106 to 1010 bp, with prokaryotes falling at the lower end and multicellular eukaryotes at the higher end of the range (22). For the model bacterium Escherichia coli, there are generally to 1,000 molecules per cell for particular TFs, with just a few cases ranging as high as 50,000 (23). Somewhat lower numbers have been estimated for another bacterium, Leptospira interrogans (24). In prokaryotes, it is unusual for the number of genes serviced by a particular TF to exceed so the range for B for such species is on the order of 103 to 106.
For the yeast Saccharomyces cerevisiae the average number of molecules for individual TFs is on the order of 8,000 per cell (25), so with a genome size of 12 Mb B should be in the range of 103 to 104. Proteins within mammalian cells are about 10 times as numerous as those in yeast (26), but with a genome size of ∼3,000 Mb, B can be expected to be All of these estimates of background interference assume that the primary deterrent to TF accessibility is nonspecific binding to DNA. If other sources of interference exist (such as promiscuous binding to other proteins), B would be accordingly higher. However, DNA binding proteins, such as histones in eukaryotes, could reduce B by restricting access of a TF to only a fraction of the genome.
The solution of Eqs. 3–5 illustrates several general principles (Fig. 2). First, depends only on the product of the effective population size, , and the strength of selection, which scales with α. This implies that a doubling in has the same influence as a doubling in the strength of selection. Because Eq. 3 was derived under the sequential stepwise-fixation model, to evaluate the domain of validity of this treatment, extensive computer simulations were performed with a Wright–Fisher model (with episodes of mutation, selection, and random genetic drift to generate the frequency of all possible genotypes each generation). The distributions predicted by Eq. 3 provide excellent fits to the more detailed analyses so long as the number of new mutations arising per site per generation (SI Text). Empirical data suggest that this condition is met in most natural populations (22), apparently because mutation rates evolve to increasingly lower levels with increasing (27, 28). Methods for generalizing to arbitrarily large show that the neutral expectation, continues to play a central role in determining the overall distribution of m (29).
Fig. 2.
Equilibrium evolutionary distributions of binding-site matches with transcription-factor motifs of lengths and 16. Results are given for various levels of the strength of selection relative to the power of genetic drift and two levels of background interference (black lines denote weak interference and red lines strong interference ). Blue circles denote the expectation under neutral evolution.
Second, regardless of the set of parameter values, substantial variation in m is almost always expected, even in the face of constant directional selection. Unless the motif size is on the low end of the range typically seen and levels of background interference and selection pressures are very high, the vast majority of binding sites are expected to contain mismatches with respect to the optimum motif. With larger population sizes the distribution is pushed toward larger m, but with an optimum motif size of 16 bp essentially no TFBS is expected to be perfect. This behavior can be understood by considering the selection coefficients, , associated with adjacent matching classes. Because is sigmoid, the fitness function approaches an asymptotic slope of zero at high values of m, owing to the near-certain level of binding. The point at which the selective difference between adjacent classes becomes smaller than the power of drift, , represents the barrier beyond which selection is incapable of influencing the rate of allelic substitution.
Third, with relatively weak selection pressure , is very heavily skewed toward small numbers of matches (converging on the neutral expectation). This intrinsic weighting toward low numbers of matches is a result of the biased mutation pressure toward mismatches and the increasing multiplicity of configurations leading to the same m with increasing numbers of mismatches.
Fourth, because the neutral distribution is strongly weighted toward low m, there can be a strong “phase transition” in the form of the probability distribution as crosses the threshold of . As can be seen in Fig. 2, Upper, at intermediate strengths of selection cases even exist in which is bimodal, with a peak to the left driven by mutation pressure and a peak to the right driven by selection pressure.
The preceding results can be used to evaluate the validity of the popular notion that TFBSs can be detected by searching for relatively conserved intergenic patches of orthologous nucleotide sequence in genome comparisons. For genes in an early stage of divergence, that is, with on the order of single-nucleotide substitutions per motif, as would be the case for closely related species, the rate of evolution relative to the neutral expectation is defined by
[6] |
(derived in SI Text). Effective neutrality results in , whereas strong purifying selection results in . Analysis of Eq. 6 demonstrates that despite their centrality to gene expression TFBS sequences are expected to be frequently under only weak purifying selection on short time scales, unless the motif size is very small and is very large (Fig. 3). The width of the selective sieve (ω) declines with increasing , but for a wide range of parameter space the majority of nucleotide substitutions are free to proceed to fixation . Larger motifs are subject to weaker selective effects because of the greater degrees of freedom involving alternative binding-site configurations.
Fig. 3.
(Upper) Width of the selective sieve (proportion of new mutations capable of fixation) in the early stages of divergence as a function of the effective population size (assumed to be 1/10 the actual population size) for four forms of the fitness function. Black lines denote 16-bp motifs; red lines denote 8-bp motifs. (Lower) Asymptotic sequence identity for motifs at independently evolving loci.
The asymptotic level of divergence of TFBS motifs is also of interest, because this ultimately determines the extent to which consensus motifs can be determined by “phylogenetic footprinting” from patches of intergenic sequence conservation among distantly related species (where all surrounding nonfunctional DNA has been effectively randomized). Following ample time for mutational saturation, the average fraction of sites conserved among all random pairs of motifs is
[7a] |
[7b] |
is the expected number of identical sites for a pair of random motifs with x and y sites matching the optimum TF motif. Solution of Eq. 7a reveals a wide range of parameter space for which the asymptotic level of sequence similarity is <50%, even in the face of fairly efficient selection, especially when the recognition motif of the TF exceeds 10 bp or so in length (Fig. 3). As expected for random sequences with four equally frequent nucleotides, the average sequence similarity converges on 0.25 in populations with small effective sizes.
Notably, Eq. 7a applies to the average similarity of random pairs of sequences, which does not fully reveal the overall level of sequence conservation. If, for example, the average level of pairwise sequence similarity is 0.5, there is still a considerable amount of variance in the number of shared sites among pairs of TFBSs, and different pairs with the same fractional level of sequence conservation will not necessarily be identical at the same sites.
A Coevolving Two-Component System.
The preceding model can be readily extended to the situation in which both the TFBS and the optimal binding motif of the TF are capable of evolving. Although such a scenario opens up the possibility that a suboptimal TFBS can be restored to more favorable binding through a compensatory mutation in the TF, it also provides additional and typically more numerous routes to suboptimal binding through the accumulation of deleterious TF modifications. The net effect of such coevolution will be the random wandering of the joint system over the entire domain of possible sequence space, constrained only by the joint maintenance of a level of cooperativity defined by the drift barrier. This type of scenario should apply to a number of other specialized pairwise intracellular interactions, such as bacterial two-component signaling systems and toxin–antitoxin systems.
To accommodate the evolution of trans effects in the TF, we retain a focus on the number of matches between a TFBS and its cognate TF, while making two modifications to the flow diagram in Fig. 1. First, instead of the reversion rate from a system with m to matches being defined as , it is rescaled to , where ν is the rate of mutation to improved binding within the TF protein per mismatch. Second, the rate of loss of preexisting binding sites is rescaled from to The term δ can be viewed as a scalar for the rate of loss of TF binding potential relative to the rate of gain. These modifications do not alter the feature of detailed balance in the overall system but can lead to modified equilibrium distributions.
If , all upward and downward mutation rates are proportional to , and the equilibrium probability distribution of binding affinity is independent of the total mutation pressure and identical to that given in the preceding section. If, however, , the term in Eq. 3 needs to be modified to where In effect, Δ is a relative measure of the system-wide mutational pressure toward reduced vs. improved binding. Because there are likely many more ways to reduce than improve the ability of a TF to bind a specific sequence, Δ is likely to be larger than 3.0, in which case the mean expected binding strength will be depressed relative to the expectations outlined in the previous section, where the preferred TF motif is frozen in evolutionary time. This means that instead of promoting perfection, coevolution instead leads to a less efficient interaction, as each member of the pair creates a never-ending necessity of evolutionary fine-tuning by the opposite member. In addition, with increasingly higher Δ, this mutational weighting to lower m becomes increasingly strong, so again there will be a critical range of selection efficiencies, within which the equilibrium distribution of will be bimodal.
The Consequence of Multiple Downstream Components.
A common view, especially among developmental biologists, is that key evolutionary changes associated with gene regulation are much more likely to involve modifications at the level of cis-regulatory (TFBS) sites than at the trans (TF) level. The implicit assumption underlying this idea is that the evolution of transcription factors becomes increasingly constrained with increasing numbers of target genes. To explore the validity of this idea, we now move beyond the one-to-one situation outlined in the preceding section to allow for the coevolution of the optimal recognition motif of a TF with the TFBSs of multiple genes.
To keep things reasonably tractable in this initial exploration, we treat the TF recognition motif in a manner parallel to that of the TFBSs, that is, as being a sequence of fixed length ℓ with four possible states at each position, with , , and so that the relevant sites in the TF motif are subject to the same mutation pressure as those in the TFBSs. Under this computational setting, the number of matches (m) between the TF and any TFBS is defined by the states of the corresponding positions. This particular approach would be especially appropriate for the analysis of the coevolution of microRNAs and their binding sites.
By adhering to the sequential model, the evolution of the entire system can be followed in a stepwise manner over time, with the transition probabilities to alternative states being defined by the products of the mutation rates and fixation probabilities to alternative adjacent states. The approach is identical to that used above except for the more intricate details—the precise sequence of the TF and each TFBS must be monitored. We assumed that total fitness is determined by the product of the locus-specific fitnesses of the TFBSs, each defined in the manner outlined above using Eq. 4.
Three general results emerge from this analysis (Fig. 4). First, except under neutrality , the rate of evolution of the TF recognition motif declines with increasing numbers of genes serviced, increasingly so with higher . Second, the rate of evolution of the individual TFBSs is independent of the total number of sites and corresponds to the results given by Eq. 6. Third, the average degree of conservation among coexisting TFBSs within a genome is also independent of the total number of sites and corresponds to the single-site result given by Eq. 7a. Overall, these results support the idea that a TF becomes increasingly constrained with increasing numbers of client genes, whereas the TFBSs evolve independently of each other, driven mainly by the current state of the TF.
Fig. 4.
The evolutionary features of the motifs of a TF and its TFBSs as a function of increasing numbers of the latter. Black lines denote weak interference and red lines strong interference . An optimal motif size of is assumed. Data points denote results from computer simulations; horizontal lines denote expectations based on the analytical solutions in the text. Divergence rates are relative to the neutral expectation.
Stabilizing Selection on the Rate of Expression.
In the preceding analyses, it was assumed that selection favors maximum binding strength. Such a fitness function is justified by analyses of TFBSs in E. coli and S. cerevisiae that consistently infer a monotonic increase of fitness with increasing binding strength (15, 30, 31). Nonetheless, situations likely exist in which an intermediate phenotype is favored. For example, because the construction of gene products involves an energetic expenditure, selection may favor a less than maximum level of gene expression (32). Here we consider a Gaussian fitness function, with an optimal level of equal to θ,
[8] |
The gradient of this fitness function around the optimum is defined by σ, with higher σ causing a flatter (and hence more neutral) fitness scenario. Denoting the term within parentheses as d, for , so for example a 0.01 deviation of from θ in units of σ leads to a 0.0001 reduction in fitness relative to the optimal value of 1.0. Recalling Eq. 3, this implies that the equilibrium distribution of m under this model is primarily determined by the composite parameter , which is a measure of the ratio of the power of stabilizing selection to random genetic drift.
Because of the nonlinear relationship between and m and the discrete nature of m, this model exhibits several unique features (Fig. 5). First, the optimum level of expression, , is typically unachievable, with the two least disadvantageous classes of m usually having values of bridging the optimum. As a consequence of this feature, the highest fitness class can sometimes remain stable despite substantial change in the optimum level of . At some critical value of θ the optimal degree of matching will suddenly switch to the next class of m but in doing so will also switch from a point at which to , or vice versa. This kind of behavior of alleles with discrete states has been pointed out before for quantitative traits (33). Second, Gaussian selection on gene expression does not translate into Gaussian selection on m. Owing to the sigmoid relationship between m and , the fitness function on the scale of m is generally asymmetrical around the optimum value, and sometimes highly so. Finally, again because of the sigmoid form of , becomes flat at extreme values of m, imposing a drift barrier to selection for very low and very high .
Fig. 5.
(Upper) Selective disadvantage associated with the degree of matching to a motif of length with a Gaussian fitness function with an optimal level of denoted by θ and width of the fitness function The binding probability is defined by Eq. 5 with An increase in σ by a factor of 10 reduces the selective disadvantage by a factor of (Lower) Equilibrium evolutionary distributions of m as a function of the scaled strength of selection relative to drift , given for two optimum levels of expression ( and 0.75).
All of the methods used above can be used to explore the consequences of stabilizing selection (or any other kind of fitness function). For example, the equilibrium distribution of matching sites can be obtained by applying Eq. 8 directly to Eq. 3. Under this model, the scaled selection parameter defines an approximate cutoff below which the evolutionary distribution is close to the neutral expectation regardless of the optimum level of expression (Fig. 5). For , stabilizing selection is so efficient that almost no variation is expected for m, and again there can be an intermediate selection intensity that yields a bimodal distribution.
Although the preceding analyses have been couched in terms of transcription factors and their binding sites, they should apply to a diversity of other coevolutionary issues involving pairwise molecular interactions. These include microRNAs and their binding sites, bacterial two-component signaling systems and toxin–antitoxin systems, the interfaces within heterodimeric molecules, proteins involved in vesicle sorting, and so on. Thus, the preceding results have potentially general implications for evolutionary cell biology.
First, owing to both the diminishing-returns aspect of intermolecular binding with increasing numbers of participating residues and the multiplicity of effectively equivalent alternative motifs, substantial variation is typically expected for the sequences underlying pairwise interactions, even in the face of strong selection for conserved function. Exceedingly high intensities of selection are generally required for the maintenance of sequence identity across the full length of a motif, and the degrees of freedom associated with the locations and nucleotide identities of mismatching sites provide numerous paths along which motifs can wander neutrally in sequence space. Thus, although it has been argued that selection favors short TFBSs as a means for minimizing mutational breakdown (14), less than maximum TF/TFBS matching lengths arise naturally as a consequence of mutation–selection balance, without any direct selection for mutational robustness.
Second, with intermediate levels of selection, bimodal distributions of binding strengths can emerge among motifs exposed to identical selection pressures, raising questions about the interpretation of species differences in TFBS affinities as indicators of lineage-specific differences in selection pressures. Such behavior is a simple consequence of the conflict between mutational bias toward low affinity and selection bias toward high affinity. Because the motifs within the different peaks of such distributions will differ in both length and sequence, this result may help explain the widespread use of secondary TFBS motifs by TFs in mammals and land plants (2, 34, 35). Bimodality in binding-affinity distributions is predicted to be less likely in species with high , as increasingly efficient selection overwhelms the mutation bias, generating a unimodal distribution with high average binding strength.
Third, unless effective population sizes are enormous and selection is exceedingly strong, 50% or more of the mutations arising in binding motifs are typically free to move toward fixation. As a consequence of this drift process, despite being critical to fitness orthologous motifs in closely related species can often evolve at rates of at least half the mutation rate, and those in distantly related lineages will frequently have asymptotic levels of divergence of up to 50%. For motifs that are only 10 bp or so in length, this level of divergence can be exceedingly difficult to discriminate from the neutral expectation. Consistent with this view, many studies have documented substantial within-species variation in the sequences of orthologous regulatory elements (36–39). In addition, studies in diverse lineages, sometimes involving closely related species, have routinely shown near-complete scrambling of regulatory-region sequence, often with no apparent functional consequences (40–48).
Fourth, if both members of an interacting pair are free to evolve, even greater variation and divergence in binding-site affinity is expected. This is because selection operates only on the overall degree of matching between participants, with the precise motifs involved in the interaction being largely irrelevant (so long as spurious cross-talk with noncognate systems is avoided). Provided the degree of overall affinity remains within certain bounds, the degrees of freedom beyond the drift barrier present ample opportunities for unbounded molecular wandering of the individual motifs. Such within-species drift will passively give rise to incompatibilities among isolated lineages as the reciprocal partners in heterospecific combinations no longer recognize each other (49, 50).
Finally, the pattern of evolution of participating partners is expected to vary with the number of transactions involved. With a highly specific (one-to-one) association, long-term rates of evolution must be identical for both members of the pair. Otherwise, fitness would necessarily decline as the coevolutionary loop is broken. In contrast, under a one-to-many scenario, where one member of the pair interacts with multiple representatives of the other, there can be substantial asymmetry in the rates of evolution of the two components. The member of the pair servicing multiple partners (i.e., having more pleiotropic effects) experiences the strongest selective constraint, with the overall rate of evolutionary divergence declining with increasing numbers of partners. In contrast, the more numerous partners evolve in an essentially independent fashion, with distribution and rate features identical to what would be expected in a one-to-one system. In effect, with multiple interacting partners the master controlling element becomes increasingly constrained to accepting only the reduced subset of mutations that is either effectively neutral for all partners or the even smaller subset with a net overall positive impact. Consistent with this prediction, within the γ-Proteobacteria TFs with larger numbers of target genes are more evolutionarily conserved at the amino acid sequence level, including in the TFBS recognition sequence (51). In addition, the decline in TF binding-site specificity with increasing numbers of genes serviced in both E. coli and yeast (14, 52) seems to be consistent with the evolution of more generalized recognition systems in TFs with greater pleiotropic effects.
Collectively, the preceding theory indicates that, without direct empirical evidence, there is little justification for interpreting motif variation (either among multiple genes within genomes or among orthologous sequences across species) as evidence for the adaptive fine-tuning of individual loci. Rather, such variation is expected to be a natural consequence of the degrees of freedom associated with binding interfaces, the diminishing advantages of increased binding affinity, and the limits to the power of natural selection. Thus, the common inability to locate orthologous TFBSs in comparative studies is most likely not an artifact of inadequate computational tools but an inherent consequence of the evolutionary features of such sequences.
It has recently been argued that some genomic features may exhibit evolutionary behaviors that are nearly independent of population size, particularly when such features are linearly related to a phenotypic trait under stabilizing selection (53). However, a central conclusion from the preceding theory is that the evolutionary behavior of binding interfaces is strongly influenced by , even when the level of gene expression is under stabilizing selection. Consistent with the theory, comparison of TF systems in Drosophila and mammals (cases of relatively high vs. low ) suggests higher rates of evolution in the latter (3). Such work will need to be repeated over many other phylogenetic lineages before definitive conclusions can be drawn. However, as large, curated databases continue to develop for multiple organisms (e.g., for TFs and their binding sites; see refs. 54–56), it will become possible to test the various evolutionary hypotheses concerning the roles played by effective population size, numbers of molecules per cell, number of interacting partners, and so on.
A number of additional avenues of inquiry remain for the future. For example, although we have provided a fairly extensive analysis of one-to-one and one-to-many types of interactions, the many-to-one and many-to-many scenarios remain to be explored. Many-to-one scenarios include the regulation of the large proportion of genes in eukaryotes that require multiple TFs for gene activation. Many-to-many interactions include higher-order gene networks and heteromeric proteins, and long chains of interactions are relevant to many signal-transduction cascades in eukaryotes. In all of these cases, the issue of epistasis (nonmultiplicative interactions among fitness-determining loci) will merit investigation, because this might influence the degree to which the partners at equivalent levels in such systems evolve in an independent manner. Recombination will have no influence on the preceding results provided the population size is small enough that the mode of evolution is consistent with the sequential-fixation model, but will become increasingly important in cases involving multiple loci in large populations where jointly segregating polymorphic sites have an appreciable probability of occurrence.
Supplementary Material
We thank J. Gunawardena, M. Lässig, and G. Marinov for helpful comments. This work has been supported by National Institutes of Health Grant R01 GM036827 (to M. Lynch and W. K. Thomas), National Science Foundation Grant MCB-1050161 (to M. Lynch), and Award W911NF-09-1-0444 from the US Army Research Office (to M. Lynch, P. Foster, H. Tang, and S. Finkel).
The authors declare no conflict of interest.
This article contains supporting information online at
- 1.Dowell RD. Transcription factor binding variation in the evolution of gene regulation. Trends Genet. 2010;26(11):468–475. doi: 10.1016/j.tig.2010.08.005. [DOI] [PubMed] [Google Scholar]
- 2.Nakagawa S, Gisselbrecht SS, Rogers JM, Hartl DL, Bulyk ML. DNA-binding specificity changes in the evolution of forkhead transcription factors. Proc Natl Acad Sci USA. 2013;110(30):12349–12354. doi: 10.1073/pnas.1310430110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Villar D, Flicek P, Odom DT. Evolution of transcription factor binding in metazoans - mechanisms and functional implications. Nat Rev Genet. 2014;15(4):221–233. doi: 10.1038/nrg3481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Dayhoff JE, Shoemaker BA, Bryant SH, Panchenko AR. Evolution of protein binding modes in homooligomers. J Mol Biol. 2010;395(4):860–870. doi: 10.1016/j.jmb.2009.10.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lynch M. Evolutionary diversification of the multimeric states of proteins. Proc Natl Acad Sci USA. 2013;110(30):E2821–E2828. doi: 10.1073/pnas.1310980110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Capra EJ, Perchuk BS, Skerker JM, Laub MT. Adaptive mutations that prevent crosstalk enable the expansion of paralogous signaling protein families. Cell. 2012;150(1):222–232. doi: 10.1016/j.cell.2012.05.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rowland MA, Deeds EJ. Crosstalk and the evolution of specificity in two-component signaling. Proc Natl Acad Sci USA. 2014;111(15):5550–5555. doi: 10.1073/pnas.1317178111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Landry CR, Freschi L, Zarin T, Moses AM. Turnover of protein phosphorylation evolving under stabilizing selection. Front Genet. 2014;5:245. doi: 10.3389/fgene.2014.00245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kimura M. On the probability of fixation of mutant genes in a population. Genetics. 1962;47(6):713–719. doi: 10.1093/genetics/47.6.713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Berg J, Willmann S, Lässig M. Adaptive evolution of transcription factor binding sites. BMC Evol Biol. 2004;4:42. doi: 10.1186/1471-2148-4-42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lässig M. From biophysics to evolutionary genetics: Statistical aspects of gene regulation. BMC Bioinformatics. 2007;8(Suppl 6):S7. doi: 10.1186/1471-2105-8-S6-S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bintu L, et al. Transcriptional regulation by the numbers: Applications. Curr Opin Genet Dev. 2005;15(2):125–135. doi: 10.1016/j.gde.2005.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gerland U, Hwa T. On the selection and evolution of regulatory DNA motifs. J Mol Evol. 2002;55(4):386–400. doi: 10.1007/s00239-002-2335-z. [DOI] [PubMed] [Google Scholar]
- 14.Stewart AJ, Hannenhalli S, Plotkin JB. Why transcription factor binding sites are ten nucleotides long. Genetics. 2012;192(3):973–985. doi: 10.1534/genetics.112.143370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Haldane A, Manhart M, Morozov AV. Biophysical fitness landscapes for transcription factor binding sites. PLOS Comput Biol. 2014;10(7):e1003683. doi: 10.1371/journal.pcbi.1003683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.von Hippel PH, Berg OG. On the specificity of DNA-protein interactions. Proc Natl Acad Sci USA. 1986;83(6):1608–1612. doi: 10.1073/pnas.83.6.1608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Sarai A, Takeda Y. Lambda repressor recognizes the approximately 2-fold symmetric half-operator sequences asymmetrically. Proc Natl Acad Sci USA. 1989;86(17):6513–6517. doi: 10.1073/pnas.86.17.6513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Takeda Y, Sarai A, Rivera VM. Analysis of the sequence-specific interactions between Cro repressor and operator DNA by systematic base substitution experiments. Proc Natl Acad Sci USA. 1989;86(2):439–443. doi: 10.1073/pnas.86.2.439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Fields DS, He Y, Al-Uzri AY, Stormo GD. Quantitative specificity of the Mnt repressor. J Mol Biol. 1997;271(2):178–194. doi: 10.1006/jmbi.1997.1171. [DOI] [PubMed] [Google Scholar]
- 20.Shultzaberger RK, Malashock DS, Kirsch JF, Eisen MB. The fitness landscapes of cis-acting binding sites in different promoter and environmental contexts. PLoS Genet. 2010;6(7):e1001042. doi: 10.1371/journal.pgen.1001042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Yang L, et al. TFBSshape: A motif database for DNA shape features of transcription factor binding sites. Nucleic Acids Res. 2014;42(Database issue):D148–D155. doi: 10.1093/nar/gkt1087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lynch M. The Origins of Genome Architecture. Sinauer; Sunderland, MA: 2007. [Google Scholar]
- 23.Robison K, McGuire AM, Church GM. A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. J Mol Biol. 1998;284(2):241–254. doi: 10.1006/jmbi.1998.2160. [DOI] [PubMed] [Google Scholar]
- 24.Malmström J, et al. Proteome-wide cellular protein concentrations of the human pathogen Leptospira interrogans. Nature. 2009;460(7256):762–765. doi: 10.1038/nature08184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ghaemmaghami S, et al. Global analysis of protein expression in yeast. Nature. 2003;425(6959):737–741. doi: 10.1038/nature02046. [DOI] [PubMed] [Google Scholar]
- 26.Schwanhäusser B, et al. Global quantification of mammalian gene expression control. Nature. 2011;473(7347):337–342. doi: 10.1038/nature10098. [DOI] [PubMed] [Google Scholar]
- 27.Lynch M. The lower bound to the evolution of mutation rates. Genome Biol Evol. 2011;3:1107–1118. doi: 10.1093/gbe/evr066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Sung W, Ackerman MS, Miller SF, Doak TG, Lynch M. Drift-barrier hypothesis and mutation-rate evolution. Proc Natl Acad Sci USA. 2012;109(45):18488–18492. doi: 10.1073/pnas.1216223109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Nourmohammad A, Schiffels S, Lässig M. Evolution of molecular phenotypes under stabilizing selection 2013 . arXiv:1301.3981. [Google Scholar]
- 30.Mustonen V, Kinney J, Callan CG, Jr, Lässig M. Energy-dependent fitness: A quantitative model for the evolution of yeast transcription factor binding sites. Proc Natl Acad Sci USA. 2008;105(34):12376–12381. doi: 10.1073/pnas.0805909105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Mustonen V, Lässig M. Evolutionary population genetics of promoters: Predicting binding sites and functional phylogenies. Proc Natl Acad Sci USA. 2005;102(44):15936–15941. doi: 10.1073/pnas.0505537102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Dekel E, Alon U. Optimality and evolutionary tuning of the expression level of a protein. Nature. 2005;436(7050):588–592. doi: 10.1038/nature03842. [DOI] [PubMed] [Google Scholar]
- Vladar HP, Barton N. Stability and response of polygenic traits to stabilizing selection and mutation. Genetics. 2014;197(2):749–767. doi: 10.1534/genetics.113.159111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Badis G, et al. Diversity and complexity in DNA recognition by transcription factors. Science. 2009;324(5935):1720–1723. doi: 10.1126/science.1162327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Franco-Zorrilla JM, et al. DNA-binding specificities of plant transcription factors and their potential to define target genes. Proc Natl Acad Sci USA. 2014;111(6):2367–2372. doi: 10.1073/pnas.1316278111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Balhoff JP, Wray GA. Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites. Proc Natl Acad Sci USA. 2005;102(24):8591–8596. doi: 10.1073/pnas.0409638102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zheng W, Gianoulis TA, Karczewski KJ, Zhao H, Snyder M. Regulatory variation within and between species. Annu Rev Genomics Hum Genet. 2011;12:327–346. doi: 10.1146/annurev-genom-082908-150139. [DOI] [PubMed] [Google Scholar]
- 38.Garfield D, Haygood R, Nielsen WJ, Wray GA. Population genetics of cis-regulatory sequences that operate during embryonic development in the sea urchin Strongylocentrotus purpuratus. Evol Dev. 2012;14(2):152–167. doi: 10.1111/j.1525-142X.2012.00532.x. [DOI] [PubMed] [Google Scholar]
- 39.Heinz S, et al. Effect of natural genetic variation on enhancer selection and function. Nature. 2013;503(7477):487–492. doi: 10.1038/nature12615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Romano LA, Wray GA. Conservation of Endo16 expression in sea urchins despite evolutionary divergence in both cis and trans-acting components of transcriptional regulation. Development. 2003;130(17):4187–4199. doi: 10.1242/dev.00611. [DOI] [PubMed] [Google Scholar]
- 41.Oda-Ishii I, Bertrand V, Matsuo I, Lemaire P, Saiga H. Making very similar embryos with divergent genomes: Conservation of regulatory mechanisms of Otx between the ascidians Halocynthia roretzi and Ciona intestinalis. Development. 2005;132(7):1663–1674. doi: 10.1242/dev.01707. [DOI] [PubMed] [Google Scholar]
- 42.Hare EE, Peterson BK, Iyer VN, Meier R, Eisen MB. Sepsid even-skipped enhancers are functionally conserved in Drosophila despite lack of sequence conservation. PLoS Genet. 2008;4(6):e1000106. doi: 10.1371/journal.pgen.1000106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Barrière A, Gordon KL, Ruvinsky I. Distinct functional constraints partition sequence conservation in a cis-regulatory element. PLoS Genet. 2011;7(6):e1002095. doi: 10.1371/journal.pgen.1002095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Barrière A, Gordon KL, Ruvinsky I. Coevolution within and between regulatory loci can preserve promoter function despite evolutionary rate acceleration. PLoS Genet. 2012;8(9):e1002961. doi: 10.1371/journal.pgen.1002961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.He BZ, Holloway AK, Maerkl SJ, Kreitman M. Does positive selection drive transcription factor binding site turnover? A test with Drosophila cis-regulatory modules. PLoS Genet. 2011;7(4):e1002053. doi: 10.1371/journal.pgen.1002053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Ludwig MZ, Manu RK, Kittler R, White KP, Kreitman M. Consequences of eukaryotic enhancer architecture for gene expression dynamics, development, and fitness. PLoS Genet. 2011;7(11):e1002364. doi: 10.1371/journal.pgen.1002364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Paris M, et al. Extensive divergence of transcription factor binding in Drosophila embryos with highly conserved gene expression. PLoS Genet. 2013;9(9):e1003748. doi: 10.1371/journal.pgen.1003748. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Stefflova K, et al. Cooperativity and rapid evolution of cobound transcription factors in closely related mammals. Cell. 2013;154(3):530–540. doi: 10.1016/j.cell.2013.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.True JR, Haag ES. Developmental system drift and flexibility in evolutionary trajectories. Evol Dev. 2001;3(2):109–119. doi: 10.1046/j.1525-142x.2001.003002109.x. [DOI] [PubMed] [Google Scholar]
- 50.Johnson NA, Porter AH. Evolution of branched regulatory genetic pathways: Directional selection on pleiotropic loci accelerates developmental system drift. Genetica. 2007;129(1):57–70. doi: 10.1007/s10709-006-0033-2. [DOI] [PubMed] [Google Scholar]
- 51.Rajewsky N, Socci ND, Zapotocky M, Siggia ED. The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons. Genome Res. 2002;12(2):298–308. doi: 10.1101/gr.207502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Sengupta AM, Djordjevic M, Shraiman BI. Specificity and robustness in transcription control networks. Proc Natl Acad Sci USA. 2002;99(4):2072–2077. doi: 10.1073/pnas.022388499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Charlesworth B. Stabilizing selection, purifying selection, and mutational bias in finite populations. Genetics. 2013;194(4):955–971. doi: 10.1534/genetics.113.151555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Abdulrehman D, et al. YEASTRACT: Providing a programmatic access to curated transcriptional regulatory associations in Saccharomyces cerevisiae through a web services interface. Nucleic Acids Res. 2011;39(Database issue):D136–D140. doi: 10.1093/nar/gkq964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Salgado H, et al. RegulonDB v8.0: Omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res. 2013;41(Database issue):D203–D213. doi: 10.1093/nar/gks1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Shazman S, Lee H, Socol Y, Mann RS, Honig B. OnTheFly: A database of Drosophila melanogaster transcription factors and their binding sites. Nucleic Acids Res. 2014;42(Database issue):D167–D171. doi: 10.1093/nar/gkt1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Berggrun A, Sauer RT. Contributions of distinct quaternary contacts to cooperative operator binding by Mnt repressor. Proc Natl Acad Sci USA. 2001;98(5):2301–2305. doi: 10.1073/pnas.041612198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Gunasekera A, Ebright YW, Ebright RH. DNA sequence determinants for binding of the Escherichia coli catabolite gene activator protein. J Biol Chem. 1992;267(21):14713–14720. [PubMed] [Google Scholar]
- 59.Kinney JB, Murugan A, Callan CG, Jr, Cox EC. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc Natl Acad Sci USA. 2010;107(20):9158–9163. doi: 10.1073/pnas.1004290107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Omagari K, et al. Systematic single base-pair substitution analysis of DNA binding by the cAMP receptor protein in cyanobacterium Synechocystis sp. PCC 6803. FEBS Lett. 2004;563(1-3):55–58. doi: 10.1016/S0014-5793(04)00248-0. [DOI] [PubMed] [Google Scholar]
- 61.Schildbach JF, Karzai AW, Raumann BE, Sauer RT. Origins of DNA-binding specificity: Role of protein contacts with the DNA backbone. Proc Natl Acad Sci USA. 1999;96(3):811–817. doi: 10.1073/pnas.96.3.811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Wang X, et al. A high-throughput percentage-of-binding strategy to measure binding energies in DNA-protein interactions: Application to genome-scale site discovery. Nucleic Acids Res. 2008;36(15):4863–4871. doi: 10.1093/nar/gkn477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Nutiu R, et al. Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nat Biotechnol. 2011;29(7):659–664. doi: 10.1038/nbt.1882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Oda M, Furukawa K, Ogata K, Sarai A, Nakamura H. Thermodynamics of specific and non-specific DNA binding by the c-Myb DNA-binding domain. J Mol Biol. 1998;276(3):571–590. doi: 10.1006/jmbi.1997.1564. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.