Abstract
We use Monte Carlo simulations to analyze the simultaneous interactions of multiple proteins to a long DNA molecule. We study the time dependence of protein organization on DNA for different regimes that comprise (non)cooperative sequence-independent protein assembly, dissociation, and linear motion. A range of different behaviors is observed for the dynamics, final coverage, and cluster size distributions. We observe that the DNA substrate is almost never completely covered by protein when taking into account only (non)cooperative binding, because gaps remain on the substrate that are smaller than the binding site size of the protein. Due to these gaps, the apparent binding size of a protein during noncooperative binding can be overestimated by up to 30%. During dissociation of cooperatively bound proteins, the dissociation curve can be exponentially shaped even when allowing only end-dependent dissociation. We discuss the potential of our method for the analysis of a number of single-molecule experiments, for example, the binding of the DNA-repair proteins RecA and Rad51 to DNA.
INTRODUCTION
In the last decade, new experimental techniques have opened the way to study protein-DNA or protein-protein interactions at the level of single molecules. In contrast to bulk experiments, single-molecule experiments do not suffer from averaging multiple events, thereby allowing much more detailed characterization. The interaction between proteins and DNA involves a variety of relevant processes, e.g., binding, dissociation, translocation, shape deformation, etc. To describe the dynamic interactions between a protein and DNA, one can study systems where the protein-DNA interaction is restricted to a specific site, e.g., a recognition sequence for a restriction enzyme (1). For this type of experiment with a single protein interacting with DNA, models were developed to extract the relevant kinetic interaction rates.
For proteins that bind nonspecifically to DNA, however, the situation may be much more complex. Often, many proteins interact with DNA and other proteins simultaneously. One approach to study these systems in detail has been to avoid multiple events by severely reducing the amount of target area. For example, the length of available DNA substrate can be limited to only tens of bases, or the concentration of protein present in the reaction can be substantially lowered with respect to the target area. Another approach to study the dynamics of multiple nonspecific interactions of proteins with DNA or other proteins is to develop models that go beyond the description of single-entity binding.
In the classic protein-binding model of McGhee and von Hippel (MVH) (2), two cases of nonspecific protein-DNA binding are addressed: noncooperative and cooperative binding. In the former, proteins bind randomly to the lattice without any preference to bind adjacent to an already bound protein. In cooperative binding, however, a nucleation event is followed by an extension phase where proteins preferably bind next to an occupied lattice position (Fig. 1 A). In their analytical approach, MVH assumed an infinite lattice to which proteins can bind (non)cooperatively, without taking into account disassembly. Depending on the protein concentration and the strength of the cooperativity, the fractional final coverage was deduced at equilibrium—yielding a value for the binding constant of the protein to the lattice.
Although this model is valuable and widely used to determine the magnitude of cooperativity for a certain protein-DNA system (3–5), it has certain limitations. In a single-molecule experiment, one can measure the fractional coverage as it develops in time. The approach of MVH does not allow us to describe this dynamics or to extract kinetic parameters from the experimental single-molecule data, because it restricts the description to the final equilibrated system. Furthermore, the model proposed by MVH assumes that for B proteins that bind noncooperatively to the lattice, there are B + 1 gaps of bare DNA in between the bound proteins. This is a priori not true, because even in the noncooperative case, proteins can bind next to an already bound protein. Finally, the obtained coverage for cooperative binding in the MVH model is always complete. This outcome is incorrect because gaps smaller than the binding size of the protein remain on the lattice due to the random nucleation of proteins along the lattice.
Recently, an analytical tool based upon hidden Markov modeling was developed and applied to extract kinetic rates from single-molecule fluorescence data (6,7) and ion-channel data (8,9). A regular Markov model consists of a series of states where, at each time, the system may change from the state it was in the moment before, or may stay in the same state. These states are directly visible to the observer. In a hidden Markov model, however, the model contains an underlying stochastic process that is not observable (it is hidden) (10). A correct interpretation of single-molecule data using hidden Markov modeling depends on the number of states in the model and the corresponding probabilities involved (11). Furthermore, the states within the model should be independent of each other. For example, hidden Markov modeling does not work well for RNA secondary-structure analysis (10). Studying protein binding to a lattice using hidden Markov modeling causes a similar problem, because an already bound protein can influence a different protein cluster.
Here, we develop a new analysis method based upon Monte Carlo simulations that allows a description of both the dynamics and the final states of systems with multiple protein-DNA or protein-protein interactions via nonspecific target areas. In these Monte Carlo simulations, a Markov chain of different states is calculated. The simulations allow following the interaction of multiple proteins with a single DNA or protein substrate modeled as a one-dimensional lattice in time. We separately show the results for proteins that bind noncooperatively or cooperatively, dissociate, or reorganize along the DNA substrate. Furthermore, combinations of these three different interaction modes are implemented in the Monte Carlo simulations. The usefulness of our method was recently illustrated with a comparison to the interaction between the recombinase Rad51 with single- and double-stranded DNA, where a fit of the model to the experimental data allows us to extract a variety of protein-DNA interaction parameters that could not be obtained otherwise (12). Finally, we suggest a number of different systems to which this method can be applied.
Description of the model
We model protein-DNA interaction using Monte Carlo simulations (13–16). In our Monte Carlo simulations, we model the interaction between protein and DNA with a Markov chain where the next state of the protein-DNA complex depends on the current state. The transition probabilities between different states are given, and a certain stochastic pathway results. The Monte Carlo approach allows us to study both the dynamics as well as equilibrium states.
We first describe the concept for simulating the simplest two-state process that can be written as
(1) |
The reaction rate k1 is coupled to a transition probability p1 in a Markov Chain as
(2) |
where Δt is the duration of a single simulation step in the Monte Carlo simulations. The duration of a simulation step is taken such that 1), the transition probability within a single simulation step is always much smaller than unity; and 2), the chance of having two local transitions within a single simulation step is negligible. In the Monte Carlo simulations, a transition to the new state occurs when the transition probability is larger than a random value extracted from a uniform distribution between 0 and 1.
The interaction between proteins and DNA is implemented as follows in the Monte Carlo simulations (for details, see Methods). The DNA substrate is represented as a long one-dimensional array with the number of elements equivalent to the number of nucleotides or basepairs available. Upon binding, a protein occupies a certain number of elements corresponding to its binding site size corresponding to the most simple model of irreversible adsorption known as random sequential adsorption. The random sequential adsorption model has proven to be quite successful in describing a number of systems (17), lattice and continuum limits have been studied (18,19). Subsequently, the protein can dissociate or move along the substrate, respectively freeing or occupying other elements of the array. The protocol for binding, dissociating, and moving is repeated for each protein interacting with the DNA. Monitoring the transitions in time can be done by evaluating observables such as the lattice occupancy, contour length, or stepping of a single labeled protein.
METHODS
Binding of protein onto a DNA substrate was modeled using Monte Carlo simulations implemented in Interactive Data Language (RSI, Boulder, CO). A one-dimensional array was used to represent the DNA substrate containing a number of elements equivalent to the number of nucleotides or basepairs of the DNA molecule of interest. Simulations were done with various binding sizes for the protein. Cooperative binding was described by nucleation followed by growth that extended the nucleation point, whereas noncooperative binding involved nucleation only.
Nucleation was allowed to occur at any point along the entire molecule. In the Monte Carlo simulations, the nucleation step was simulated as follows: a value was randomly extracted from a uniform distribution yielding a value between 0 and 1. If this value was smaller than a given threshold corresponding to the set nucleation rate for the entire molecule, a protein was bound. The binding location was deduced from a second random number between 0 and 1, which was extracted from a uniform distribution that was multiplied by the number of elements in the one-dimensional array. Binding occurred only when this site plus the following n – 1 sites were not covered by another protein, to account for the fact that each protein covers n nucleotides or basepairs.
For cooperative binding, we evaluated all nucleation sites where protein patch extension could occur. For each site, a value was extracted from a uniform distribution and compared to a given threshold corresponding to the set rate of extension for a single protein patch. If this value was smaller than the threshold, the protein patch was extended if the next n nucleotides or basepairs were not already covered by protein. Extension was only permitted into the direction of higher numbers in the one-dimensional array.
The probabilities for nucleation and growth per time step were taken so small, that the chance of two binding events within a single Monte Carlo step was negligible. For comparison to experiments, the threshold values, which are rates expressed in units (Monte Carlo step)−1, can convert into kinetic rates expressed in s−1 by adjusting the time axis of the Monte Carlo growth curve to the experimental growth data. Whereas our simple modeling involved protein patch extension and disassembly in a unidirectional fashion, the model can be extended using protein patch extension and/or disassembly in both directions. Essentially the same results are found if extension and disassembly occur in both directions, albeit with two slightly different values for the rates that change by a factor up to 2.
In those cases where disassembly was considered, we additionally allowed dissociation to occur after the protein patch extension step. At each end of a protein patch (i.e., a protein cluster consisting of m protein monomers, with m ≥ 1) opposite to the protein-patch-extension end (i.e., toward lower numbers in the array), a value was extracted from a uniform distribution and if this value was smaller than the threshold set by the dissociation rate, the protein dissociated and a vacancy was created. In the case of diffusion of these end-bound monomers, the protein remained bound to the lattice. Alternatively, a second route was considered where dissociation was allowed at all monomer sites i.e., also in the middle of protein patches. Here, the above procedure was extended to all bound proteins.
Reorganization of individual proteins or protein patches along the DNA substrate was incorporated as follows: a value was randomly extracted from a uniform distribution yielding a value between 0 and 1. If this value was smaller than a given threshold corresponding to the reorganization rate, a step of the protein patch was made of one nucleotide or basepair. For unidirectional translocation, the direction was chosen uniformly, toward lower numbers in the array. For diffusive motion, the stepping direction was randomly toward higher/lower numbers in the array, when an extracted value from a uniform distribution was larger/smaller than 0.5. Diffusive motion of end-bound monomers after detachment was done similarly. Upon collision with individual proteins or protein patches, the diffusive motion was stopped.
To ensure the robustness of the code, all simulations were run a number of times (with different seeds) to validate that the outcome was similar for different runs. Typical data of the different scenarios is shown in the corresponding figures.
RESULTS
We modeled the interaction between proteins and DNA for a variety of processes, i.e., binding, dissociation, reorganization, and combinations of these. Protein-DNA binding can be divided into two different schemes—noncooperative and cooperative binding (see Fig. 1 A) (20–23). We first present the results for noncooperative binding of proteins to DNA.
Noncooperative binding
Noncooperative binding of proteins to a DNA molecule is modeled in the Monte Carlo simulations as random binding to a one-dimensional lattice. Upon binding, the protein covers a binding site of multiple nucleotides or basepairs. First, only binding is considered; i.e., once bound, a protein does not disassemble or rearrange. The occupancy of the lattice is followed in time (see Fig. 2 A). The resulting protein coverage displays an exponential growth profile, independent of the binding site size of the protein (see Fig. 2 B). The final occupancy, however, varies with respect to the chosen size of the binding site of the protein (see Fig. 2 D). For a binding site size of 1 nt, full occupancy is obtained, as expected. However, the fractional occupancy decreases for an increasing binding site size, reaching a plateau of ∼0.76 (Fig. 2 D). Due to the finite size of the binding site, gaps of unoccupied lattice elements with a size smaller than the binding site remain throughout the lattice (see bottom panel in Fig. 2 A for an example). The actual number of bound proteins to the lattice is therefore smaller than when all proteins would mutually align such that no gaps would remain on the lattice. Division of the length of the lattice by the number of bound proteins yields the apparent binding size for the protein, which is larger than its intrinsic binding size due to the existence of gaps. This leads to an increase in the apparent binding size of 29.5 ± 0.2% compared to the actual binding size (see Fig. 2 E).
The kinetics of simple noncooperative binding can be described analytically as follows (21). The binding process is limited by the amount of free basepairs available on the DNA molecule (see Fig. 2 A). During growth the amount of free basepairs Nfree decreases according to dNfree/dt=−aNfree, where a is the binding rate of the protein to the lattice, which together with the boundary condition of Nfree(0) = N yields Nfree = Ne−at. The time-dependent occupancy θ becomes
(3) |
showing an exponential binding profile in excellent agreement with the profiles obtained in the Monte Carlo simulations (see black lines in Fig. 2 B).
The final occupancy depends on the binding size n of the protein. A lattice that consists of N possible binding sites allows binding up to (N/n) proteins. During a noncooperative binding process, gaps of size i (1 ≤ i ≤ n – 1) are created throughout the entire lattice reducing the final amount of proteins on the DNA. For the final state, an effective binding size n* = n + sgap can be defined, where sgap is the average gap size between proteins. This average gap size between proteins is not equivalent to but instead When the binding size increases, the possible gap sizes increase accordingly. Therefore, one needs to take into account the actual number of proteins with an adjacent gap size i that is bound to the lattice decreasing as (n + i)−1 and not as n−1. Together, this yields for the fractional occupancy
(4) |
which can be simplified into
(5) |
where and Γ(x) is the γ-function. This relation between fractional coverage and binding site indeed describes the observed behavior from our Monte Carlo simulations well; see the solid red line in Fig. 2 D. The fractional coverage for a protein covering two sites (n = 2) is 0.857, close to the result of 0.865 derived by Flory (24) using combinatorial techniques. For large binding sites, noncooperative binding is similar to the car-parking problem, where one-dimensional cars are parked randomly in a linear array (25,26). Equation 5 yields a fractional coverage of 0.765 for n → ∞ in fairly good agreement with the result of 0.748 obtained for the car-parking problem (25).
Cooperative binding
Cooperative binding of proteins to a DNA molecule is modeled in the Monte Carlo simulations in two steps: nucleation followed by extension. “Nucleation” denotes protein binding at an unoccupied DNA position not adjacent to already bound proteins, whereas we define “extension” as binding to a site directly adjacent to one that is already occupied. We can follow the binding process to the lattice in the Monte Carlo simulations in time by visualizing the binding of every individual protein (or protein cluster if binding occurs via multimers); see Fig. 3, A and B. Using these simulations, we obtained lattice occupancy profiles at different ratios between extension and nucleation (Fig. 3, C and D). Different qualitative behavior is observed, depending on the ratio between rates for extension and nucleation, henceforth called the cooperativity number ω. For a high cooperativity number (i.e., when nucleation is rare; see solid line in Fig. 3, A and B), nothing happens until a first nucleation event occurs, after which the coverage of the DNA molecule increases linearly due to the extension of the protein patch. By contrast, at low cooperativity numbers, many nucleation loci are created followed by extension into multiple protein patches. The process ends when the molecule has no more free binding sites that are large enough to accommodate binding of another protein or protein cluster. With increasing cooperativity numbers, the obtained time-dependent binding profiles change from an exponential (for a ratio of zero, equivalent to noncooperative binding) to a linear relationship (for ratios >106) (see Fig. 3, C and D). The final fractional coverage of the lattice increases for increasing cooperativity values, because the final amount of gaps is reduced (Fig. 3 G) (20,27). Although extension can be orders-of-magnitude larger than nucleation, full coverage is hardly ever obtained for n ≥ 2.
The final distribution of protein clusters along the lattice can be quantified. Different protein-patch length distributions can be obtained depending on the size of the protein cluster that binds during nucleation and extension and the ratio between nucleation and extension rate (see Fig. 3, E and F). For the noncooperative case, the distribution of protein patch sizes peaks around the size of the binding unit with a long tail toward longer protein patches (see Fig. 2 C). An analytical expression for the protein-patch-length distribution Fc has been proposed by (27)
(6) |
where
(7) |
and c the length of the protein patch. In the saturated state (θ = 1), the cluster size approaches infinity, independent of the binding site of the protein (27). This prediction for the noncooperative case of ω = 1 is not in agreement with the observed behavior where the distribution shows a Poissonian profile (see Fig. 2 C) due to the existence of gaps. We can fit the distributions for n ≥ 2 with Eq. 6, where one can reduce the number of free parameters to one, i.e., the cooperativity number ω, because the relative coverage θ is given by with bc the number of appearances of a protein patch with length c. Protein patch length distributions for n = 3 were fit for varying cooperativity numbers (ωin) with Eq. 6 to obtain an apparent value for the cooperativity number (ωout). Interestingly, the fits yield significant differences—even by orders of magnitude—between the values entered (ωin) and obtained after fitting (ωout) for the cooperativity number (see Fig. 3 H). Due to the finite lattice length and incomplete coverage, the fit severely underestimated the cooperativity number in all cases.
Multimeric binding and Hill coefficient
In the MVH model, it is assumed that the binding unit of the protein during nucleation and extension is the same. Both processes, however, in principle, can involve different protein multimers. The binding unit can be determined from concentration-dependent binding reactions where the binding rate in either nucleation or extension, is determined with respect to the protein concentration (see Fig. 4 A). This behavior can be described by the Hill equation
(8) |
where nH is the Hill coefficient and S0.5 the concentration where half-maximum activity occurs. The Hill coefficient can be interpreted as the minimal size of the binding unit, i.e., for nH = 1, the protein binds as a monomer to the lattice, whereas for larger values of nH, the protein binds as a nH-mer (28). This coordination between proteins, for example by binding of preformed multimers, is sometimes called cooperative binding, but this is entirely unrelated to the cooperative binding defined above (the ratio between extension and nucleation in protein patch formation). Within the Monte Carlo simulations, the binding unit in nucleation and extension can be varied independently. In the case where the binding units for nucleation and extension are equivalent, i.e., when the Hill coefficients are identical, the lattice occupancy profiles remain the same independent of protein concentration. On the other hand, if the binding units are not equivalent, the growth profiles and final occupancy change depending on the protein concentration (see Fig. 4 C).
Dissociation
In the above binding schemes, the binding was taken to be irreversible. However, proteins bound to a lattice can have a probability to detach from the lattice, i.e., they dissociate (22). Two different scenarios can be envisioned (see Fig. 1 B). As longer protein patches are formed on the lattice, 1), only proteins located at an end of a patch are allowed to dissociate (29,30); or 2), all proteins are allowed to dissociate regardless of their position within the patch (31). For the latter, the Monte Carlo simulations show an exponentially shaped disassembly curve (see red line in Fig. 5), as expected. This dissociation behavior is independent of the assembly history. Dissociation is, however, linear when dissociation occurs only at the end of a single protein patch (see black line in Fig. 5). If growth has resulted in a multitude of small patches, end dissociation, however, also leads to an exponentially shaped profile (see green line in Fig. 5) (30).
These observations can be understood straightforwardly. For noncooperative binding, bound proteins do not gain from protein-protein interactions and equivalently, once bound, every protein has an equal probability to dissociate. Following a similar reasoning as above for binding (Eq. 3), this yields an exponential dissociation profile, in agreement with the Monte Carlo simulations. In the case of cooperatively bound proteins, dissociation results in a linear profile only if a single protein patch exists on the lattice, because the proteins can only dissociate from one end, as indeed observed in the Monte Carlo simulations. Multiple patches result in multiple end-dissociation points, and the broad size distribution of the patches then leads to an approximately exponential dissociation curve (29,30).
Rearrangements
We also consider in our simulations the spatial rearrangement of proteins on the DNA, where bound proteins can move linearly along the DNA. This is modeled by three different pathways (see Fig. 1 C): 1), the protein cluster moves diffusively; or 2), the protein cluster translocates unidirectionally along the DNA molecule; or 3), end-bound monomers detach and move diffusively toward the neighboring protein cluster. Diffusive movement of the protein leads to a random walk in the Monte Carlo simulations (Fig. 6 B). Unidirectional motion leads to an approximately linear relation between traveled distance and time, as expected (see Fig. 6 A). Unidirectional motion is of course only possible at the expense of an available energy source, e.g., ATP hydrolysis.
One-dimensional diffusive motion of a protein cluster along the lattice can be written as 〈x2〉 = 2Dt, where 〈x2〉 is the average mean-square displacement, D the diffusion coefficient, and t the time that the protein is moving along the lattice. As shown in Fig. 6 C, the average mean-square displacement of a single protein over a given time window indeed follows this relation.
So far, we have considered that, upon dissociation, an end-bound monomer detaches from the protein cluster on the DNA and vanishes to bulk solution. Instead of dissociation into the bulk solution, the detached monomer can also remain bound to the DNA molecule as sketched in scenario 3 (lower mode in Fig. 1 C). After end detachment, the monomer diffuses freely between two protein clusters (32). When the monomer reaches either protein cluster, it will bind.
Combination of processes
Above we have shown different interaction modes for a protein with the DNA substrate, i.e., (non)cooperative binding, dissociation, and reorganization. In the simulations, these modes can be combined in various ways to sort out the different processes that contribute to the occupancy of the lattice. Combination of these pathways can yield very different results. Here, we visualize these using kymographs, graphs that represent the one-dimensional lattice occupancy on one axis and time on the other (33).
Fig. 7 A shows a kymograph for cooperative binding, where the final lattice is not completely covered because gaps between protein clusters remain on the lattice. The grayscale in the kymographs indicate protein-bound (open) and protein-free (solid) DNA substrate. The permanent gaps in Fig. 7 A thus are seen as the solid horizontal lines that persist over time. Cooperative binding in the presence of end-dependent disassembly yields a different behavior (Fig. 7 B). As can be seen in Fig. 7 B, protein patches appear and disappear in time at different positions on the lattice. The dissociation rate is chosen such that the fractional coverage on the lattice remains approximately constant in time.
Upon allowing reorganization of detached end-bound monomers or bound protein patches either by diffusive or unidirectional motion, a completely covered lattice is obtained (see Fig. 7, C, E, and G, respectively). Unidirectional movement of the protein patches is observed in the kymographs by linear stripes in the downward direction (Fig. 7 G), which represent protein patches that shift and fuse with other patches at the bottom. Diffusive motion of monomers or protein patches leads to a more strongly fluctuating behavior (Fig. 7, C and E). Also, upon combining all three modes—cooperative growth, rearrangement, and end-dependent disassembly—the final lattice is completely covered (Fig. 7, D, F, and H). Note that the timescale now has significantly increased due to the presence of protein dissociation before saturation is obtained. The downward motion observed for one-dimensional diffusion is caused by protein monomers that erode from patches and after a diffusive walk, end up at the next patch. A similar behavior is observed in Fig. 7 B.
DISCUSSION
Using Monte Carlo simulations, we have modeled different interactions between protein and DNA, i.e., (non)cooperative binding, dissociation, and reorganization. The flexibility of the Monte Carlo simulations allows our using different binding-site sizes of the protein during nucleation, filament extension, or dissociation. The Monte Carlo simulations of the different interactions yielded interesting results. First, the DNA substrate is almost never completely covered by protein when taking binding only, into account. Gaps remain on the substrate smaller than the binding site size of the protein. Second, the apparent binding size of a protein during noncooperative binding can be overestimated by up to 30% due to the existence of gaps. Furthermore, the fractional coverage increases for higher numbers of cooperativity. Finally, the dissociation behavior of cooperatively bound protein can lead to an exponentially shaped dissociation curve even when allowing only end-dependent dissociation (30,31).
We can compare the benefits of our Monte Carlo simulations to the MVH model and hidden Markov modeling. MVH derived equations to describe the binding of a protein to a lattice while taking into account cooperativity. This model has been applied numerous times in equilibrium studies to extract binding constants and the cooperativity number. The model fails, however, to address the kinetics of individual proteins when dissociation and translocation of bound proteins are relevant. Furthermore, the final occupancy of the lattice is (incorrectly) always complete despite the finite binding site size of the protein involved. Indeed, our Monte Carlo results show that the MVH cooperativity number extracted from filament length distributions did not correspond to the input value. Finally, the binding site sizes of the protein during nucleation and protein patch extension in the original MVH model are identical, whereas in an experiment they can be different.
In contrast to the MVH model, hidden Markov modeling allows addressing the reaction kinetics. Due to the modular setup of a Markov chain, different interaction pathways can be modeled. However, hidden Markov modeling cannot cope with systems where different pathways influence the outcome of each other.
With Monte Carlo simulations, protein-DNA interactions can be followed in time for each protein involved. A disadvantage is that it takes a fair amount of computational power to simulate the kinetics of complex pathways in Monte Carlo simulations, but not unreasonably so. (The current results were obtained on a PC with an integrated computation time of three weeks.)
Application of Monte Carlo modeling
The Monte Carlo simulations described here can be applied to a variety of experimental systems. We briefly mention a few examples. A system to which this method was applied, was the interaction between the RecA-like recombinase RAD51 and DNA (12) (see Fig. 8 A). Using magnetic tweezers, the end-to-end distance of a tethered DNA molecule was followed in time while RAD51 was allowed to bind forming a nucleoprotein filament. Upon binding to the DNA substrate, RAD51 induces a change in end-to-end distance yielding a measure for the lattice occupancy. The induced changes in end-to-end distance were fit with Monte Carlo simulated binding profiles yielding all relevant single RAD51 (dis)assembly rates.
Another useful application would be RNA-dependent RNA polymerases. These polymerases can either generate template-long duplexes by synthesizing full-length RNA chains in one run, or generate many short duplexes by synthesizing short complementary RNA oligonucleotides scattered along the RNA template, known as abortive initiation (see Fig. 8 B) (34). The former is a highly cooperative binding mode, whereas abortive initiation corresponds to a low-cooperative binding mode. This can be experimentally measured because the creation of duplex RNA from a single strand template increases the stiffness of the RNA molecule yielding a change in end-to-end distance of a tethered molecule in, e.g., a tweezers setup. These changes in end-to-end distance can be analyzed with the Monte Carlo simulations yielding values for the rates of initiation and duplex extension.
Other protein-DNA binding reactions can be analyzed as well. The case of single-stranded binding proteins like SSB and RPA, or nucleosome binding to DNA, is conceptually very similar to the RAD51 binding that we have already described. Another example is structural maintenance of chromosome (SMC) proteins, which are the central components of several multiprotein complexes that help to organize chromosomes throughout the cell cycle (35) (Fig. 8 C). The analysis presented here provides a basis for quantification of, for example, the binding size, the presence of any cooperativity involved, or possible reorganization during binding of SMC proteins with DNA.
CONCLUSION
To overcome certain limitations of the classic MVH model and the hidden Markov model, we have used Monte Carlo simulations to model ligand-lattice interaction. These Monte Carlo simulations allow determination of protein-related binding rates even when multiple proteins interact simultaneously with the lattice. This tool was applied to understand RAD51-DNA interaction. Application of this analytical tool can be extended to other systems where cooperativity plays a crucial role, like single-stranded binding proteins, polymerases, and SMC proteins.
Acknowledgments
We have benefited greatly from many discussions with Claire Wyman and Roland Kanaar over the years. We thank Marijn van Loenhout for useful discussions and a critical reading of the manuscript.
This work was supported by grants from the “Stichting voor Fundamenteel Onderzoek der Materie”, which is financially supported by the “Nederlandse Organisatie voor Wetenschappelijk Onderzoek”.
Thijn van der Heijden's present address is Leiden Institute of Physics, 2333 CA Leiden, The Netherlands.
Editor: David P. Millar.
References
- 1.Seidel, R., J. van Noort, C. van der Scheer, J. G. P. Bloom, N. H. Dekker, C. F. Dutta, A. Blundell, T. Robinson, K. Firman, and C. Dekker. 2004. Real-time observation of DNA translocation by the type I restriction modification enzyme EcoR124I. Nat. Struct. Mol. Biol. 11:838–843. [DOI] [PubMed] [Google Scholar]
- 2.McGhee, J. D., and P. H. von Hippel. 1974. Theoretical aspects of DNA-protein interactions—cooperative and non-cooperative binding of large ligands to a one-dimensional homogeneous lattice. J. Mol. Biol. 86:469–489. [DOI] [PubMed] [Google Scholar]
- 3.Lonberg, N., S. C. Kowalczykowski, L. S. Paul, and P. H. von Hippel. 1981. Interactions of bacteriophage T4-coded gene 32 protein with nucleic acids: III. Binding-properties of 2 specific proteolytic digestion products of the protein (G32P*I and G32P*III). J. Mol. Biol. 145:123–138. [DOI] [PubMed] [Google Scholar]
- 4.Ando, R. A., and S. W. Morrical. 1998. Single-stranded DNA binding properties of the UvsX recombinase of bacteriophage T4: binding parameters and effects of nucleotides. J. Mol. Biol. 283:785–796. [DOI] [PubMed] [Google Scholar]
- 5.Moreno-Herrero, F., L. Holtzer, D. A. Koster, S. Shuman, C. Dekker, and N. H. Dekker. 2005. Atomic force microscopy shows that Vaccinia topoisomerase IB generates filaments on DNA in a cooperative fashion. Nucleic Acids Res. 33:5945–5953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yang, H., G. B. Luo, P. Karnchanaphanurach, T. M. Louie, I. Rech, S. Cova, L. Y. Xun, and X. S. Xie. 2003. Protein conformational dynamics probed by single-molecule electron transfer. Science. 302:262–266. [DOI] [PubMed] [Google Scholar]
- 7.Joo, C., S. A. McKinney, M. Nakamura, I. Rasnik, S. Myong, and T. Ha. 2006. Real-time observation of RecA filament dynamics with single monomer resolution. Cell. 126:515–527. [DOI] [PubMed] [Google Scholar]
- 8.Qin, F., A. Auerbach, and F. Sachs. 2000. A direct optimization approach to hidden Markov modeling for single channel kinetics. Biophys. J. 79:1915–1927. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Qin, F., A. Auerbach, and F. Sachs. 2000. Hidden Markov modeling for single channel kinetics with filtering and correlated noise. Biophys. J. 79:1928–1944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Eddy, S. R. 2004. What is a hidden Markov model? Nat. Biotechnol. 22:1315–1316. [DOI] [PubMed] [Google Scholar]
- 11.Rabiner, L. R. 1989. A tutorial on hidden Markov-models and selected applications in speech recognition. Proc. IEEE. 77:257–286. [Google Scholar]
- 12.van der Heijden, T., R. Seidel, M. Modesti, R. Kanaar, C. Wyman, and C. Dekker. 2007. Real-time assembly and disassembly of human RAD51 filaments on individual DNA molecules. Nucleic Acids Res. 35:5646–5657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Metropolis, N., and S. Ulam. 1949. The Monte Carlo method. J. Am. Stat. Assoc. 44:335–341. [DOI] [PubMed] [Google Scholar]
- 14.Halton, J. H. 1970. A retrospective and prospective survey of Monte Carlo method. SIAM Rev. 12:1–63. [Google Scholar]
- 15.Binder, K. 1986. Monte Carlo methods in statistical physics. In Topics in Current Physics, 2nd Ed. Springer-Verlag, Berlin; New York.
- 16.Evans, J. W. 1993. Random and cooperative sequential adsorption. Rev. Mod. Phys. 65:1281–1329. [Google Scholar]
- 17.Onoda, G. Y., and E. G. Liniger. 1986. Experimental determination of the random-parking limit in two dimensions. Phys. Rev. A. 33:715–716. [DOI] [PubMed] [Google Scholar]
- 18.Widom, B. 1966. Random sequential addition of hard spheres to a volume. J. Chem. Phys. 44:3888–3894. [Google Scholar]
- 19.Gonzales, J. J., P. C. Hemmer, and J. S. Hoye. 1974. Cooperative effects in random sequential polymer reactions. Chem. Phys. 3:228–238. [Google Scholar]
- 20.Epstein, I. R. 1978. Cooperative and non-cooperative binding of large ligands to a finite one-dimensional lattice—model for ligand-oligonucleotide interactions. Biophys. Chem. 8:327–339. [DOI] [PubMed] [Google Scholar]
- 21.Epstein, I. R. 1979. Kinetics of large-ligand binding to one-dimensional lattices—theory of irreversible binding. Biopolymers. 18:765–788. [Google Scholar]
- 22.Epstein, I. R. 1979. Kinetics of nucleic acid large ligand interactions—exact Monte Carlo treatment and limiting cases of reversible binding. Biopolymers. 18:2037–2050. [DOI] [PubMed] [Google Scholar]
- 23.Dateo, C., and I. R. Epstein. 1981. Kinetics of nucleic-acid large ligand interactions—multiplet-closure approximations and matrix-iteration techniques. Biopolymers. 20:1651–1669. [Google Scholar]
- 24.Flory, P. J. 1939. Intramolecular reaction between neighboring substituents of vinyl polymers. J. Am. Chem. Soc. 61:1518–1521. [Google Scholar]
- 25.Renyi, A. 1958. On a one-dimensional problem concerning random space-filling. Publ. Math. Inst. Hungarian Acad. Sci. 3:109–127. [Google Scholar]
- 26.Solomon, H., and H. Weiner. 1986. A review of the packing problem. Comm. Statist. Theory Methods. 15:2571–2607. [Google Scholar]
- 27.Kowalczykowski, S. C., L. S. Paul, N. Lonberg, J. W. Newport, J. A. McSwiggen, and P. H. von Hippel. 1986. Cooperative and noncooperative binding of protein ligands to nucleic-acid lattices—experimental approaches to the determination of thermodynamic parameters. Biochemistry. 25:1226–1240. [DOI] [PubMed] [Google Scholar]
- 28.Weiss, J. N. 1997. The Hill equation revisited: uses and misuses. FASEB J. 11:835–841. [PubMed] [Google Scholar]
- 29.Lohman, T. M. 1984. Kinetics and mechanism of dissociation of cooperatively bound T4-gene-32-protein single-stranded nucleic-acid complexes: I. Irreversible dissociation induced by sodium-chloride concentration jumps. Biochemistry. 23:4656–4665. [DOI] [PubMed] [Google Scholar]
- 30.Lohman, T. M. 1983. Model for the irreversible dissociation kinetics of cooperatively bound protein nucleic-acid complexes. Biopolymers. 22:1697–1713. [DOI] [PubMed] [Google Scholar]
- 31.Balazs, A. C., and I. R. Epstein. 1984. Kinetics of irreversible dissociation for proteins bound cooperatively to DNA. Biopolymers. 23:1249–1259. [DOI] [PubMed] [Google Scholar]
- 32.Lohman, T. M. 1984. Kinetics and mechanism of dissociation of cooperatively bound T4-gene-32-protein single-stranded nucleic-acid complexes: II. Changes in mechanism as a function of sodium-chloride concentration and other solution variables. Biochemistry. 23:4665–4675. [DOI] [PubMed] [Google Scholar]
- 33.Waterman-Storer, C. M., A. Desai, J. C. Bulinski, and E. D. Salmon. 1998. Fluorescent speckle microscopy, a method to visualize the dynamics of protein assemblies in living cells. Curr. Biol. 8:1227–1230. [DOI] [PubMed] [Google Scholar]
- 34.Makeyev, E. V., and D. H. Bamford. 2002. Cellular RNA-dependent RNA polymerase involved posttranscriptional gene silencing has two distinct activity modes. Mol. Cell. 10:1417–1427. [DOI] [PubMed] [Google Scholar]
- 35.Hirano, T. 2006. At the heart of the chromosome: SMC proteins in action. Nat. Rev. Mol. Cell Biol. 7:311–322. [DOI] [PubMed] [Google Scholar]