Abstract
To gain a deeper insight into cellular processes such as transcription and translation, one needs to uncover the mechanisms controlling the configurational changes of nucleic acids. As a step toward this aim, we present here a mesoscopic-level computational model that provides a new window into nucleic acid dynamics. We model a single-stranded nucleic as a polymer chain whose monomers are the nucleosides. Each monomer comprises a bead representing the sugar molecule and a pin representing the base. The bead-pin complex can rotate about the backbone of the chain. We consider pairwise stacking and hydrogen-bonding interactions. We use a modified Monte Carlo dynamics that splits the dynamics into translational bead motion and rotational pin motion. By performing a number of tests, we first show that our model is physically sound. We then focus on a study of the kinetics of a DNA hairpin—a single-stranded molecule comprising two complementary segments joined by a noncomplementary loop—studied experimentally. We find that results from our simulations agree with experimental observations, demonstrating that our model is a suitable tool for the investigation of the hybridization of single strands.
I. INTRODUCTION
Some of the most challenging questions in biochemistry—such as determining RNA secondary structure starting from sequence alone [1,2] or identifying the dynamic mechanism responsible for the slow folding of the molecule into its catalytic structure [3,4]—concern the mesoscopic behavior of nucleic-acid chains. The understanding of the configurational changes of nucleic acids is a key step if one wishes to control cellular processes such as transcription or translation. In addition, the configurational dynamics of single-stranded nucleic acids is also relevant to microarray experiments: The expression level assigned to a given gene is related to the hybridization of a labeled nucleic-acid chain (the probe) to another nucleic-acid chain tethered to a glass slide (the target) [5-7]. In microarrays, each gene is represented in 10 to 20 spots. Significantly, the hybridization yields for spots representing the same gene exhibit large fluctuations, posing serious problems for the interpretation of microarray results [8-10]. Understanding the hybridization of target and probe will thus help us in designing more reliable microarrays and in interpreting microarray data.
Nucleic-acid hairpins are likely the least complex system from which to assess mesoscopic properties of single strands. They are also relevant to a number of biologically important phenomena. For example, in RNA, the formation of hairpin structures is believed to be the critical step before the fast folding into the native configuration [11], while, in DNA, hairpin formation is relevant to a number of significant processes such as recombination, transposition, and gene expression [12-14]. For these reasons, hairpins are systems to which experimentalists have devoted much attention [15-20]. Importantly, experimental observations report that, even for short hairpins, the configurational dynamics is complex and strongly affected by sequence.
Here, we develop a mesoscopic-level model which we show can describe the dynamics of single-stranded nucleic acids. In order to validate our model, we study short DNA hairpins—single-stranded nucleic acid chains comprising two complementary “stems” joined by a noncomplementary “loop.” We show that simulations of the model consistently reproduce predicted melting temperatures. To validate the dynamics, we focus our attention on a DNA hairpin which was extensively studied experimentally by Ansari and co-workers [16,18] and show that the relaxation rates measured with our model agree with the relaxation rates measured experimentally.
This paper is organized as follows. In Sec. II, we review the existing modeling approaches for DNA. In Sec. III, we describe our model including the basic units, the types of interactions, and the implementation of the dynamics. In Sec. IV, we present the results of a number of tests used to validate the model, including the comparison with experimental observations for an extensively studied hairpin. Finally, in Sec. V we present our conclusions.
II. PRIOR NUCLEIC ACID MODELING
Nucleic acids are linear polynucleotide chains. Each nucleotide comprises a nitrogenous organic base attached to a pentose—a five-carbon sugar—which is also attached to a phosphoric acid. The pentose in DNA is a deoxyribose, while in RNA the pentose is a ribose. The carbon atoms in the pentose are labeled from 1′, the carbon to which the base is attached, to 5′, to which the phosphate group is attached. The bases fall onto two groups: The purines—adenine (A) and guanine (G)—and the pyrimidines—thymine (T), cytosine (C), and uracil (U). The combination of a nucleic base and a pentose is called a nucleoside. A nucleotide is formed by attaching one, two, or tree phosphate groups to a nucleoside.
Ab initio modeling
For short time and length scales, researchers typically use ab initio models, in which interactions between atoms are calculated by integration of the Schrödinger equation [21-23]. Since the electron orbitals are explicitly considered, this approach is adequate to investigate phenomena involving changes in electronic states such as chemical reactivity and absorption of light.
A weakness of ab initio modeling is that it takes into account neither the molecular structure nor solvent or temperature effects. Thus, these methods only describe the zero-temperature gas phase of nucleic acids. Nonetheless, the information obtained from ab initio calculations provides the theoretical grounds for the parametrization of more coarse-grained models [24].
Force-field models
Due to the complexity of ab initio calculations, the use of these models is restricted to single nucleotides or oligonucleotide dimers [25]. To model nucleic acids at larger scales, one can use force-field models [26,27] in which the DNA molecule is treated as a classical system composed of atoms held together by bonds. In these models, the energy of the system is a function only of the position of the atoms.
Force-field models have successfully predicted both static [28,29] and dynamic [30,31] structural properties of DNA. However, a serious handicap of this treatment is that the existence of a large number of long-range electrostatic pairwise interactions dramatically increases the duration of the simulations. To overcome this problem, one can truncate the potential, but this leads to the construction of an effective potential that is not necessarily accurate. Nevertheless, a chain with 12 base pairs can be simulated for typically 20 nanoseconds, which is the time scale associated with the rotation of a nucleotide [32].
Zipper models
The computational cost of force-field models imposes the need to develop even coarser descriptions in order to model longer time scales or longer chains. To characterize DNA denaturation, a successful approach is to consider a two-dimensional lattice model in which the two strands are bonded by springs and bases oscillate about their equilibrium position [33]. As an alternative approach, Isinglike models—which describe double-stranded (ds) DNA as an ensemble of molecule configurations in which bases are either open or closed [34-36]—are quite accurate in predicting equilibrium properties such as the melting temperatures of large chains. Recently, these models have been extended by including elasticity terms in order to describe different dynamic aspects observed in the so-called pulling and unzipping experiments of single molecules [19,37,38]. However, most of these models do not consider sequence heterogeneity, and even when they do, they do not take into account the sequence dependence of the single-strand contribution.
Bead models
A second class of mesoscopic models are the so-called bead models, which are used to study the long-time dynamics of DNA molecules [39-42]. In these models, each DNA single strand is a chain of beads. Each bead represents a rigid part of the nucleotide [43] or the center of mass of bases and backbone groups [40]. Bead models—which successfully reproduce the melting dynamics observed in experiments [40]—typically consider only interactions that affect double-helix stability, neglecting single-strand properties.
Elastic chains
To investigate even larger molecules, one has to introduce further simplifications. For instance, to investigate the supercoiling structure of dsDNA in chromosomes, researchers model dsDNA as an elastic chain whose units interact electrostatically [44]. With these models, it is possible to investigate the dynamics of very long chains containing thousands of base pairs for time scales on the order of milliseconds [45]—the time scale associated with site-specific recombination processes [46]—as well as the temperature- or torque-induced denaturation of long molecules [47].
Mesoscopic models for single-stranded nucleic acids
Recently, several groups have developed models to investigate the statics and dynamics of single-stranded nucleic acids at mesoscales [10,48-52]. Most of these models focus on the investigation of a system of great current interest: hairpins, which are single-stranded nucleic acids with two complementary sections linked by a noncomplementary loop [15-18]. Hairpins appear in both DNA and RNA and participate in a number of biological processes such as recombination and gene expression mechanisms [46,53]. All-atom models have also been used to study mesoscopic objects like hairpins [54,55]. Specifically, Sorin et al. have investigated the configurational dynamics of an RNA hairpin 14 bases long. Using 40 000 processors, they could simulate the molecule for 500 μs. This is clearly the largest scale that one can pursue with such models, but, unfortunately, it still falls short for the time scales involved in microarray experiments, which are of the order of seconds or more.
The models proposed for the study of hairpins fall roughly into two categories. In the first category, one finds models whose purpose is to investigate the elastic properties of the hairpin loops [49,52]. In these models, the hairpin has no stem. It is reduced to a homopolymer which represents the loop and whose monomers in the loop can be either stacked or unstacked. For example, in the model of Aalberts et al. [49], the polymer is divided into rigid segments comprised by an equal number of monomers to mimic different stacking strengths, whereas, in the model of Sain et al. [52], the stacking interaction between neighboring monomers is specifically taken into account. While these models are a practical first approach to investigate ring formation of single-stranded nucleic acids, their use is very limited, since the dynamics of hairpins with long stems cannot be investigated.
In the second category, one finds “configurational models.” These models are defined on a plane, thus they only consider the secondary structure of the hairpin [10,48,50,51]. Different configurations in these models differ in the sequence of base pairs bonded.
The model proposed by Chen and Dill [48] uses polymer graph theory to compute the entropy associated to the different configurations and uses a multiplicative factor to account for the loss in entropy due to the missing third dimension. The stacking free energies for the different configurations are computed for each particular sequence using the Turner rules [56]. However, single-stranded regions have no energy contribution. Recently, Zhang and Chen [10] studied the “configurational” dynamics with these types of models by introducing transition rates. The only allowed transitions are those that break or add a base pair to configurations comprising at least two stacked base pairs.
Cocco et al. proposed a similar model to study the unzipping dynamics of the pulling experiments on RNA hairpins. In their model, free energies are also computed using the Turner rules and an extra entropic term is assigned to the single-stranded ends of the molecule. The dynamics is implemented by assigning transition rates to the process of breaking or adding a single base pair at each time step.
All these models have a common feature: They rely on the zipping/unzipping mechanisms to describe the folding and unfolding of hairpins. This approach has been proven useful to study some aspects of how the dynamics relate to the free-energy landscape. However, since there is no sequence-specific treatment of a single strand and since they do not consider the diffusion of the molecule in space, they are not suitable to investigate the hybridization of target and probe under microarray conditions.
III. THE BEAD-PIN MODEL
The model we develop is closest in spirit to the “bead” models (Fig. 1). We model single-stranded nucleic acid chains as linear polymer chains in which each monomer comprises a bead rigidly attached to a pin. The bead represents the sugar molecule, while the pin represents the nitrogenous base. We model phosphodiester bonds as rigid rods that connect two consecutive sugar beads and form, with the beads, the backbone of the chain.
The sugar beads sit on the nodes of a three-dimensional triangular lattice (Fig. 2). This lattice is commonly used in simulations of polymers—see, for instance, Ref. [57]—because (i) each node has a larger number of first neighbors than cubic lattices, implying that a greater number of symmetries are preserved [58], and (ii) it is not possible for two strands to cross—a situation that is almost unavoidable for cubic lattices in which movements of the beads to next-nearest-neighbor nodes are allowed [59]. Note that at each time, we allow a single bead to sit on any lattice site.
A. Lattice configurations
To model the stiffness of the chain, we restrict the angle between two consecutive bonds [Fig. 2(b)]. The model generates sequence-dependent elastic properties [60] by means of base-specific stacking interactions [15,61] (Fig. 3). Because bonds that link two consecutive sugars in the strand can rotate almost freely [60], we impose no restrictions on the direction of the base pins.
An important factor concerning the implementation of the model is that the characteristic time scale for the rotation of the nucleosides about the chain axis is of the order of nanoseconds [62-65], at least two orders of magnitude faster than the time scale associated with the motion of the monomers in the polymer chain itself, which for molecules tens of bases long is of the order of fractions of microseconds for ssDNA [66] or microseconds for dsDNA [67]. The implication for the dynamic rules implemented in the model is that after the translational motion of a nucleotide, the nucleotide conformation is immediately relaxed to the temperature-specific equilibrium conformation [89]. It follows that the time resolution of our model is finite, hence phenomena taking place at time scales shorter than nanoseconds cannot be investigated.
B. Interactions
The model allows for different types of pairwise interactions including nucleotide-nucleotide and nucleotide-solvent. These interactions are assumed to be short-ranged and thus restricted to elements occupying neighboring sites in the lattice. In the following, we only describe interactions between pairs of bases. Solvability effects due to salt concentration can be effectively introduced by changing the values of the interactions.
We consider two types of nucleotide-nucleotide interactions: complementarity interactions and stacking interactions. Complementarity interactions lead to Watson-Crick (WC) pair formation through hydrogen bonds. These interactions occur when the pins of a pair of neighboring nucleotides point to one another. Thus, complementarity interactions are not possible between consecutive bases in a strand, although they are possible between bases belonging to the same strand as long as the rigidity conditions described in Fig. 2 are not violated. We show in Fig. 3(a) the strength of these interactions.
The stacking interaction arises from the fact two bases “like” to “lie” on top of each other. In our model, two consecutive nucleotides are stacked when the pins are parallel to each other and the relative angle between the pin orientation and the phosphodiester bond connecting the two nucleosides is greater than or equal to 60°. In general, this interaction is stronger for purines than for pyrimidines because of their larger size [60,61]. However, the strength of the interaction also depends on the sequence and, in the case of dsDNA, on the existence or not of base pairs above and below the considered one. In such a case, opposite bases belonging to adjacent bonded base pairs can be cross-stacked (Fig. 3).
The strength of the different interactions shown in Fig. 3 was obtained mostly from experimental data. As a first approximation, we consider the interaction to be symmetric, i.e., there is no difference between 5′ to 3′ and 3′ to 5′ interactions.
We obtain the base-stacking interactions directly from the enthalpies measured from the thermodynamic parameters for single-strand stacking reported in Chap. 8 of Ref. [60]. For convenience, we rescale them into the range [-4,0] for convenience. Note that the data are incomplete since there are no experimental measurements for the stacking enthalpies for Poly(G) or the combinations AC, GC, AG, GT, and CT. To assign the remaining stacking interactions, we use the following assumptions: (i) Purines have stronger stacking interactions; (ii) G’s have stronger stacking interactions than A’s and C’s have stronger stacking interactions than T’s or U’s. The rationale for (i) is purine’s larger size, while the rationale for (ii) is the greater stability of duplexes comprising G-C bonds.
We compute the hydrogen-bonding and cross-stacking interactions from the duplex stacking enthalpies used in the Turner rules. As a first approximation, we consider that cross-stacking interactions only occur between purines. As for the base-stacking interactions, we rescale all the interaction values into the range [-4,0].
C. Chain motion
A major challenge when modeling the kinetics of lattice polymer chains is the implementation of thermal dynamics that (i) sample the whole phase space, (ii) reproduce thermodynamic equilibrium properties, and (iii) are realistic and consistent with the kinetic features of the system being modeled. The selection of realistic chain movements that preserve ergodicity and do not introduce spurious symmetries into the conformations of the polymer is, thus, of the greatest importance.
In the past, there has been some discussion on whether the use of Monte Carlo (MC) dynamics is a valid tool to investigate polymer kinetics, since it was initially formulated to investigate static equilibrium properties. There are, however, plenty of examples in the polymer literature showing that by choosing an appropriate set of rules of motion and the correct simulation time scale, the results obtained using MC dynamics are as reliable as those obtained with molecular dynamics [68]. In fact, there is evidence that simulations using MC sampling reproduce the dynamics observed experimentally [68,69].
A number of algorithms using MC dynamics have been proposed over time to investigate polymers. One of the most popular is the Verdier-Stockmayer model [70], in which a number of local moves can be performed depending on the local conformation of the monomers, namely the so-called “crankshaft,” “end-bond,” and “kink-jump” movements. MC simulations using these dynamics have been shown to reproduce some real kinetic properties of homopolymers and proteins [59,69,70]. However, this algorithm has problems in the sampling of phase space [59]. Specifically, the relaxation of kinks toward the center of the polymer chain is very slow and the polymer can get locked in some configurations.
With other chain “moves,” such as reptation (or “slithering snake”) [71] and “pivot relaxation” [72], the sampling of phase space is much improved and the relaxation toward equilibrium is much faster. However, the rules of motion proposed in these algorithms are not “realistic” moves that happen in real polymers under dilute conditions [59]. Nevertheless, by constraining the reptation to a number of selected internal monomers, a modified reptation algorithm can be used to “propagate kinks” along the chain while keeping a correct description of the kinetic properties [73,74].
In our model, we use a generalized version of this “internal reptation” model which includes, but is not restricted to, the propagation of kinks along the polymer (Table I). Our dynamics includes all the local movements considered in the Verdier-Stockmayer algorithm, as well as the propagation of “local deformations” along the chain (see Fig. 4). This generalized dynamics has the advantage that in order to generate a new configuration, one does not have to study the local configuration of the monomers to see which local movement is possible as it happens in the Verdier-Stockmayer algorithm. The only constraint for the new configuration is that the stiffness conditions be fulfilled.
TABLE I.
1 | Randomly select a monomer, mi. |
2 | If mi is not bonded to its own chain: |
2.1 List all the empty neighboring sites that fulfill the stiffness constrains with at least one of the sites occupied by the two neighboring monomers in the chain mi-1 and mi+1. | |
2.2 Randomly select a new site from the list for the selected monomer to move to. | |
2.3 Select the direction of the chain that will reptate: | |
2.3.1 If the new position of the selected monomer is a neighbor of only the site occupied by the left (right) neighboring monomer in the chain, the monomers toward the right (left) end of the chain reptate. | |
2.3.2 If the selected site is a neighbor of the sites occupied by both mi-1 andmi+1, then: | |
2.3.2.1 If the new configuration satisfies stiffness constraints: | |
2.3.2.1.1 With probability p, only the selected monomer moves to the selected site. No other monomers reptate. | |
2.3.2.1.2 With probability 1-p, select randomly one of the ends (right or left), and allow the monomers along the chain toward the selected end to reptate. | |
2.3.2.2 If the new configuration does not satisfy stiffness constraints with the left (right) neighboring monomer in the chain, allow the monomers along the chain toward the left (right) end of the chain to reptate. | |
2.4 Move the selected monomer to the selected site and iterate the following steps for the monomers between the selected monomer and the selected end. | |
2.4.1 For the following monomer, build a list of empty neighboring sites that satisfy stiffness conditions with the previous monomers in the chain. | |
2.4.2 Select randomly a new site from the list for this monomer to move to. | |
2.4.3 If the selected site is a neighbor of the sites occupied by the two neighboring monomers in the chain and the new configuration satisfies stiffness conditions, stop the reptation with probability p. | |
3 | If mi is bonded to its own chain, propose a change of orientation. Accept or reject the change using the METROPOLIS algorithm. |
3.1 If the change is accepted, follow the procedure described in step 2. | |
3.2 If the change is rejected, the pair of bonded monomers move simultaneously in the same direction while the remaining monomers remain in the previous position. | |
3.2.1 List the pair of neighboring sites of the pair of bonded monomers that are neighbors of the consecutive monomers in the chain and satisfy stiffness conditions. | |
3.2.2 Select randomly a pair of sites among the list and perform the movement. | |
4 | Verify that none of the pin orientations overlaps with the backbone. |
4.1 If a pin orientation overlaps with the backbone, select randomly a new orientation for that pin. | |
5 | Compute the energy of the new configuration. |
6 | Accept or reject the change in configuration using the METROPOLIS algorithm. |
7 | For a number of times equal to the number of monomers in the chain, repeat the following steps: |
7.1 Randomly select a monomer mj. | |
7.2 Randomly select a new orientation for the pin of mj that does not overlap with the backbone of the polymer. | |
7.3 Compute the new energy. | |
7.4 Accept or reject the change in orientation using the METROPOLIS algorithm. | |
Because the time scales for the motion of the entire polymer and for the rotation of the pins differ by a factor of a 1000, we use a modified MC scheme that considers separately the translational motion of the beads and the rotational motion of the pins. Specifically, we “split” the motion of the nucleotide chain into two steps: (i) bead motion and (ii) pin motion. Table I describes in detail the algorithm by which we implement the motion of beads and pins—note that the pins thermalize regardless of whether the beads change their configuration or not.
In the algorithm described in Table I, we consider single strands whose bases can form WC pairs. If we consider two single strands bonded to each other, an immediate extension of the algorithm is to consider the simultaneous motion of the two bonded chains.
IV. MODEL VALIDATION
In order to validate our model, we first perform basic tests that ensure that our model displays a physically sound behavior. Next, we study static and kinetic properties of ssDNA hairpins, which are self-complementary single strands linked by a noncomplementary loop (Fig. 5). Hairpin conformations are ubiquitous in nature. In RNA, they dominate the secondary structure and are responsible for the fast folding into the native structure [11], while in DNA they are involved in important biological processes such as the regulation of gene expression [14,53] and DNA recombination [12,75] and transposition [13,76]. Importantly, hairpins are not static structures: In thermal equilibrium, they fluctuate between open and closed states (Fig. 5), providing an ideal model system for the investigation of single-strand properties.
A. Sampling of configuration space
First, we test if the motion algorithm implemented in our model is ergodic [77]. To this end, we investigate the sampling of configuration space for a polymer chain moving according to the algorithm described earlier and for different values of p. We study two polymers comprising six and eight monomers at infinite temperature. Our results indicate that the sampling of configuration space becomes more uniform as p→0 (Fig. 6). Importantly, our analysis also suggests that for p as large as 0.1, the sampling of configuration space is already essentially uniform. This is of practical relevance because even for p of order 0.1, one already observes a substantial decrease in simulation times. This decrease arises from the fact that in MC simulations, the energy difference between configurations increases with an increasing number of moving monomers. Larger energy differences make it less likely for the move to be accepted, resulting in longer equilibration times.
B. Average radius of gyration
The radius of gyration Rg is the mean distance of all monomers to the center of mass of the polymer,
(1) |
where L is the number of monomers in the chain.
According to polymer theory [79], the average radius of gyration 〈Rg〉 scales with the polymer length (i.e., the number of monomers) as 〈Rg〉∼Lv, with v>0. In our model, we have included volume constraints since a lattice site cannot be occupied by more than one monomer simultaneously, but we also have stiffness constraints and therefore we should expect to obtain an exponent value somewhat larger than the value for a self-avoiding random walk v=0.6. Figure 7(a) shows that our simulations agree with theoretical expectations, since the average radius of gyration for different polymer lengths scales as 〈Rg〉∼Lv, with v=0.75±0.07.
C. Diffusion
A polymer comprising L monomers diffuses with a diffusion constant that scales with the length of the chain D∼L-α [78]. In order to test this prediction, we measure the mean-squared displacement X2(t) of the center of mass of the chain as a function of time,
(2) |
where 〈···〉 indicates the averages over different dynamical histories of the chain, is the set of positions of the monomers, and each time step corresponds to a single chain movement. In the diffusive regime, X2 scales linearly with time: X2∼Dt.
We study the mean-squared displacement versus time for polymers of lengths L=48 to L=197. We find that the linear regime is reached after approximately 100 time steps. This linear growth is apparent in Fig. 7(b). By scaling all the curves for the different polymer lengths, we find that the diffusion coefficient scales as D∼L-α, with α=0.80±0.13.
D. Nucleotide movements: Thermal dynamics
To test whether at finite temperatures our model samples the different configurations with Boltzmann statistics, we study the simplest hairpin structure possible: A-TTTT-T. This hairpin, which comprises a one base-pair stem and a four-base T-loop, is the simplest because T has the weakest interactions of all nucleic bases.
The question of the uniform sampling of all possible configurations for very large temperatures was already addressed in Sec. IV A. We now calculate the equilibrium energy of the hairpin as a function of temperature [Fig. 8(a)]. To simplify the calculations, we set the stacking interactions of the T’s to zero. Under these conditions, there are two possible interactions with nonzero energy: the formation of the A-T WC pair in the stem (energy ε=-1) and the stacking interaction between the A and its neighboring T (energy ε=-2). Therefore, there are four possible energy values ε=-3,-2,-1,0. The minimum energy ε=-3 corresponds to a closed hairpin with A and T stacked. If the hairpin is open but A and T are stacked, the energy is ε=-2. If the hairpin is closed but there is no AT stacking, the energy is ε=-1. In all other cases, the energy is 0.
By enumerating the possible pin conformations for each of the the 7500 different lattice configurations for a polymer with L=6, we are able to compute the degeneracy of each energy level: 0, -1, -2, and -3. Hence, we can calculate the expected occupation numbers—i.e., average population—of each energy level as a function of temperature [Fig. 8(b)]. At high temperatures, one expects the energy to be dominated by the configurations with zero energy, because of their large number, while at lower temperatures, one expects that the dominant contribution comes from those configurations with lower energies. As Fig. 8 demonstrates, we find excellent agreement between the simulations and the theoretical predictions [90].
E. Melting temperatures
In order to show that our model correctly describes hairpin properties, we test whether equilibrium properties such as melting temperatures and closing times are in agreement with experimental observations. First, we demonstrate that our model is able to reach equilibrium and that we do observe a transition from a high-temperature region dominated by open configurations to a low-temperature region dominated by closed configurations (Fig. 9). We measure the average fraction of broken bonds θ(T) as a function of temperature and find a typical melting curve that goes from 1 at high temperatures, where hairpins are mostly open, to zero at low temperatures, where open configurations dominate the partition function. The temperature at which θ(Tm)=0.5 defines the melting temperature, which is also the temperature at which the melting curve has an inflection point and the specific heat has a maximum [91].
Next, we investigate if the values for the nucleotide interaction energies that we derived from experimental data in Ref. [82] lead to “self-consistent” predictions of melting temperatures of hairpins with different sequences. We perform simulations for more than 60 hairpins with randomly sampled stem sequences with stems comprising four, five, and six base pairs and loops comprising four T’s. We show in Fig. 10(a) the factor necessary to convert the melting temperatures in our simulations into the melting temperatures obtained using the Tm server of Zuker [1,81]. It is visually apparent that we obtain an approximately constant conversion factor for all those hairpin sequences.
In order to better evaluate the fluctuations of the conversion factor, we show in Fig. 10(b) the relative fluctuations of the sequence-specific conversion factor to the average conversion factor. Note that most cases are within 30% of the average, and that the standard deviation is only 15%. Moreover, as shown in Fig. 10(c), these fluctuations are well described by a Gaussian distribution with zero mean and standard deviation 0.15, indicating that there is no apparent bias in our estimation of melting temperatures [92].
We checked that the fluctuations of the ratios between the experimental melting temperatures for hairpins with short stems reported in [60] and the values predicted by the server are ten times smaller than the fluctuations observed with the ratios between the simulation results and the server predictions.
F. Relaxation rates
To validate the dynamics of our model, we compare the kinetic measurements obtained from simulations at a fixed temperature with experimental results. Specifically, we measure the relaxation rates kr for a hairpin of sequence GGATAA-T4-TTATC which was studied experimentally in [16]. The relaxation rate is defined as
(3) |
where τclosing and τopening stand for the closing and opening times, respectively. In Fig. 11(a), we show that simulation results for kr (black dots) are in agreement with experimental measurements (red dots) and show a decrease in rate with 1/T. Note that in order to convert simulation rates to experimental rates, one needs to use a factor of the order of 10-9 s/MCS. This value suggests a correspondence between one Monte Carlo step and one nanosecond, which is the time scale at which nucleosides move and get thermalized [65]. Recall that in our algorithm nucleosides are thermalized within one MCS, which is thus consistent with the experimental time scales.
The simplest description for the folding/unfolding transitions of a hairpin is a two-state system (open and closed) with a transition state at a constant energy barrier Ea. Two-state models are commonly used to describe the kinetics of the unfolding of single-domain proteins and hairpins [16,20,83-85]. Within this description, which we suppose to be valid close to Tm, the relaxation constants are assumed to have an Arrhenius dependence on the barrier. In this scenario, the relaxation constant is described by
(4) |
where Keq(T)=1/θ(T)-1 is the equilibrium constant, T is the absolute temperature (we have set kB=1 for convenience), and A is a phenomenological constant rate. Figure 11(b) shows that is well fit by an exponential with a negative activation energy Ea≈-3.5 kcal/mol consistent with the analysis of the experimental data for the same hairpin by Ansari et al. [16]. Negative activation energies are believed to be a hallmark of zipping processes in which the transition state has a lower energy than the coil configuration. In such processes, the rate-limiting step is the formation of a nucleus with a small number of hydrogen bonds—between residues in polypeptides or bases in nucleic acids—that immediately leads to the complete folding of the molecule [60]. This is not unlike the situation found in oligonucleotide dimers [86], protein β-sheet hairpins [83,87], and protein α-helices [88,93].
V. CONCLUSION
Understanding the configurational dynamics of nucleic acids is relevant to many open questions such as the folding of RNA or the hybridization of two separate DNA strands in microarrays. A particularly challenging problem is to understand how base heterogeneity affects mechanical and kinetic properties of nucleic acids at mesoscales.
In order to provide a new window into these questions, in this paper we have described a novel mesoscopic model for nucleic acid chains. The main feature of the model is that it considers single-strand properties individually, making it suitable for the study of double-stranded as well as single-stranded nucleic acids. We have demonstrated that the dynamical rules implemented are physically sound, and that they are realistic. Specifically, we performed a number of comparisons of static and dynamic properties obtained in our simulations with those for ssDNA hairpins and found good agreement. All these results validate our model, making it a suitable tool for the investigation of processes in which single-strand properties are relevant, such as the formation of complex structures such as H-pseudoknots which cannot be predicted by current models.
ACKNOWLEDGMENTS
We thank D. Aalberts, A. Ansari, C. A. Ng, D. B. Stouffer, A. Modragon, and M. Zuker for numerous suggestions and discussions. R.G. and M.S. thank the Fulbright Program and the Spanish Ministry of Education and Science. J.W. gratefully acknowledges the support of Grants No. R01-GM054692 and No. R01-GM058617 from the NIH.L.A.N.A. gratefully acknowledges the support of the Searle Leadership Fund and the NIH.
APPENDIX A: COMPARISON OF DIFFERENT MONTE CARLO SCHEMES
To ensure the validity of the dynamic algorithm described in Sec. III C, a further test is to compare the Monte Carlo scheme that we use to alternative schemes that also consider separately bead and pin motion. A possible alternative is one that we denote scheme B (Table II). This alternative scheme should, in principle, sample configurations with the same equilibrium distribution as the scheme described in Table I. Indeed, Fig. 12 shows that for a hairpin of sequence GGATAA-T4-TTATCC, parallel tempering simulations [80] yield identical melting and energy curves for the two schemes. However, Fig. 12(b) shows that the two schemes are not equivalent when performing simulations at a fixed temperature. In particular, scheme B samples configurations with energies significantly lower than the value obtained with parallel tempering simulations. This observation suggests that scheme B tends to sample the minimum energy conformations of each different lattice configuration, and as a result gets easily “trapped” in local minima. It follows that this alternative scheme is not as good a tool to investigate the DNA dynamics [94].
TABLE II.
1 | Generate a new lattice configuration for the polymer keeping the pin orientations fixed. That is, go through steps 1 to 5 in the algorithm described in Table I. |
2 | Thermalize the nucleotides in this new lattice configuration. That is, change the orientation of the pins following step 7 in the algorithm described in Table I. |
3 | Compute the energy of this new global configuration. |
4 | Accept or reject this new global configuration according to the METROPOLIS algorithm. |
APPENDIX B: OPTIMIZATION OF THE INTERACTION PARAMETERS
To provide better agreement between the simulation melting temperatures and the melting temperatures predicted by Zuker’s server [1], we need to optimize the interaction strengths presented in Fig. 3 and used in our simulations. To this end, we must define a cost function and select the set of parameters that minimizes it. In our case, an obvious choice for the cost function is the standard deviation of the relative fluctuations of the ratios between simulation and theoretical melting temperatures (Fig. 10). This is, however, a time-consuming task that we have not concluded yet.
To improve the parameter choices, we analyze the melting temperatures for hairpins with stems comprising four base pairs. We find out that hairpins whose stems are rich in GC pairs are more stable than they should be, whereas hairpins with stems rich in AT pairs are less stable than they should be. This suggests that one may reduce the fluctuations of the temperature ratios by introducing, for instance, the following modifications: (i) reducing the GC bond strength, (ii) increasing the AT bond strength, and (iii) reducing the GC stacking strength. Figure 13 shows the variation of the temperature ratios with respect to the nominal case (Fig. 10)—black dots—for the following cases: (i)—blue triangles, (ii)—left yellow triangles, (iii)—green diamonds, and changing the three parameters simultaneously—red squares. Note that these changes of parameters result in a decrease of the fluctuations, the best choice being the simultaneous change of the strength of AT bonds, GC bonds, and GC stacking interactions. To obtain the best set of parameters, one needs to obtain the overall minimum for the variance among all the possible sets of parameters.
References
- [1].Zuker M. Nucleic Acids Res. 2003;31:3406. doi: 10.1093/nar/gkg595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Xayaphoummine A, Bucher T, Thalmann F, Isambert H. Proc. Natl. Acad. Sci. U.S.A. 2003;100:15 310. doi: 10.1073/pnas.2536430100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Treiber DK, Williamson JR. Curr. Opin. Struct. Biol. 1999;9:339. doi: 10.1016/S0959-440X(99)80045-1. [DOI] [PubMed] [Google Scholar]
- [4].Chadalavada DM, Senchack SE, Bevilacqua P. J. Mol. Biol. 2002;317:559. doi: 10.1006/jmbi.2002.5434. [DOI] [PubMed] [Google Scholar]
- [5].Schena M. Microarray Analysis. John Wiley and Sons, Inc; Hoboken, NJ: 2002. [Google Scholar]
- [6].Southern E, Mir K, Shchepinov M. Nat. Genet. 1999;21(Suppl):5. doi: 10.1038/4429. [DOI] [PubMed] [Google Scholar]
- [7].Mir K, Southern E. Nat. Biotechnol. 1999;17:788. doi: 10.1038/11732. [DOI] [PubMed] [Google Scholar]
- [8].Naef F, Hacker CR, Patil N, Magnasco M. Genome Bio. 2002;3(4):1. doi: 10.1186/gb-2002-3-4-research0018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Dai H, Meyer M, Stepaniants S, Ziman M, Stoughton R. Nucleic Acids Res. 2002;30:e86. doi: 10.1093/nar/gnf085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Zhang W, Chen S-J. Proc. Natl. Acad. Sci. U.S.A. 2002;99:1931. doi: 10.1073/pnas.032443099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Woodson S. Cell. Mol. Life Sci. 2000;57:796. doi: 10.1007/s000180050042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Roth DB, Menetski JP, Nakajima PB, Bosma MJ, Gellert M. Cell. 1992;70:983. doi: 10.1016/0092-8674(92)90248-b. [DOI] [PubMed] [Google Scholar]
- [13].Kennedy A, Guhathakurta A, Kleckner N, Haniford D. Cell. 1998;95:125. doi: 10.1016/s0092-8674(00)81788-2. [DOI] [PubMed] [Google Scholar]
- [14].Zazopoulos E, Lalli E, Stocco DM, Sassone-Corsi P. Nature (London) 1997;390:311. doi: 10.1038/36899. [DOI] [PubMed] [Google Scholar]
- [15].Goddard NL, Bonnet G, Krichevsky O, Libchaber A. Phys. Rev. Lett. 2000;85:2400. doi: 10.1103/PhysRevLett.85.2400. [DOI] [PubMed] [Google Scholar]
- [16].Ansari A, Kuznetsov SV, Shen Y. Proc. Natl. Acad. Sci. U.S.A. 2001;98:7771. doi: 10.1073/pnas.131477798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Wallace MI, Ying L, Balasubramanian S, Klenerman D. Proc. Natl. Acad. Sci. U.S.A. 2001;98:5584. doi: 10.1073/pnas.101523498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Shen Y, Kuznetsov SV, Ansari A. J. Phys. Chem. B. 2001;105:12 202. [Google Scholar]
- [19].Bustamante C, Smith SB, Liphardt J, Smith D. Curr. Opin. Struct. Biol. 2000;10:279. doi: 10.1016/s0959-440x(00)00085-3. [DOI] [PubMed] [Google Scholar]
- [20].Liphardt J, Onoa B, Smith SB, Tinoco I, Jr., Bustamante C. Science. 2001;292:733. doi: 10.1126/science.1058498. [DOI] [PubMed] [Google Scholar]
- [21].Aida M, Nagata C. Int. J. Quantum Chem. 1986;29:1253. [Google Scholar]
- [22].Hobza P, Sponer J. Chem. Rev. Vol. 99. Washington, D.C.: 1999. p. 3247. [DOI] [PubMed] [Google Scholar]
- [23].Sponer J, Leszczynski J, Hobza P. Biopolymers. 2001;61:3. doi: 10.1002/1097-0282(2001)61:1<3::AID-BIP10048>3.0.CO;2-4. [DOI] [PubMed] [Google Scholar]
- [24].Lind KE, Sherlin VMLD. p. 18. in [25] [Google Scholar]
- [25].Lerontis N, SantaLucia J Jr., editors. Molecular Modeling of Nucleic Acids. 1998. (ACS Symp. Ser. 682). [Google Scholar]
- [26].Cheatham TE, Kollman PA. Annu. Rev. Phys. Chem. 2000;51:435. doi: 10.1146/annurev.physchem.51.1.435. [DOI] [PubMed] [Google Scholar]
- [27].Mackerell A, Wiórkiewicz-Kuczera J, Karplus M. J. Am. Chem. Soc. 1995;117:11946. [Google Scholar]
- [28].Langley DR. J. Biomol. Struct. Dyn. 1998;16:487. doi: 10.1080/07391102.1998.10508265. [DOI] [PubMed] [Google Scholar]
- [29].Orozco M, Perez A, Noy A, Luque F. Chem. Soc. Rev. 2003;32:350. doi: 10.1039/b207226m. [DOI] [PubMed] [Google Scholar]
- [30].Cheatham TE, Kollman PA. Structure (London) 1997;5:1297. doi: 10.1016/s0969-2126(97)00282-7. [DOI] [PubMed] [Google Scholar]
- [31].Cieplak P, Cheatham TE, Kollman PA. J. Am. Chem. Soc. 1997;119:6722. [Google Scholar]
- [32].Mazur AK. J. Comput. Chem. 2001;22:457. [Google Scholar]
- [33].Peyrard M, Bishop AR. Phys. Rev. Lett. 1989;62:2755. doi: 10.1103/PhysRevLett.62.2755. [DOI] [PubMed] [Google Scholar]
- [34].Poland D, Scheraga HA. Theory of Helix-Coil Transition in Biopolymers. Academic Press; New York: 1976. [Google Scholar]
- [35].Wartell RM, Benight AS. Phys. Rep. 1985;126:67. [Google Scholar]
- [36].Garel T, Orland H. e-print q-bio.BM/0402037.
- [37].Smith SB, Cui Y, Bustamante C. Science. 1996;271:795. doi: 10.1126/science.271.5250.795. [DOI] [PubMed] [Google Scholar]
- [38].Cocco S, Monasson R, Marko JF. Proc. Natl. Acad. Sci. U.S.A. 2001;98:8608. doi: 10.1073/pnas.151257598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Drukker K, Schatz GC. J. Phys. Chem. 2000;104:6108. [Google Scholar]
- [40].Drukker K, Wu GS, Schatz GC. J. Chem. Phys. 2000;114:579. [Google Scholar]
- [41].Zhang F, Collins MA. Phys. Rev. E. 1995;52:4217. doi: 10.1103/physreve.52.4217. [DOI] [PubMed] [Google Scholar]
- [42].Jian H, Vologodskii A, Schlick T. J. Comput. Phys. 1997;73:123. [Google Scholar]
- [43].Bruant N, Flatters D, Lavery R, Genest D. Biophys. J. 1999;77:2366. doi: 10.1016/S0006-3495(99)77074-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Marko JF, Siggia ED. Science. 1994;265:506. doi: 10.1126/science.8036491. [DOI] [PubMed] [Google Scholar]
- [45].Schlick T, Beard AD, Huang J, Strahs A, Qian X. Comput. Sci. Eng. 2000;2:38. [Google Scholar]
- [46].Parker NA, Halford SE. Cell. 1991;66:781. doi: 10.1016/0092-8674(91)90121-e. [DOI] [PubMed] [Google Scholar]
- [47].Cocco S, Monasson R. Phys. Rev. Lett. 1999;83:5178. [Google Scholar]
- [48].Chen S-J, Dill KA. Proc. Natl. Acad. Sci. U.S.A. 2000;97:646. doi: 10.1073/pnas.97.2.646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Aalberts DP, Parman JM, Goddard NL. Biophys. J. 2003;84:3212. doi: 10.1016/S0006-3495(03)70045-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Cocco S, Marko JF, Monasson R, Sarkar A, Yan J. Eur. Phys. J. E. 2003;10:249. doi: 10.1140/epje/i2002-10113-2. [DOI] [PubMed] [Google Scholar]
- [51].Leoni P, Vandenzande C. Phys. Rev. E. 2003;68:051904. doi: 10.1103/PhysRevE.68.051904. [DOI] [PubMed] [Google Scholar]
- [52].Sain A, Ha B-Y, Tsao H-K, Chen JZ. Phys. Rev. E. 2004;69:061913. doi: 10.1103/PhysRevE.69.061913. [DOI] [PubMed] [Google Scholar]
- [53].Schramke V, Allshire R. Science. 2003;301:1069. doi: 10.1126/science.1086870. [DOI] [PubMed] [Google Scholar]
- [54].Sorin EJ, Engelhardt MA, Herschlag D, Pande VS. J. Mol. Biol. 2002;317:493. doi: 10.1006/jmbi.2002.5447. [DOI] [PubMed] [Google Scholar]
- [55].Sorin EJ, Rhee YM, Nakatani BJ, Pande VS. Biophys. J. 2004;75:790. doi: 10.1016/S0006-3495(03)74520-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Serra MJ, Turner DH. Methods Enzymol. 1995;259:242. doi: 10.1016/0076-6879(95)59047-1. [DOI] [PubMed] [Google Scholar]
- [57].Nelson P, Hatton T, Rutledge G. J. Chem. Phys. 1997;107:1269. [Google Scholar]
- [58].Rothman DH, Zaleski S. Lattice-Gas Cellular Automata: Simple Models of Complex Hydrodynamics. Cambridge University Press; Cambridge: 1997. [Google Scholar]
- [59].Binder K, Paul W. J. Polym. Sci., Part B: Polym. Phys. 1997;35:1. [Google Scholar]
- [60].Bloomfield VA, Crothers DM, Tinoco I., Jr. Nucleic Acids: Structures, Properties, and Functions. University Science Books; Sausalito, CA: 2000. [Google Scholar]
- [61].Haijun Z, Yang Z, Zhong-can O-Y. Phys. Rev. Lett. 1999;82:4560. [Google Scholar]
- [62].Bolton PH, James TL. Biochemistry. 1980;19:1388. doi: 10.1021/bi00548a019. [DOI] [PubMed] [Google Scholar]
- [63].Early TA, Kearns DR. Proc. Natl. Acad. Sci. U.S.A. 1979;76:4165. doi: 10.1073/pnas.76.9.4165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [64].Hogan ME, Jardetsky O. Biochemistry. 1980;19:3460. doi: 10.1021/bi00556a009. [DOI] [PubMed] [Google Scholar]
- [65].Pörschke D. Biopolymers. 1978;17:315. [Google Scholar]
- [66].Xu X-H, Yeung ES. Science. 1997;275:1106. doi: 10.1126/science.275.5303.1106. [DOI] [PubMed] [Google Scholar]
- [67].Smith DE, Perkins TT, Chu S. Phys. Rev. Lett. 1995;75:4146. doi: 10.1103/PhysRevLett.75.4146. [DOI] [PubMed] [Google Scholar]
- [68].Binder K, editor. Monte Carlo Dynamics and Molecular Dynamics Simulations in Polymer Science. Oxford University Press; New York: 1995. [Google Scholar]
- [69].Socchi ND, Onuchic JN. J. Chem. Phys. 1994;101:1519. [Google Scholar]
- [70].Verdier PH, Stockmayer WH. J. Chem. Phys. 1962;36:227. [Google Scholar]
- [71].Wall FT, Mandel F. J. Chem. Phys. 1975;63:4592. [Google Scholar]
- [72].Lal M. Mol. Phys. 1969;17:57. [Google Scholar]
- [73].Hu W. J. Chem. Phys. 1998;109:3686. [Google Scholar]
- [74].Hu W. J. Chem. Phys. 2001;115:4395. [Google Scholar]
- [75].Froelich-Ammon SJ, Gale KC, Osheroff N. J. Biol. Chem. 1994;269:7719. [PubMed] [Google Scholar]
- [76].Bhasin A, Goryshin IY, Reznikoff WS. J. Biol. Chem. 1999;274:037021. doi: 10.1074/jbc.274.52.37021. [DOI] [PubMed] [Google Scholar]
- [77].Pathria RK. Statistical Mechanics. Butterworth-Heinemann; Oxford, UK: 1996. [Google Scholar]
- [78].Doi M, Edwards SF, editors. The Theory of Polymer Dynamics. Oxford University Press; New York: 1986. [Google Scholar]
- [79].de Gennes PG, editor. Scaling Concepts in Polymer Physics. Cornell University Press; Ithaca, NY: 1979. [Google Scholar]
- [80].Marinari E, Parisi G. Europhys. Lett. 1992;81:451. [Google Scholar]
- [81].Zuker M. http://www.bioinfo.rpi.edu/zukerm/
- [82].Turner DH. Nucleic Acids: Structures, Properties and Functions. University Science Books; Sausalito, CA: 2000. Chap. 8. [Google Scholar]
- [83].Muñoz V, Thompson PA, Hofrichter J, Eaton WA. Nature (London) 1997;390:196. doi: 10.1038/36626. [DOI] [PubMed] [Google Scholar]
- [84].Ritort F, Bustamante C, Tinoco I. Proc. Natl. Acad. Sci. U.S.A. 2003;99:13. doi: 10.1073/pnas.172525099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [85].Mañosas M, Ritort F. e-print cond-mat/0405035.
- [86].Craig M, Crothers D, Doty P. J. Mol. Biol. 1971;62:383. doi: 10.1016/0022-2836(71)90434-7. [DOI] [PubMed] [Google Scholar]
- [87].Muñoz V, Henry ER, Hofrichter J, Eaton WA. Proc. Natl. Acad. Sci. U.S.A. 1998;95:5872. doi: 10.1073/pnas.95.11.5872. colloquium paper. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [88].Lednev I, Karnoup A, Sparrow M, Asher S. J. Am. Chem. Soc. 1999;121:8074. doi: 10.1021/ja003381p. [DOI] [PubMed] [Google Scholar]
- [89].This is in apparent disagreement with the motion of real nucleosides, which are believed to “adjust” their orientation while the nucleic acid chain moves. Nevertheless, it is still reasonable to assume that at a given instant of time (at which measurements are performed), the different pin conformations are sampled with the equilibrium weights that correspond to the specific lattice configuration of the chain.
- [90].We also checked that our algorithm is more suitable for kinetic measurements than alternative MC schemes with the same equilibrium distribution (Appendix A).
- [91].We also checked that the averages of the different observables at a fixed temperature had converged.
- [92].A significant point is that the interaction energies presented in Fig. 3 and used in our simulations have not yet been optimized to provide the best agreement with Zucker’s code [1]. To do this would be a trivial but computationally demanding task (see Appendix B).
- [93].However, some experiments in other hairpins have reported positive activation energies [15], a subject which is still under discussion and which might be associated to the experimental technique used, as we shall discuss in another paper.
- [94].This observation conveys the point that the parallel tempering technique is useful to recover equilibrium properties, even when the dynamics is not effective at low temperatures, as long as the equilibrium distribution is correct.