Abstract
Crystal simulations provide useful tools, along with solution simulations, to test nucleic acid force fields, but should be interpreted with care owing to the difficulty of establishing the environmental conditions needed to reproduce experimental crystal packing. These challenges underscore the need to construct proper protocols for carrying out crystal simulations and analyzing results to identify the origin of deviations from crystallographic data. Toward this end, we introduce a novel framework for B-factor decomposition into additive intramolecular, rotational, and translational atomic fluctuation components, and partitioning of each of these components into individual asymmetric unit and lattice contributions. We apply the framework to a benchmark set of A-DNA, Z-DNA and B-DNA double helix systems of various chain lengths. Overall the intramolecular deviations from the crystal were quite small (≤ 1.0 Å), suggesting high accuracy of the force field, whereas crystal packing was not well reproduced. The present work establishes a framework to conduct and analyze crystal simulations that ultimately take on issues of crystal packing, and can provide insight into nucleic acid force fields.
Graphical Abstract
Introduction
The function of nucleic acids is intimately tied to their structure and dynamics.1 Structural biology experiments such as X-ray crystallography,2 NMR,3 and Cryo-EM4 are powerful tools that provide data about nucleic acid structure and dynamics, but require at least some level of modeling in order to interpret. Molecular simulations with realistic, physics based force field models are poised to aid in the interpretation of a wide range of experimental structural biology data, and provide detailed conformational ensembles of the system that can be used to predict not only structure, but thermodynamics and kinetics.5–10
Currently, X-ray crystallography has been one of the primary sources of structural data for nucleic acids.11 Crystal structures, therefore, often serve as the departure points for molecular dynamics (MD) simulation studies that further probe mechanistic details about the structure and dynamics in solution that are not easily inferred from the experimental data.12 Crystal simulations play an important validation role in order to help deconstruct possible artifacts of the models from features of the simulations that reflect true relaxation of the structure from a crystalline to a solution environment, and afford a mechanism to compare results directly with crystallographic data.13–19 In the case of nucleic acids, crystal simulations of a B-DNA dodecamer20 were first used to demonstrate that with rigorous treatment of long-range electrostatics, nucleic acids could be stably simulated without artificial restraints or charge scaling. More recently, crystal simulations of the hammerhead,21hairpin,22 twister,23 and TS24 ribozymes have been used to validate the force field and molecular simulation protocol, and helped lead to the identification of the most plausible active dynamical state in solution.
The value of molecular simulation results, however, relies heavily on the accuracy of the underlying force field. Due to the high charge of nucleic acid biopolymers, which requires special consideration of electrostatics, solvation and ionic interactions, the development of molecular simulation force fields for DNA and RNA have faced significant challenges.25–29 Traditionally, the accuracy of these force fields have lagged behind those for proteins. Recently, there has been resurgence of effort to develop improved, next-generation force fields for nucleic acids, enabled largely by the concurrent evolution of specialized high-performance computing resources such as graphic processing units (GPUs) and the Anton supercomputer.9 As these resources and algorithms become more generally accessible, it becomes feasible to integrate more detailed sets of experimental data into the force field development pipeline such that nucleic acid models can more rapidly advance.
Molecular simulations of nucleic acid crystals provide rich benchmark data from which modern force fields can be evaluated and refined.8,9,18,30,31 In this regard crystal simulations have the advantages that: 1) statistical sampling is highly efficient as many copies of the nucleic acid can be simulated without need for large solvent boxes, 2) information is gained for intramolecular and intermolecular interactions, both of which are important for higher-order nucleic acid structures, 3) simulation results can be directly compared with X-ray crystal-lographic data. In this way, crystal simulations of nucleic acids will play an increasingly important role in the evolution of next-generation force fields for nucleic acids. However, it is critical to note that crystal simulations alone are far from sufficient to fully benchmark performance of force fields or assess their limitations. Caution should be made in interpreting data from crystal simulations, the crystal packing environment for which are especially sensitive to conditions such as temperature and crystallization agents32 (the concentrations for which might not be known in the crystal itself), and may be further limited by unbalanced solute-solvent interactions and incomplete sampling.33 These factors could produce crystal packing results that lead to potentially false positive or false negative conclusions.34 Recently a critical study of results from crystal simulations of a DNA dodecamer has been examined in the context of intermolecular forces that arise from crystal packing and crystallization agents.32
Toward that end, it is important to establish best practices and provide a framework from which consistent benchmark crystal simulations of nucleic acids can be performed and analyzed. In the present work, we describe methods for performing and analyzing crystal simulations of nucleic acids. We begin by describing construction and preparation of the fundamental unit cell and its replication into an appropriately sized supercell for production simulation, collection of statistics, and comparison with crystallographic data. Next, we develop a novel B-factor decomposition scheme that allows contributions to atomic fluctuations to be separated into additive intramolecular, rotational, and translational components, as well as partitioning of each of these components into individual asymmetric unit (e.g., duplex “molecule”) and lattice (supercell) contributions. We then apply this framework to examine DNA duplex structure and fluctuations for benchmark A-, B- and Z-form DNA systems using the AMBER OL1535 DNA force field. We first examine helical parameters, base pair interactions, torsion angle and sugar pucker distributions. Following, we examine intermolecular crystal packing interactions by analyzing the preservation of native crystal contacts and using B-factor decomposition analysis. In this way we can characterize the degree to which disruption of native packing leads to artificially large B-factor estimates. It should be pointed out that this analysis is not meant to be a critical assessment of the OL15 force field, as the many potential caveats described above for crystal simulations32 have not been addressed. For these reasons, the simulation results presented here are expected to reasonably represent intramolecular structure and dynamics, but likely do not well represent the intermolecular interactions that affect the crystal packing environment. Rather the consistent framework for conducting and analyzing crystal simulations that we develop here is meant to capture and quantify these deficiencies, and hence facilitate more in depth studies of a broad range of crystals that could ultimately be used to benchmark nucleic acid force fields and assess their limitations.
Methods
All simulations were carried out using AMBER1636 package, employing OL1535 DNA force field, with TIP4P/Ew37 water model and corresponding Joung & Cheatham monovalent38 and Li/Merz divalent39 ion parameters, under periodic boundary conditions at 300 K. An 8 Å nonbond cutoff is applied and the electrostatics beyond cutoff are accounted for by particle mesh Ewald.40 Langevin thermostat with γ=5 ps−1 and Berendsen isotropic barostat with τ=1 ps are used when necessary. Heavy atom time step of 1 fs (as opposed to the more standard 2 fs, for consistency with other simulations performed in the lab such as QM/MM and certain free energy simulations) is used along with SHAKE algorithm41 for hydrogens.
The number of water molecules needed for each system is determined through multiple unconstrained NPT simulations until the system (or interpolation between two simulations) yields within 0.5% of experimental unit cell volume (Tables S2 and S4), which is comparable to the tolerance used in other crystal simulations reported in the literature.17 It should be pointed out that NPT simulations of crytsals are difficult to set up for the reason that packing solvent is a tedious process of trial and error, causing most crystal simulations to be performed in the NVT ensemble. Each system is solvent equilibrated in a multistage fashion through a cycle of minimization, NVT heating, and NPT equilibration where the constraint weight on the solute is approximately halved at each turn [50, 25, 10, 5, 2 kcal/(mol·Å2)], yielding in a total of ~100 ns solvent equilibration with fixed solute. Supercells are then constructed by propagating the solvent equilibrated unit cells and are further solvent equilibrated through the same cycle.
For each system, production simulations of supercells (SCs) with 12 unit cells (UCs) are run for 1.2 μs with frames collected every 10 ps. The first 200 ns are discarded and analyses are performed on the remaining 1 μs of simulations. The structural indexes and distributions monitored in the present study are reasonably well converged on this time scale, and the structures are observed to be equilibrated and stably fluctuate after 200 ns as discussed in the Results and Discussion section. Drifting of the crystal is removed by re-imaging the system around the solute center of mass, such that the center of mass of the system is fixed. For analyses at the molecular level (i.e. all except crystal contacts), each symmetry related molecule is mapped onto the asymmetric unit using reverse symmetry operations.
For each discussed property P, either an overall average, 〈P〉, or a molecular average, 〈P〉mol, is reported. 〈P〉 is calculated for the full ensemble of structures obtained from combining reverse symmetry applied to the time series for all DNA duplex trajectories, while 〈P〉mol is average of P calculated separately for all individual DNA duplex trajectories. RMSD, B-factor, native contacts, helical parameters, hydrogen bond distances and torsional angle distributions are all calculated using corresponding cpptraj modules, with the exception of sugar puckers which were calculated using ν1 and ν3 angles.42 Crystal contacts are calculated every 10 ns, while other properties are calculated every 10 ps.
Results and Discussion
Fundamentals of Crystal Simulations
Constructing the unit cell (UC)
Crystals have long-ranged order that involves replication of a unit cell, the fundamental repeating unit of the crystal. The unit cell may have internal symmetry dictated by the space group. The asymmetric unit is the symmetry-independent part of the unit cell that can be used to generate the complete unit cell from the space group symmetry operations, and is typically the only coordinates that are deposited in crystallographic databases. In the present case, the complete unit cell contains molecules of DNA closely packed, with solvent and ions (typically between 30–70%) filling the interstitial spaces. It should be emphasized that symmetry of the crystal, and indeed its periodicity, is a result of an ensemble average over all unit cells and time; i.e., perfect symmetry and periodicity will not be apparent from an instantaneous configuration taken from a snapshot in time. Hence, simulations that introduce instantaneous symmetry and/or periodicity with “periodic boundary conditions” alone are artificial even for a perfect crystal. These artifacts can be systematically reduced by considering larger “supercells” (SCs) created by replicating copies of the unit cell along its lattice vector translations, and considering all atoms in the supercell as independent degrees of freedom in solving the equations of motion. Although this procedure is theoretically more rigorous, by not constraining simulations to preserve either the internal space group symmetry or exact periodicity of the fundamental unit cell, freedom is provided for the packing and fluctuations in the crystal to break symmetry and diverge from their experimental values. This then provides a sensitive test of the simulation models to reproduce intermolecular interactions in the crystal that affect packing, which may also have relevance as models for tertiary interactions in higher order nucleic acid structures and complexes.
Constructing a model supercell is not without its challenges, as while the asymmetric unit contains the DNA molecule(s), typically only a small number of ordered solvent molecules are able to be resolved from crystallographic data. Hence, one must develop robust procedures to pack the interstitial space with water and ions. The only information available to guide this process are the experimental volume of the unit cell (and in rare cases the density of the crystal), and the identities of solution conditions used in crystallization. The precise composition of crystallization agents in the crystal itself is often unknown if these components are not resolved from the X-ray diffraction, as the conditions in the crystal will differ from those of the crystallization buffer. These conditions are nonetheless important for forming correct crystal packing, and thus present a significant challenge to set up in simulations, as discussed in a recent in depth study of a DNA dodecamer.32 We do not explore these issues in depth for the broad range of DNA crystals under study, but rather confine our attention to overall packing of the crystal with solvent and counterions so as to closely reproduce the unit cell volume. In the present study, we constructed UCs by keeping all the ion and solvent atomic positions from the X–ray data, added counter ions to neutralize the system and packed water molecules so as to preserve the experimental lattice vectors and unit cell volume. Simulated unit cell contents in the present study are summarized in Table 1.
Table 1:
Unit Cell (UC) Contents and Properties of Each DNA Crystal.
1LJX49 | 137D48 | 5DNB43 | 1D2350 | 119D51 | 1BNA52 | ||
---|---|---|---|---|---|---|---|
conformation | Z | A | B | B | B | B | |
bpa in duplex | 6 | 10 | 10 | 10 | 12 | 12 | |
duplexes in UC | 4 | 4 | 2 | 4 | 4 | 4 | |
Xtal | |||||||
ion type | Mg2+ | Mg2+ | Mg2+ | Mg2+ | |||
ion count | 8 | 14 | 8 | 2 | |||
water count | 232 | 404 | 308 | 614 | 545 | 320 | |
Extra | |||||||
ion type | Mg2+ | Na+ | Na+ | Mg2+ | Mg2+ | Mg2+ | Mg2+ |
ion count | 12 | 72 | 8 | 4 | 28 | 42 | 44 |
water count | 250 | 697 | 224 | 228 | 430 | 570 | 1092 |
Total | |||||||
ions | 20 | 72 | 22b | 18 | 36 | 44 | 44 |
water | 482 | 1101 | 532 | 536 | 1044 | 1115 | 1412 |
atoms | 3460 | 5887 | 2882 | 2890 | 5696 | 6429 | 7312 |
Unit cell | |||||||
Space Group | P212121 | P212121 | C2 | P212121 | C2 | P212121 | |
a (Å) | 21.18 | 24.9 | 32.25 | 38.93 | 64.84 | 24.87 | |
b (Å) | 28.36 | 44.84 | 25.53 | 39.63 | 35.36 | 39.63 | |
c (Å) | 44.44 | 47.97 | 34.38 | 33.3 | 25.35 | 33.3 | |
α (°) | 90 | 90 | 90 | 90 | 90 | 90 | |
β (°) | 90 | 90 | 113.4 | 90 | 92.24 | 90 | |
γ (°) | 90 | 90 | 90 | 90 | 90 | 90 | |
Volume (Å3) | 26700 | 53560 | 25980 | 51375 | 58080 | 66500 | |
Volume/bp (Å3) | 1113 | 1339 | 1299 | 1284 | 1210 | 1385 | |
Resolution (Å) | 1.64 | 1.7 | 1.4 | 1.5 | 2.25 | 1.9 | |
Collection T (K) | 293 | 290 | 273 | 273 | 288 | 290c |
The ‘bp’ abbreviation stands for base pair.
Total of Na+ and Mg2+ ions.
Crystallization temperature, as collection temperature is not reported for this crystal.
Determining optimal size of the supercell (SC)
As mentioned previously, use of SCs afford a mechanism to systematically eliminate artifacts of enforcing periodic boundary conditions to mimic an infinite periodic crystal of exact cell replicas at every instant. In this sense, the larger the SC the less pronounced are periodic boundary artifacts. Computational effort increases roughly linearly with size of the SC, however, these requirements are at least partially offset by the increased number of independent DNA molecules that are sampled and contribute to statistics. Thus, it is important to find an optimum SC size that is sufficient for proper modeling of the crystal while being practical to compute.
To explore the effects of the SC size, we chose a test system of a ten base-pair B-DNA double helix (PDB: 5DNB43) which has been well-studied previously in crystal environment.44–47 We simulated this system with various SC sizes in the NPT and NVT ensembles for 250 ns (see Supporting Information Table S2 for details about NPT volume fluctuations). We examined B-factors and root mean square deviations (RMSD) (see Figure 1) of all replicas of the asymmetric unit, which for this system is a single DNA strand. B-factors are calculated using the atomic fluctuations from the simulation and give a measure of dynamics, whereas RMSD values provide a measure of the structural variation of the simulation average from the experimental structure. For each property, we present results for the overall average (average over all symmetry related molecules in the SC and time) and the molecular average (average of results obtained for individual molecules in the SC.)
Figure 1:
Effect of supercell size on RMSDs and B-factors. Top and bottom panels show overall and molecular average properties respectively. Molecular averages are obtained by averaging values of each symmetry related molecules. Each SC size is shown with a different color, and is consistent in all panels. Squares and circles denote NVT and NPT respectively. For NPT simulations with multiple SC propagations leading to the same SC size triangles are used for differentiation. RMSDs (left, Å) are with respect to the experimental structure. Standard deviations in 〈RMSD〉mol correspond to the variation among molecules. B-factors (right, Å2), are obtained from fluctuations in the simulation and are shown as residual aver ages alongside experimental data (black.) Bracket notation is used to maintain consistency with other sections.
We have seen that the overall B-factors and both set of RMSD values all initially increase with the SC size (Figure 1). This is an expected result as the system is given more degrees of freedom, and an indicator of improved sampling with increasing SC size. As the SC size continues to increase, the cell-average RMSD values converge to roughly 0.7 Å with time-averages for individual molecules consistently higher around 0.8 Å. B-factor root mean square error (RMSE) values from the cell-average values increase monotonically with SC size (Table S3), whereas the time-average values stabilize more quickly. As will be explained in more detail below, the former is due to improper crystal packing with large SC sizes, whereas the latter indicates that intramolecular structure and fluctuations are relatively stable. Overall, the best balance occurs with a SC size of 2×2×3 (12 UC replicas) that affords sufficient freedom to allow molecules to re-pack, and is the minimum propagation length ensuring contacts in all directions are with independent molecules in the crystal (i.e., there are no self or neighbor image “intermolecular” interactions).
There is some variation observed between NVT and NPT ensembles. In general, the NVT ensemble gives results comparable to NPT for a single UC, but shows more substantial deviations when applied to a larger 2×2×1 SC. Since the goal of this work is to provide benchmark simulation tests that are sensitive to errors in force fields, we will henceforth use the NPT ensemble which will allow volume fluctuations that can be compared with experiment.
Decomposition of crystal fluctuations into intramolecular and intermolecular (translational and rotational) components
Consider the the atomic fluctuations arising from a crystal simulation of a system of identical DNA molecules arranged in a supercell as described previously. Each molecule in the supercell contains independent atomic degrees of freedom (i.e., there are no explicit symmetry constraints). During dynamics, individual molecules will undergo internal (intramolecular) fluctuations, in addition to translational and rotational fluctuations about their equilibrium positions within the lattice. Further, since there are no explicit symmetry constraints, the supercell is free to break perfect symmetry of the original crystallographic lattice. This will cause the average structures of individual molecules, as well as their positions and orientations within the lattice, to deviate from one another. The results is that these deviations will yield larger positional fluctuations derived from the overall ensemble of lattice structures at each time point than the average fluctuations derived from the time series of individual molecules in the lattice.
Herein we propose a simple but general framework for analysis of atomic fluctuations from crystal simulations. This framework enables decomposition of fluctuations into additive translational, rotational and intramolecular components, as well as separation of each of these components into molecular (asymmetric unit) and lattice (supercell) contributions.
We first introduce the following notation. The overall fluctuations in atomic positions ui for atom i are derived from consideration of all replicated molecules in the supercell and all time points for each molecule. This ensemble can be constructed by applying reverse supercell translations and inverse symmetry operations so as to transform the atomic coordinates of each molecule to a common reference frame. Unless otherwise indicated, this common origin will be the center of mass origin of the fundamental asymmetric unit (i.e., the asymmetric unit from which all other asymmetric units in the unit cell are generated using forward symmetry operations). We designate the overall (total) atomic fluctuations derived from all molecules in the lattice as , where the use of brackets 〈⋯〉 indicate an average over the lattice ensemble that contains the transformed coordinates for all molecules at every sampled time point. Alternatively, we can consider the atomic fluctuations of individual molecules in the lattice, and then average the individual time ensemble fluctuations for each molecule. We designate this fluctuation average as , where we use the short-hand notation 〈⋯〉mol to indicate the sequential time ensemble average for each molecule, averaged over all molecules. We generically designate the difference between the atomic fluctuations derived from the lattice average 〈⋯〉 and molecular time-average 〈⋯〉mol atomic fluctuations as,
(1) |
We now consider sequential transformations of the ensemble such that translational and rotational fluctuations are removed. Translational fluctuations can be removed from an ensemble by re-centering each molecule of the ensemble to a common center of mass origin. Rotational fluctuations about a common center of mass origin can be removed from an ensemble by re-orienting each molecule to a common reference frame. This can be achieved by finding the set of rotations that minimize the mass-weighted variance of atomic positions within the ensemble. Considering that the total fluctuations include intramolecular (I), rotational (R), and translational (T) components (i.e. ), we designate the atomic fluctuations computed in the transformed ensembles with translational fluctuations removed as , and both translational and rotational fluctuations removed as , respectively.
Next we consider a cluster expansion of the total lattice fluctuations as,
(2) |
where
(3) |
(4) |
(5) |
Note that corresponds to purely intramolecular atomic fluctuations from the ensemble having translational and rotational fluctuations removed, and and correspond to additional contributions that arise from pure rotations and translations, respectively. The decomposition above can also be applied to the molecular average fluctuations from the time ensembles as
(6) |
with analogous definitions for the intramolecular, rotational and translational components.
Similarly to Eqn 1, we define the differences between the atomic fluctuation components derived from the lattice average and molecular time-average atomic fluctuations as
(7) |
(8) |
(9) |
The values directly report on the differences between the average molecule time ensembles and the overall lattice ensemble, and thus provide insight into the degree to which the intramolecular structure, orientation and position of the molecules deviate from one another and break symmetry of the ideal crystal.
The general framework developed here thus enables the following additive decomposition relations, illustrated below:
Transformed atomic fluctuations of the test system
In order to decompose fluctuations to all of the individual components mentioned above, the first step is to calculate the fluctuation ensembles , and , as well as their corresponding molecular counterparts (, and .) We simulated the same B-DNA system with 12 UC for 1.2 μs, and monitored the RMSD time series for each asymmetric unit (molecule), as well as the RMSD time series for the instantaneous cell-average structure (i.e., the structure resulting from averaging the conformation of each of the molecules in the unit cell at the time point). Figure 2a shows the RMSD time series, and indicates that the structures are well converged after 200 ns, and fluctuate stably with average molecule RMSD values below 1.2Å, and instantaneous cell-average RMSD values below 0.8Å. In the remainder of the manuscript, sampling of fluctuations used for the B-factor decomposition and other statistical analyses were performed over the last 1 μ of simulation. The average atomic B-factors are calculated from the atomic fluctuations through the relation:
(10) |
where 〈Bi〉 is the B-factor for atom i, and is the corresponding atomic fluctuation. Figure 2b summarizes the B-factor results in two panels, with the top focusing on supercell level (〈⋯〉) and the bottom on molecular level (〈⋯〉mol) fluctuations. For the overall system fluctuations, the values obtained from 1 μs trajectory are almost same as the previous 250 ns. On the other hand, we see ~10 point increase in the (or 〈B-factors〉mol) values. This can be explained by recognizing that, while individual DNA duplexes sampled more conformations during the longer simulation, the differences between the duplexes did not change significantly.
Figure 2:
RMSD and B-factor analysis of 5DNB simulated in 12-unit SC. a) Molecular average (black) RMSD and standard deviation (gray) shown alongside RMSD of average trajectory (blue) over 1.2 μs of simulation. Average trajectory is created by obtaining a structural average of all replicas at each frame. b) B-factor decomposition for the last 1 μs of simulation. Top (overall lattice, 〈⋯〉) and bottom (molecular averages, 〈⋯〉mol) panels show experimental values (black, Å2) in comparison with intramolecular (I, red), center of mass corrected (IR, blue) and uncorrected (IRT, green) B-factors.
Comparing the calculated B-factor decomposition results, one observes that the (bottom, red) values correlate and overlap the best with the experimental values. Another is that the specific B-factor component does not affect the general trends, but rather produces an overall shift/amplification of the curves. These two results point out the strength of the force field in modeling intramolecular interactions, but also bring to surface the deficiencies of the current simulations in packing. These are discussed in greater detail for the full set of benchmark systems studied in the next sections.
Benchmark DNA crystal systems
In addition to the 5DNB system discussed in the previous section, we consider five additional crystal systems comprising A-DNA (137D48), Z-DNA(1LJX49), and three other B-DNAs (1D23,50 119D51 and 1BNA52) as summarized in Table 1. These systems were chosen for study based on their resolution, size, structural diversity, and near room-temperature data collection. For each system, we constructed supercells consisting of 12 UCs as described in the previous section, and ran microsecond MD simulations for analysis (see Methods for details and Table S4 in Supporting Information for volume information).
In the remainder of the paper, we apply the protocols and analysis methods to characterize the structure, dynamics and crystal packing of the benchmark set of crystal systems described above using the AMBER OL1535 DNA force field. As stated previously, the goal is not to assess the force field itself, particularly with respect to intermolecular interactions and crystal packing that are sensitive to precise crystallization conditions. Rather the purpose is to develop and apply the framework for conducting and analyzing the simulations in order to set the stage for further study aimed at aiding in force field validation. We organize the analysis and discussion that follow in sections that examine separately intramolecular DNA duplex structure and fluctuations, and intermolecular crystal packing interactions. We take a hierarchical approach for data reduction that uses average structures for overall metrics like RMSD. We report only average helical values and standard deviations for helical parameters that, as discussed below, almost uniformly show small standard deviations. We report full angular distributions (also indicating averages and standard deviations) for torsion degrees of freedom and sugar pucker since these distributions can be multimodal.
DNA duplex structure and fluctuations
In this section, we examine structure, fluctuations, and interactions within each DNA duplex. We start first by using the RMSD as an overall measure of the structural similarity between the simulation average structures and the crystal structures. Second, we compare helical parameters and their fluctuations to examine variations between different DNA duplex forms. Third, we examine duplex base-pair hydrogen bonding and compare with crystallographic values. Fourth, we examine the distributions of DNA backbone (α-ζ) and glycosidic bond (χ) torsion angles, and sugar puckering phase (P) angles derived from the simulation, and compare residue averages with crystallographic values.
RMSD of average structures
We obtained average structures from combined 1-μs trajectories and, for each system, examined root mean square deviation (RMSD) from experimental structures. Figure 3 shows simulation averages (colored) aligned and overlaid with experimental structures (gray), along with RMSD (Å) values. All of the simulation average structures were very close to the crystal structures and had RMSD values below 1.0 Å (shown in parentheses) except for A-DNA (137D) which shows some deviation at the ends that likely arise from disruption of crystal packing (described in the Crystal Packing section below). Hence, overall, the average structure of each system was very close to the experimental crystal structure.
Figure 3:
Overlay of experimental (gray) and simulation average (colored) structures for the six DNA systems. Heavy-atom RMSD (Å) and RMSD per base pair (Å/bp) are shown in parentheses and brackets respectively.
We see that the best result (lowest RMSD) is obtained with Z-DNA (1LJX, 0.57 Å) and one of the dodecamers (119D, 0.64 Å) and the worst with A-DNA (137D, 1.07 Å). However since RMSD usually increases with the size of the molecule (due to the global fitting procedure), we felt it valuable to also report the average RMSD per base pair (values in brackets) to help normalize values for duplexes of different length. With the base-pair normalized values we see that the dodecamers (119D and 1BNA) are closest to experiment with values of 0.05 and 0.06 Å/base pair, and are very close with each other as well. Following the dodecamers are the two decamers (1D23 and 5DNB), where 1D23 is better than 5DNB. The worst two values are from the A-DNA 12-mer (137D, 0.11 Å/base pair) and Z-DNA 6-mer (1LJX, 0.10 Å/base pair).
Helical parameters
Next, we expand our measure of overall structural features to helical parameters. We calculated helical parameters using 3DNA software53 for each system at each frame of the trajectories. Table 2 compares the experimental and simulation average values over all base pairs for each system, along with standard deviations. The average helical parameter values calculated in this way from the trajectories are essentially identical to values calculated from the corresponding simulation average structure (see Table S6 in Supporting Information). For major and minor groove widths, we report the value calculated from the average structure alone.
Table 2:
Helical Parameters for all DNA Systems Obtained from Molecular Dynamics Simulation (Sim) Versus Experiment (Expt). Simulation helical parameters are averages over every frame of the trajectory, except for minor and major grooves which are calculated on the simulation average structures 〈Sim〉. Both simulation and experimental values are reported as single average values for each system with standard deviations showing the magnitude of variation among residues.
Z | A | B | |||||
---|---|---|---|---|---|---|---|
1LJX | 137D | 5DNB | 1D23 | 119D | 1BNA | ||
rise | Expt | 3.6±0.2 | 3.3±0.2 | 3.3±0.2 | 3.4±0.2 | 3.3±0.3 | 3.4±0.2 |
(Å) | Sim | 3.6±0.3 | 3.3±0.1 | 3.4±0.1 | 3.3±0.1 | 3.3±0.1 | 3.3±0.1 |
tilt | Expt | −0.7±1.0 | −0.2±1.4 | 0.0±2.5 | −0.2±1.9 | 1.6±4.0 | −0.2±2.8 |
(deg) | Sim | −0.2±2.2 | −0.2±1.6 | −0.1±1.6 | 0.0±1.3 | 0.5±2.7 | 0.0±1.9 |
roll | Expt | −2.4±2.6 | 6.3±4.9 | 2.2±5.8 | 0.0±4.4 | 2.2±3.3 | 0.0±5.4 |
(deg) | Sim | −2.8±2.9 | 10.4±4.1 | 2.0±3.0 | 1.1±3.8 | 2.3±3.3 | 0.2±3.4 |
twist | Expt | 26.7±19.7 | 30.7±4.3 | 35.2±8.7 | 37.1±5.0 | 35.5±5.1 | 35.6±4.9 |
(deg) | Sim | 27.5±14.6 | 29.3±1.8 | 34.4±6.7 | 36.2±5.6 | 35.3±6.3 | 35.5±3.6 |
x-disp | Expt | 12.5±12.8 | −4.5±1.5 | 0.5±1.8 | 0.2±1.0 | 0.3±1.1 | 0.1±0.8 |
(Å) | Sim | 9.6±9.1 | −3.2±0.9 | 0.1±1.2 | 0.0±0.7 | −0.1±1.1 | −0.2±0.6 |
y-disp | Expt | 1.2±1.8 | −0.1±1.4 | 0.0±1.0 | 0.0±0.6 | −0.2±1.0 | 0.0±0.9 |
(Å) | Sim | −0.3±1.6 | 0.0±0.5 | 0.0±0.4 | −0.1±0.7 | −0.2±0.8 | 0.0±0.7 |
minor | Expt | 11.7 | 17.4±0.6 | 11.6±1.0 | 11.2±1.5 | 12.1±1.9 | 10.9±1.5 |
(Å) | 〈Sim〉 | 11.6 | 17.1±0.7 | 11.2±0.8 | 11.0±1.5 | 11.4±1.8 | 10.6±1.0 |
major | Expt | 21.4 | 18.3±1.2 | 17.9±2.0 | 17.4±0.6 | 16.9±1.5 | 17.5±0.5 |
(Å) | 〈Sim〉 | 21.2 | 19.1±1.1 | 18.5±0.4 | 17.9±0.4 | 17.2±1.4 | 18.1±0.5 |
Helical parameters calculated from simulation agree reasonably well with experimental values, especially for rise distance where both the average values and the standard deviations are in close agreement. Similarly, both major and minor grooves are well modeled by the simulation average structures.
For twist angle, the average values agree well (less than 1 deg) with experiment with only the A-DNA (137D) system being a minor exception (1.4 deg twist deviation). The residue standard deviations from the simulations are generally less than those of the crystal structures, suggesting the simulations are producing slightly more uniform helices than the crystal structures. We see this reduced variation in the simulation results for other parameters as well.
For tilt angle while most systems give good agreement with experiment, the values for 1LJX (Z-DNA 6-mer) and 119D (B-DNA dodecamer) are both predicted to be slightly closer to zero (less tilt) in the simulation. In the case of Z-DNA the standard deviation is also higher than the experiment, which is opposite of the general trend seen in simulations. For roll angles, 137D (A-DNA) and 1D23 (B-DNA decamer) are overestimated by more than 1 deg, and for the A-DNA system the difference is about 4 deg.
In x- and y-displacement the B-DNA systems agree well with the experimental values, but for the A-DNA and Z-DNA systems, there are clear discrepancies. Average value for x-displacement is underestimated for A-DNA, as well as the standard deviation, but the major problems lie with the Z-DNA system. For both x- and y-displacement Z-DNA simulations have issues, such that the x-displacement averages are underestimated by almost 3 Å, whereas the y-displacement average is off by 1.5 Å and has the wrong sign.
For the 137D (A-DNA decamer) and particularly the 1LJX (Z-DNA 6-mer), there are minor to moderate deviations in helical parameters with respect to the crystallographic values. Nonetheless, overall, the experimental and simulation B-DNA structures have very similar helices.
Hydrogen bond distances
Hydrogen bonding between base pairs (base pairing) of nucleic acids is fundamental to their structure and biological function.54 It is therefore essential for force fields to model these interactions accurately. Here, we calculated and compared hydrogen bond distances in the experimental and simulation average structures (Table 3).
Table 3:
Donor – Acceptor Hydrogen Bond Distances (Å) for Simulated DNA Duplexes. Reported values are averages of base-pair hydrogen bonding heavy atom distances. Experimental distances (Expt) are calculated from the PDB structure whereas simulation distances (Sim) are from the overall simulation average structures.
1LJX | Expt | Sim | 5DNB | Expt | Sim | 1D23 | Expt | Sim | |||||||||
1 | T | — | A | 2.91 | 2.92 | 1 | C | — | G | 2.87 | 2.85 | 1 | C | — | G | 2.82 | 2.88 |
2 | G | — | C | 2.88 | 2.87 | 2 | C | — | G | 2.91 | 2.88 | 2 | G | — | C | 2.92 | 2.89 |
3 | C | — | G | 2.93 | 2.87 | 3 | A | — | T | 2.98 | 2.91 | 3 | A | — | T | 2.96 | 2.92 |
4 | G | — | C | 2.91 | 2.87 | 4 | A | — | T | 2.91 | 2.88 | 4 | T | — | A | 2.86 | 2.89 |
5 | C | — | G | 2.92 | 2.91 | 5 | C | — | G | 2.84 | 2.88 | 5 | C | — | G | 2.82 | 2.88 |
6 | A | — | T | 2.92 | 2.94 | 6 | G | — | C | 2.91 | 2.89 | 6 | G | — | C | 2.87 | 2.89 |
7 | T | — | A | 2.99 | 2.91 | 7 | A | — | T | 2.93 | 2.95 | ||||||
8 | T | — | A | 2.89 | 2.89 | 8 | T | — | A | 2.89 | 2.91 | ||||||
9 | G | — | C | 2.89 | 2.86 | 9 | C | — | G | 2.82 | 2.89 | ||||||
10 | G | — | C | 2.87 | 2.86 | 10 | G | — | C | 2.78 | 2.88 | ||||||
137D | Expt | Sim | 119D | Expt | Sim | 1BNA | Expt | Sim | |||||||||
1 | G | — | C | 2.71 | 2.90 | 1 | C | — | G | 2.81 | 2.88 | 1 | C | — | G | 2.74 | 2.87 |
2 | C | — | G | 2.75 | 2.88 | 2 | G | — | C | 2.68 | 2.91 | 2 | G | — | C | 2.78 | 2.86 |
3 | G | — | C | 2.76 | 2.86 | 3 | T | — | A | 2.67 | 2.96 | 3 | C | — | G | 2.72 | 2.88 |
4 | G | — | C | 2.82 | 2.88 | 4 | A | — | T | 2.69 | 2.94 | 4 | G | — | C | 2.75 | 2.89 |
5 | G | — | C | 2.83 | 2.87 | 5 | G | — | C | 2.58 | 2.89 | 5 | A | — | T | 2.95 | 2.95 |
6 | C | — | G | 2.75 | 2.85 | 6 | A | — | T | 2.63 | 2.97 | 6 | A | — | T | 3.11 | 2.98 |
7 | C | — | G | 2.74 | 2.86 | 7 | T | — | A | 2.65 | 2.92 | 7 | T | — | A | 2.97 | 2.96 |
8 | C | — | G | 2.76 | 2.86 | 8 | C | — | G | 2.75 | 2.91 | 8 | T | — | A | 2.83 | 2.90 |
9 | G | — | C | 2.82 | 2.88 | 9 | T | — | A | 2.62 | 2.94 | 9 | C | — | G | 2.77 | 2.90 |
10 | C | — | G | 2.74 | 2.89 | 10 | A | — | T | 2.46 | 2.92 | 10 | G | — | C | 2.77 | 2.88 |
11 | C | — | G | 2.71 | 2.88 | 11 | C | — | G | 2.85 | 2.89 | ||||||
12 | G | — | C | 2.76 | 2.91 | 12 | G | — | C | 3.01 | 3.00 | ||||||
Averages | Expt | Sim | |||||||||||||||
A | — | T | 2.84 ± 0.16 | 2.93 ± 0.03 | |||||||||||||
C | — | G | 2.81 ± 0.08 | 2.88 ± 0.02 |
Overall, simulation values are in close agreement with experiment, especially for the Z-DNA (1LJX) and two decamer B-DNA (5DNB and 1D23) systems, which happen to have coaxial base stacking only. In case of the two dodecamers the differences are higher. In 1BNA, although differences are still less than 0.25 Å, simulation gives slightly longer distances than experiment. This is more amplified in 119D where all simulation distances are longer than experiment, and seven of which (6 A–T and 1 C–G) are by more than 0.25 Å. This is mostly due to the fact that the experimental values for this system are significantly below the average hydrogen bond distances, the simulation values are very close to simulation values of other systems (and therefore the average simulation values.) A-DNA (137D), only C–G pairs, similarly produces longer distances than experiment. Even though all the differences are less than 0.25 Å hydrogen bonds are systematically ~0.1 Å longer.
We also calculated base-pair-specific average H-bond distances from all of the experimental and simulation average structures. The experimental structures exhibited moderate standard deviation (0.08 Å) value for C–G pairs and fairly large (0.16 Å) value for A–T pairs. The simulation average structures exhibit much smaller standard deviation values (0.2–0.3 Å). The experimental and simulation average distances for C–G and A–T pairs are quite close with difference less than 0.1 Å, which is only marginally statistically significant provided the standard deviations and number of data points.
Torsional angle distributions
We calculated all backbone and χ torsions of DNA along with sugar pucker (i.e. phase angle, P) for each system. We show the distribution density (Figure 4) since some torsions have bimodal distributions and reporting only averages can be less informative.
Figure 4:
Simulated torsional angle distributions for the six DNA systems. Simulation distributions calculated over all frames are shown in shades of gray according to population density. Experimental average and the standard deviation among residues are shown with red tick marks and arcs respectively, individual values for residues are shown with blue tick marks. In every dial, the origin (zero) of angular axis is the horizontal right (“east”) reference direction and increases moving counterclockwise.
Looking at all the distributions one torsional at a time (i.e. looking down a column), we can see clear patterns among different systems. For B-DNA there are slight variations in the narrowness (seen as light gray smudges) but overall the distributions for all torsional angles are very similar, more so than their corresponding experimental values. For example, for α angle, while 5DNB, 1D23 and 1BNA all have very narrow distribution for the experimental values (as seen by the red arc denoting the standard deviation among residues), 119D has a much wider arc and blue ticks are more spread out. Yet, the simulation distribution for 119D is very similar to that of 5DNB, 1D23 and 1BNA. More specifically 119D never populates α=−4°, even though it is the experimental value for one of its residues. We call this situation ‘no representation’, when the distribution from the simulation does not overlap so as to populate an experimental value. We see a few more ‘no representations’ among two BDNAs 119D and 1D23: in 119D, γ= −106° and P=95°–125° (3 residues); in 1D23 γ=−173°, δ=63° and P=35°. It needs to be noted that since these distributions combine torsions of all residues under one plot, residue-specific nuances are not distinguished.
Other than the few ‘no representations’ mentioned above, torsional angle distributions of B-DNA are generally well replicated in the simulations. The same is not true for A-and Z-DNA. For these systems, there are a few instances where the experimental values are not populated. For the Z-DNA these include γ=−60°, δ=48°–52° (3 residues), ϵ=−172° (2 residues) and ζ=44° (2 residues). For A-DNA α=143° and γ=−177° are values with no representation. Unlike B-DNA the experimental distribution for these torsional angles in A- and Z-DNA are very narrow, and the ‘no representation’ values correspond to singled out values.
The main issue with A- and Z-DNA, however, is not with the ‘no representation’ set, but a more problematic set of ‘misrepresentation’ values, where the simulation significantly populates a region that does not have an experimental counterpart. We see this in Z-DNA for 80° < δ < 100° along with a light shadow seen around ζ=−80°. In A-DNA it is seen at 140° < δ < 150°, −100° < δ < −120°, and 140° < P < 160°. One thing to note about the misrepresentations seen in each of the three angles in A-DNA and ζ of Z-DNA is the misrepresentation region corresponds to highly populated regions of B-DNA. From the current simulations one cannot tell whether these are due to temperature, mis-balance of solvent, ions and crystallization agents, insufficient sampling, and/or force field deficiencies. Further study of more diverse sets of A- and Z-form helices are required to draw general conclusions about the source of these deviations. Nonetheless, next-generation molecular simulation force fields for nucleic acids may need to pay close attention to parameters that distinguish A-DNA and Z-DNA from B-DNA helices.
Intermolecular crystal packing interactions
The nucleic acid crystals studied here exhibit a diverse array of crystal packing (Figure 5). All studied systems except for the A-DNA (137D48) have some form of coaxial stacking of the helices, which is common in nucleic acid crystals. Z-DNA (1LJX49) and the two decamer B-DNAs (5DNB43 and 1D2350) form uniform columns and behave like “infinite” DNA helices, where the only coaxial interaction between molecules is base stacking. The dodecamer B-DNA (119D51 and 1BNA52) have zig-zag coaxial stacking induced by the special inter-molecular interactions denoted as Dickerson Interactions (DI)55 where sets of three base pairs at both ends hydrogen bond with the consecutive molecules. The A-DNA decamer (137D48) is an orthorhombic crystal with helices that, unlike the other crystals, are not coaxially stacked, but rather form a more complex packing arrangement with the reference duplex interacting with two molecules that are asymmetrically tilted away from the center of the duplex. It has been pointed out that the packing arrangement of the crystalline environment can strongly influence local DNA conformation and helix parameters.48
Figure 5:
Crystal packing of benchmark DNA systems. Each system is labeled by its PDB ID and space group. Molecules making up the unit cell are colored and the repeating units are shown in gray. Each system is represented at a view angle portraying packing and stacking most clearly, layers of non unit cell molecules are hidden to further enhance clarity.
These crystals thus present a wide range of crystal packing arrangements, in addition to sharing some common features, that enable us to make a critical assessment of the ability of our current simulations to model intermolecular interactions between DNA helices. Interactions similar to these are important for modeling tertiary or quaternary interactions in more complex nucleic acid systems and macromolecular assemblies.56–59 Here we analyze results and characterize the degree to which our current simulations are able to model this diverse array of packing interactions by comparing the preservation of native contacts, and examining the origins of discrepancies with crystallographic data through B-factor decomposition analysis.
Crystal contacts during simulation
In order to analyze the conservation of crystal contacts during simulations, we monitored the number of native intermolecular contacts for each residue. Specifically, we defined a contact as any pairwise atomic distance shorter than 3.5 Å between an atom of the residue and any atom of a neighboring duplex. The contact distance cut-off was chosen to represent roughly the vdW contact distance between carbons (1.7 Å). We report the calculated native crystal contacts for each residue where they exist alongside the simulation averages for all systems in Table 4.
Table 4:
Conservation of Native Contacts in Simulation. Number of contacts each residue makes in the native crystal and fraction of contacts observed over the simulations are presented. Cutoff distance for defining a contact is taken as 3.5 Å. For each system number of native contacts per residue are shown in parentheses next to PDB ID. Simulation contacts are reported as fraction averages over symmetry related duplexes along with standard deviations. Dashed lines are used to separate the two strands within a helix.
1LJX (5.2) | 5DNB (3.0) | 1D23 (3.4) | |||||||||
Res | Sim | Native | Res | Sim | Native | Res | Sim | Native | |||
A | 1 | 2.1 ± 1.0 | 6 | A | 1 | 1.8 ± 0.6 | 8 | A | 1 | 2.3 ± 1.1 | 12 |
A | 3 | 0.2 ± 0.2 | 3 | A | 4 | 0.1 ± 0.1 | 2 | A | 3 | 0.2 ± 0.2 | 2 |
A | 4 | 2.8 ± 0.7 | 6 | A | 10 | 2.8 ± 0.7 | 12 | A | 4 | 0.0 ± 0.1 | 1 |
A | 6 | 5.3 ± 1.3 | 18 | B | 11 | 2.6 ± 1.0 | 14 | A | 6 | 1.0 ± 0.5 | 4 |
B | 7 | 2.9 ± 0.9 | 15 | B | 13 | 0.2 ± 0.2 | 2 | A | 8 | 0.2 ± 0.2 | 1 |
B | 8 | 0.9 ± 0.4 | 2 | B | 14 | 0.3 ± 0.2 | 2 | A | 10 | 5.1 ± 1.3 | 14 |
B | 10 | 1.0 ± 0.4 | 3 | B | 20 | 3.8 ± 1.0 | 19 | B | 11 | 2.9 ± 1.0 | 9 |
B | 12 | 3.3 ± 1.1 | 9 | B | 12 | 0.2 ± 0.2 | 3 | ||||
B | 13 | 0.1 ± 0.1 | 1 | ||||||||
137D (3.2) | 119D (4.6) | B | 16 | 1.7 ± 0.3 | 6 | ||||||
Res | Sim | Native | Res | Sim | Native | B | 17 | 0.2 ± 0.2 | 1 | ||
A | 1 | 1.1 ± 0.9 | 7 | A | 1 | 4.6 ± 1.4 | 13 | B | 20 | 5.5 ± 1.2 | 14 |
A | 2 | 0.3 ± 0.4 | 1 | A | 2 | 4.4 ± 0.7 | 9 | ||||
A | 3 | 0.6 ± 0.5 | 1 | A | 3 | 4.6 ± 1.1 | 12 | 1BNA (2.2) | |||
A | 4 | 0.6 ± 0.8 | 5 | A | 4 | 0.4 ± 0.2 | 1 | Res | Sim | Native | |
A | 5 | 0.0 ±0.1 | 3 | A | 6 | 0.3 ± 0.3 | 1 | A | 1 | 0.7 ± 0.4 | 3 |
A | 7 | 0.4 ± 0.5 | 3 | A | 10 | 1.5 ± 0.3 | 4 | A | 2 | 2.2 ± 0.5 | 3 |
A | 8 | 0.5 ± 0.3 | 6 | A | 11 | 0.6 ± 0.3 | 1 | A | 3 | 2.0 ± 0.7 | 8 |
A | 10 | 1.5 ± 0.6 | 5 | A | 12 | 8.3 ± 1.1 | 15 | A | 8 | 0.5 ± 0.4 | 2 |
B | 11 | 1.4 ± 1.3 | 8 | B | 13 | 4.6 ± 1.9 | 14 | A | 12 | 5.2 ± 1.1 | 8 |
B | 12 | 0.2 ± 0.3 | 3 | B | 14 | 4.7 ±0.7 | 7 | B | 14 | 2.3 ± 0.3 | 4 |
B | 13 | 0.5 ± 0.3 | 4 | B | 15 | 4.6 ± 1.0 | 7 | B | 15 | 2.1 ± 0.5 | 5 |
B | 14 | 0.2 ± 0.2 | 2 | B | 19 | 0.3 ± 0.2 | 1 | B | 16 | 2.2 ± 0.8 | 8 |
B | 15 | 0.4 ± 0.6 | 3 | B | 20 | 0.3 ± 0.3 | 4 | B | 17 | 0.4 ± 0.2 | 1 |
B | 16 | 0.3 ± 0.3 | 2 | B | 22 | 0.8 ± 0.6 | 1 | B | 22 | 0.9 ± 0.2 | 1 |
B | 18 | 0.5 ± 0.4 | 2 | B | 23 | 1.8 ± 0.8 | 6 | B | 24 | 6.9 ± 1.6 | 10 |
B | 19 | 1.3 ± 0.6 | 2 | B | 24 | 8.8 ± 1.5 | 14 | ||||
B | 20 | 1.1 ± 0.5 | 6 |
Looking at native contacts alone gives a sense of both the compactness and the packing motifs of the crystals. The number of total native contacts and the corresponding per residue values (shown in parentheses next to PDB IDs) are good indicators of how densely the crystals pack, and correlate well with the volume per base pair (bp) values (Table 1), except for a switch in the order between 5DNB and 137D. As for the crystal packing motifs, the axial base-stacking systems Z-DNA (1LJX) and two decamer B-DNAs (5DNB and 1D23) have the highest number of contacts at the ends of the duplex helices with much fewer contacts in between. The dodecamer B-DNAs (119D and 1BNA) also have high number of contact at the ends of the helices, in addition to residues flanking the ends. On the other hand, A-DNA (137D) has a much broader contact distribution, which reflects its packing arrangement that does not involve coaxial stacking.
Simulation averages portray a drastic loss of native crystal contacts through the simulations due to deviations from the native crystal packing. There is no clear pattern of contact loss among different regions of the molecules, but this can be explained by considering that a slight shift of the molecular orientation that disrupts a contact will propagate to residues involved in other contacts (since only native contacts are monitored). Overall, the best conserved system (1BNA) has only 48% of its contacts preserved in the simulation, and the worst conserved system is for A-DNA (137D) with a mere 17% preserved native contacts. With the exception of 5DNB (20%), we see similar percentage conservation for systems that have similar packing motifs. The two dodecamers with DI interactions (119D and 1BNA) are at the 45% range, and the remaining two of the coaxial stacking systems (1LJX and 1D23) are at 30%. These values suggest despite having an overall good description of intramolecular structure and fluctuations with the AMBER OL1535 DNA force field, our simulations fail to maintain the overall intermolecular contacts and crystal packing environment.
B-factor decomposition analysis
We performed B-factor decomposition analysis following the framework introduced in Scheme 1 for all systems (Table 5). At the single molecule level, intramolecular, rotational and translational fluctuation components give a sense for average fluctuations of individual duplexes around their lattice positions, whereas at the (super)cell level, these components incorporate the additional fluctuations introduced by considering the structural ensemble arising from all independent duplexes.
Scheme 1:
Atomic fluctuation decomposition.
Table 5:
Decomposition of B-factors (Å2) for all DNA Systems Following the Setup Shown in Scheme 1.
Intra | Rot | Trans | Total | 1LJX | Intra | Rot | Trans | Total | 5DNB | ||
〈⋯〉mol | 9.9 | 4.6 | 6.3 | 20.8 | 〈⋯〉mol | 16.6 | 8.6 | 11.0 | 36.2 | ||
32% | 15% | 20% | 67% | 29% | 15% | 19% | 63% | ||||
Δ | 3.5 | 3.5 | 3.0 | 10.0 | Δ | 4.4 | 9.0 | 8.2 | 21.5 | ||
+ | 11% | 11% | 10% | 33% | Expt | + | 8% | 16% | 14% | 37% | Expt |
〈⋯〉 | 13.3 | 8.1 | 9.3 | 30.8 | 15.9 | 〈⋯〉 | 21.0 | 17.5 | 19.2 | 57.7 | 13.9 |
43% | 26% | 30% | 100% | 36% | 30% | 33% | 100% | ||||
Intra | Rot | Trans | Total | 137D | Intra | Rot | Trans | Total | 1D23 | ||
32.4 | 15.9 | 27.3 | 75.6 | 16.1 | 6.6 | 13.3 | 36.1 | ||||
25% | 12% | 21% | 58% | 30% | 12% | 24% | 66% | ||||
Δ | 18.4 | 12.8 | 23.2 | 54.4 | Δ | 3.2 | 3.2 | 12.1 | 18.5 | ||
+ | 14% | 10% | 18% | 42% | Expt | + | 6% | 6% | 22% | 34% | Expt |
〈⋯〉 | 50.8 | 28.7 | 50.5 | 129.9 | 23.1 | 〈⋯〉 | 19.3 | 9.8 | 25.4 | 54.5 | 12.9 |
39% | 22% | 39% | 100% | 35% | 18% | 47% | 100% | ||||
Intra | Rot | Trans | Total | 119D | Intra | Rot | Trans | Total | 1BNA | ||
11.1 | 2.4 | 4.0 | 17.6 | 21.9 | 13.0 | 29.4 | 64.4 | ||||
25% | 5% | 9% | 39% | 25% | 15% | 33% | 72% | ||||
Δ | 8.4 | 7.2 | 11.5 | 27.1 | Δ | 6.5 | 6.6 | 11.6 | 24.7 | ||
+ | 19% | 16% | 26% | 61% | Expt | + | 7% | 7% | 13% | 28% | Expt |
〈⋯〉 | 19.5 | 9.6 | 15.5 | 44.6 | 23.1 | 〈⋯〉 | 28.4 | 19.6 | 41.0 | 89.1 | 37.3 |
44% | 22% | 35% | 100% | 32% | 22% | 46% | 100% |
In an ideal crystal simulation where average symmetry and crystal packing were preserved, and each duplex sampled the same conformational space, the values would be zero. The track the fluctuation differences between the average molecule (duplex) time ensembles and the overall lattice ensemble. However, we see that this is far from the case for the current simulations. For most systems, the component makes up 30–40% of total simulation fluctuations, reaching as high as 61% for 119D, and alone is higher than the experimental B-factor in few cases.
Looking at percent contributions to total fluctuations, we see some trends among different systems. For all systems values are within 25–32%, and except for the dodecamers (119D and 1BNA), have the highest contribution to the total fluctuations. In general, , with 10–15% and 19–33%, where 119D is again an exception with its 5% and 9% contributions. In terms of Δ values, we see that is relatively small with 6–19% contribution, and is the largest of the set in systems except for Z-DNA (1LJX) and 5DNB. Finally, results yield in having a range of 32–44%, with 18–30% and 30–46%. Overall, the B-values calculated from the intramolecular fluctuations have comparable magnitude to the experimentally derived values, but these fluctuations represent less than 1/3 of the full simulated fluctuations. The B-value components arising from translational fluctuations around a lattice site (), and between lattice sites () have the next largest contributions, which reflects defects of packing.
Comparing simulation B-factors with experimental values, we see that even the intramolecular fluctuations alone, either at molecular or supercell levels, are enough for half of the systems to overshoot the experimental B-factors. Only the Z-DNA (1LJX), and the two dodecamer B-DNAs (119D and 1BNA) have intramolecular B-factor fluctuations () lower than the experimental values. From the molecular level fluctuations (), only 119D has B-factor values less than experimental B-factors. The disruption of crystal packing in the simulations thus leads to greatly exaggerated fluctuations when considering the ensemble of conformations arising from the supercell, including rotational and translational components.
One interesting result is that similar systems did not necessarily yield similar trends in B-values, the most striking example being the dodecamers where 119D gives B-factors much closer to experiment relative to 1BNA. These differences can be traced, in part, to differences in the overall tightness of the crystal packing environment. A quantitative measure of packing tightness is the volume per base pair. We saw that 119D, the better behaving system, had 10% lower base-pair volume compared to the other dodecamer 1BNA, and overall is the second smallest after the Z-DNA (1LJX). The two decamers had similar mid-range volume and A-DNA had the highest (see Table 1). This order agrees with the trend we see in the overall system B-factors, suggesting that the less tightly packed, the more solvent there is in the unit cell, and thus the more freedom there is to lose crystal contacts and break symmetry causing large fluctuations.
Conclusion
Crystal simulations are valuable tools to aid in assessing the reliability of nucleic acid force fields used to gain predictive insight into complex biological problems. Crystal simulations have the advantages that they provide dense statistical sampling (many copies of the asymmetric unit) at relatively low cost, and afford a mechanism to compare directly to observables from X-ray crystallographic data. Further, crystal simulations not only predict intramolecular structure and dynamics, but also serve as a sensitive probe of intermolecular interactions that may be important for modeling nucleic acid tertiary interactions. It should be emphasized that crystal simulations alone are not sufficient for force field assessment, and caution must be taken in interpreting data from these simulations which are sensitive to environmental conditions, including temperature and crystallization agents that might not be known in the crystal, in addition to sampling requirements. In the present work, we establish a framework from which consistent, benchmark crystal simulations of nucleic acids can be performed and analyzed. We examine a benchmark set of four B-DNA, one A-DNA and one Z-DNA crystals using the AMBER OL15 DNA force field with TIP4P/Ew water model and balanced ion parameters. As supercell size was increased, observed fluctuations from the full lattice ensemble initially increased but reasonably converged with 12 replicas of the fundamental unit cell. Intramolecular structure and fluctuations, as depicted by monitoring helical parameters, base pairing and torsion angle/sugar puckering profiles, were overall quite good, with best agreement with crystallographic data obtained for the B-DNA systems, and considerably worse agreement obtained for the A- and Z-DNA systems. A novel B-factor decomposition scheme was introduced that enables contributions to atomic fluctuations to be separated into additive intramolecular, rotational, and translational components, as well as partitioning of each of these components into individual duplex “molecule” (asymmetric unit) and lattice (supercell) contributions. This framework was applied to study fluctuations arising from the structural ensemble in the supercell, and helped to pinpoint artifacts that arise as a consequence of improper crystal packing. Overall the intramolecular deviations from the crystal were quite small (typically less than 1.0 Å), suggesting relative high accuracy of the force field, whereas crystal packing was not well reproduced in these simulations. These simulation results, however, do not represents a critical assessment of the force field, as the challenges of addressing environmental conditions and factors that impact crystal packing were not explored, and hence one cannot make conclusions about the origin of observed artifacts. Rather, the framework developed in the current work enables an systematic mechanism to conduct and analyze crystal simulations that do take on these issues. Analysis of such simulations would provide insight into the force field balance between solute, solvent, salt, and crystallization agents that are expected to much more significantly impact interactions between different molecules than within individual molecules.
Supplementary Material
Acknowledgement
The authors are grateful for financial support provided by the National Institutes of Health (No. GM107485 and GM62248). Computational resources were provided by the Office of Advanced Research Computing (OARC) at Rutgers, The State University of New Jersey, the National Institutes of Health under Grant No. S10OD012346, and by the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation (Grants No. ACI-1548562 and No. OCI-1053575). The authors thank Trich E. Chidae (Crystal River, FL) for engaging interactions and motivation for this work.
Footnotes
Supporting Information Description
The Supporting Information is available free of charge on the ACS Publications website.
The SI includes tabulations of: dependence of RMSD on supercell size; volume fluctuations of each supercell simulated; root mean square error and linear correlation coefficients for B-factors; volume fluctuations for all systems; and average structure RMSD for all systems.
References
- (1).Saenger W Principles of Nucleic Acid Structure; Springer; New York, 1984. [Google Scholar]
- (2).Egli M Nucleic acid crystallography: current progress. Curr. Opin. Chem. Biol 2004, 8, 580–591. [DOI] [PubMed] [Google Scholar]
- (3).Mollova ET; Pardi A NMR solution structure determination of RNAs. Curr. Opin. Struct. Biol 2000, 10, 298–302. [DOI] [PubMed] [Google Scholar]
- (4).Trabuco LG; Villa E; Mitra K; Frank J; Schulten K Flexible fitting of atomic structures into electron microscopy maps using molecular dynamics. Structure 2008, 16, 673–683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Chen AA; García AE High-resolution reversible foling of hyperstable RNA tetraloops using molecular dynamics simulations. Proc. Natl. Acad. Sci. USA 2013, 110, 16820–16825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).Lemkul JA; MacKerell AD Jr. Polarizable Force Field for DNA Based on the Classical Drude Oscillator: II. Microsecond Molecular Dynamics Simulations of Duplex DNA. J. Chem. Theory Comput. 2017, 13, 2072–2085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (7).Zgarbová M; Jurečka P; Lankaš F; Cheatham TE; Šponer J; Otyepka M Influence of BII Backbone Substates on DNA Twist: A Unified View and Comparison of Simulation and Experiment for All 136 Distinct Tetranucleotide Sequences. J. Chem. Inf. Model 2017, 57, 275–287. [DOI] [PubMed] [Google Scholar]
- (8).Zhang C; Lu C; Jing Z; Wu C; Piquemal JP; Ponder JW; Ren P Amoeba Polarizable Atomic Multipole Force Field for Nucleic Acids. J. Chem. Theory Comput. 2018, 14, 2084–2108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (9).Tan D; Piana S; Dirks RM; Shaw DE RNA Force Field with Accuracy Comparable to State-of-the-Art Protein Force Fields. Proc. Natl. Acad. Sci. USA 2018, 115, 1346–1355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Sponer J; Bussi G; Krepl M; Banas P; Bottaro S; Cunha RA; Gil-Ley A; Pinamonti G; Poblete S; Jurecka P et al. RNA Structural Dynamics As Captured by Molecular Simulations: A Comprehensive Overview. Chem. Rev 2018, 118, 4177–4338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Berman HM; Westbrook J; Feng Z; Gilliland G; Bhat TN; Weissig H; Shindyalov IN; Bourne PE The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).Panteva MT; Dissanayake T; Chen H; Radak BK; Kuechler ER; Giambaşu GM; Lee T-S; York DM In Methods in Enzymology; Chen S-J, Burke-Aguero DH, Eds. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).York DM; Darden TA; Pedersen LG; Anderson MW Molecular Dynamics Simulation of HIV-1 Protease in a Crystalline Environment and in Solution. Biochemistry 1993, 32, 1443–1453. [DOI] [PubMed] [Google Scholar]
- (14).York DM; Wlodawer A; Pedersen LG; Darden T Atomic level accuracy in simulations of protein crystals. Proc. Natl. Acad. Sci. USA 1994, 91, 8715–8718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).York DM; Darden T; Pedersen LG The effect of hydrostatic pressure on protein crystals investigated by molecular simulation. 1995. [Google Scholar]
- (16).Cerutti DS; Freddolino PL; Duke RE Jr.; Case DA Simulations of a protein crystal with a high resolution X-ray structure: evaluation of force fields and water models. J. Phys. Chem. B 2010, 114, 12811–12824. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (17).Janowski PA; Cerutti DS; Holton J; Case DA Peptide Crystal Simulations Reveal Hidden Dynamics. J. Am. Chem. Soc 2013, 135, 7938–7948. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (18).Liu C; Janowski PA; Case DA All-atom crystal simulations of DNA and RNA duplexes. Biochim. Biophys. Acta 2015, 1850, 1059–1071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (19).Janowski PA; Liu C; Deckman J; Case DA Molecular dynamics simulation of triclinic lysozyme in a crystal lattice. Protein Sci. 2016, 25, 87–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (20).York DM; Yang W; Lee H; Darden T; Pedersen LG Toward the accurate modeling of DNA: the importance of long-range electrostatics. J. Am. Chem. Soc 1995, 117, 5001–5002. [Google Scholar]
- (21).Martick M; Lee T-S; York DM; Scott WG Solvent structure and hammerhead ribozyme catalysis. Chem. Biol 2008, 15, 332–342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (22).Heldenbrand H; Janowski PA; Giambaşu G; Giese TJ; Wedekind JE; York DM Evidence for the role of active site residues in the hairpin ribozyme from molecular simulations along the reaction path. J. Am. Chem. Soc 2014, 136, 7789–7792. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (23).Gaines CS; York DM Ribozyme Catalysis with a Twist: Active State of the Twister Ribozyme in Solution Predicted from Molecular Simulation. J. Am. Chem. Soc 2016, 138, 3058–3065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (24).Gaines CS; York DM Model for the Functional Active State of the TS Ribozyme from Molecular Simulation. Angew. Chem. Int. Ed 2017, 129, 13577–13580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (25).Šponer J; Banáš P; Jurečka P; Zgarbová M; Kührová P; Havrila M; Krepl M; Stadlbauer P; Otyepka M Molecular dynamics simulations of nucleic acids. From tetranucleotides to the ribosome. J. Phys. Chem. Lett 2014, 5, 1771–1782. [DOI] [PubMed] [Google Scholar]
- (26).Pasi1 M; Maddocks JH; Beveridge D; Bishop TC; Case DA; Cheatham TE III; Dans PD; Jayaram B; Lankas F; Laughton C et al. μABC: a systematic microsecond molecular dynamics study of tetranucleotide sequence effects in B-DNA. Nucleic Acids Res. 2014, 42, 12272–12283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (27).Bergonzo C; Henriksen NM; Roe DR; Cheatham TE III. Highly sampled tetranucleotide and tetraloop motifs enable evaluation of common RNA force fields. RNA 2015, 21, 1578–1590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (28).Galindo-Murillo R; Robertson JC; Zgarbova M; Sponer J; Otyepka M; Jurecka P; Cheatham TE III Assessing the current state of AMBER force field modifications for DNA. J. Chem. Theory Comput. 2016, 12, 4114–4127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (29).Dans PD; Ivani I; Hospital A; Portella G; Gonzalez C; Orozco M How accurate are accurate force-fields for B-DNA? Nucleic Acids Res. 2017, 45, 4217–4230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (30).Hart K; Foloppe N; Baker CM; Denning EJ; Nilsson L; MacKerell AD Jr.. Optimization of the CHARMM Additive Force Field for DNA: Improved Treatment of the BI/BII Conformational Equilibrium. J. Chem. Theory Comput. 2012, 8, 348–362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Lemkul JA; Huang J; Roux B; MacKerell AD An Empirical Polarizable Force Field Based on the Classical Drude Oscillator Model: Development History and Recent Applications. Chem. Rev 2016, 116, 4983–5013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Kuzmanic A; Dans PD; Orozco M An In-Depth Look at DNA Crystals through the Prism of Molecular Dynamics Simulations. Chem 2019, 5, 649–663. [Google Scholar]
- (33).Galindo-Murillo R; Roe DR; Cheatham TE III Convergence and reproducibility in molecular dynamics simulations of the DNA duplex d(GCACGAACGAACGAACGC). Biochim. Biophys. Acta 2015, 1850, 1041–1058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).Knapp B; Ospina L; Deane CM Avoiding False Positive Conclusions in Molecular Simulation: The Importance of Replicas. J. Chem. Theory Comput. 2018, 14, 6127–6138. [DOI] [PubMed] [Google Scholar]
- (35).Zgarbova M; Sponer J; Otyepka M; Cheatham TE III; Galindo-Murillo R; Jurecka P Refinement of the sugar-phosphate backbone torsion beta for AMBER Force Fields improves the description of Z- and B-DNA. J. Chem. Theory Comput. 2015, 11, 5723–5736. [DOI] [PubMed] [Google Scholar]
- (36).Case DA; Betz RM; Cerutti DS; Cheatham TE III; Darden TA; Duke RE; Giese TJ; Gohlke H; Goetz AW; Homeyer N et al. AMBER 16. University of California, San Francisco: San Francisco, CA, 2016. [Google Scholar]
- (37).Horn HW; Swope WC; Pitera JW; Madura JD; Dick TJ; Hura GL; Head-Gordon T Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. J. Chem. Phys 2004, 120, 9665–9678. [DOI] [PubMed] [Google Scholar]
- (38).Joung IS; Cheatham TE III Determination of alkali and halide monovalent ion parameters for use in explicitly solvated biomolecular simulations. J. Phys. Chem. B 2008, 112, 9020–9041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (39).Li P; Roberts BP; Chakravorty DK; Merz KM Jr. Rational design of Particle Mesh Ewald compatible Lennard-Jones parameters for +2 metal cations in explicit solvent. J. Chem. Theory Comput. 2013, 9, 2733–2748. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (40).Essmann U; Perera L; Berkowitz ML; Darden T; Hsing L; Pedersen LG A smooth particle mesh Ewald method. J. Chem. Phys 1995, 103, 8577–8593. [Google Scholar]
- (41).Ryckaert JP; Ciccotti G; Berendsen HJC Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes.J. Comput. Phys 1977, 23, 327–341. [Google Scholar]
- (42).Huang M; Giese TJ; Lee T-S; York DM Improvement of DNA and RNA Sugar Pucker Profiles from Semiempirical Quantum Methods. J. Chem. Theory Comput. 2014, 10, 1538–1545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (43).Prive GG; Yanagi K; Dickerson RE Structure of the B-DNA decamer C-C-AA-C-G-T-T-G-G and comparison with isomorphous decamers C-C-A-A-G-A-T-T-G-G and C-C-A-G-G-C-C-T-G-G. J. Mol. Biol 1991, 217, 177–199. [DOI] [PubMed] [Google Scholar]
- (44).Bevan DR; Li L; Pedersen LG; Darden TA Molecular Dynamics Simulations of the d(CCAACGTTGG)2 Decamer: Influence of the Crystal Environment. Biophys. J 2000, 78, 668–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (45).Baucom J; Thomas T; Fuentes-Cabrera M; Krahn JM; Darden TA; Sagui C Molecular dynamics simulations of the d(CCAACGTTGG)2 decamer in crystal environment: comparison of atomic point-charge, extra-point, and polarizable force fields. J. Chem. Phys 2004, 121, 6998–7008. [DOI] [PubMed] [Google Scholar]
- (46).Babin V; Baucom J; Darden TA; Sagui C Molecular dynamics simulations of DNA with polarizable force fields: convergence of an ideal B-DNA structure to the crystallographic structure. J. Phys. Chem. B 2006, 110, 11571–11581. [DOI] [PubMed] [Google Scholar]
- (47).Babin V; Baucom J; Darden TA; Sagui C Molecular Dynamics simulations of polarizable DNA in crystal environment. Int. J. Quantum Chem. 2006, 106, 3260–3269. [Google Scholar]
- (48).Ramakrishnan B; Sundaralingam M Evidence for crystal environment dominating base sequence effects on DNA conformation: Crystal structures of the orthorhombic and hexagonal polymorphs of the A-DNA decamer d(GCGGGCCCGC) and comparison with their isomorphous crystal structures. Biochemistry 1993, 32, 11458–11468. [DOI] [PubMed] [Google Scholar]
- (49).Thiyagarajan S; Kumar PS; Rajan SS; Gautham N Structure of d(TGCGCA)2 at 293 K: comparison of the effects of sequence and temperature. Acta Cryst. Sec. D 2002, 58, 1381–1384. [DOI] [PubMed] [Google Scholar]
- (50).Grzeskowiak K; Yanagi K; Prive GG; Dickerson RE The structure of B–helical C–G–A–T–C–G–A–T–C–G and comparison with C–C–A–A–C–G–T–T–G–G. J. Biol. Chem 1991, 266, 8861–8883. [DOI] [PubMed] [Google Scholar]
- (51).Leonard GA; Hunter WN Crystal and molecular structure of d(CGTAGATCTACG) at 2·25 resolution. J. Mol. Biol 1993, 234, 198–208. [DOI] [PubMed] [Google Scholar]
- (52).Drew HR; Wing RM; Takano T; Broka C; Tanaka S; Itakura K; Dickerson RE Structure of a B-DNA dodecamer: conformation and dynamics. Proc. Natl. Acad. Sci. USA 1981, 78, 2179–2183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (53).Zheng G; Lu X-J; Olson WK Web 3DNA– a web server for the analysis, reconstruction, and visualization of three-dimensional nucleic-acid structures. Nucleic Acids Res. 2009, 37, 240–246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (54).Bloomfield VA; Crothers DM; Tinoco I Jr. Nucleic Acids: Structures, Properties, and Functions; University Science Books: Sausalito, CA, 2000. [Google Scholar]
- (55).Tereshko V; Subirana JA Influence of packing interactions on the average conformation of B-DNA in crystalline structures. Acta Cryst. Sec. D 1999, 55, 810–819. [DOI] [PubMed] [Google Scholar]
- (56).Butcher SE; Pyle AM The molecular interactions that stabilize RNA tertiary structure: RNA motifs, patterns, and networks. Acc. Chem. Res 2011, 44, 1302–1311. [DOI] [PubMed] [Google Scholar]
- (57).Todolli S; Perez PJ; Clauvelin N; Olson WK Contributions of Sequence to the Higher-Order Structures of DNA. Biophys. J 2017, 112, 416–426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (58).Salmon L; Yang S; Al-Hashimi HM Advances in the Determination of Nucleic Acid Conformational Ensembles. Annu. Rev. Phys. Chem 2014, 65, 293–316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (59).Jones CP; Ferré-D’Amaré AR RNA quaternary structure and global symmetry. Trends Biochem. Sci 2015, 40, 211–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.