Abstract
Structure-based models (SBMs) are simplified models of the biomolecular dynamics that arise from funneled energy landscapes. We recently introduced an all-atom SBM that explicitly represents the atomic geometry of a biomolecule. While this initial study showed the robustness of the all-atom SBM Hamiltonian to changes in many of the energetic parameters, an important aspect, which has not been explored previously, is the definition of native interactions. In this study, we propose a general definition for generating atomically-grained contact maps called “Shadow.” The Shadow algorithm initially considers all atoms within a cutoff distance and then, controlled by a screening parameter, discards the occluded contacts. We show that this choice of contact map is not only well behaved for protein folding, since it produces consistently cooperative folding behavior in SBMs, but also desirable for exploring the dynamics of macromolecular assemblies since it distributes energy similarly between RNAs and proteins despite their disparate internal packing. All-atom structure-based models employing Shadow contact maps provide a general framework for exploring the geometrical features of biomolecules, especially the connections between folding and function.
1 Introduction
Structural biology techniques, such as nuclear magnetic resonance (NMR), x-ray crystal-lography, and cryogenic electron microscopy (cryo-EM), have provided extraordinary insights into the details of the functional configurations of biomolecular systems. Recent experimental advances have enabled the structural characterization of large and diverse molecular assemblies that are composed of heterogeneous parts: DNA, RNA, proteins and small molecules. Some notable examples include the ribosome,1 proteasome2 and spliceosome.3 Molecular simulations allow one to connect these static pictures with dynamical experimental data such as single molecule Férster resonance energy transfer (FRET).4 To bridge static and dynamic structural data, it is essential that we establish robust theoretical models that are able to accurately describe the dynamics of complex biomolecules.
Long-range communication between spatially-distant residues in these assemblies are, to first approximation, controlled by the geometry of the molecular complexes. The ability of structural biologists to capture detailed structural information means that these folded structures are in low free-energy configurations. Energy landscape theory5–7 and the principle of minimal frustration8 explain that robust folding and assembly implies that these low free energy native structures are composed of consistent or “minimally frustrated” native interactions. This organization leads to a funnel-shaped energy landscape, where the overall energetic drive to the native structure is much larger than competing traps stemming from non-native interactions. While originally developed in the context of protein folding, energy landscape theory has been extended to account for biomolecular oligomerization9 and functional transitions.10 The class of theoretical models that probe the dynamics that emerge from a molecular geometry is called a structure-based model (SBM).11,12 SBMs represent the dynamics of minimally frustrated systems through the approximation that all native interactions are stabilizing and include non-native interactions to maintain proper excluded volume.
We recently introduced an all-atom SBM, which explicitly represents the atomic geometry of a biomolecule.12 This SBM is a baseline model that can be used to fully discern the role of biomolecular geometry. Additionally, by introducing additional energetic complexity, one may also uncover the extent to which detailed energetics contribute to biomolecular structure, folding, and function. While our initial study showed the robustness of the all-atom SBM Hamiltonian to changes in many of the energetic parameters,12 an important aspect, which has not been explored previously, is the definition of native interactions. Each native interaction, or “native contact,” is formed by an atom-atom pair (or residue-residue pair in a Cα representation) interaction that is proximate in the native state. The set of native contacts is called a contact map and is a ubiquitous tool in the analysis of internal biomolecular interaction networks.13–15
The definitions of contact maps in the literature are nearly as diverse as their applications. The simplest algorithms define contacts between atom (or residue) pairs that are within a cutoff radius of each other.16 More complicated algorithms additionally consider, for example, solvent accessibility17,18 or atomic chemistry.19 For protein folding studies, contacts have often been defined through the atomic geometry, by choosing residue pairs that have heavy atoms within a cutoff distance (4.0 Å to 6.5 Å)20–23 or atom pairs that shield each other's solvent accessibility.11,24 In a SBM, the native contact map is an integral part of the Hamiltonian, since it defines the distribution of stabilizing energy in the biomolecule. Therefore, as SBMs are being explored on multiple levels of detail and are being applied to increasingly diverse and heterogenous systems, a consistent method for choosing contact maps is desirable.
In this study, we propose a general definition for generating atomically-grained contact maps called “Shadow” (Figure 1). It is motivated by the need to satisfy two mutually incompatible features of a simple heavy atom cutoff contact map: to include relevant contacts at distances of at least 6 Å without introducing nonphysical next-nearest neighbor contacts. Long cutoffs enable the map to capture atomic contacts across structural waters or heavy metals that are not explicitly represented. At long cutoff distances though, contacts will be introduced between atom pairs that we do not wish to model, specifically, those that have an intervening atom. The Shadow algorithm initially considers all atoms within a cutoff distance C and then, controlled by a screening parameter S, discards the occluded contacts. We compare two classes of contact maps: 1) maps based on a simple cutoff distance C and S = 0, and 2) maps with S > 0. They are compared dynamically by measuring the folding thermodynamics of well studied two-state proteins, the thermodynamics of an RNA hairpin, and the native basin fluctuations of the ribosome. We find that the Shadow contact map gives a consistent definition of atomically-grained native interactions from small proteins up to macromolecular assemblies. Two-state proteins and RNA hairpins show reliably cooperative folding transitions. Also, Shadow contact maps distribute energy similarly between RNAs and proteins despite their disparate internal packing. All-atom structure-based models employing Shadow contact maps are a general framework for exploring the geometrical features of biomolecules, especially the connections between folding and function.
2 Methods
2.1 The all-atom structure-based model
The all-atom (AA) SBM12,25,26 for proteins discussed in this study has been recently used to study proteins, nucleic acids, and ligands for both dynamics4,27,28 and molecular modeling.29–31 All heavy (non-hydrogen) atoms are included and each atom is represented as a single bead of unit mass. Bond lengths, bond angles, improper dihedrals, and planar dihedrals are maintained by harmonic potentials. Dihedral energy functions are defined such that each angle is a minimum in the native configuration. Non-bonded atom pairs that are in contact in the native state between residues i and j, where i > j + 3, are given an attractive potential, while all other non-local interactions are repulsive, which ensures the atoms have a defined excluded volume. The functional form of the potential is,
(1) |
where FD is a traditional dihedral potential . Cij is a contact potential, an effective short range interaction between atoms i and j that are in contact in the native state (see Section 2.4). The definition of the native contacts is considered in detail in Section 2.3. Consistent with previous studies,12,25 three criteria define the values of εBB, εSC, and εC for a given molecular complex. 1) εBB and εSC are scaled so that . 2) The energetic weight of each dihedral and contact is also scaled, such that the ratio of total contact energy to total dihedral energy , is satisfied. 3) The total stabilizing energy is set, such that , where ε is the reduced energy unit. Here, RBB/SC = 2 for protein and RBB/SC = 1 for RNA, and RC/D = 2. This means that as the contact map is varied, even though the number of contacts may vary, the net energy contribution from the contacts is constant at . The energy per contact though will vary. This allows for careful comparison between the different native contact maps. is the native distance separation between atoms i and j. εb = 100ε, εθ = 20ε, εχ = 10ε, and εNC = ε. r0, θ0, χ0, ϕ0 and are given the values found in the native state and rNC = 1.7 Å. We note that previous implementations of SBMs of RNA25 reduced the strength of contacts between stacked bases by a factor of 3 when using cutoff maps (also see Section 3.4).
2.2 Simulation details
AA structure-based simulations were performed using the Gromacs software package.32 Protein simulations were typically performed on 4 cores and the ribosome simulations were performed on 128, or 256, cores each. The Gromacs source code was modified to include the Gaussian interaction (available at http://smog.ucsd.edu); no further modifications were necessary. The Gromacs topology files were generated with the smog@ctbp webserver.33 Reduced units were used. The time step τ was 0.0005. Temperature was controlled through stochastic dynamics with a coupling constant of 2. For all systems simulated in this paper, several constant temperature trajectories were obtained. In the case of folding, temperatures varied from the protein being always folded to always unfolded, and trajectories contained many folding transitions (> 20). The Weighted Histogram Analysis Method34,35 was used to combine data from multiple temperatures into single free energy profiles. Each ribosome simulation was performed for 2 × 107 time steps, with the second half used for data analysis. Fluctuations in proteins are calculated from 2 × 107 time steps of data. Convergence of native-state fluctuations was reached by 107 time steps, since doubling the data gives no discernible difference in the averages.
2.3 Contact maps
Atoms that are spatially near in the native state are considered contacts and together the set of all contacts composes a native contact map. A contact map encodes which atom pairs ij are given attractive interactions in the SBM potential. In the context of a SBM, the native contact map sets the distribution of renormalized stabilizing enthalpy in the native state.
Here, we propose an algorithm for determining atomic contacts, called Shadow. It uses a heavy-atom cutoff distance together with geometric occlusion. There are two parameters in the algorithm, the cutoff distance C and the screening radius S (Figure 1). The algorithm can be metaphorically described: if a light source were located at the center of atom i, and all other atoms were opaque, then all atoms within the cutoff C that have no shadow cast upon them would be considered contacts. To keep the bonded atoms from overlapping, S is maintained ≤ 0.5 Å when screening a bonded neighbor. To put shadowing in the context of other approaches, we compare it to the commonly used simple heavy-atom cutoff distance (S = 0). We denote a contact map with cutoff distance C and screening radius S as . C and S are given in units of Å. Related geometric occlusion methods have been employed by Wu et al.36 and by Veloso et al.19
2.3.1 The Shadow map
The parameter set recommended for general use is termed “the Shadow map,” and refers to . Both the Shadow map and contact maps with variable S and C can be generated using the smog@ctbp webserver (http://smog.ucsd.edu).33
2.4 Contact potential
All of the pair interactions defined in the native contact map interact through an attractive potential, denoted in the SBM potential by . The contact potential has a minimum at the distance between the pair in the native state . Traditionally, a contact is defined through a Lennard-Jones (LJ) type potential,
(2) |
The LJ potentials are well tested and perform well for many systems, but they introduce an excluded volume that scales with the contact distance (Figure 2). Since the effective volume of two atoms in contact grows with , this can lead to complications for certain applications. The variable repulsion introduces heterogeneity into coarse-grained beads, which allows the model to capture effective excluded volume effects. However, it is less clear that this feature is appropriate for all-atom SBM, since the excluded volume should already be explicitly captured by the all-atom geometry.
Here, we employ a contact potential that allows independent control of the excluded volume. By decoupling the protein geometry from the energetics, the contact map definition is independent of the excluded volume. Without this feature, the consequences of varying the contact map will be obscured by the entropic effects of the varying excluded volume. As used elsewhere,27,37 we included contact interactions through the use of an attractive Gaussian well coupled to a fixed LJ repulsion,
(3) |
where
(4) |
This functional form ensures that the depth of the minimum is -1 (scaled by εC in Equation 1), and rex sets the excluded volume. rex has the same function as rNC in Eq. 1. If rex = rNC, all atomic interactions have an equal excluded volume. For consistency with the LJ potentials, the width of the Gaussian well σij models the variable width of the LJ potential. so σij is defined such that giving . If rex is significantly smaller than Eq. 3 reduces to a more transparent form,
(5) |
2.5 Thermodynamics
Folding experiments on small globular proteins have long shown evidence of thermodynamic and kinetic cooperativity, which indicates a phenomenon similar to a first order phase transition between native and denatured states.38,39 To quantify the thermodynamics and cooperativity of the SBM, the heat capacity was calculated. Two different dimensionless measures of cooperativity are considered: the width of the peak in the heat capacity κ1 and the van't Hoff criterion κ2. Both are applicable for describing the cooperativity of two-state transitions.24,40,41
(6) |
where σ1/2 is the full width at half maximum of the heat capacity and Tmax is the temperature corresponding to the peak in CV (Figure 3). κ1 is interpreted as a measure of the temperature range over which the transition occurs, where smaller κ1 indicates a higher degree of cooperativity.
The van't Hoff criterion κ2 is a measure of cooperativity that is based on the enthalpy distribution during the transition. A cooperative transition has a well defined energy separation between unfolded U and folded F ensembles. With Keq = [F]/[U] as the equilibrium constant of the folding reaction, the van't Hoff criterion is defined at the midpoint of the transition, given by .
(7) |
where ΔHcal is the calorimetric enthalpy change of the transition and 〈H〉X is the average enthalpy of ensemble X. ΔHcal is the integral of CV over the transition region and is determined by extrapolating the unfolded state enthalpy and the folded state enthalpy to T1/2, the temperature where (Figure 3). These extrapolations, known as baselines, approximate the temperature dependence of the enthalpy in the absence of the protein transition.41 The baselines, HU and HF, isolate the heat change of the transition, ΔHcal = HU(T1/2)– HF (T1/2). Determination of T1/2 requires a definition of the unfolded and folded ensembles. In this investigation, a cutoff in root mean square deviation from the native state (rmsd) dc is used to partition configurations.42,43 The “proper” dc may be determined simply by maximizing κ2, i.e. ∂κ2/∂dc = 0. Note that simplifying the calculation by fixing 〈H〉F = HF will overestimate κ2 since 〈H〉F > HF.
3 Results and discussion
Since SBM are applied to diverse biomolecular systems, the present study encompasses a broad range of biomolecular systems, in particular, globular proteins, RNA, and the ribosome. First, we discuss the effects of geometric occlusion on the number and distribution of native contacts in globular proteins and in RNA secondary structure. Then we analyze the sensitivity of both folding thermodynamics and native state fluctuations to the choice of native contacts in model protein and RNA systems. Lastly, we examine the sensitivity of fluctuations to the contact map in a large molecular assembly: the ribosome.
3.1 Protein contact maps
Protein native structures, as determined by structural biology techniques, are compact and densely-packed structures stabilized by both short- and long-range interactions.44,45 The all-atom SBM encodes the stability imparted by these interactions with short-range attractive potentials between pairs of atoms. These interactions drive the protein towards the low free energy native configuration. The short-range atomic interactions in proteins are on the Å length scale. The closest pairs are the hydrogen bonding interactions between the carboxyl O and amino N found throughout α-helices and β-sheets. The N-O are commonly separated by 2.6-3.0 Å. In the hydrophobic core, carbon pairs are separated by 3.5-4.5 Å. This longer distance is a consequence of the larger Van der Waals radius of carbon compared to oxygen and nitrogen. Salt-bridges exist in protein cores with separations up to 5.5 Å.46 Indirect pair interactions mediated through water molecules, either surface or buried, can vary between 5-7 Å45 and are a source of enthalpic stabilization.47
An algorithm to generate protein contact maps that includes all of the above mentioned short-range interactions must accommodate pair separations up to at least 6 Å, or more. While a simple cutoff distance criterion will capture all of the essential interactions, there will also be many occluded pair interactions that we are not seeking to model (Figure 1). The occluded interactions represent 3-body interactions, and their effects should be considered higher-order corrections. These higher-order, occluded contacts can be identified, and discarded, by using the shadowing geometric criteria described in Section 2.3. The parameter choices of C = 6 Å and S = 1 Å, or , define the contact map henceforth called “the Shadow map.”
3.1.1 Removal of contacts through geometric occlusion
The abundance of occluded contacts is checked by constructing various native contact maps , where S and C are measured in Å and are described in Figure 1. S is the screening strength, which sets the radius of each shadowing atom, and C is the cutoff radius that sets the maximum separation allowed between contacts. Results are summarized for 4 proteins in Table 1. To quantify average values and statistical variability in the contact map calculations, we use a standard library of 33 non-homologous globular proteins (NHGP) often used in structure prediction.47 Figure 4A shows the number of contacts per atom as a function of the cutoff radius for different shadowing sizes, averaged over NHGP. has nearly 6 contacts per atom, but for it drops to 1.2 contacts per atom. Thus, geometric occlusion removes 80% of the contacts if shadowing atoms are given a radius of 1 Å. Interestingly, shadowing has a significant effect, even at cutoff distances as small as 4 Å, while .
Table 1.
Protein | Residues | Total contacts | ||||||
UBQa | 76 | 387 | 322 | 262 | 625 | 874 | 1504 | 3510 |
CI2b | 64 | 280 | 229 | 188 | 466 | 597 | 1117 | 2566 |
SH3c | 57 | 325 | 266 | 185 | 409 | 532 | 1397 | 3155 |
BDPAd | 46 | 179 | 153 | 115 | 222 | 329 | 575 | 1345 |
Contacts per res.e | Contacts per atom | |||||||
CI2 | 9.3 | 0.56 | 0.45 | 0.37 | 0.92 | 1.2 | 2.2 | 5.1 |
SH3 | 9.3 | 0.71 | 0.58 | 0.41 | 0.89 | 1.2 | 3.1 | 6.9 |
BDPA | 7.1 | 0.49 | 0.41 | 0.31 | 0.60 | 0.90 | 1.6 | 3.6 |
NHGPf | 9.1 | 0.70 | 0.60 | 0.43 | 0.89 | 1.2 | 2.6 | 5.7 |
Dispersion in contacts between residuesg | ||||||||
NHGP | 2.50 | 2.49 | 2.05 | 1.83 | 1.63 | 2.06 | 1.67 | |
CV criterion, κ1 | ||||||||
CI2 | 3-state | 0.038 | 0.031 | 0.024 | 0.032 | 0.063 | 0.12 | |
SH3 | 0.027 | 0.025 | 0.021 | 0.025 | 0.028 | 0.033 | 0.047 | |
BDD | 0.14 | 0.087 | 0.050 | 0.046 | 0.046 | 0.072 | 0.10 | |
van 't Hoff criterion, κ2 | ||||||||
CI2 | 3-state | 0.66 | 0.73 | 0.93 | 0.84 | 0.70 | 0.72 | |
SH3 | 0.82 | 0.91 | 0.93 | 0.96 | 0.94 | 0.90 | 0.89 | |
BDD | 0.77 | 0.71 | 0.76 | 0.79 | 0.84 | 0.70 | 0.73 | |
PDB code 1UBQ
PDB code 1YPA
PDB code 1FMK, residues 84-140
PDB code 1BDD
Contacts per residue with map .
Average over a set of 33 non-homologous globular proteins.47
Atom-atom contacts per residue-residue contact, dispersion is normalized by average
While shadowing removes contacts that are occluded by intervening atoms, longer distance contacts that are separated by buried (implicit) solvent are maintained. Figure 5A,B show the contact networks in regions of disrupted secondary structure, where buried waters satisfy the left over backbone hydrogen bonds. The black dotted lines highlight contacts that are separated by more than 4.5 Å but that are included in the Shadow map. In both cases shown, there are waters sufficiently localized to be detected in x-ray crystallography, which are depicted by yellow spheres. The water molecules sit in voids in the protein interior, and provide stabilization to a configuration that would otherwise be enthalpically costly. Although the solvent is not explicitly modeled in SBM, choosing contacts through shadowing automatically fills these open pockets with compensating contacts because there are no occluding atoms in the void left by the solvent.
There are global differences between the distributions of stabilizing contact energy between cutoff and shadowing maps, i.e. S = 0 and S = 1. The most obvious difference is the significant reduction in the total number of contacts when S = 1. This reduction in contacts is strongest for the longest distance contacts, since they are more likely to be occluded. This alters the contact radial distribution function (Figure 4B,C). The distribution becomes more heavily weighted towards short-range contacts. Peaks at 3 Å and 4 Å become visible in and are more pronounced for (only nearest neighbors). For all contact maps, the 3 Å peak is due to the hydrogen bonding interactions along the secondary structure and the 4 Å peak results from hydrophobic interactions. A more subtle difference is that shadowing tends to smooth the distribution of stabilizing energy between residues. There is a reduction in residue-residue contact energy variance for S > 0 (Table 1). Residue-residue contact energy is defined as the sum of atom-atom contacts shared between two residues. These differing contact energy distributions will be seen to alter the thermodynamics of protein folding (discussed in Section 3.3).
A quantity that shows no systematic variation with contact map is the relative contact order (CO). Averaged over the proteins in NHGP, , and there is little variation from protein to protein since 〈|ΔCO|/CO〉 = 0.04. The constant CO shows that the ratio of long range to short range (in sequence) contacts is constant.
3.1.2 Parameter reduction: C → ∞ and S → ∞
By increasing C → ∞, a cutoff-invariant definition of contacts is obtained. This corresponds to including as contacts any unshadowed atoms regardless of distance. As mentioned above, any protein interior contacts so generated are likely enthalpically important since an absence of mediating atoms is entropically unlikely. increases by 1.7 over to 2.9, but for a slightly larger shadow size, only increases by 0.3 contacts per atom over . The amount of free space rapidly decreases for S > 1. These additional contacts generated with long cutoffs are dominated by atoms near the protein surface interacting through multiple waters. In order to separate out the desired interior contacts, we would need to introduce a burial parameter, and this is left to future studies.
A parameter-less contact map results from C → ∞ and S → ∞. Since an atom k can only shadow a contact between atom pair ij if rik, rjk < rij, only includes the nearest neighbor pairs, . While can be used to find nearest neighbor pairs, nearly all interactions longer than 4.5 Å are excluded (Figure 4C) and it does not result in cooperative folding (data not shown).
3.2 Decoupling the protein geometry from the contact energy distribution
If the contact potentials introduce additional excluded volume between native atomic pairs (Figure 2), different contact maps will have different amounts of excluded volume. To probe the effects of introducing additional excluded volume in the native contacts, the thermodynamics of CI2 was calculated (Figure 6A). Heat capacity (CV) was compared for with varied native repulsive distances and a constant repulsive distance (rNC) between non-native beads of 1.7 Å. For example, the black curve labeled “4Å” includes a Lennard-Jones-type repulsion (σNC) at 4 Å between all native pairs of 4 Å or larger, and at the native position for those closer than 4 Å. The CV becomes sharper (more cooperative) with increasing native repulsion. Also, since the folded basin is being destabilized relative to the unfolded basin, the folding temperature TF (i.e. the temperature at the peak in CV) decreases. This excluded volume effect makes the Hamiltonian with Lennard-Jones contact potentials (”LJ”) markedly more cooperative and less stable than the equivalent Hamiltonian with Gaussian contact potentials (“1.7Å”).
The tendency of native excluded volume to alter cooperativity and stability has opposite thermodynamic behavior between Lennard-Jones and Gaussian potentials with , , and (Figure 6B). The Lennard-Jones potentials decrease protein stability since increasing the contact map cutoff C introduces more native contacts, and thus, more native excluded volume. The increased excluded volume decreases the entropy of the native basin relatively more than the unfolded basin, and therefore decreases the stability of the native state. In contrast, the Gaussian potentials isolate the effects of changing the contact energy distribution by maintaining a constant native excluded volume of 1.7 Å between all atoms. The Gaussian potentials show an opposite behavior, protein stability is increased as C is increased. Now the dominant effect is the increased entropy of the native state as more contacts are introduced. This stabilizing effect will be further discussed in the next section.
Independent of the contact map and contact potential, the repulsive size of the atoms also affects the folding cooperativity and stability. The Gaussian potential allows us to also isolate the effects of changing the atomic repulsion between either only the non-native atomic pairs or all atomic pairs (Figure 6C). The Shadow map () is used, non-native excluded volume is controlled by rNC (Equation 1), and native excluded volume is controlled by rex (Equation 3). Increasing the size of all the atoms has a similar effect as only increasing the repulsion between native pairs (Figure 6A), where κ1 increases and stability decreases. Since the native state is denser and has more atomic collisions than the unfolded configurations, the entropy of the native basin is relatively smaller when the atoms are larger. Somewhat surprising is that increasing the repulsive size of only the non-native interactions follows the same trend as well. While one might surmise that a larger excluded volume of non-native interactions lowers the entropy of the unfolded basin more than the folded basin, the destabilizing effect shows that in fact non-native interactions are more frequently encountered in the folded basin of the all-atom model. This is opposite to the effect seen in a closely related coarse-grained Cα-model.37 While the Cα atoms in the backbone are similarly constrained to their native positions in both the coarse-grained and all-atom models,12 the all-atom model introduces close-packed side chains that encounter many non-native atomic collisions. In addition to the close atomic distances, there are less native restraints on each atom since gives 1.2 contacts per atom versus 2.6 contacts per residue. We note that the ability to encounter non-native collisions is enhanced by the smooth energy landscape. Previous work showed that an all-atom SBM makes comparatively more non-native contacts in the folded basin than an explicit solvent transferable potential like OPLS.12
3.3 Shadowing tends to increase folding cooperativity
In this section, we explore the effects on folding cooperativity of changing the largest energetic component of the SBM, the native contact map. The native contact map defines the distribution of tertiary stabilizing energy. The effects of changing this distribution are isolated by using a Gaussian contact potential that maintains a constant excluded volume across contact maps (Figure 2). The model proteins are three small, fast-folding globular proteins: B-domain of protein A (BDPA), chymotrypsin inhibitor 2 (CI2), and the sh3 domain of csrc kinase (SH3). These three proteins, which we studied previously,12 are well studied both experimentally39,48,49 and theoretically11,20,50 and represent simple to complicated folds, respectively.15 Differential scanning microcalorimetry has shown that small globular proteins like BDPA, CI2, and SH3 fold cooperatively in a two state manner with singly peaked heat capacity at the folding transition and κ1 < 0.05 and κ2 > 0.95.39,40,51
We find that using a contact map generated with geometric occlusions consistently increases folding cooperativity relative to a map generated with a cutoff distance. Figure 7 shows the heat capacity calculated for two sets of contact maps and three proteins. The first set of maps used a direct cutoff (, , and ), while the second set have S = 1 (, , and ). In every case, the map with S = 1 has a smaller κ1 than the corresponding cutoff map (Table 1). In addition to consistently higher folding cooperativity, the thermal stabilities for S = 1 vary little in the same protein (<5%) and between proteins (<10%). The Shadow map () dependably gives folding temperatures near 1.2 for globular proteins. Proteins (PDB codes) not in Figure 7 that have been folded with the default all-atom SBM are 3MLG, 1RIS, 2A3D, and 2EFV, and have folding temperatures of 1.12, 1.21, 1.18, and 1.15, respectively.
The thermodynamics of the cutoff contact maps shows some interesting features. First, as C increases the protein becomes more thermally stable seen by the movement of TF. This is because of two effects: 1) as C increases the contacts are on average wider and 2) the stabilizing energy is more diffuse. Both of these effects increase the entropy of the native state and hence increase stability. The cutoff map contact distance distribution is skewed towards C, and therefore, the average native distance between contacts increases with C (Figure 4). A larger native distance produces a wider contact potential since (Equation 4). The energy distribution becomes more diffuse because at higher C there are more total contacts. The total energy available for the contacts is held fixed, so each contact has a smaller share of stabilizing energy. Interestingly, κ2 does not follow the trend of κ1 as C → 6 Å, instead staying constant or even increasing. This implies that the increase in κ1 is not from the introduction of intermediate states, but rather the slow conversion of well defined unfolded and folded ensembles. Second, there is a minimum cutoff distance, below which the protein no longer makes a cooperative transition. Remarkably, at C = 4 Å CI2 becomes a 3-state folder, the heat capacity shows a thermodynamic intermediate36 (Figure 7B). At C = 3.5 Å SH3 resembles a downhill folder (Figure 7C). Last, since cooperativity vanishes at both low and high C, there is a peak in cooperativity at an intermediate range of 4 Å < C < 5 Å. The thermal stability TF of the most cooperative cutoff maps is near the stability of the Shadow map. This property, that the contact maps with similar stabilities have similar cooperativities, was seen to hold among the many variations of tested for this paper. It implies that there is an optimal temperature to have a cooperative transition. Perhaps, the Shadow map consistently achieves this stability and thus is cooperative.
3.4 Dynamics of RNA and macromolecular assemblies
There are many new and exciting areas ripe for exploring through the lens of energy landscape theory, the foundation upon which structure-based models are built. These theoretical tools are already being applied to the study of RNA folding25,52,53 and the dynamics of molecular machines composed of either protein, such as kinesin,54 or RNA-protein complexes like the ribosome.4,55 In this section we look beyond protein folding, and show that Shadow contact maps provide a consistent treatment for heterogeneous systems, and thus, a solid framework for addressing the geometrical features of molecular machines.
3.4.1 RNA contact maps
RNA has three main types of contacts, Watson-Crick (WC) base-pairing, base-stacking (BS) interactions, and tertiary backbone contacts (Figure 5C). WC pairs are the hydrogen bonding interactions between complementary RNA bases (i.e. A·U and G·C). BS interactions refer to π-π stacking: attractive, non-covalent interactions between the aromatic rings of stacked bases that are adjacent in sequence. Maintaining proper energetic balance between these interactions will be important to the performance of RNA models.
Short-range cutoff contact maps have been shown to overweight the BS interactions relative to WC pairs and tertiary contacts. To maintain a proper balance between secondary and tertiary structure in the study of the folding of the mRNA SAM-I riboswitch with a SBM,25 BS interactions were scaled by a factor of when using a 4 Å contact map . Here, we denote the cutoff contact map including scaled BS interactions as . The over-stabilization of BS interactions in arises from the geometry of closely packed rings. As seen in Figure 5C, atoms 1 and 2 are each within 4 Å of five atoms in the adjacent stacked base. This is the case for every atom in the ring, and for every stacked ring in the riboswitch. Interestingly, if geometric occlusion is considered, due to the close packing, the over-counting is avoided. Introducing shadowing with S = 1 Å, atoms 1 and 2 each have only a single stacking interaction.
Shadowing naturally gives rise to the approximate scaling in stacking interactions. Table 2 compares to for an RNA helix. Base-stacking interactions relative to WC pairs are decreased by a factor of 0.59/1.74 = 0.34. Relative to all contacts, the BS contacts are decreased by a factor of 0.18/0.48 = 0.37, which is in surprising agreement with the previous conjecture of .25 Thus, the energy distribution between and are similar in RNA, but vary by a factor of 2.5 in the number of total contacts. The heat capacity of the isolated 16 residue P2 helix of the SAM-I riboswitch25 was calculated for the two contact maps (Figure 8). is more cooperative, while is more stable. These trends are in line with those observed in Section 3.3 for proteins. in protein is the analog of in RNA, it introduced an excess of contacts that increased stability, while the shadow map was less stable but more cooperative. So, while the Shadow map gives a reasonable distribution of energy within RNA, to be applied to RNA-protein assemblies, the shared energy distribution with proteins must also be balanced.
Table 2.
RNA helix | ||
Watson-Crick contacts (WC) | 137 | 61 |
Base-stacking contacts (BS) | 232 | 35 |
Total contacts (All) | 480 | 190 |
WC/BS | 0.59 | 1.74 |
BS/All | 0.48 | 0.18 |
Ribosome | ||
Erna–rna contacts | 77529 | 57355 |
Epro–rna contacts | 8045 | 14771 |
Epro–pro contacts | 15053 | 28510 |
(per RNA atom)a | 0.37 | 0.37 |
(per protein atom) | 0.26 | 0.26 |
(per RNA atom)b | 1.18 | 1.01 |
(per protein atom) | 0.64 | 0.97 |
in RNA atomsc | 0.27 | 0.26 |
in protein atoms |
0.63 | 0.49 |
Dihedral energy in RNA (protein) divided by the number of RNA (protein) atoms
Total contact and dihedral energy in RNA (protein) divided by the number of RNA (protein) atoms.
EC represents the contact energy per atom by residue.
3.4.2 Shadowing in heterogeneous assemblies
Tables 1 and 2 indicate that the atomic packing is very different between RNA and protein. In the RNA hairpin, has 150% more contacts than , whereas in proteins, has ~ 70% more contacts than . The regularity of base-stacking dominates the short-range contacts in RNA. Proteins, in contrast, have no regular residue packing since the amino acid side chains have a diversity shapes. This difference in packing causes short-range cutoff maps to skew the distribution of stabilizing energy in favor of RNA.
The contact energy per atom by residue EC in the ribosome is shown in Figure 9. Even with the BS contacts scaled by in , the contact energy in RNA is double that in protein, , where is averaged over all residues of type X. The Shadow map gives a much closer division, . Since RNA has a higher density of dihedrals than protein, if the dihedral and contact energy are summed and compared, the Shadow map gives an equal distribution of energy, . This feature is desirable when simulating heterogeneous molecular assemblies. Fluctuations in the ribosome for two different contact maps, and , are compared to the fluctuations predicted from the experimental B-factors (Figure 10A). For the 23S Ribosomal RNA the correlation between experiment and the SBM with the Shadow map is 0.78. On a smaller scale, to highlight the variability between contact maps, fluctuations are shown for three proteins at ~ 0.75TF (Figure 10B). While the correlation is high between the between the two maps, deviations can be seen. Future work will have to explore how robust these fluctuations are since deviations in fluctuations between related proteins have been predicted to have functional consequences.28
4 Conclusions
We have proposed a general algorithm for generating atomically-grained contact maps called “Shadow” (Figure 1). This algorithm enables sufficient contact cutoff distances to capture atomic contacts across structural waters or heavy metals that are not explicitly represented, without introducing contacts between atom pairs that one does not wish to model, specifically, those that have an intervening atom. The Shadow algorithm initially considers all atoms within a cutoff distance C = 6 Å and then, controlled by a screening parameter S = 1 Å, discards the occluded contacts. We showed that this choice of contact map is not only well behaved for protein folding, since it produces consistently cooperative folding behavior, but also desirable in exploring the dynamics of macromolecular assemblies since it distributes energy similarly between RNAs and proteins despite their disparate internal packing.
The study of the connection between the contact distribution and folding cooperativity highlighted that many components of the SBM Hamiltonian affect cooperativity, especially the geometric components. We showed how the Lennard-Jones contact interaction mixes the geometric and energetic parts of the Hamiltonian by changing the excluded volume of native interactions. By decoupling the geometric and energetic parts with the Gaussian contact potential, it became clear that the increased cooperativity obtained through additional Lennard-Jones native contacts was caused by the extra excluded volume. Further, the decoupling showed that the innate cooperativity of the Shadow map was purely an effect of the contact energy distribution. In the case of CI2, the energetic effect of changing contact maps from to decreased κ1 from 0.12 to 0.032, while the geometric effect of increasing the diameter of the atoms from 1.7 Å to a more realistic 2.4 Å brought κ1 even further down to 0.018 (experimental range was κ1 < 0.05). Other studies have shown that, for example, excluded volume,56,57 backbone stiffness,12,58 contact potential width (e.g. σij in Eq. 4)37,59,60 and many-body effects24,57,61 affect the cooperativity of protein folding models.
Structure-based models will continue to be an important tool in the characterization of molecular machines and macromolecular assemblies. They are baseline models that can be used to fully discern the role of biomolecular geometry. Going forward, all-atom structure-based models employing Shadow contact maps provide a general framework for exploring the geometrical features of biomolecules, especially the connections between folding and function.
Acknowledgments
JKN wishes to thank Shachi Gosavi for helpful discussion and enthusiasm. This work was supported by the Center for Theoretical Biological Physics sponsored by the NSF (Grant PHY-0822283) and by NSF-MCB-1214457. JNO is a CPRIT Scholar in Cancer Research sponsored by the Cancer Prevention and Research Institute of Texas. This research was also supported in part by the NSF through TeraGrid resources provided by TACC under grant number TGMCB110021. JKN was supported in part by an NIH Molecular Biophysics Training Grant while at UCSD (Grant T32 GM08326).
References
- 1.Yusupov MM, Yusupova GZ, Baucom A, Lieberman K, Earnest TN, Cate JH, Noller HF. Science. 2001;292(5518):883–896. doi: 10.1126/science.1060089. [DOI] [PubMed] [Google Scholar]
- 2.Bochtler M, Ditzel L, Groll M, Hartmann C, Huber R. Annu. Rev. Biophys. Biomol. Struct. 1999;28:295–317. doi: 10.1146/annurev.biophys.28.1.295. [DOI] [PubMed] [Google Scholar]
- 3.Wahl MC, Will CL, Lührmann R. cell. 2009;136(4):701–718. doi: 10.1016/j.cell.2009.02.009. [DOI] [PubMed] [Google Scholar]
- 4.Whitford PC, Geggier P, Altman RB, Blanchard SC, Onuchic JN, Sanbonmatsu KY. rna. 2010;16(6):1196–1204. doi: 10.1261/rna.2035410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bryngelson J, Wolynes PJ. Phys. Chem. 1989;93:6902–6915. [Google Scholar]
- 6.Leopold PE, Montal M, Onuchic JN. Proc. Nat. Acad. Sci. USA. 1992;89(18):8721–8725. doi: 10.1073/pnas.89.18.8721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Onuchic JN, Wolynes PG. Curr. Opin. Struct. Biol. 2004;14(1):70–75. doi: 10.1016/j.sbi.2004.01.009. [DOI] [PubMed] [Google Scholar]
- 8.Bryngelson J, Wolynes P. Proc. Nat. Acad. Sci. USA. 1987;84:7524. doi: 10.1073/pnas.84.21.7524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Levy Y, Wolynes PG, Onuchic JN. Proc. Nat. Acad. Sci. USA. 2004;101(2):511–516. doi: 10.1073/pnas.2534828100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Miyashita O, Onuchic JN, Wolynes PG. Proc. Nat. Acad. Sci. USA. 2003;100(22):12570–12575. doi: 10.1073/pnas.2135471100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Clementi C, Nymeyer H, Onuchic JN. J. Mol. Biol. 2000;298(5):937–953. doi: 10.1006/jmbi.2000.3693. [DOI] [PubMed] [Google Scholar]
- 12.Whitford PC, Noel JK, Gosavi S, Schug A, Sanbonmatsu KY, Onuchic JN. Proteins. 2009;75(2):430–441. doi: 10.1002/prot.22253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Miyazawa S, Jernigan R. Macromolecules. 1985;18(3):534–552. [Google Scholar]
- 14.Tirion M. Phys. Rev. Lett. 1996;77(9):1905–1908. doi: 10.1103/PhysRevLett.77.1905. [DOI] [PubMed] [Google Scholar]
- 15.Plaxco KW, Simons KT, Baker DJ. Mol. Biol. 1998;277(4):985–994. doi: 10.1006/jmbi.1998.1645. [DOI] [PubMed] [Google Scholar]
- 16.Silveira CHD, Pires DEV, Minardi RC, Ribeiro C, Veloso CJM, Lopes JCD, Meira W, Neshich G, Ramos CHI, Habesch R, et al. Proteins. 2009;74(3):727–743. doi: 10.1002/prot.22187. [DOI] [PubMed] [Google Scholar]
- 17.Sobolev V, Sorokine A, Prilusky J, Abola EE, Edelman M. Bioinformatics. 1999;15(4):327–332. doi: 10.1093/bioinformatics/15.4.327. [DOI] [PubMed] [Google Scholar]
- 18.Sułkowska JI, Cieplak M. Biophys. J. 2008;95(7):3174–3191. doi: 10.1529/biophysj.107.127233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Veloso CJM, Silveira CH, Melo RC, Ribeiro C, Lopes JCD, Santoro MM, Meira W. Genet. Mol. Res. 2007;6(4):799–820. [PubMed] [Google Scholar]
- 20.Shea J, Onuchic J, C. B. Proc. Nat. Acad. Sci. USA. 1999;96(22):12512–12517. doi: 10.1073/pnas.96.22.12512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Koga N, Takada SJ. Mol. Biol. 2001;313(1):171–180. doi: 10.1006/jmbi.2001.5037. [DOI] [PubMed] [Google Scholar]
- 22.Shen T, Zong C, Portman JJ, Wolynes PG. J. Phys. Chem. B. 2008;112(19):6074–6082. doi: 10.1021/jp076280n. [DOI] [PubMed] [Google Scholar]
- 23.Zhang Z, Chan HS. Proc. Nat. Acad. Sci. USA. 2010;107(7):2920–2925. doi: 10.1073/pnas.0911844107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kaya H, Chan HS. J. Mol. Biol. 2003;326(3):911–931. doi: 10.1016/s0022-2836(02)01434-1. [DOI] [PubMed] [Google Scholar]
- 25.Whitford PC, Schug A, Saunders J, Hennelly SP, Onuchic JN, Sanbonmatsu KY. Biophys. J. 2009;96(2):L7–9. doi: 10.1016/j.bpj.2008.10.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Noel JK, Onuchic JN. In: Computational Modeling of Biological Systems. Dokholyan N, editor. Springer; New York: 2012. Chapter 2. [Google Scholar]
- 27.Noel JK, Sulkowska JI, Onuchic JN. Proc. Nat. Acad. Sci. USA. 2010;107(35):15403–15408. doi: 10.1073/pnas.1009522107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Nechushtai R, Lammert H, Michaeli D, Eisenberg-Domovich Y, Zuris JA, Luca MA, Capraro DT, Fish A, Shimshon O, Roy M, et al. Proc. Nat. Acad. Sci. USA. 2011;108(6):2240–2245. doi: 10.1073/pnas.1019502108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Jamros MA, Oliveira LC, Whitford PC, Onuchic JN, Adams JA, Blumenthal DK, Jennings PA. J. Biol. Chem. 2010;285(46):36121–36128. doi: 10.1074/jbc.M110.116947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ratje AH, Loerke J, Mikolajka A, Brünner M, Hildebrand PW, Starosta AL, Dönhöfer A, Connell SR, Fucini P, Mielke T, et al. Nature. 2010;468(7324):713–716. doi: 10.1038/nature09547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Schug A, Weigt M, Onuchic JN, Hwa T, Szurmant H. Proc. Nat. Acad. Sci. USA. 2009;106(52):22124–22129. doi: 10.1073/pnas.0912100106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hess B, Kutzner C, van der Spoel D, Lindahl EJ. Chem. Theory Comput. 2008;4(3):435–447. doi: 10.1021/ct700301q. [DOI] [PubMed] [Google Scholar]
- 33.Noel JK, Whitford PC, Sanbonmatsu KY, Onuchic JN. Nucleic Acids Res. 2010;38:W657–61. doi: 10.1093/nar/gkq498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ferrenberg A, Swendsen R. Phys. Rev. Lett. 1988;61(23):2635–2638. doi: 10.1103/PhysRevLett.61.2635. [DOI] [PubMed] [Google Scholar]
- 35.Ferrenberg A, Swendsen R. Phys. Rev. Lett. 1989;63(12):1195–1198. doi: 10.1103/PhysRevLett.63.1195. [DOI] [PubMed] [Google Scholar]
- 36.Wu L, Zhang J, Qin M, Liu F, Wang WJ. Chem. Phys. 2008;128(23):235103. doi: 10.1063/1.2943202. [DOI] [PubMed] [Google Scholar]
- 37.Lammert H, Schug A, Onuchic JN. Proteins. 2009;77(4):881–891. doi: 10.1002/prot.22511. [DOI] [PubMed] [Google Scholar]
- 38.Privalov PL, Khechinashvili NN. J. Mol. Biol. 1974;86(3):665–684. doi: 10.1016/0022-2836(74)90188-0. [DOI] [PubMed] [Google Scholar]
- 39.Jackson SE, Fersht AR. Biochemistry. 1991;30(43):10428–10435. doi: 10.1021/bi00107a010. [DOI] [PubMed] [Google Scholar]
- 40.Privalov PL, Potekhin SA. Methods Enzymol. 1986;131:4–51. doi: 10.1016/0076-6879(86)31033-4. [DOI] [PubMed] [Google Scholar]
- 41.Kaya H, Chan HS. Proteins. 2000;40(4):637–661. doi: 10.1002/1097-0134(20000901)40:4<637::aid-prot80>3.0.co;2-4. [DOI] [PubMed] [Google Scholar]
- 42.Clementi C, García AE, Onuchic JN. J. Mol. Biol. 2003;326(3):933–954. doi: 10.1016/s0022-2836(02)01379-7. [DOI] [PubMed] [Google Scholar]
- 43.Cho S, Levy Y, Wolynes PG. Proc. Nat. Acad. Sci. USA. 2006;103(3):586–591. doi: 10.1073/pnas.0509768103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Faure G, Bornot A, de Brevern AG. Biochimie. 2008;90(4):626–639. doi: 10.1016/j.biochi.2007.11.007. [DOI] [PubMed] [Google Scholar]
- 45.Williams MA, Goodfellow JM, Thornton JM. Protein Sci. 1994;3(8):1224–1235. doi: 10.1002/pro.5560030808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Rashin AA, Honig BJ. Mol. Biol. 1984;173(4):515–521. doi: 10.1016/0022-2836(84)90394-2. [DOI] [PubMed] [Google Scholar]
- 47.Papoian GA, Ulander J, Eastwood MP, Luthey-Schulten Z, Wolynes PG. Proc. Nat. Acad. Sci. USA. 2004;101(10):3352–3357. doi: 10.1073/pnas.0307851100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Sato S, Religa TL, Daggett V, Fersht AR. Proc. Nat. Acad. Sci. USA. 2004;101(18):6952–6956. doi: 10.1073/pnas.0401396101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Viguera AR, Martínez JC, Filimonov VV, Mateo PL, Serrano L. Biochemistry. 1994;33(8):2142–2150. doi: 10.1021/bi00174a022. [DOI] [PubMed] [Google Scholar]
- 50.Hoang T, Cieplak MJ. Chem. Phys. 2000;113(18):8319–8328. [Google Scholar]
- 51.Schafer H, van Gunsteren WF, Mark AE. J. Comput. Chem. 1999;20(15):1604–1617. [Google Scholar]
- 52.Sorin EJ, Nakatani BJ, Rhee YM, Jayachandran G, Vishal V, Pande VS. J. Mol. Biol. 2004;337(4):789–797. doi: 10.1016/j.jmb.2004.02.024. [DOI] [PubMed] [Google Scholar]
- 53.Hyeon C, Dima RI, Thirumalai D. Structure. 2006;14(11):1633–1645. doi: 10.1016/j.str.2006.09.002. [DOI] [PubMed] [Google Scholar]
- 54.Hyeon C, Onuchic JN. Proc. Nat. Acad. Sci. USA. 2007;104(44):17382–17387. doi: 10.1073/pnas.0708828104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Whitford PC, Ahmed A, Yu Y, Hennelly SP, Tama F, Spahn CMT, Onuchic JN, Sanbonmatsu KY. Proc. Nat. Acad. Sci. USA. 2011;108(47):18943–18948. doi: 10.1073/pnas.1108363108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Qi X, Portman JJ. Proc. Nat. Acad. Sci. USA. 2007;104(26):10841–10846. doi: 10.1073/pnas.0609321104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Suzuki Y, Noel JK, Onuchic JN. J. Chem. Phys. 2011;134(24):245101. doi: 10.1063/1.3599473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Prieto L, Rey AJ. Chem. Phys. 2007;126(16):165103. doi: 10.1063/1.2727465. [DOI] [PubMed] [Google Scholar]
- 59.Prieto L, de Sancho D, Rey AJ. Chem. Phys. 2005;123:154903. doi: 10.1063/1.2064888. [DOI] [PubMed] [Google Scholar]
- 60.Suzuki Y, Noel JK, Onuchic JN. J. Chem. Phys. 2008;128(2):025101. doi: 10.1063/1.2812956. [DOI] [PubMed] [Google Scholar]
- 61.Eastwood M, Wolynes PJ. Chem. Phys. 2001;114(10):4702. [Google Scholar]
- 62.Garcia A, Krumhansl J, Frauenfelder H. Proteins. 1997;29(2):153–160. [PubMed] [Google Scholar]