Skip to main content
Interface Focus logoLink to Interface Focus
. 2019 Apr 19;9(3):20180062. doi: 10.1098/rsfs.2018.0062

Simulating and analysing configurational landscapes of protein–protein contact formation

Andrej Berg 1, Christine Peter 1,
PMCID: PMC6501351  PMID: 31065336

Abstract

Interacting proteins can form aggregates and protein–protein interfaces with multiple patterns and different stabilities. Using molecular simulation one would like to understand the formation of these aggregates and which of the observed states are relevant for protein function and recognition. To characterize the complex configurational ensemble of protein aggregates, one needs a quantitative measure for the similarity of structures. We present well-suited descriptors that capture the essential features of non-covalent protein contact formation and domain motion. This set of collective variables is used with a nonlinear multi-dimensional scaling-based dimensionality reduction technique to obtain a low-dimensional representation of the configurational landscape of two ubiquitin proteins from coarse-grained simulations. We show that this two-dimensional representation is a powerful basis to identify meaningful states in the ensemble of aggregated structures and to calculate distributions and free energy landscapes for different sets of simulations. By using a measure to quantitatively compare free energy landscapes we can show how the introduction of a covalent bond between two ubiquitin proteins at different positions alters the configurational states of these dimers.

Keywords: molecular dynamics simulation, dimensionality reduction, structure characterization, protein aggregation, clustering

1. Introduction

Protein–protein interactions are crucial for the function of biological systems on the molecular level since they determine almost all signalling and catalytic path ways in the cell. Complex biological functions are often executed by large multi-protein complexes which can be divided into domains which are assigned to specific functions and often perform correlated motions [1]. The relative position/orientation of domains with respect to each other determines the overall morphology of the aggregate to a great extent and should therefore be decisive for protein function [2]. To obtain a molecular-level understanding, one needs to know the predominant conformations of the protein system, their thermodynamic weights and transition rates between these states. This information can be obtained from ensembles produced by molecular dynamics (MD) simulations [3].

One example for a very popular and extensively studied topic is the ubiquitin (Ub) signalling system where the small protein Ub (figure 1) is covalently attached to certain lysine residues of substrate proteins [4]. Since Ub can be ubiquitylated at eight positions as well, Ub chains are formed which are then recognized by Ub binding domains. However, we still lack a full understanding of what determines the specific recognition of differently linked Ub chains for different signalling pathways [5]. One reason for this specificity is that the linkage position determines the morphology of Ub chains, as was shown in various structural studies of differently linked Ub dimers and longer chains [6]. On the other hand, for a single linkage type, very different conformations can be found as well (PDB ID: 1F9J (open), 1TBE (compact), 1AAR (closed)) which indicates that a Ub chain can adopt different structures in solution [710].

Figure 1.

Figure 1.

Illustration of coarse-grained Ub model and simulation set-up. (a) CG representation of a single Ub protein. Backbone beads are coloured in grey and side chain beads in orange. Labels indicate the N-terminus and the seven lysine residues which can be used in a peptide or isopeptide bond to form Ub chains. Lysine side chains are coloured in blue. Thin white lines indicate the supportive elastic network within the individual Ub units. (b,c) Schematic representations of the two types of systems investigated in this study (water beads not shown). (b) System of two Ub proteins without any bonded interactions (2xUb). (c) One of the eight possible Ub dimers (diUb; here: K48-linked). (Online version in colour.)

MD simulation is ideally suited to complement structural experimental data by yielding ensembles which can be used to provide insights into the nature and thermodynamic stability of distinct conformations in solution [11]. Owing to increased computational resources like distributed computing and optimized algorithms, we are now able to perform MD simulations on long time scales which are relevant in biological processes, such as protein recognition [12,13]. Therefore, an increased need emerges for analysis techniques which are able to process massive amounts of data to extract information. Valuable knowledge is also obtained when information from different sources is connected, e.g. data from simulations at different levels of resolution or different types of experiments. Comparison and agreement between structures from different methods validates their significance and can show us the underlying thermodynamical principles behind the phenomena of interest [14].

Such datasets consist of a large number of structures while each structure represents a point in a very high-dimensional space (R(3×Natom)3). One possibility to overcome the problem, that direct visualization and interpretation of such high-dimensional data are impossible, is to create a low-dimensional map of the configurational or conformational phase space [15]. This map can be then used as a basis for clustering of similar structures and therefore for state identification. A common way to reduce the phase space dimensionality is to find a set of descriptors which are internal, i.e. independent of the absolute atom positions in space. These descriptors are called collective variables (CVs) and need to be carefully chosen since they provide the essential basis for further characterization. Often the number of CVs which is required to capture the characteristics of complex structures and structural transitions is still relatively high-dimensional, i.e. in this space, the sampling of configurations can neither be easily visualized nor can its essential features be grasped without further analysis. This problem can be addressed by dimensionality reduction to obtain a low-dimensional representation [16,17].

Recently, we used the example of covalently linked Ub dimers (diUb) to introduce a method for characterization of the conformational space of multi-domain proteins [18]. We identified CVs that were ideally suited to capture the characteristics of protein–protein interfaces and were an ideal basis for subsequent dimensionality reduction. We were able to obtain a two-dimensional free energy landscape for each of the eight different Ub dimers from coarse-grained (CG) MD simulations and to quantitatively compare them with each other. With this method, we were also able to compare the CG results with experimental structural data and the sampling in atomistic simulations.

Here, we extend this method and demonstrate that it can also be very well used for the characterization of the configurational landscape of two proteins which are not bound and can therefore move freely relative to each other. We performed CG simulations on the microsecond time scale of two Ub proteins to obtain an equilibrated ensemble. We used a set of 144 CVs that describe how the proteins in the simulations are oriented towards each other—their relative configurations. With a nonlinear, multi-dimensional scaling-like dimensionality reduction method called Sketch-map, a two-dimensional representation of the configurational space described by these CVs was constructed. From this low-dimensional projection, we could obtain a free energy landscape which illustrates the non-covalent dimerization of two Ub proteins in solution. We were also able to compare the previously obtained configurational landscapes of covalently linked Ub dimers to two unlinked Ub. We can show that the formation of a covalent bond between two Ub proteins alters the free energy landscape of protein–protein contacts in a linkage-specific manner.

2. Results

2.1. Non-covalent dimerization

Two unlinked Ub proteins (2xUb) were simulated extensively with a modified CG MARTINI model as is shown in figure 1 (see Material and methods) [18,19]. To this end, 48 independent runs of 10 µs each were carried out which were started from four different initial configurations with well separated proteins. This amounts to a total simulation time of 480 µs. Here and throughout the rest of the paper, we do not account for any possible additional speed-up of CG dynamics, which is estimated for the MARTINI model to be a factor of 4–8 for lipid diffusion [20], but unknown for the protein–protein association and reorientation processes investigated here. For further details about set-up and post-processing, see Material and methods section. On this time scale, we observed non-covalent dimerization between the two Ub proteins but also frequent disintegration and re-formation of protein–protein interfaces in all simulations. Nevertheless, we observed a strong tendency to form contacts, as indicated by the centre of geometry distance distribution shown in figure 2. From these data, we determined that a fraction of approximately 80% of the structures are in an aggregated state (see Material and methods). Note that due to severe finite size effects (with only two Ub molecules in a comparatively small simulation box), no reliable estimate for a dissociation constant KD can be given. Assuming a Ub concentration in the simulations of approximately 130 mM (the volume of Ub was excluded in this calculation), an observation of 80% aggregated structures is in qualitative agreement with the experimental value of KD=4.9 mM estimated from NMR experiments [21]. Before we go into a detailed analysis of the contact states that are visited by this 2xUb system, we first qualitatively assess which regions of the protein surfaces most predominantly participate in contact formation. Figure 2 shows the two Ub chains coloured according to the number of contacts that the respective residues form with any residue of the other protein. Red colouring shows that in our simulations, most of the contacts were mediated by the flexible as well as the β-sheet region of Ub. Importantly, the colouring is comparable on both chains which indicates that the simulation is reasonably well converged. This general observation of β-sheet mediated aggregation is well known for Ub [21].

Figure 2.

Figure 2.

(a) Centre of geometry distance distribution between two Ub proteins obtained from CG simulations. (b) Secondary structure and structural features of Ub. Colouring is according to the number of contacts to the other chains. A very intense red colour corresponds to a high contact count. (Online version in colour.)

In the next step, we will analyse the structural ensemble that was obtained from the simulations in more detail. To obtain a deeper insight into the visited phase space of mutual protein–protein configurations and to cluster the data and identify states, we will use a recently introduced method that relies on a set of collective variables tailored to characterize such contact interfaces and a nonlinear dimensionality reduction technique [18]. For covalently linked diUb, residue-wise minimum distances (RMD) between the two subunits have been found to be a very good set of CVs that are able to capture the essential molecular-level features of non-covalent subunit–subunit contacts, i.e. that allow one to characterize protein–protein interactions and protein–protein interfaces. Transferring this approach to unlinked proteins means that any simulated structure (here two Ub units of 72 amino acid residues each; the highly flexible residues 73–76 of Ub were not included in analysis) is described by a vector of 144 distance values. These were determined by calculating for each of the 72 Cα atoms in one protein the minimum distance to any Cα atom in the respective other protein and vice versa.

Thus, for simulated conformations extracted every 100 ps, the 144-dimensional RMD vector was calculated. Note that although the two chains are equivalent in the present case (without covalent linkage), they become pseudo ordered in CV space (distances of chain A are represented by the first half of the RMD vector and those of chain B by the second half). This set of CVs contains information about the distance between the proteins, about the residues that are part of the contacting interface (if a contact is formed) and—implicitly—about the relative orientation of the proteins with respect to each other. Following the approach in [18], those CVs were then used for dimensionality reduction with Sketch-map to obtain a two-dimensional representation. The choice of CVs, i.e. the internal coordinates at an intermediate level of dimensionality, has an enormous impact on how well the subsequent projection to a low-dimensional map works [16,22]. One may easily obtain a very fragmented map where similar structures are subdivided into pseudo-clusters or one may obtain a map that does not well separate different states. For the given problem, the RMD vector proved to be very suitable. Three hundred representative conformations were selected from the 2xUb ensemble as so-called landmarks to construct the Sketch-map [22]. After that, all other data points were projected into the map that was spanned based on those landmarks.

From the resulting two-dimensional probability distribution, an estimate of the configurational free energy landscape of the 2xUb system was obtained from Boltzmann inversion, shown in figure 3a. As in the case of covalently linked diUb, we obtain a circular shaped projection of the simulated structures where the different regions can be linked to certain geometrical attributes. It is found that configurations with particularly large interdomain distances (no or very little protein–protein contact) are found on the left-hand side of the map and are clearly separated from the aggregated structures on the right. This is illustrated by the small map in figure 3b(i) (green colouring) which is coloured according to the protein–protein centre of geometry distance. Figure 3 also shows representative structures for various regions of the map where proteins do not form many contacts. These snapshots illustrate that configurations with different relative orientations are well separated, i.e. as in the previous study of covalently linked diUb, the circular positioning on the map indicates certain protein–protein orientations. The other two small panels in figure 3 show which regions on the map are characterized by certain types of contact interfaces: 3b(ii) (orange colouring) shows where conformations with a high number of β-sheet contacts are located and 3b(iii) (purple colouring) shows where aggregates are dominated by contacts through the α-helices. Most importantly, the regions where contacts are formed via the β-sheet or α-helical regions of the protein surfaces appear well separated. Possible overlaps would then refer to structures where one protein interacts via its β-sheet and the other one via its α-helix. This figure already nicely demonstrates the power of this analysis, which allows one to identify interesting features within the simulated ensembles and to extract the corresponding structures.

Figure 3.

Figure 3.

Free energy landscape of 2xUb (a) obtained from projected RMD data. Structures selected from certain points of the projection were back-mapped to the atomistic level in order to display the protein structure in a cartoon representation. Chain A (blue sphere) is aligned in all configurations. (b) Maps coloured according to the mean value of a certain property after structures were binned into histograms. (i) Centre of geometry distance between chain A and B; (ii) contact count to the other protein evaluated for residues involved in a β-sheet motif; (iii) contact count to the other protein evaluated for residues involved in an α-helix motif. (Online version in colour.)

The two-dimensional free energy landscape in figure 3 shows a relatively large shallow area with non-aggregated (left half of map) or only loosely aggregated conformations. Due to the finite simulation box, the two Ub proteins can have a maximum centre of geometry distance of about 8 nm (figure 2). In order to obtain a meaningful projection in which it is possible to distinguish different aggregated structures, it was not necessary to exclude conformations with very large distances from the projection and from the landmark selection. We point out that such a restriction of the analysis to a certain maximum distance might be an option for other set-ups with larger simulation boxes to improve the outcome of the dimensionality reduction. Here, we demonstrate that it is possible to obtain a low-dimensional representation which differentiates between high distances and orientation at the same time. Therefore, one could use this analysis method to study long range protein interactions such as electrostatic effects that might mediate aggregation [23].

In the following, we will more closely look at the highly populated region which represents aggregated structures in the 2xUb map (i.e. the area in figure 3 that is dominated by red, yellow and light green colours). Figure 4 shows that this region is separated into several low-free-energy states which are connected by shallow transition areas. Representative structures of several states are shown in such a way, that chain A is always oriented the same way—to visualize how well the map separates aggregates with different relative orientations and contact interfaces. Furthermore, structure bundles are presented exemplarily for four states, again with all structures aligned on chain A. These bundles show that the simulated structures which are projected into the same region on the map are indeed very similar, i.e. that the two-dimensional map is a good basis for clustering and state identification.

Figure 4.

Figure 4.

Highly populated region of the free energy landscape of 2xUb (zoom into figure 3). Representative structures were selected from local minima and back-mapped to the atomistic level in order to display the protein structure in a cartoon representation. Matching pairs of structures which are very similar if the chain ordering is interchanged are marked by matching geometric shapes (black/orange). A superposition of these configurations is shown in figure 5. Structure bundles show CG structures around selected minima (black boxes). In the bundles, structures are aligned to chain A, represented by CG backbone beads which are connected by lines, and coloured according to the secondary structure of Ub. (Online version in colour.)

As mentioned above, the CVs which were used to describe each conformation consist of 72 RMD values for chain A followed by 72 RMD values for chain B. This introduces an arbitrary ordering of the—in principle interchangeable—chains in CV space. Thus, in a perfect sampling of configuration space, the ensemble of observed RMD vectors exhibits a symmetry with respect to their upper and lower halves. It turns out that in consequence, the map exhibits a mirror symmetry with a symmetry axis which runs approximately horizontally in the middle of figure 4 (note that the fact that the axis runs horizontally is coincidental). The symmetry of the map is to be expected since the Sketch-map algorithm relies on a metric which (qualitatively) preserves distances between pairs of structures. Certain imperfections in the symmetry are due to the limited number of landmark structures from which the map is constructed. Interestingly, several states lie directly on the middle axis of the projection which separates the two halves, and consequently these structures are symmetric themselves. This can very well be seen in the cluster representatives shown in the top row of figure 4, in particular in the second representative from left. As a consequence, configurations from two corresponding states in the upper and lower halves of the map are indeed similar if one reorders the chains accordingly. We marked such pairs of structures with matching geometric shapes in figure 4. These structures are shown again in figure 5 but now chain B of the structure from the upper half of the map (marked in orange) is aligned with chain A of the structure from the lower half of the map (marked in black). One sees that the positioning of the respective other, unaligned chains is remarkably similar. This result nicely demonstrates that the configurational phase space of the two proteins is well sampled and that the projection and state identification is robust.

Figure 5.

Figure 5.

Illustration of the equivalence of low free energy states in the upper and lower half of the configurational landscape shown in figure 4. All six 2xUb structures (in cartoon representation) were already shown in figure 4 (marked by the black and orange geometric shapes). The black structures are oriented as in figure 4 (chain A positioned at the bottom and aligned to the same orientation). For the orange structures, chain B was aligned to chain A of the corresponding black structure. (Online version in colour.)

2.2. Impact of linkage

Now that we have a means to characterize the configurations/aggregate states of two Ub proteins, we can use this tool to investigate the impact of linking the two chains covalently (diUb) together. How does this linkage affect the sampled configurational space and the Ub–Ub contact interfaces? How does this depend on the type of linkage? In this section, we will show that the analysis with low-dimensional Sketch-maps and RMDs as CVs is very well suited to address these questions.

Since one of the seven lysine side chains or the N-terminal methionine can be linked to the C-terminal glycine of a second Ub, eight differently linked diUb types are possible. According to the common convention, we will call these chains proximal and distal, respectively, and denote the different diUb chains by their linking residues as M1, K6, K11, K27, K29, K33, K48 and K63. By covalently connecting the two protein chains, one introduces a constraint which should reduce the conformational space [14]. Therefore, large distances and some relative orientations between the two chains are not possible any more. In [18] analyses of simulations of all differently linked diUb have been presented. Simulations had been started from ‘open’ conformations, where the two Ub moieties were positioned in a way that no interdomain contacts were present. In total 12 independent CG simulations of 10 μs had been performed for each linker, summing up to 960 μs for all dimers. These data will be now used to address the above questions.

In a first step, we investigate the overall impact of covalent dimerization independent of the linkage type. Note that while the different linkage positions are distributed over the surface on the proximal chain, the distal chain is always connected via its C-terminal region with the second moiety. The effect of this asymmetry is nicely reflected in the distribution of protein–protein contacts over the surfaces of the two subunits which was analysed in an analogous fashion as in figure 2. Figure 6 shows that on the distal chain contacts are located at the β-sheet region around the C-terminus, as one would expect due to the covalent connection in this region. By contrast, the distribution of contacts on the proximal chain (averaged over all linkage types) is comparable with unlinked 2xUb, i.e. most contacts are mediated through the flexible and the β-sheet regions. A further differentiation of the linkages will be done below, using the conformational maps.

Figure 6.

Figure 6.

Illustration of the distribution of subunit–subunit contacts over the protein surface of diUb (averaged over all linkage types). Colouring is according to the number of contacts to the respective other subunit comparable with figure 2. (Online version in colour.)

Next we exploit the potential of low-dimensional maps to qualitatively and even quantitatively compare the conformational and configurational sampling in different simulations, e.g. independent simulations of the same system with the same model, simulations at different levels of resolution (i.e. with an atomistic and a CG model as in [18]), or, as in the present case, simulations of different systems where we compare unlinked and differently linked two-Ub systems.

The only prerequisite for a seamless linking of different simulation levels/systems is that the CVs which underlie the dimensionality reduction have been suitably defined. Since RMDs have been defined based on Cα atoms in the structurally stable core of the two Ub moieties, the maps of 2xUb and diUb can be compared by either generating a new map from a combined set of landmarks or by projecting the diUb simulations into the 2xUb map. Starting from the assumption that simulations of two unlinked Ub provide a good representation of all relative configurations which can be obtained by two Ub we used the latter ansatz. Figure 7 shows a free energy landscape (coloured areas) obtained from all simulations of all dimers projected using landmarks which were obtained from 2xUb simulations. One sees that diUb covers a certain subspace of the much larger 2xUb landscape (black outline in figure 7). Note that the diUb projection does not cover areas which are not present on the landscape of 2xUb and that there is no accumulation of probability density at the rims of the diUb area. This would have happened if the diUb simulations had sampled conformations and therefore RMD values which are not well represented by the landmarks from unlinked simulations. The area on the left-hand side of the 2xUB landscape, which represents structures with a large protein–protein distance, is not visited in simulations of diUb for obvious reasons. More remarkably, the landscape of diUb also lacks some regions which appeared aggregated in simulations of unlinked Ub. As discussed above, due to the character of the simulated system with two equivalent, unlinked chains, the projection turned out to be symmetric in the low energy region. This symmetry is broken by the introduction of a covalent linkage between the C-terminus of one Ub (distal subunit, chain A) to one of the lysine residues of the other Ub (proximal subunit, chain B): chain A and chain B are topologically not equivalent (see figure 1). As a consequence, diUb simulations do not sample the upper and the lower half of the landscape equally.

Figure 7.

Figure 7.

Free energy landscape of diUb. Structures from all diUb simulations (eight different linkage types; coloured heatmap) were projected into the map of 2xUb (the outline of which is shown for comparison; outer rim: dashed line; areas where free energy of 2xUb is below −15, −20, −25 kJ mol−1: solid lines). (Online version in colour.)

This observation becomes drastically more pronounced if we analyse the differently linked dimers separately. Figure 8 (upper panels) shows the linkage dependent reduction of the configurational phase space compared to the sampling of two unlinked Ub in analogy to figure 7. One sees that the differently linked diUb cover very different areas of the map and that most linkages occupy regions on either the upper or the lower half of the map and on the horizontal axis in the middle (where symmetric dimer structures are found). This separation nicely illustrates how the topological constraint of a bond to one position on the proximal subunit affects the sampling of conformations in a linkage-specific manner.

Figure 8.

Figure 8.

Free energy landscapes of differently linked diUb. Upper rows show linkage dependent maps of diUb (purple heatmaps) compared to 2xUb (grey rim). Lower rows show the highly populated area of Ub dimers. States of 2xUb (coloured heatmap taken from figure 4) in comparison with free energy minima of diUb (black contour lines). (Online version in colour.)

The lower panels in figure 8 show a zoom into the low free energy region that represents stable aggregated states. In this representation, the free energy minima of the different diUb simulations are indicated by contour lines which are overlaid on top of the 2xUb landscape (taken from figure 4). This allows for a direct comparison of diUb states to those of unlinked 2xUb which yields some very interesting observations. Some linkage types, i.e. K6, K11, K48 and K63, exhibit free energy minima only in regions where unlinked Ub also adopts stable structures. One could say those linkage types select and amplify conformations that are already present in unlinked Ub dimers or alter the shape of the respective state only slightly. By contrast, K27-linked dimers show a behaviour which has almost no similarity to unlinked Ub. It seems that this linkage position alters the relative position of two Ub chains on a large scale. For the remaining linkages (M1, K29 and K33), we find a combination of both features. They have some minima in common with 2xUb but adopt several linkage-specific states in addition.

To complete the comparison of Sketch-map projections originating from different simulations, one can attempt to quantify the similarity of different free energy landscapes. In [18], the so-called earth-mover distance (EMD) [24,25], widely used in the computer vision and image retrieval community, was found to be particularly well suited to quantitatively compare two-dimensional landscapes. EMDs were used as a similarity measure between all eight conformational landscapes of differently linked diUb and the results were visualized by arranging markers for the different linkages in a graph according to their respective pair-wise EMDs (again by employing a multi-dimensional scaling technique). As a result, patterns of (dis)similarities between the linkages could be identified that correlate well with diUb interaction behaviour found experimentally [26,27].

An analogous analysis is now performed on the diUb maps generated in this study. Note that the appearance of the diUb conformational free energy landscapes is very different from those presented in [18], where the same diUb simulations had been projected using a set of landmark structures which were selected from these diUb simulations. Nevertheless, the pair-wise EMDs between the 8 landscapes (figure 9a) are strikingly similar to the former results. In figure 9b, this is illustrated by showing the EMD-based two-dimensional arrangement of linkage types from this study (circles with full colours) alongside the data from [18] (circles with pale colours). Thus, even though the outcome of the dimensionality reduction with Sketch-map is very sensitive to the selection of landmarks and parameters, the results are quantitatively comparable even if landmarks from a different dataset are used—as long as these landmarks cover the whole phase space of the projected data and the projection parameters are well chosen. This is apparently the case for two Ub chains (and RMDs as CVs), where representative landmarks from unlinked 2xUb simulations generate a map on which the essential conformational features of the linked dimers are not only well represented but where important, distinctive characteristics are also well separated.

Figure 9.

Figure 9.

(a) Relative pair-wise EMD between distributions of two-dimensional projections in figure 8. (b) Markers for all differently linked diUb positioned according to their relative pair-wise EMD by multi-dimensional scaling. For comparison, the pale markers show the result for the same analysis of the distributions in [18]. (Online version in colour.)

3. Conclusion

In this study, we showed that RMDs between different proteins or between subunits in multi-domain proteins can be used as CVs to characterize protein–protein aggregation and interface characteristics. Together with dimensionality reduction, we can obtain a representative position for each protein configuration in a two-dimensional map. Due to the nonlinear metric employed by the Sketch-map algorithm, small and large differences between structures (distances) are reproduced qualitatively while intermediate distances are accurately reproduced in the projection. This makes the approach particularly well suited to analyse states and transitions in molecular ensembles. The low-dimensional representation allows us to calculate distributions and free energy landscapes which can be compared with each other both qualitatively and quantitatively—the latter with the EMD metric, which opens up new possibilities for analysis and comparison of related molecular systems. The low-dimensional projections are also very useful for diverse visualization variants which help to understand relevant features of the projected ensemble. Since we use a dimensionality reduction technique which projects a single data point according to a fixed set of landmarks, new/additional configurations can be added to the projection. Thus, if the CVs underlying the dimensionality reduction allow this, configurations from different systems (e.g. 2xUb and diUb) as well as different levels of resolution can be compared on the same map [18]. We demonstrate the power of this method by characterizing the configurational ensemble of two unlinked Ub chains. The low-dimensional free energy landscape shows that two Ub proteins can adopt various states in solution which are dominated by contacts in the β-sheet and flexible region of Ub. Free energy landscapes of differently linked Ub dimers show that some linkage types tend to populate states which are already present in unlinked Ub but in some cases new states do appear. In the future, this insight may be useful to investigate the possible mechanism of Ub ligation (forming the linking isopeptide bond). In this study, we demonstrated the impact of linkage formation; in a similar manner, one could investigate the effect of other factors such as mutations or post translational modifications of certain residues on the structural ensemble of ubiquitin dimers. One remarkable result of this study is that representative structures from simulations of two unlinked Ub molecules generate a Sketch-map on which the diUb conformations are not only well represented but where important, distinctive characteristics are also well separated. As a consequence, the differences between the linkages are captured—as can be seen in the EMD-based classification of the linkage types.

The method presented here can be easily transferred to other systems where two domains perform motions relative to each other. The selection criteria for the Cα atoms which are included in the RMD calculation can be adjusted for other systems (here we excluded the flexible C-terminal region of Ub). For larger proteins, for example, one could limit the analysis to residues on the protein surface. On the other hand, more than just the Cα atoms could be included in RMD calculation to increase the sensitivity of projection where this is needed. Furthermore, RMD values could be used to investigate protein interaction with other (interaction) partners, such as small molecules or membranes. Also, an extension of this method to characterize more than two domains should be feasible.

We are convinced that the characterization of protein aggregates and contact interfaces presented here opens up a wide range of possibilities for simulation and analysis. The low-dimensional projections can be used for a systematic comparison of different methods or models (MD simulations with different force fields or at levels of resolution) and they can be combined with different types of sampling and expansion schemes [2830]. Last but not least, they can serve as an ideal basis for different clustering algorithms, such as k-means or density-based clustering.

4. Material and methods

4.1. Molecular dynamics simulations

All simulations were performed with the GROMACS simulation package v. 5 [31]. Temperature and pressure were kept at 300 K and 1 bar using the velocity rescaling thermostat and the Parrinello–Rahman barostat, respectively. The Verlet cut-off scheme was applied [32]. The default md (leap-frog) integrator was used with a 10 fs time step. The cut-off distance for short range van der Waals interactions was set to 1.1 nm and electrostatics were treated by the reaction field method with a cut-off distance of 1.1 nm and a dielectric constant of 15 [33].

4.2. Coarse grained model

All CG simulations were performed using the MARTINI force field v. 2.2 with non-polarizable coars-grained water as solvent [19,34]. As in [18], non-bonded interactions between water and protein bead types are increased compared to the original MARTINI force field. Structure and topology input files for CG simulations were created with a modified version of the martinize script, which allows the formation of an isopeptide bond. All CG simulations were performed using the ELNEDIN force field for bonded interactions [35]. An iteratively-refined distance-based elastic network was used to reproduce the intrinsic dynamic properties of Ub correctly [18,36].

4.3. Set-up

Initial conformations for simulation of two unlinked Ub proteins were obtained by placing two copies of Ub (PDB-ID: 1UBQ [37]) well separated in a dodecahedron box with a box vector of 11 × 11 × 8 nm. The initial position of the two chains was chosen such that the distance between atoms on different chains was at least 2.5 nm. To obtain four different initial conformations, chain B was successively rotated by 90° around all three Cartesian axes. All ‘open’ initial conformations of diUb were constructed from two Ub units by placing the Ub moieties next to each other so that the C-terminal carboxyl group of the first chain and a lysine side chain of the second chain were close in space. For each linkage type, a second conformation was generated. For this, the relative orientation between the two chains was altered. For all simulations, diUb was placed in a 10 × 10 × 10 nm dodecahedron box. All structures were relaxed by energy minimization before and after solvation. Solvated systems were equilibrated in three short runs of 200 ps each: (1) under constant temperature (NVT) with position restrained backbone beads; (2) under constant temperature and pressure (NPT) with position restrained backbone beads; (3) NPT without any position restraints.

4.4. Post processing of structures

Structures of two unlinked Ub proteins had to be prepared for analysis after simulation, in particular, periodic boundary conditions were removed such that distances between the two not covalently linked proteins are always calculated between the two closest periodic images. In a first step, jumps between periodic images were removed from the trajectories. Next, chain A was positioned together with its closest periodic image of chain B inside of the simulation box using gmx cluster. Finally, the two proteins were placed in the so obtained relative orientation in the middle of a much larger cubic box (60 × 60 × 60 nm) to avoid later processing errors. This was necessary since otherwise the chains can (seemingly) interact with each other in two directions over periodic boundary conditions. For some conformations, this leads to wrong results in the minimum distance calculation.

4.5. Sketch-map

Sketch-map v. 3.0 was used [22]. Based on the high-dimensional distance distribution of RMD from simulations of unlinked Ub, the sigmoid function parameters σ = 7.0, A = 12, B = 5, a = 2, b = 5 were chosen. Landmarks (N=300) were selected from CG simulations of unlinked Ub only and their two-dimensional positions were optimized in 15 steps. This selection was done randomly in combination with the minmax option with γ = 0.01 to increase the number of representative structures from rarely sampled areas. We observed that number of landmarks should not be too large since this increases the computational time for projection without giving significantly better results. Note that in some cases (different parameters for sigmoid function and landmark selection) the dimensionality reduction failed. Therefore, the selection of these parameters should be done with care and the resulting projection needs to be validated, for example by comparison of the clustered structures.

4.6. Analysis of protein–protein contacts and residue-wise minimum distances

The fraction of aggregated structures and the residue-wise contacts were calculated from RMD values, which are described in more detail in the Results section and in [18]. Two unlinked Ub chains were counted as aggregated if at least one of the minimum distances was below 1.0 nm. The same criterion was applied to count residue-wise contacts which were then scaled and used for colouring of the Ub structures in figures 2 and 6. For β-sheet and α-helix contact count in figure 3, the most common secondary structure motif was used which was observed in atomistic simulations of Ub.

4.7. Miscellaneous

CG diUb structures were back-mapped with BACKWARD [38] to obtain atomistic representations of simulated conformations. All figures were created using Python v. 3.5 and Matplotlib v. 2.2.2. As in [18], two-dimensional distributions were quantitatively compared using the EMD algorithm as it is implemented in Pyemd v. 0.5.1 [24]. Relative pair-wise EMDs between two-dimensional distributions of diUb simulations are shown in figure 9a. These distances were then used together with metric multi-dimensional scaling (implemented in sklearn.manifold.MDS) to image the relationship between linkage types in two dimensions (figure 9b). As a starting point for optimization, positions from [18] were used.

Acknowledgements

We thank Oleksandra Kukharenko for her assistance with dimensionality reduction and other aspects regarding the mathematical analysis and Christoph Globisch for providing tools for parametrization of the CG elastic network.

Data accessibility

This article has no additional data.

Authors' contributions

A.B. and C.P. designed the study. A.B. carried out the molecular dynamics simulations and analysis. C.P. coordinated the study. A.B. and C.P. wrote the manuscript and gave final approval for publication.

Competing interests

We declare we have no competing interests.

Funding

A.B. is funded by the DFG through SFB969 (project no. B09). We are grateful for computational resources of the bwHPC project (DFG grant INST no. 35/1134-1 FUGG and the state of Baden-Württemberg). We also gratefully acknowledge computing time granted by the John von Neumann Institute for Computing (NIC) and provided on the supercomputer JUWELS at Jülich Supercomputing Centre (JSC).

References

  • 1.Gavin AC. et al. 2002. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147. ( 10.1038/415141a) [DOI] [PubMed] [Google Scholar]
  • 2.Changeux J-P, Edelstein S. 2011. Conformational selection or induced fit? 50 years of debate resolved. F1000 Biol. Rep. 3, 19 ( 10.3410/B3-19) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Shukla D, Hernández CX, Weber JK, Pande VS. 2015. Markov state models provide insights into dynamic modulation of protein function. Acc. Chem. Res. 48, 414–422. ( 10.1021/ar5002999) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Komander D, Rape M. 2012. The ubiquitin code. Annu. Rev. Biochem. 81, 203–229. ( 10.1146/annurev-biochem-060310-170328) [DOI] [PubMed] [Google Scholar]
  • 5.Dikic I, Wakatsuki S, Walters KJ. 2009. Ubiquitin-binding domains: from structures to functions. Nat. Rev. Mol. Cell Biol. 10, 659–671. ( 10.1038/nrm2767) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ye Y. et al. 2012. Ubiquitin chain conformation regulates recognition and activity of interacting proteins. Nature 492, 266–270. ( 10.1038/nature11722) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Phillips CL, Thrower J, Pickart CM, Hill CP. 2001. Structure of a new crystal form of tetraubiquitin. Acta Crystallogr. D: Biol. Crystallogr. 57, 341–344. ( 10.1107/S090744490001800X) [DOI] [PubMed] [Google Scholar]
  • 8.Cook WJ, Jeffrey LC, Kasperek E, Pickart CM. 1994. Structure of tetraubiquitin shows how multiubiquitin chains can be formed. J. Mol. Biol. 236, 601–609. ( 10.1006/jmbi.1994.1169) [DOI] [PubMed] [Google Scholar]
  • 9.Cook WJ, Jeffrey LC, Carson M, Cheni Z, Pickart CM. 1992. Structure of a diubiquitin conjugate and a model for interaction with ubiquitin conjugating enzyme (E2). J. Biol. Chem. 267, 16 467–16 471. [DOI] [PubMed] [Google Scholar]
  • 10.Kniss A. et al. 2018. Chain assembly and disassembly processes differently affect the conformational space of ubiquitin chains. Structure 26, 249–258.e4. ( 10.1016/j.str.2017.12.011) [DOI] [PubMed] [Google Scholar]
  • 11.Bonomi M, Heller GT, Camilloni C, Vendruscolo M. 2017. Principles of protein structural ensemble determination. Curr. Opin. Struct. Biol. 42, 106–116. ( 10.1016/j.sbi.2016.12.004) [DOI] [PubMed] [Google Scholar]
  • 12.Dror RO, Dirks RM, Grossman J, Xu H, Shaw DE. 2012. Biomolecular simulation: a computational microscope for molecular biology. Ann. Rev. Biophys. 41, 429–452. ( 10.1146/annurev-biophys-042910-155245) [DOI] [PubMed] [Google Scholar]
  • 13.Krishnamani V, Globisch C, Peter C, Deserno M. 2016. Breaking a virus: identifying molecular level failure modes of a viral capsid by multiscale modeling. Eur. Phys. J.: Spec. Top. 225, 1757–1774. ( 10.1140/epjst/e2016-60141-2) [DOI] [Google Scholar]
  • 14.Wang Y, Tang C, Wang E, Wang J. 2014. PolyUbiquitin chain linkage topology selects the functions from the underlying binding landscape. PLoS Comput. Biol. 10, e1003691 ( 10.1371/journal.pcbi.1003691) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhuravlev PI, Materese CK, Papoian GA. 2009. Deconstructing the native state: energy landscapes, function, and dynamics of globular proteins. J. Phys. Chem. B 113, 8800–8812. ( 10.1021/jp810659u) [DOI] [PubMed] [Google Scholar]
  • 16.Ceriotti M, Tribello GA, Parrinello M. 2013. Demonstrating the transferability and the descriptive power of sketch-map. J. Chem. Theory Comput. 9, 1521–1532. ( 10.1021/ct3010563) [DOI] [PubMed] [Google Scholar]
  • 17.Comitani F, Rossi K, Ceriotti M, Sanz ME, Molteni C. 2017. Mapping the conformational free energy of aspartic acid in the gas phase and in aqueous solution. J. Chem. Phys. 146, 145102 ( 10.1063/1.4979519) [DOI] [PubMed] [Google Scholar]
  • 18.Berg A, Kukharenko O, Scheffner M, Peter C. 2018. Towards a molecular basis of ubiquitin signaling: a dual-scale simulation study of ubiquitin dimers. PLoS Comput. Biol. 14, e1006589 ( 10.1371/journal.pcbi.1006589) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Monticelli L, Kandasamy SK, Periole X, Larson RG, Tieleman DP, Marrink SJ. 2008. The MARTINI coarse-grained force field: extension to proteins. J. Chem. Theory Comput. 4, 819–834. ( 10.1021/ct700324x) [DOI] [PubMed] [Google Scholar]
  • 20.Ingólfsson HI, Lopez CA, Uusitalo JJ, de Jong DH, Gopal SM, Periole X, Marrink SJ. 2014. The power of coarse graining in biomolecular simulations. Wiley Interdiscip. Rev. Comput. Mol. Sci. 4, 225–248. ( 10.1002/wcms.1169) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Liu Z, Gong Z, Jiang WX, Yang J, Zhu WK, Guo DC, Zhang WP, Liu ML, Tang C. 2015. Lys63-linked ubiquitin chain adopts multiple conformational states for specific target recognition. eLife4, 1–19. ( 10.7554/eLife.05767) [DOI] [Google Scholar]
  • 22.Ceriotti M, Tribello GA, Parrinello M. 2011. Simplifying the representation of complex free-energy landscapes using sketch-map. Proc. Natl Acad. Sci. USA 108, 13023–13028. ( 10.1073/pnas.1108486108) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhang Y, Vuković L, Rudack T, Han W, Schulten K. 2016. Recognition of poly-ubiquitins by the proteasome through protein refolding guided by electrostatic and hydrophobic interactions. J. Phys. Chem. B 120, 8137–8146. ( 10.1021/acs.jpcb.6b01327) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Pele O, Werman M. 2008. A linear time histogram metric for improved SIFT matching. In Computer vision: ECCV 2008 (eds D Forsyth, P Torr, A Zisserman). Lecture Notes in Computer Science, vol. 5304, pp. 495–508. Berlin, Germany: Springer. ( 10.1007/978-3-540-88690-7_37) [DOI] [PubMed]
  • 25.Pele O, Werman M. 2009. Fast and robust earth mover’s distances. In IEEE 12th Int. Conf. on Computer Vision, Kyoto, Japan, 29 September–2 October 2009, pp. 460–467. ( 10.1109/ICCV.2009.5459199) [DOI]
  • 26.Zhao X, Lutz J, Höllmüller E., Scheffner M, Marx A, Stengel F. 2017. Identification of proteins interacting with ubiquitin chains. Angew. Chem. Int. Ed. 56, 15764–15768. ( 10.1002/anie.201705898) [DOI] [PubMed] [Google Scholar]
  • 27.Zhang X, Smits AH, van Tilburg GBA, Jansen PWTC, Makowski MM, Ovaa H, Vermeulen M. 2017. An interaction landscape of ubiquitin signaling. Mol. Cell 65, 941–955.e8. ( 10.1016/j.molcel.2017.01.004) [DOI] [PubMed] [Google Scholar]
  • 28.Tribello GA, Ceriotti M, Parrinello M. 2012. Using sketch-map coordinates to analyze and bias molecular dynamics simulations. Proc. Natl Acad. Sci. USA 109, 5196–5201. ( 10.1073/pnas.1201152109) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Zheng W, Rohrdanz MA, Clementi C. 2013. Rapid exploration of configuration space with diffusion-map-directed molecular dynamics. J. Phys. Chem. B 117, 12769–12776. ( 10.1021/jp401911h) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kukharenko O, Sawade K, Steuer J, Peter C. 2016. Using dimensionality reduction to systematically expand conformational sampling of intrinsically disordered peptides. J. Chem. Theory Comput. 12, 4726–4734. ( 10.1021/acs.jctc.6b00503) [DOI] [PubMed] [Google Scholar]
  • 31.Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, Lindah E. 2015. GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2, 19–25. ( 10.1016/j.softx.2015.06.001) [DOI] [Google Scholar]
  • 32.Páll S, Hess B. 2013. A flexible algorithm for calculating pair interactions on SIMD architectures. Comput. Phys. Commun. 184, 2641–2650. ( 10.1016/j.cpc.2013.06.003) [DOI] [Google Scholar]
  • 33.de Jong DH, Baoukina S, Ingólfsson HI, Marrink SJ. 2016. Comput. Phys. Commun. 199, 1–7. ( 10.1016/j.cpc.2015.09.014) [DOI] [Google Scholar]
  • 34.De Jong DH, Singh G, Bennett WF, Arnarez C, Wassenaar TA, Schäfer LV, Periole X, Tieleman DP, Marrink SJ. 2013. Improved parameters for the martini coarse-grained protein force field. J. Chem. Theory Comput. 9, 687–697. ( 10.1021/ct300646g) [DOI] [PubMed] [Google Scholar]
  • 35.Periole X, Cavalli M, Marrink SJ, Ceruso MA. 2009. Combining an elastic network with a coarse-grained molecular force field: structure, dynamics, and intermolecular recognition. J. Chem. Theory Comput. 5, 2531–2543. ( 10.1021/ct9002114) [DOI] [PubMed] [Google Scholar]
  • 36.Globisch C, Krishnamani V, Deserno M, Peter C. 2013. Optimization of an elastic network augmented coarse grained model to study CCMV capsid deformation. PLoS ONE 8, e60582 ( 10.1371/journal.pone.0060582) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Vijay-Kumar S, Bugg CE, Cook WJ. 1987. Combining an elastic network with a coarse-grained molecular force field: structure, dynamics, and intermolecular recognition. J. Mol. Biol. 194, 531–544. ( 10.1016/0022-2836(87)90679-6) [DOI] [PubMed] [Google Scholar]
  • 38.Wassenaar TA, Pluhackova K, Böckmann RA, Marrink SJ, Tieleman DP. 2014. Going backward: a flexible geometric approach to reverse transformation from coarse grained to atomistic models. J. Chem. Theory Comput. 10, 676–690. ( 10.1021/ct400617g) [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

This article has no additional data.


Articles from Interface Focus are provided here courtesy of The Royal Society

RESOURCES