MDplot: Visualise Molecular Dynamics

Christian Margreitter; Chris Oostenbrink

. Author manuscript; available in PMC: 2017 Aug 24.

Published in final edited form as: R J. 2017 May 10;9(1):164–186.

MDplot: Visualise Molecular Dynamics

Christian Margreitter ¹, Chris Oostenbrink ¹

PMCID: PMC5570379 EMSID: EMS73867 PMID: 28845302

Abstract

The MDplot package provides plotting functions to allow for automated visualisation of molecular dynamics simulation output. It is especially useful in cases where the plot generation is rather tedious due to complex file formats or when a large number of plots are generated. The graphs that are supported range from those which are standard, such as RMsD/RMsF (root-mean-square deviation and root-mean-square fluctuation, respectively) to less standard, such as thermodynamic integration analysis and hydrogen bond monitoring over time. All told, they address many commonly used analyses. In this article, we set out the MDplot package′s functions, give examples of the function calls, and show the associated plots. Plotting and data parsing is separated in all cases, i.e. the respective functions can be used independently. Thus, data manipulation and the integration of additional file formats is fairly easy. Currently, the loading functions support GROMOS, GROMACS, and AMBER file formats. Moreover, we also provide a Bash interface that allows simple embedding of MDplot into Bash scripts as the final analysis step.

Availability

The package can be obtained in the latest major version from CRAN (https://cran.r-project.org/package=MDplot) or in the most recent version from the project′s GitHub page at https://github.com/MDplot/MDplot, where feedback is also most welcome. MDplot is published under the GPL-3 license.

Introduction

The amount of data produced by molecular dynamics (MD) engines (such as GROMOS (Schmid et al., 2012; Eichenberger et al., 2011), GROMACS (Pronk et al., 2013), NAMD (Phillips et al., 2005), AMBER (Cornell et al., 1995), and CHARMM (Brooks et al., 2009)) has been constantly increasing over recent years. This is mainly due to more powerful and cheaper hardware. As a result of this, both the lengths and sheer number of MD simulations (i.e. trajectories) have increased enormously. Even large sets of simulations (e.g., in the context of drug design) are attainable nowadays; thus suggesting that the processing of the resulting information is undertaken automatically.

In this respect, automated yet flexible visualisation of molecular dynamics data would be highly advantageous: both in order to avoid repetitive tasks for the user and to yield the ultimately desired result instantly (see Figure 1). Moreover, generating some of the graphs can be cumbersome. An example would be the plotting of a time series of a clustering program or hydrogen bonds. Therefore, these cases are predestined to be handled by a plotting library. There have been attempts made in that direction, for example the package bio3d (Grant et al., 2006; Skjærven et al., 2014) (which allows the trajectories to be processed in terms of principle component analysis (PCA), RMSD and RMSF calculations), MDtraj (McGibbon et al., 2015), or Rknots (Comoglio and Rinaldi, 2012). However, to the best of our knowledge, there is currently no R package available that offers the wide range of plotting functions and engine-support that is provided by MDplot. R is the natural choice for this undertaking because of both its power in data handling and its vast plotting abilities.

Shows the overall workflow typically applied in molecular dynamics simulations beginning with a single PDB (Berman et al., 2000) structure as the input for the simulation and ending with the graphical representation of the data obtained. For large amounts of data, generating figures might become a tedious, highly repetitive task.

In the following sections we outline all of the plotting functions that are currently supported. For each function, examples of the function calls based on the test data included in the package, the resulting plots, the return values, and a table of arguments are detailed. The respective code samples use the loading functions (reported below) to parse the input files located in folder ′extdata′, which allows immediate testing and provides format information to users. Currently, the package supports GROMOS, GROMACS, and AMBER file formats as input.¹ However, extensions in both format support and plotting functionalities are planned.

Plotting functions

The package currently offers 14 distinct plotting functions (Table 1), which cover many of the graphs that are commonly required. Although the focus of the package relies on the visualisation of data, in addition to this values are calculated to characterise the underlying data when appropriate. For example, TIcurve() calculates the thermodynamic integration free-energy values including error estimates and the hysteresis between the integration curves. In many cases, the plotting functions return useful information on the data used, e.g., range, mean and standard deviation of curves.

To provide simple access to these functions, they may be called from within a Bash script. Examples are provided at the end of the manuscript.

Table 1.

Lists all of the currently available plotting functions that have been implemented in MDplot. Most functions accept a boolean parameter (barePlot), that indicates printing of the plotting area only, i.e. stripped from any additional features such as axis labels.

Plot function	Description
`clusters()`	Summary of clustering over trajectories (RMSD based).
`clusters_ts()`	Time series of cluster populations (RMSD based).
`dssp()`	Secondary structure annotation plot (DSSP based).
`dssp_ts()`	Time series of secondary structure elements (DSSP based).
`hbond()`	Hydrogen bonds summary plot.
`hbond_ts()`	Time series of hydrogen bonds.
`noe()`	Nuclear-Overhauser-effect violation plot.
`ramachandran()`	Dihedral angle plot.
`rmsd()`	Root-mean-square deviation plot.
`rmsd_average()`	Average root-mean-square deviation plot.
`rmsf()`	Root-mean-square fluctuation plot.
`TIcurve()`	Thermodynamic integration curves.
`timeseries()`	General time series plot.
`xrmsd()`	Cross-RMSD plot (heat-map of RMSD values).

Argument name	Default value	Description
`clusters`	none	Matrix with clusters: trajectories are given in row-wise, clusters in column-wise fashion as provided by `load_clusters()`, the associated loading function.
`clustersNumber`	`NA`	When specified, only these first clusters are shown.
`legendTitle`	`“trajectories”`	The title of the legend.
`barePlot`	`FALSE`	A Boolean indicating whether the plot is to be made without any additional information or not.
…	none	Additional arguments.

Argument name	Default value	Description
`clustersDataTS`	none	List of cluster information as provided by `load_clusters_ts()`, the associated loading function.
`clustersNumber`	`NA`	An integer specifying the number of clusters that is to be plotted.
`selectTraj`	`NA`	Vector of indices of trajectories that are plotted (as given in the input file).
`selectTime`	`NA`	Range of time in snapshots.
`timeUnit`	`NA`	Abbreviation of time unit.
`snapshotsPerTimeInt`	1000	Number of snapshots per time unit.
…	none	Additional arguments.

Argument name	Default value	Description
`dsspData`	none	Table containing information on the secondary structure elements. Can be generated by function `load_dssp()`.
`printLegend`	`FALSE`	If `TRUE`, a legend is printed on the right hand side of the plot.
`useOwnLegend`	`FALSE`	If `FALSE`, the names of the secondary structure elements are considered to be in default order.
`elementNames`	`NA`	Vector of names for the secondary structure elements.
`colours`	`NA`	A vector of colours that can be specified to replace the default ones.
`showValues`	`NA`	A vector of boundaries for the values.
`showResidues`	`NA`	A vector of boundaries for the residues.
`plotType`	`“dots”`	Either `“dots”`, `“curves”`, or `“bars“`.
`selectedElements`	`NA`	A vector of names of the elements selected.
`barePlot`	`FALSE`	Boolean, indicating whether the plot is to be made without any additional information.
…	none	Additional arguments.

Argument name	Default value	Description
`tsData`	none	List of lists, which are composed of a name (string) and a values table (x … snapshots, y … residues). Can be generated by `load_dssp_ts()`.
`printLegend`	`TRUE`	If `TRUE`, a legend is printed on the right hand side of the plot.
`timeBoundaries`	`NA`	A vector of boundaries for the time in snapshots.
`residueBoundaries`	`NA`	A vector of boundaries for the residues.
`timeUnit`	`NA`	If set, the snapshots are transformed into the respective time (depending on parameter `snapshotsPerTime`).
`snapshotsPerTimeInt`	1000	Number of snapshots per respective `timeUnit`.
`barePlot`	`FALSE`	A Boolean indicating whether the plot is to be made without any additional information.
…	none	Additional arguments.

Argument name	Default value	Description
`hbonds`	none	Table containing the hydrogen bond information in columns "hbondID", "resDonor", "resDonorName", "resAcceptor", "resAcceptorName", "atom-Donor", "atomDonorName", "atomH", "atomAcceptor", "atomAcceptorName", "percentage" (automatically generated by function `load_hbond()`).
`plotMethod`	`”residue-wise”`	Allows to set the detail of hydrogen bond information displayed. options are: `”residue-wise”`.
`acceptorRange`	`NA`	A vector specifying the range of acceptor residues.
`donorRange`	`NA`	A vector specifying the range of donor residues.
`printLegend`	`TRUE`	A Boolean enabling the legend.
`showMultipleInteractions`	`TRUE`	If `TRUE`, this option causes multiple interactions between the same residues as being represented by a black circle around the coloured dot.
`barePlot`	`FALSE`	A Boolean indicating whether the plot is to be made without any additional information.
…	none	Additional arguments.

Argument name	Default value	Description
`timeseries`	none	Table containing the time series information (e.g., produced by `load_hbond_ts()`).
`summary`	none	Table containing the summary information (e.g., produced by `load_hbond()`).
`acceptorRange`	`NA`	A vector of acceptor residues.
`donorRange`	`NA`	A vector of donor residues.
`plotOccurences`	`FALSE`	Specifies whether the overall summary should be plotted on the right hand side.
`scalingFactorPlot`	`NA`	Used to manually set the scaling factor (if necessary).
`printNames`	`FALSE`	Enables human readable names rather than the hydrogen bond identifiers.
`namesToSingle`	`FALSE`	If `printNames` is `TRUE`, this flag instructs one-letter codes instead of three-letter ones.
`printAtoms`	`FALSE`	Enables atom names in hydrogen bond identification on the y-axis.
`timeUnit`	`NA`	Specifies the time unit on the x-axis.
`snapshotsPerTimeInt`	1000	Specifies how many snapshots make up one time unit (see above).
`timeRange`	`NA`	A vector specifying a certain time range.
`hbondIndices`	`NA`	A list containing vectors to select hydrogen bonds by their identifiers.
`barePlot`	`FALSE`	A Boolean indicating whether the plot is to be made without any additional information.
…	none	Additional arguments.

Argument name	Default value	Description
`noeData`	none	Input matrix. Generated by function `load_noe()`.
`printPercentages`	`TRUE`	If `TRUE`, the violations will be reported in a relative manner (percent) rather than absolute numbers.
`colours`	`NA`	Vector of colours to be used for the bars.
`lineTypes`	`NA`	If `plotSumCurves` is `TRUE`, this vector might be used to specify the types of curves plotted.
`names`	`NA`	Vector to name the input columns (legend).
`plotSumCurves`	`TRUE`	If `TRUE`, the violations are summed up from left to right to show the overall behaviour.
`maxYAxis`	`NA`	Can be used to manually set the y-axis of the plot.
`printLegend`	`FALSE`	A Boolean indicating if legend is to be plotted.
…	none	Additional arguments.

Argument name	Default value	Description
`dihedrals`	none	Matrix with angles (two columns). Generated by function `load_ramachandran()`.
`xBins`	150	Number of bins used to plot (x-axis).
`yBins`	150	Number of bins used to plot (y-axis).
`heatFun`	`“norm”`	Function selector for calculation of the colour. The possibilities are either: `“norm”` for linear calculation or “log” for logarithmic calculation.
`structureAreas`	`c()`	List of areas, which are plotted as black lines.
`plotType`	`“sparse”`	Type of plot to be used, either “sparse” (default, using function `hist2d()`), `“comic”` (own binning, supports very few datapoints), or `“fancy”` (3D, using function `persp()`).
`printLegend`	`FALSE`	A Boolean specifying whether a heat legend is to be plotted or not.
`plotContour`	`FALSE`	A Boolean specifying whether a contour should be added or not.
`barePlot`	`FALSE`	A Boolean indicating whether the plot is to be made without any additional information.
…	none	Additional arguments.

Argument name	Default value	Description
`rmsdData`	none	List of (alternating) indices and RMSD value vectors, as produced by `load_rmsd()`.
`printLegend`	`TRUE`	A Boolean which triggers the plotting of the legend.
`factor`	`1000`	A number specifying how many snapshots are within one `timeUnit`.
`timeUnit`	`“ns”`	Specifies the time unit.
`rmsdUnit`	`“nm”`	Specifies the RMSD unit.
`colours`	`NA`	A vector of colours used for plotting.
`names`	`NA`	A vector holding the names of the trajectories.
`legendPosition`	`“bottomright”`	Indicates the position of the legend: either `“bottomright", “bottomleft", “topleft",` or `“topright"`.
`barePlot`	`FALSE`	A Boolean indicating whether the plot is to be made without any additional information.
…	none	Additional arguments.

Argument name	Default value	Description
`rmsdInput`	none	List of snapshot and RMSD value pairs, as, for example, provided by loading function `load_rmsd()`.
`levelFactor`	`NA`	If there are many datapoints, this parameter may be used to use only the `levelFactor`th datapoints to obtain a clean graph.
`snapshotsPerTimeInt`	1000	Number, specifying how many snapshots are comprising one `timeUnit`.
`timeUnit`	`”ns”`	Specifies the time unit.
`rmsdUnit`	`”nm”`	Specifies the RMSD unit.
`maxYAxis`	`NA`	Can be used to manually set the y-axis of the plot.
`barePlot`	`FALSE`	A Boolean indicating whether the plot is to be made without any additional information.
…	none	Additional arguments.

Argument name	Default value	Description
`rmsfData`	none	List of (alternating) atom numbers and RMSF values, as, for example, produced by `load_rmsf()`.
`printLegend`	`TRUE`	A Boolean controlling the plotting of the legend.
`rmsfUnit`	"nm"	Specifies the RMSF unit.
`colours`	`NA`	A vector of colours used for plot.
`residuewise`	`FALSE`	A Boolean specifying whether atoms or residues are plotted on the x-axis.
`atomsPerResidue`	`NA`	If `residuewise` is `TRUE`, this parameter can be used to specify the number of atoms per residue for plotting.
`names`	`NA`	A vector of the names of the trajectories.
`range`	`NA`	Range of atoms.
`legendPosition`	`”topright”`	Indicates position of legend: either `”bottomright”, ”bottomleft”, ”topleft”`, or `”topright”`.
`barePlot`	`FALSE`	A Boolean indicating whether the plot is to be made without any additional information.
…	none	Additional arguments.

Argument name	Default value	Description
`lambdas`	none	List of matrices (automatically generated by `load`_`TIcurve())` holding the thermodynamic integration information.
`invertedBackwards`	`FALSE`	If a forward and backward TI are provided and the lambda points are enumerated reversely (i.e. 0.3 of one TI is equivalent to 0.7 of the other), this flag can be set to be `TRUE` in order to automatically mirror the values appropriately.
`energyUnit`	"kJ/mol"	Defines the energy unit used for the plot.
`printValues`	`TRUE`	If `TRUE`, the free energy values are printed.
`printErrors`	`TRUE`	A Boolean indicating whether error bars are to be plotted.
`errorBarThreshold`	0	If the error at a given lambda point is below this threshold, it is not plotted.
`barePlot`	`FALSE`	A Boolean indicating whether the plot is to be made without any additional information.
…	none	Additional arguments.

Argument name	Default value	Description
`tsData`	none	List of (alternating) indices and response values, as produced by `load_timeseries()`.
`printLegend`	`TRUE`	Parameter enabling the plotting of the legend.
`snapshotsPerTimeInt`	1000	Number specifying how many snapshots make up one `timeUnit`.
`timeUnit`	`”ns”`	Specifies the time unit.
`valueName`	`NA`	Name of response variable.
`valueUnit`	`NA`	Specifies the response variable's unit.
`colours`	`NA`	A vector of colours used for plotting.
`names`	`NA`	A vector of names of the trajectories.
`legendPosition`	`”bottomright”`	Indicates position of legend: either `”bottomright”`, `”bottomleft”`, `”topleft”`, or `”topright”`.
`barePlot`	`FALSE`	A Boolean indicating whether the plot is to be made without any additional information.
…	none	Additional arguments.

Argument name	Default value	Description
`xrmsdValues`	none	Input matrix (three rows: x-values, y-values, RMSD-values). Can be generated by function `load_xrmsd().`
`printLegend`	`TRUE`	If TRUE, a legend is printed on the right hand side.
`xaxisRange`	`NA`	A vector of boundaries for the x-snapshots.
`yaxisRange`	`NA`	A vector of boundaries for the y-snapshots.
`colours`	`NA`	User-specified vector of colours to be used for plotting.
`rmsdUnit`	`”nm”`	Specifies in which unit the RMSD values are given.
`barPlot`	`FALSE`	A Boolean indicating whether the plot is to be made without any additional information.
…	none	Additional arguments.

PERMALINK

MDplot: Visualise Molecular Dynamics

Christian Margreitter

Chris Oostenbrink

Abstract

Availability

Introduction

Figure 1.

Plotting functions

Table 1.

The clusters() function

Figure 2.

Table 2.

The clusters_ts() function

Figure 3.

Table 3.

The dssp() function

Figure 4.

Table 4.

The dssp_ts() function

Figure 5.

Table 5.

The hbond() function

Figure 6.

Table 6.

The hbond_ts() function

Figure 7.

Table 7.

The noe() function

Figure 8.

Table 8.

The ramachandran() function

Figure 9.

Figure 10.

Table 9.

The rmsd() function

Figure 11.

Table 10.

The rmsd_average() function

Table 11.

The rmsf() function

Figure 13.

Table 12.

The TIcurve() function

Figure 14.

Table 13.

The timeseries() function

Figure 15.

Table 14.

The xrmsd() function

Figure 16.

Table 15.

Additional functions and the Bash interface

The loading functions

Conclusions

Figure 12.

Acknowledgements

Footnotes

Bibliography

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

The `clusters()` function

The `clusters_ts()` function

The `dssp()` function

The `dssp_ts()` function

The `hbond()` function

The `hbond_ts()` function

The `noe()` function

The `ramachandran()` function

The `rmsd()` function

The `rmsd_average()` function

The `rmsf()` function

The `TIcurve()` function

The `timeseries()` function

The `xrmsd()` function