Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Dec 15.
Published in final edited form as: J Comput Chem. 2014 Oct 18;35(32):2305–2318. doi: 10.1002/jcc.23753

Lightweight Object Oriented Structure analysis: Tools for building Tools to Analyze Molecular Dynamics Simulations

Tod D Romo, Nicholas Leioatts, Alan Grossfield *
PMCID: PMC4227929  NIHMSID: NIHMS633076  PMID: 25327784

Abstract

LOOS (Lightweight Object-Oriented Structure-analysis) is a C++ library designed to facilitate making novel tools for analyzing molecular dynamics simulations by abstracting out the repetitive tasks, allowing developers to focus on the scientifically relevant part of the problem. LOOS supports input using the native file formats of most common biomolecular simulation packages, including CHARMM, NAMD, Amber, Tinker, and Gromacs. A dynamic atom selection language based on the C expression syntax is included and is easily accessible to the tool-writer. In addition, LOOS is bundled with over 120 pre-built tools, including suites of tools for analyzing simulation convergence, 3D histograms, and elastic network models. Through modern C++ design, LOOS is both simple to develop with (requiring knowledge of only 4 core classes and a few utility functions) and is easily extensible. A python interface to the core classes is also provided, further facilitating tool development.

Keywords: molecular dynamics, analysis, software, membranes, convergence

Introduction

As computers increase in performance and decrease in price, more scientists are using simulations and generating ever more simulation data. The increasing availability of supercomputing resources has only hastened the production of simulation data that now approaches biologically relevant time-scales. With the growth of this data come new questions that can be asked and new ways to analyze the data that are not always addressed by the common analysis software tools, requiring the creation of a new tool. One of the first questions an aspiring tool-writer must ask is how to implement this tool. There are three basic approaches: hook into an existing system at the source code level, extend an existing system through a scripting interface or via plugins, and write a stand-alone tool. An additional concern that is sometimes overlooked is the license for a system. Not all packages permit the tool-writer to distribute their modifications, reducing the benefits to the community at large and reducing the incentive to modifying the system to their needs.

CHARMM1, ptraj/cpptraj2,3, carma/grcarma4,5, and MDAnalysis6 are popular systems for which source code is available. Using an existing software system has the advantage of a fairly consistent interface for the user and readily available infrastructure for handling file formats and computations. The difficulty in this approach, however, is that the code may be quite large, such as CHARMM, or monolithic, such as carma/grcarma. There are nearly 14,000 lines of C code in carma, and grcarma, the graphical front-end to carma, is similarly a monolithic Perl/Tk program. Other packages, such as ptraj, are nearly monolithic. The “actions” module, which contains many analysis routines, consists of nearly 20,000 lines of C code. In these circumstances, it is often difficult to figure out where to place new code and how to integrate it within the supporting environment. Moreover, there is always the danger of side effects from the added code—unforeseen errors that manifest in other parts of the system.

Recently, ptraj was renamed to cpptraj, converted to C++, and refactored so that, among other structural changes, actions reside in separate source code units. Actions use a common programming interface via subclassing and the “command” design pattern, facilitating adding new analyses modules to cpptraj. However, there is still a barrier to adding even a simple analysis module in terms of “glue” code required to hook a new routine into the system. For example, the radial distribution function in cpptraj consists of 506 lines of C++ code, whereas the LOOS implementation (rdf) is 434 lines, 50 of which are documentation. Conceptually, actions are decoupled from the I/O and atom selection as these steps are handled by cpptraj itself. There are advantages to this approach, such as relieving the module-writer from having to worry about reading the trajectory and iterating over it. However, this decoupling may be nonintuitive for new programmers and, as we shall see with LOOS, iterating over a trajectory can be accomplished in as little as 5 lines of code, which is barely a barrier. Moreover, since actions are driven externally (by cpptraj), actions that require multiple passes through the trajectory must cache the requisite data.

Another approach to performing analysis is to integrate it directly into a visualization package such as Pymol7 and VMD8. The advantage of both approaches is that utilizing a graphical framework, and the possibility of making the analysis interactive, may significantly lower the barrier to use of the tool. Moreover, these tools have very large user communities, which means it is easier for users to find the new tool. The disadvantage is that using the scripting layer relies on existing functionality from the “host” program, which may or may not have the same flexibility in design as a library would, and it involves an added layer of execution (or interpretation) along with a corresponding increase in execution time. While the plugin approach would seem to solve the execution speed issue, the requirements of the plugin application programming interface (API) and the programming models typically used in interactive graphical applications add several more levels of complexity for even basic tools.

A similar approach is used by ST-Analyzer9, which provides an integrated graphical workbench using a web browser as a front-end. While there are clear benefits to providing a graphical interface to analysis tools from the standpoint of the end user, such as ease of use and tracking of results, it often imposes hardships on the programmer that can easily exceed those related to learning a new API, particularly an API with few classes and limited class hierarchies such as LOOS. Adding a new tool to ST-Analyser requires an understanding of the event flow used by the workbench, as well as consideration of both the front and back-ends used, which may require some knowledge of the various technologies employed (e.g. AJAX, JQuery, and SQL). The amount of interface code required for typical GUI applications can also be significant. For example, the radial distribution function in ST-Analyzer consists of roughly 1,100 lines of Python, JQuery, and HTML code. In contrast, the similar tool in LOOS (xy-rdf) is under 600 lines of C++ code. Although ST-Analyzer provides separate documentation for adding new tools, this document is nearly 20 pages long, which may be quite daunting for less seasoned programmers, or those wanting to quickly test out a new analysis idea. Finally, the implementation of the analysis routine itself will likely still rely on some library for data manipulation and analysis (e.g. MDAnalysis and NumPy) and hence their APIs must also be understood.

Finally, the tool-writer can create a stand-alone tool. While this approach significantly reduces the overhead to creating a new tool, it typically requires a library to handle file formats and basic data structures for storing structural information. MDAnalysis6 follows the toolkit, or library, paradigm. It provides a set of utility functions and classes that are designed to facilitate the creation of new tools, in addition to providing several useful tools. In many respects, MDAnalysis is a “kindred spirit” to LOOS. However, there are several important differences in both philosophy and implementation. MDAnalysis uses a more “top down” approach to design in that, as much as possible, the library is written in Python. Performance bottlenecks are identified and rewritten in C/C++. LOOS, on the other hand, uses a “bottom up” approach where most of the library is written in C++ with a Python wrapper on top of it. In practice, the differences in performance, from the Python perspective, are probably small. However, it does mean that should performance or efficient utilization of resources be required, it is possible to only use the C++ layer of LOOS, which is not possible with MDAnalysis. The second, philosophical difference, is that MDAnalysis provides a deeper class hierarchy that mimics common structural elements (e.g. residues, chains, molecules), while LOOS only uses groups of shared atoms. As we will describe later, this can have important consequences on the idioms used when writing tools, and correspondingly the compactness and complexity of the code implementing the tool.

We believe that one of the paramount goals of any analysis package should be the rapid prototyping of new techniques. It is not always clear, from the outset, what the best method is, nor is it always clear what the implementation should be. During the course of development, it is often necessary to try several different approaches. It is therefore critical to minimize the barrier to this programming “noodling”. The monolithic and GUI-based approaches impose a substantial barrier to this development methodology, while the toolkit and scripting approaches largely eliminate it, leaving the scientific process of trial and error unimpeded.

Methods

General Overview

The design of LOOS is dictated by several fundamental goals. First, it is intended to be lightweight, eschewing the more complex class hierarchies typical of modeling packages, and is therefore easy to learn. Tool developers only need to know 4 core classes (Coord, Atom, AtomicGroup, and Trajectory) and a handful of utility functions. In addition to its simple interface, LOOS has few external dependencies, simplifying installation and maintenance. The primary dependencies are the Boost C++ libraries10 and LAPACK11 and the Blas12,13, or the ATLAS equivalents14, all available through the package managers of most versions of Linux. When compiled using MacOS, LOOS will take advantage of the Accelerate framework thereby gaining both SIMD and multicore parallelization for linear algebra functions.

Another fundamental goal of LOOS is that it is easily extensible. Base classes (e.g. AtomicGroup) are used as much as possible both to fix the API as well as to collect related functions (e.g. most common geometric operations). Finding the common functions that work with an AtomicGroup is simply a matter of consulting the class documentation for AtomicGroup. Relying on base classes for functionality also encourages tools written with LOOS to be agnostic with respect to file format. As a rule, the only time the specialized classes are needed are to address specialized problems.

We designed LOOS from the outset for rapid prototyping and to be easy to use; the LOOS functions are highly expressive, so that most codes resemble scripting languages as much as C++. The relatively lean class structure of LOOS facilitates this goal, as does the grouping of related functions and our efforts to hide memory management from the tool developer. In order to aid the novice tool writer, several tool templates are also included with the distribution. These contain the framework for common analysis tasks, such as iterating through the atoms of a structure, iterating over a trajectory, or applying a transformation to the trajectory. These frameworks handle interfacing with the command-line, instantiating the appropriate objects, and reading required data. In principal, all that is required is for the tool-writer to insert their analysis code at the appropriate point.

LOOS is also designed to be powerful despite its simplicity. For example, a common problem facing MD analysis tools is how to select what parts of a system are of interest. CHARMM and VMD provide a powerful selection language, but programmatic access to the parser is difficult since the parser is intended to be used interactively. CPPTraj utilizes a masking system with a syntax based originally on MIDAS2,15. Gromacs utilizes an index file (in essence, a list of atom numbers) that can be created via different methods16. MDAnalysis, in contrast, implements a simple recursive-descent parser in Python for atom selection. The importance of this approach is that the selection can be dynamic (i.e. it can change during the course of the tool’s execution). Again, LOOS’ approach is similar to that of MDAnalysis, except that the parser is built using the standard Unix tools lex and yacc. These take a token and grammar specification, respectively, and generate code for the corresponding tokenizer and parser, making it relatively easy to extend the selection language. The selection system is made available to the tool-writer as a simple function call, but the low-level components are also exposed. A unique feature of how selections are implemented in LOOS is that they are actually compiled into a small “program” that performs the selection, and this program can be stored for later use or reuse.

A frequent problem encountered with code libraries (and the C and C++ languages in general) is the need to directly manage dynamically allocated memory. LOOS sidesteps this challenge by working mostly with Boost shared pointers allocated inside the library, effectively removing the responsibility for memory management from the developer. A shared pointer is essentially a reference-counted pointer, that is, a pointer that keeps track of how many things (objects or functions) are using the data it points to. When an object no longer needs the data, the count is decremented and, when the count reaches zero the associated memory is deallocated. Using shared pointers therefore serves as a form of garbage collection, giving LOOS developers the advantages of dynamic memory allocation without the hassles and pitfalls of explicit memory management.

Class Structure

The are four fundamental classes in LOOS: three required to represent atomic structure and one that represents molecular dynamics trajectories. The structural classes are shown in Figure 1, along with derived classes used to represent specific file formats. For illustrative purposes, only a subset of member functions are shown.

Figure 1.

Figure 1

LOOS classes used to model molecular structure and associated file formats. Only a small subset of member functions and operators are shown for illustrative purposes.

The first class is Coord, which represents atomic coordinates. This class provides overloaded operators for math involving coordinates (e.g. vector addition, dot products, and cross products), and includes facilities for handling periodic boundary conditions (rectangular boxes). The Coord class uses a template to determine the data type used internally for representing the coordinate. Typically in LOOS, this is a double, although it need not even be a floating point type. For example, the 3D histogram classes uses Coord with an integral coordinate type to represent grid coordinates.

The next fundamental class is Atom, which encapsulate atomic information, such as coordinates (via a Coord), charge, connectivity, and associated meta-data (e.g. atom number, residue number, name, etc). In practice, tools almost never use Atom objects directly. Instead, a Boost shared pointer to the atom, referred to as a pAtom, is used. The job of allocating storage for the Atom is handled internally within LOOS and it is deallocated automatically when no longer needed by the shared pointer “wrapper.” There are additional advantages to using shared pointers beyond issues in memory management. Copying of atoms is “lightweight” in that only the pointer needs to be copied, not the data pointed to. In addition, sharing pointers to the same atom means that if one function updates the atom, all shared copies of the atom are also updated (for example, when new coordinates are read from the next frame from a trajectory). This will have important implications in how LOOS emulates traditional structural hierarchies of atoms.

The natural inclination when designing an object-oriented library for simulations analysis is to begin to model real-world components with classes, e.g. an atom class, a residue class, a chain class, etc. One difficulty with this approach is that the correspondence is not always clear or appropriate. Should a lipid be considered a residue, or a chain? This approach also leads to a large number of classes to model the real-world systems and their hierarchies. Another issue is an implicit tight coupling between the classes and their modeled components. In the older CHARMM2717,18, lipids were represented as a multiple “residues” whereas in CHARMM3619, lipids consist of a single residue. Such a change requires either a reinterpretation of the associated class, or a redesign of that class.

Yet another difficulty with these hierarchical representations is how to best “drill down” and iterate at the level that the tool-writer is interested in. For example, a loop over all atoms might look like,

for molecule in world:
    for lipid in molecule:
        for residue in lipid:
              for atom in residue:
                  do_something(atom)

Such an organization requires hard-coded nested iterations, which would quickly made the code messy and difficult to read, considering how often this task is performed. Alternatively, the library can provide another method to iterate over all atoms. However, this typically requires internal structures to support the iteration. Even more problematic is at what granularity should iteration be supported: atoms, residues, chains, or all of them?

The approach LOOS takes is to dispense with explicitly modeling a hierarchy. Instead, only a collection of atoms is supported, called an AtomicGroup, which is the third fundamental class in LOOS. When coupled with an extensive system for selecting atoms (usually via atom metadata), virtually any hierarchy can be implicitly modeled. For example, selecting a range of residue ID’s returns an AtomicGroup containing all of the corresponding atoms (or strictly speaking, pointers to the atoms selected). If the tool needs to operate only on residues from that range, then this group can be further split into an array of AtomicGroup objects, each containing the atoms corresponding to a single residue.

AtomicGroup can also be considered the “workhorse” class because it collects many common functions that operate on groups of atoms. These include calculating the center of mass, radius of gyration, principal axis vectors, alignment, and basic set operations, as well as aligning one group onto another. One of our design goals was to reduce all common tasks into 1-line operations; this not only simplifies new code, but makes existing code more expressive and comprehensible.

As mentioned above, AtomicGroup actually contains not Atom objects, but rather pAtom’s (Boost shared pointers to Atom’s). When an AtomicGroup is copied, or a set of atoms are selected and returned as a new AtomicGroup, the common Atom objects are shared between the two groups. Changes made to an atom in one group will also appear in the other group. This property is illustrated in Figure 2. The “Model” AtomicGroup contains the pAtom’s representing the entire system. The “Heavy Atoms” group was created from the “Model” group by using a non-hydrogen selection. Similarly, the “Backbone Atoms” group was created by selecting the backbone atoms by their name meta-data. Each AtomicGroup object has its own pAtom objects, but share the underlying Atom objects. Altering the first backbone atom will also alter the corresponding atom in the “Model” and “Heavy Atoms” groups.

Figure 2.

Figure 2

Sharing Atom objects between different AtomicGroup’s

Reading the native file formats for different packages is handled by subclasses of AtomicGroup, but the tool-writer normally does not interact with these specialized classes. Instead, a factory function is used to read in a file regardless of the format, returning an AtomicGroup; as a result, LOOS programs are by default format-independent. For example, the following code will read in models stored in different package formats,

AtomicGroup namd_model = createSystem("model.pdb");
AtomicGroup gro_model = createSystem("model.gro");
AtomicGroup amber_model = createSystem("model.parmtop");
AtomicGroup tinker_model = createSystem("model.xyz");

The only exception is when writing out a structure. The only format supported for output is the PDB format, and involves converting an AtomicGroup into a PDB object (using a class function) and printing it out.

The final fundamental class is the Trajectory class, which provides an abstracted interface to all of the trajectory filetypes supported by LOOS (see Table 1). Every supported filetype derives from this class (i.e. DCD implements the CHARMM/NAMD trajectory format and is a child of Trajectory). Since Trajectories need to behave polymorphically, pointers must be used to the appropriate Trajectory-derived object. These are passed as Boost shared pointers called pTraj’s (a typedef, similar to pAtom, used to make the code simpler). It is also worth noting that LOOS does not restrict the use of a trajectory type with a specific model, i.e. it is possible to mix and match model file types with different trajectory formats, e.g. combining a model defined by a (NAMD) PSF file, and a GROMACS trajectory,

AtomicGroup model = createSystem("model.psf");
pTraj traj = createTrajectory("simulation.xtc", model);

Table 1.

File formats supported by LOOS for reading in data

Format Class Name Type File Extensions
Amber Parmtop Amber Model prmtop
Amber NetCDF AmberNetcdf Trajectory mdcrd, crd, nc
Amber Restart AmberRst Trajectory (single frame) inpcrd, rst, rst7
Amber Trajectory AmberTraj Trajectory mdcrd, crd
Concatenated PDB CCPDB Trajectory pdb
CHARMM Coordinates CHARMM Model crd
NAMD DCD DCD Trajectory dcd
GROMACS Model Gromacs Model gro
NAMD PDB PDB Model pdb
Multi-PDB Trajectory PDBTraj Trajectory
NAMD PSF PSF Model (no coordinates) psf
Tinker Arc TinkerArc Trajectory arc
Tinker XYZ TinkerXYZ Model xyz
GROMACS Trajectory TRR Trajectory trr
GROMACS Compressed Trajectory XTC Trajectory xtc

This can on occasions be convenient, because not all file formats provide identical information; PSF files contain information about connectivity and partial charges, while GRO files do not.

Since a common pattern for analysis tools is to work on each frame in a trajectory, a Trajectory object can also be used as an iterator.

pTraj traj;
AtomicGroup model;

while (traj->readFrame()) {
  traj->updateGroupCoords(model);
  analyze(model); // Do some analysis task
}

In order to ensure consistent behavior of the iterator, seeking functions are implemented using the “Non-Virtual Interface” (NVI) idiom. The public seek and read functions maintain the state of the iterator while hidden implementation-specific functions are called to perform the actual seeking. For trajectory formats where the location of a frame is not easily calculable (e.g. GROMACS XTC), LOOS will scan the trajectory at initialization of the Trajectory object and build an index into the file for each frame, permitting rapid random-access to any frame in the trajectory. This is not necessary for formats with a fixed frame size, such as CHARMM/NAMD DCDs.

Associating coordinates within a frame of a trajectory to a given atom can be problematic. LOOS solves this by including extra metadata with each Atom that specifies its location (or index) into the trajectory frame. This index is independent of the atomid property and is set when an AtomicGroup is read in by one of the derived-classes that represent specific file formats. The index is determined by the order the atoms appear in the model file, and are assumed to match the corresponding ordering of data within the trajectory frames.

Selection Language

Selecting atoms in LOOS is based on the idea that a selection is really just a filter, picking atoms based on specific properties and placing them into a new AtomicGroup. The selection language LOOS uses is loosely based on the C expression syntax. Atom properties are bound to keywords (summarized in Table 2), such as id for the atom-id and resname for the residue name. In addition, there are special keywords, such as all which matches everything, none matching nothing, and hydrogen matching only hydrogen atoms (based on name or molecular weight). The LOOS operators are summarized in Table 3, which includes the standard C relational operators. There are also two special operators. The first, =~, permits regular expression matching that gives the user a powerful pattern-matching system as well as substring matching. The second, ->, extracts numbers from within strings (e.g. residue names, segment names, etc). Logical operators (&& and ||) as well as parenthesis are also defined. Operator priority follows the C-convention. As an example, selecting all alpha-carbons in LOOS is written as,

name == "CA"

Table 2.

LOOS Keywords

Keyword Atom Property Type
name atom name string
Id atom id number
resname residue name string
resid residue number number
segid segment name string
segname segment name string
chainid chain ID string
All always true boolean
none never true boolean
hydrogen true if atom is a hydrogen boolean

Table 3.

LOOS Operators

Operator Operation Example
> greater than resid > 10
< less than resid < 10
>= greater than or equals resid >= 10
<= less than or equals resid <= 10
== equals name == "CA"
!= not equals segid != "SOLV"
=~ regular expression match name =~"(CA|C|N|O)"
&& logical and name == "CA" && segid == "PROT"
|| logical or segid == "SOLV" || segid == "BULK"
! logical not !hydrogen
not logical not not hydrogen
-> number extraction from string (segid -> "L(\d+)") < 100

Selecting a range of residues based in their resid is simply,

resid >= 10 && resid <= 20

To select only non-hydrogen0 atoms from the same block of residues, the previous selection is combined with one to exclude hydrogens,

( resid >= 10 && resid <= 20 ) && !hydrogen

In this case, because of operator precedence and the left-to-right parsing of the expression, parentheses must be used. An example of using regular expressions is how to select backbone atoms in LOOS,

name =~ "^(C|O|N|CA)$"

This selection matches any atom whose name is exactly “C”, “O”, “N”, or “CA”. Since regular expressions would normally match a substring, it is necessary to “anchor” the strings we want to match by using the ^ and $ operators. In fact, LOOS (via Boost) supports most of the Perl-extensions to regular expressions enabling very sophisticated pattern matching.

Although the LOOS selection language is more verbose than other systems, such as VMD or Gromacs, this verbosity can be an asset in that it makes the selections easier for the user to read and understand. The drawback, however, is that repeatedly entering long expressions can be tiring and error-prone. A simple trick of the Unix shell can eliminate this problem. The selection is stored in a text file and is substituted in-place by the shell. For example, the vibrational subsystem elastic network model tool vsa20 requires a subsystem and an environment selection. Selecting the trans-membrane (TM) helices of a G protein-coupled receptor (GPCR) would require 7 different residue range selections (i.e. resid >= A && resid <= B). If the subsystem selection was stored in a text file called subsystem, then it could be used in the command line as follows,

(‘cat subsystem‘) && name == ’CA’

This combines the TM helices selection with another selection that picks only alpha-carbons. The environment can then be any alpha-carbon not part of the TM helices selection,

!(‘cat subsystem‘) && name == ’CA’

A unique feature of the selection system used in LOOS is that it is implemented using the “command” design pattern and a minimal virtual machine with the actual “selection” handled by predicate objects. This has two benefits: ease of modification and the storing of selection “programs”. Since each operation in the language is encapsulated by an object, it is straightforward to add new operations. Moreover, each selection expression is converted into a set of objects that implement the selection. These can be stored for later use and re-used without the overhead of parsing the original selection string. The infrastructure for turning strings into selections, as well as the components of the selection system, are exposed to the tool-writer. A selection can be made with a single system call, or it can be built up using the low-level components for efficiency. This is in contrast to environments such as CHARMM, where there is an extensive and powerful selection system, but one that is difficult to access from within Fortran.

Extending Loos

LOOS was designed to make it easy to add support for new file formats. To do so, one simply derives a new class from AtomicGroup and provides an appropriate read() function. Adding a new trajectory format is only slightly more complicated. A new class is derived from Trajectory and a handful of functions need to be defined, such as parsing frames, seeking to a frame, returning coordinates, and periodic box information. In both cases, once the code to provide the format-specific functionality is written, very little additional “glue” is needed to integrate with LOOS, such as adding the new object to the factory functions used to read files.

Extending the selection language is also straightforward. Since the selection language is built using the standard Unix tools lex and yacc, the high-level source for the tokenizer and parser are available and easy to modify. All that remains is to create a new class to handle the corresponding action (derived from the Action class). That said, the parser appears to be feature-complete at this point, in that it contains access to the commonly used metadata currently stored in the Atom datastructure.

Python Interface

The C++ language itself can be an impediment to creating new tools for those who are new to programming, due to the complexity of the language (in actuality, a federation of languages21), the difficulty in unwinding compilation errors from the volumes of messages compilers typically emit, and the time involved in the write-compile-debug cycle. Python, in contrast, is readily accessible to new programmers, provides many high-quality higher-level constructs and libraries, and requires no explicit compilation step. Exposing the core library and classes of LOOS to Python greatly expands number of people who can create their own custom tools using LOOS and reduces the development time for most common tasks. The Python interface is useful even for experienced programmers who wish to do many common tasks, such as calculating angles, distances, and various distributions, to more complex tasks such as building and inserting peptides into a membrane system.

The Python interface to LOOS (PyLOOS) is implemented using the Simplified Wrapper and Interface Generator (SWIG)22. Only the core classes are exposed, such as AtomicGroup, Coord, Atom, Trajectory, and required dependent classes. In addition, the PDB and DCDWriter classes are available for writing the respective file types. Many utility functions are available as well, including the selectAtoms function that invokes the selection language. The Boost shared pointers used by LOOS can be transparently used in Python. It should also be noted, for those with a C++ background, that there is a nuance to Python assignments, in that it is in fact setting a reference or alias, not a copy operation. This aspect of Python and PyLOOS is described in more detail in Supplemental Information S1.

Some C++ constructs simply do not translate well into Python, such as template classes and policy classes. With a few notable exceptions, such as templates from Boost and STL, we have avoided providing a Python interface to these features of LOOS. Similarly, low level functions and C++ idioms that might be confusing to novice programmers, particularly when translated into Python, have been omitted. Much of this functionality however is available through higher level functions, albeit with some loss in performance (see Table 4 for benchmarks). Any loss however is generally offset by the ease of prototyping code in Python and by its accessibility.

Table 4.

Comparing common tasks in C++ and Python with LOOS. All times reported in seconds as average of five runs. The systems used were LfB in water (LfB), Opsin molecule in membrane with water (Opsin), and a cannabiniod receptor dimer bound to cognate G protein in a membrane with water (GPCR-complex). These timing runs were performed using LOOS v2.1.1 on a modern workstation (Intel i7-3770 CPU @ 3.40GHz, 32 GB memory).

System
Task Language LfB Opsin GPCR-complex
Iteratively Align Structures C++ 3 58 276
Python 20 80 394
All-to-all RMSD C++ 91 170 484
Python 148 233 534
Inter-atomic Distance C++ 0 6 37
Python 0 6 38

Trajectory size (GB) 0.15 2.25 12.19
System size (atoms) 2,746 46,210 276,122
Number of Frames 4,285 4,058 3,678
Selection Size (atoms) 24 205 1,076

In addition to supporting most of the commonly used C++ idioms in PyLOOS, more Python-esque ones are also supported. For example, it is more natural for Python to iterate through a trajectory using a for-loop rather than the while-loop/updateGroupCoords() that is typical in C++, although we do make the latter available as well. PyLOOS includes two higher-level trajectory classes—PyTraj and PyAlignedTraj—that wrap the lower-level LOOS trajectory. Instances of these classes are iterable and can be sliced, like Python arrays and lists. In addition, the set of frames to be used from the trajectory can be configured at instantiation, such as skipping the first frames (equilibration) and skipping frames at each iteration (striding for data reduction). For example,

model = createSystem(‘foo.pdb’)
traj = createTrajectory(‘foo.dcd’, model)
ptraj = PyTraj(traj, model, skip = 50, stride = 2)

for frame in ptraj:
    calculate(frame)

Here, the trajectory is wrapped in a PyTraj. The first 50 frames are skipped, and every other frame afterwards is skipped.

The PyAlignedTraj is similar, except that the entire trajectory is first optimally aligned using an iterative procedure.23 The trajectory is not altered and not stored in memory; only the transforms needed to align each frame are stored. As each frame is retrieved, it is transformed before being passed to the user code.

Performance is a valid concern when using Python code for MD analysis, particularly considering the layers of code imposed by the SWIG interface between Python and C++. However, in practice, the performance of PyLOOS is quite reasonable, since much of the computation is performed by the underlying C++ library. When more of the work shifts to being done in Python, such as double-loops over computations, PyLOOS performance can suffer.

The relative performance of PyLOOS is illustrated in Table 4, which compares the performance of common tasks using only the C++ library vs using PyLOOS. Three different system sizes are also compared. As the system size increases, proportionally more time is spent within the core C++ library and PyLOOS performance moves toward that of the C++ implementation. For example, in the all-to-all RMSD benchmark, nearly half of the execution time is spent reading in the trajectory, which is dominated by the C++ code. The bulk of the remaining time is spent calculating the least-squares superposition using a C++ member function from AtomicGroup.

Results and Discussion

Bundled Tools

Beyond the development libraries, LOOS is also distributed with roughly 140 pre-written tools; most were initially developed for internal use in our lab, and were added to the distribution because of their general utility. These tools implement analysis tasks commonly used in macromolecular molecular dynamics for protein and membrane systems. The tools can be divided into a set of core tools, representing the more common tasks, and a set of four packages that cover hydrogen bonding, assessing the convergence of simulations, constructing and analyzing elastic network models, and building 3D histograms for visualization. The packages are a logical grouping of more specialized tools. Frequently, these tools also require libraries that would not fit cleanly into the core of LOOS. A subset of the tools are listed in Table 5.

Table 5.

Examples of tools and packages that are included with LOOS.

Core Tools
aligner Optimally align trajectory
contact-time Time-series of atom contacts
density-dist Electron, mass, or charge density along z-axis
merge-traj Merge and subsample trajectories
order_parameters Order parameters analogous to 2H quadrupolar splitting for lipid chains
rdf Radial distribution function
rmsds All-to-all RMSD for one or two trajectories
svd Singular Value Decomposition of a trajectory (PCA)
xy_rdf Radial distribution function in the x,y-plane

Convergence Package

block_average Block average of arbitrary time-series data
coscon Cosine content of a trajectory
decorr_time Decorrelation time of a trajectory
bcom, boot_bcom Block covariance overlap method for determining convergence and sampling

Density Package

blobid Segment a density grid and find non-contiguous blobs of density
grid2xplor Convert density grid to X-plor electron density map format for visualization
near_blobs Find residues in a trajectory that are near a blob of density
water-hist 3D histogram of atoms in a trajectory

Elastic Network Models

anm Anisotropic network model
enmovie Visualize ENM motions by generating a trajectory based on an ENM solution
vsa Vibrational subsystem analysis

Hydrogen Bonds

hbonds Find occupancies of putative hydrogen bonds in a trajectory
hcontacts Time-series of possible intra- and inter-molecular hydrogen bonds
hcorrelation Time-correlation of putative hydrogen bonds

LOOS tends to follow the “Unix Philosophy” in the design of its tools; that is, tools are short, simple, and modular. This is in contrast to monolithic tools such as ptraj/cpptraj3 and CHARMM1. The advantages of the small, modular approach include ease of maintenance, increased flexibility (using any Unix shell or a high-level language, such as Perl or Python, to combine tools into analysis pipelines), and ready code re-use. Often, a new analytical method is similar to an existing method (tool), and can be written by copying and modifying the existing code. This is a far easier proposition when the tool is focused and is small, because no glue code is needed to integrate the new functionality into a larger package. That said, there are a few exceptions, there multiple options are combined into a single tool. For example, the merge-traj tool not only efficiently combines trajectory fragments into a single large file, it also downsamples, recenters molecules, and fixes issues with periodic imaging. While including this functionality makes the tool somewhat complex, the alternative would be to require several passes to create a cleanly curated trajectory, which would consume significant time and disk space for large data sets.

A drawback to the Unix philosophy for tools is that they can be difficult to learn, particularly if multiple authors do not use the same command-line options. Internally, LOOS uses a framework to handle the command-line that is built using the strategy design pattern24 and utilizes ProgramOptions from BOOST to parse the command-line. Common sets of command-line options and arguments are represented by different classes (e.g. ModelWithCoords) that may be combined as needed by the tool. This approach helps to canonicalize how the command-line is handled by most LOOS tools. It is important to note that this framework is primarily for tools distributed and built with LOOS, and is entirely optional for the end-user/programmer. In our lab, we often find that we initially develop tools with a minimal command line interface, and only apply the full program options interface when preparing the tools for public release.

The other challenge with the “many small tools” approach can be figuring out which is the right tool and how to use it. With LOOS, we take a two-tiered approach to this problem. First, the documentation contains a list of all tools included with LOOS (including the packages) with a short description of each, including a reference to the paper describing the method, where appropriate. Second, nearly all tools support the options --help and --fullhelp. The former simply lists all available command line options, but the latter is far more expansive. The full help provides a detailed description of the tool, including use cases, example command lines, descriptions of logical workflows (“use tool A, then take the output and run it through tool B”), alternative tools (“use tool C instead if you want to do X”), and potential pitfalls (“this tool assumes you have already centered the membrane at z = 0”).

Basic Tools

Here, we present several typical use-cases of LOOS covering both the core bundled tools and most of the included packages. We will illustrate basic molecular dynamics simulations analysis with LOOS followed by membrane-specific applications. Next, we will examine how LOOS can be used to create Elastic Network Models (ENMs) and how these can be compared with MD simulations. We will also describe using LOOS for assessing sample convergence in simulations. Finally, we will give examples of using the “density” package for visualizing water and lipid density histograms created from simulations.

The typical work-flow for analyzing molecular simulations, particularly on-going ones, involves three steps: merge the latest trajectories into the “master” trajectory, align the system, and calculate something of interest. Simulations, through checkpointing, generally consist of a set of trajectories that grow over time. It is convenient to combine these chunks into a single trajectory for further processing and analyses. As discussed above, the merge-traj tool takes an existing “master” trajectory and the set of source chunks that have been extended with the latest simulation results, and can intelligently append the new results, correctly handling periodic boundary conditions, including molecules broken across the periodic boundary. It is often convenient to remove certain motions from the system, e.g. aligning the protein in a soluble system, or putting a membrane’s center at z=0; doing so at the point of merging simplifies things, because all future analysis tools can safely assume the coordinates are well curated. The aligner tool provides greater functionality, optimally aligning the trajectory using either an iterative procedure23 or a reference structure. Additional options are available, such as aligning in x and y only (for example, to preserve the tilt along a membrane normal aligned with z) and restricting translations in z.

Tools for more of the common analysis tasks are already included with LOOS. For example, many tools will output a timeseries for a particular geometric quantity: the distance between sets of atoms can be found with interdist. Basic measures for shape (e.g. bounding box, radius of gyration, and principal axes) for a user-specified set of atoms can be calculated with molshape. The torsion between the centroids of four sets of atoms can be found with torsion. The χ1 and χ2 side chain angles can be found using the rotamer tool, and backbone φ and ψ angles can be found with ramachandran. In addition, backbone torsions for nucleic acids, as defined by Wadley et al25 are supported. Conformational flexibility can be assessed per residue by the root mean square fluctuations (RMSF) using the rmsf. The root mean square difference from a reference structure can be determined with the rmsd2ref tool. The RMSD to the average structure can be easily found by first using averager to find the average, and then rmsd2ref to calculate the RMSD. The radial distribution function (RDF) is another common analytical method. LOOS includes tools for calculating a RDF two different ways. The first, rdf, takes two selections and treats the selections an groups that can be split into separate residues, molecules, segments, on left as a single unit. The second, atomic-rdf, instead treats each selection as a set or individual atoms rather than groups.

If you want to track the process by which to molecules interact (e.g. a peptide binding to a lipid bilayer), one way to do so it to track the contacts formed between individual residues on the peptide and specific components of the lipids.2628. The fcontacts will create a matrix of time series showing the fraction of contacts made between a “probe” set of atoms and a series of “target” sets, vs all possible contacts with the probe. The tool contact-time can be used to obtain the raw contact count for desired pairs. Finally, similar to the use of intramolecular contacts as a reaction coordinate for protein folding, native.contacts and transition_contacts count the number of contacts broken and formed over a trajectory and are useful for tracking structural transitions29.

Membrane Systems

Membrane systems often require special consideration, because unlike soluble macromolecules, they intrinsically break the isotropic symmetry of the simulation: the membrane normal (usually z) is unique. For this reason, there are a host of analysis tasks that are specific to membrane systems. LOOS includes a number of tools designed with these considerations in mind. For example, the lateral structure of a membrane system can be analyzed using the xy_rdf tool, which computes the radial distribution function for one set of atoms about another, but where only the x,y distance is used27,28. The distribution of either charge, mass, or electron density can be computed along the z axis (i.e. along the membrane normal) using density_dist. This can be useful for detecting changes in the membrane thickness, or locating molecules (e.g. peptides) depth when attached to a membrane26.

A standard measure of membrane structure is the order parameter,

SCD=123cos2θCD1 (1)

where θCD is the angle between the carbon-hydrogen bond and the membrane normal. This quantity is easily calculable from MD membrane simulations and can be determined experimentally by measuring deuterium quadrupolar splitting using solid-state NMR. The LOOS tool to perform this, orderparams, includes a novel method for estimating the statistical error in the order parameters: it applies block averaging30 to the instantaneous value of the order parameter for a given carbon, averaged over the full system; this approach accounts for both temporal and spatial correlations in the data which, when neglected, cause the statistical errors to be dramatically underestimated. An example order parameter plot with estimated errors is shown in Figure 3A. The order parameters are calculated from the palmitoyl chains of POPE (1-palmitoyl-2-oleoyl phosphatidylethanolamine) from a 400 ns all-atom simulation of a neat bilayer system consisting of POPE and POPG (1-palmitoyl-2-oleoyl phosphatidylglycerol) in a 3:1 ratio in solvent26.

Figure 3.

Figure 3

Examples of membrane analysis methods included with LOOS. (A) Lipid order parameters with error estimation. (B) Distance-based lipid molecular order parameters (C) Lipid orientation map

These order parameters, though useful, are problematic for coarse-grained representations, which lack hydrogens (or even most of the carbons). LOOS includes two tools for these cases. The first, mops, implements a molecular order parameter: for each molecule in a selection, the principal axes are determined. The second and third axes are treated as fake C-H bonds, and used in EQ 1. This generates a single value giving the average molecular order parameter for the trajectory. However, when something is bound or inserted into a membrane, what is often interesting is how the membrane order is perturbed. These changes may be difficult to detect with a global order parameter, especially if the concentration of perturbants is small. The second tool, dibmops, addresses this by binning the molecular order parameter by the lateral distance to the nearest perturbant on the same leaflet. This approach is also useful for all-atom simulations since it can reveal the short-range effects that would have otherwise been lost in the aggregate order parameters. An example of this is shown in Figure 3B, which is taken from an all-atom simulation of a lactoferrin-derived hexapeptide in a bilayer consisting of a 3:1 mixture of POPE:POPG.

Another useful analytical approach for membrane systems is to calculate the 2D distribution of a membrane property about a protein (e.g. the height along the membrane normal, number density, or orientation). This can be represented as a “heat map” of values or a vector field, such as the in-plane “orientation field”. The LOOS tool membrane-map implements these calculations. The tool uses a virtual base class to define the interface for calculating membrane properties, with concrete calculators inheriting from it (e.g. densities, molecular order, etc). This design makes it simple to add new properties to the tool. An example of using the membrane_map is shown in Figure 3C; the data comes from a 1.6 µs all-atom simulation of rhodopsin in a bilayer with explicit water31. The vectors show the average orientation of the PC (phosphatidylcholine) head groups about the rhodopsin, projected onto the membrane plane (x, y-plane).

Packages

In addition to the tools bundled in the main distribution, LOOS includes four packages that contain specialized tools and additional libraries useful to the advanced user. Frequently, related tools rely on common code, which is refactored into libraries. However, a fundamental goal of LOOS is to keep the core library and API as simple and compact as possible. Keeping these “package” libraries distinct allows code re-use while maintaining an uncluttered core library. Currently, there are packages for assessing the convergence of MD simulations, constructing and analyzing elastic network models, finding hydrogen bonds based on geometrical properties, and the construction of a basic 3D histogram from trajectories. New packages are planned in the future, including Voronoi analysis of membrane systems, calculation of solid state NMR spectra, and construction of membrane-protein systems.

Elastic Network Modeling

LOOS provides the capacity to calculate and analyze Elastic Network Models32 (ENMs) through the “ElasticNetworks” package. ENMs are a method for computing macromolecular fluctuations using a harmonic model. This is done by reducing the system to a network of nodes connected by springs. Such modeling can be done at a number of resolutions, but typically single residues (modeled by the Cα or phosphate position) are used33,34. One then solves for the normal modes of motion analytically by diagonalizing the Hessian matrix. Normal mode analysis yields a set of eigenvalues and eigenvectors. Each eigenvector describes a direction of motion that is applied to all atoms in the system. The eigenvalues contain the frequency of its paired eigenvector. These simple models have been shown to predict relevant motions in diverse systems such as HIV reverse transcriptase, GroEL, the ribosome, GPCRs, and many others26,3539. The accuracy of predicted large-scale fluctuations can also be as good as hundred-nanosecond scale all-atom explicit solvent simulations37,40,41. These models are quite useful for predicting large-scale (slow) fluctuations and testing many hypotheses very quickly while requiring only modest computational resources. Indeed, normal mode analysis on a 200–400 node network (a typical monomeric protein with one node per residue) can be performed on a modern desktop in about a minute.

The “ElasticNetworks” package contains tools to define connectivity and resolution for an ENM, to perform the normal mode analysis, and to analyze the resulting motions. Several popular ENM parametrizations are already implemented, including the standard distance-based cutoff42, the harmonic CA (HCA)43, and the parameter-free44 approaches. In addition, the isotropic guassian network model33 (gnm), anisotropic network model42 (panm), and vibra-tional subsystem analysis20 (vsa) methodologies are all implemented. The resulting eigensets are written as ascii-formatted matrices that are easily imported into Matlab/Octave, Python NumPy, or other LOOS tools for analysis. An object-oriented library is also provided for the rapid development of new parametrizations and new methods. Factory functions are used to control the spring function by the bundled tools, so new spring functions can be used by the existing tools with minimal additional code.

The general workflow for using ENMs in LOOS is shown in Figure 4:

  • Select a structure (molecule you are interested in) and spring definition (Figure 4A)

  • Calculate the normal modes via Hessian matrix diagonalization (Figure 4B). This can be done using any of the three LOOS-bundled models listed above (gnm, anm, or vsa).

  • Analyze the results using tools provided, or written in the language of your choice (Figure 4C). LOOS libraries may be imported into C++ or Python (see Section ).

Figure 4.

Figure 4

Using elastic network models. LOOS provides tools and libraries for normal mode analysis of elastic network models. (A) Construction of an ENM. The cartoon structure of a protein is shown in blue, with black spheres representing α-carbons. The yellow sticks connecting α-carbons illustrate the springs in a stand distance-cutoff ENM (as defined in ref. 42). (B) A representative Hessian matrix of an ENM. Normal model analysis of this matrix yields collective motions (This figure reproduced from ref. 40) (C) Reconstruction of a low-frequency motion. The yellow vectors indicate the direction of a given eigenvector (or normal mode). The relative length of these sticks are proportional to each α-carbon’s contribution to the mode.

LOOS includes several pre-built tools for analyzing network model results. For instance, the tool porcupine was used to create the illustration in Figure 4C. Here, the yellow vectors indicate the direction (and relative magnitude) of a particular normal mode. Similarly, the tool enmovie will create a dcd-format “trajectory” where atoms are displaced along a user-specified normal mode; the resulting motions can be easily visualized with standard tools, like VMD. LOOS also includes tools for quantitative ENM analyses, including the prediction of isotropic B-factors (eigenflucc) and comparisons between eigen-spaces (coverlap45) coming from both ENMs and PCA results from MD simulations.

Assessing Simulation Convergence

When discussing MD simulations, the terms “convergence” and “equilibration” are often used imprecisely. What is really of interest to the researcher is the statistical uncertainty in the average of an observable quantity, f, that depends on a structural configuration x⃗. The usual quantity of interest is therefore the standard error, f(x⃗), computed as

SE(f)=σfN (2)

However, this equation only applies to independent and ideal sampling, assuming that σf is the standard deviation of f(x⃗) and that N is the number of samples46. In a typical molecular dynamics trajectory, however, there are significant correlations from one frame to the next. This means the number of independent samples is far smaller than the number of snapshots (otherwise, one could reduce statistical error by writing out frames more often). However, knowing just how much smaller N is a difficult problem.

One simple method is to calculate the autocorrelation time of the observable itself. However, this analysis can be misleading—an apparently fast relaxation can be coupled to a much slower, but relevant process23,47,48. To improve on standard methods, several groups have developed techniques that estimate slow relaxation times based on global protein mo-tions45,46,4954. However, in previous work by Romo et al.54 it was suggested that there is no single ideal analysis, so that the best practice is to use multiple tests, coupled with intuition.

LOOS provides several tools in the core LOOS for a qualitative assessment, along with a “Convergence” package containing more sophisticated methods for a more quantitative analysis. In general, all of these tools are fast, especially when compared to the time required to run a simulation.

The simplest test is to plot the RMSD between the relevant structures from each step in the simulation with their average structure. The LOOS tool rmsd2ref implements this, using an average structure computed from an optimally aligned ensemble, determined by an iterative alignment method23. An example of this is shown in Figure 5A for a 4 µs all-atom simulation of rhodopsin. While a divergent plot is a sure sign of problems with the simulation, a “converged” plot is not indicative of a converged simulation, as the ensemble of structures within a given RMSD radius can be quite large. A better test is the all-to-all RMSD plot (Figure 5B), where every snapshot is compared to every other49,55 using the rmsds tool. Conformational substates (ensembles of similar structures) appear as blocks along the diagonal. When substates are revisited, off-diagonal blocks appear. The presence of blocks along the diagonal and few, if any, off-diagonal is a good indication that the relative populations of the different states is not well sampled. A similar qualitative approach is to plot the phase-space projection of the trajectory onto the first few principal component modes55,56. The svd tool computes the singular value decomposition (SVD)56 of the trajectory (equivalent to principal component analysis). The left singular vectors (LSVs) are the eigenvectors, or principal components, of the system while the right singular vectors (RSVs) are the projections of the trajectory along the corresponding LSVs. Plotting the first few RSVs can show the presence of substates and their transitions, or lack thereof. The phase-pdb tool creates a fake structure where each projection point in the phase space formed from three modes, is an atom and all points are connected by bonds. This phase portrait can be viewed in 3D in most molecular visualization programs (e.g. PyMOL, VMD, etc). Here, substates and their transitions appear as “beads on a string” in the case of poor sampling, and multi-lobed “furrballs” with better sampling57.

Figure 5.

Figure 5

Common methods of assessing quality of sampling using a 4 µs long all-atom rhodopsin simulation. (A) RMSD to the average structure. (B) All-to-All RMSD. Pairwise comparison of all structures from the dataset. (C) Cumulative histogram of ionone ring torsion.

Another simple test, particularly when a single quantity is of interest, is to examine how its distribution changes over the simulation. This can be found by calculating a set a histograms for the first n samples from the trajectory, where n increases from a small value up to the length of the trajectory. Alternatively, the histogram can be constructed from a window slid along the trajectory. The LOOS tool chist implements these methods, giving output suitable for graphing with gnuplot. Figure 5C shows the change in the distribution of the torsion for the ionone ring in the retinal from the same 4 µs all-atom simulation used previously, using the first method in chist. It is distressingly clear that it can take a very long time for the distribution of even a simple quantity to converge when it is coupled to slower modes.

More quantitative methods for assessing sampling and convergence can be found in the “Convergence” package. For example, Hess showed that the time course along principal components for a randomly diffusing system with high dimensionality appeared very cosine-like45,49. When a system is poorly sampled, the right singular vectors for the lowest-frequency, largest-amplitude modes, will resemble cosines, a property he quantified as the cosine content. The LOOS tool (coscon) calculates this quantity (see ref. 45, Equation 12) using the simulation’s principal components. This is done using a block-averaging approach so that the quantity can be tracked as a function of simulation time.

Another approach was developed by Lyman and Zuckerman, using structural histograms to calculate an effective sample size52,53. The effsize.pl tool provides a Perl front-end to drive this analysis. This effective sample size estimates the number of statistically independent conformations in a simulation by analyzing the variance in state populations, using randomly defined structural states and increasingly large samples from the trajectory. In addition, the decorrelation time method, also by Lyman and Zuckerman, is implemented in LOOS53.

Finally, we’ve implemented the block covariance overlap method developed by Romo and Grossfield54. This analysis combines a block-averaged30 calculation of the covariance overlap45 with a bootstrapping approach54. The Perl front-end (bootstrap overlap.pl) will completely run the calculation. Briefly, the algorithm takes principal components of subsets of the trajectory and compares them to PCA results for the full simulations using the covariance overlap. This quantity is normalized by another covariance overlap comparison where bootstrapped trajectories are created by randomly pulling frames from the full-length simulation.

Density

The “Density” package contains a number of tools for calculating 3D histograms from trajectories for visualization. These histograms are written out as X-plor formatted electron density maps58 suitable for display in most molecular graphics programs (e.g. PyMOL and VMD). Tools for segmenting the density, extracting contiguous regions (blobs), and associating these with the model system are also included. This tool suite was originally designed for visualizing water distributions within membrane proteins, and as such provides options for determing which water molecules are to be considered “internal.” One method is to determine the first principal axis for the membrane protein and then accept any water molecule within a given radius of this axis. Simple methods, such as waters within a given radius of any protein atom, are also available. As it is often useful to scale the densities relative to the bulk solvent, the histogramming tool allows the user to specify a horizontal slice through the trajectory for determining the bulk solvent density. An example of visualizing the average water density inside a 1 µs long simulation of β2AR is shown in Figure 6A59.

Figure 6.

Figure 6

Examples of density calculated using LOOS and rendered with PyMol. (A) Water density inside of β2AR contoured at bulk density59 (B) Membrane lipid density beneath a lipopeptide micelle (not shown). POPE lipids colored white (bulk lipid contour) and red (double bulk), and POPG lipids colored cyan (bulk)28.

Since LOOS is agnostic to chemistry (an atom is like every other atom), there are no restrictions on what can be treated as “water” and “protein” by the density tools. For example, a ligand could be selected as “water”. Membrane lipids can also be used. Figure 6B shows an example of visualizing the “crystallization” of one membrane lipid type induced by a bound lipopeptide micelle using a coarse-grained simulation28.

Distribution

LOOS is freely distributed as source code under the GNU GPLv3 license60 through Source-Forge (http://loos.sourceforge.net). The ability to build LOOS is tested on most major Linux distributions including Debian, Ubuntu, and Fedora, along with multiple versions of each distribution. In addition, MacOS is supported, as is Windows (via cygwin). In total, 20 different operating systems and releases are tested and supported. Internally, nightly regression tests are run using common tool invocations on multiple datasets. All documentation is generated using the Doxygen61 tool, and is availabse with the distribution as well as on-line through Sourceforge.

Conclusions

It is clear that addressing the needs of modern simulation analysis requires more than providing a well-rounded suite of tools. Empowering the researcher with the capacity to quickly and easily create new analytical tools is a necessity, and to that end, a clean, simple, and easy to learn API is critical. LOOS strives to provide just such an API. This design often results in C++ code that resembles a high level scripting language. In addition, including Python bindings has enabled more researchers to be able to quickly develop new tools, either by leveraging the many high-quality scientific libraries available for Python, or by eliminating the write-compile-debug cycle inherent to C++. Indeed, the latter feature allows the researcher to use LOOS to create spur of the moment, disposable scripts. These are particularly useful when developing new ideas and methods both for testing and for determining what implementation method may work best.

LOOS is not merely a library, however. The distribution includes a number of tools that have general applicability, or tools that we have found useful in our own research. The majority of the tools are geared towards a single task and intended to be combined into a work-flow via a scripting system such as the Unix shell or Python, although they may also be used interactively through a shell. In contrast to the monolithic programs, such as CHARMM and VMD, the LOOS tools are in many ways, a continuation of the “Unix Philosophy” of small tools that are focused on a single task.

Supplementary Material

Supp Material

Simulation Analysis Library & Toolkit.

LOOS is a software library designed to facilitate making novel tools for analyzing molecular dynamics simulations using C++ or Python. LOOS supports reading the native file formats of most common biomolecular simulation packages. A dynamic atom selection language is included and is easily accessible to the tool-writer. LOOS is bundled with over 120 tools. Through modern C++ design, LOOS is both simple to develop with and is easily extensible.

References

  • 1.Brooks B, Bruccoleri R, Olafson B, States D, Swaminathan S, Karplus M. Journal of Computational Chemistry. 1983;4:187. [Google Scholar]
  • 2.Case DA, Cheatham TE, Darden T, Gohlke H, Luo R, Merz KM, Onufriev A, Simmerling C, Wang B, Woods RJ. Journal of Computational Chemistry. 2005;26:1668. doi: 10.1002/jcc.20290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Roe DR, Cheatham TE. Journal Of Chemical Theory And Computation. 2013;9:3084. doi: 10.1021/ct400341p. [DOI] [PubMed] [Google Scholar]
  • 4.Glykos NM. Journal of Computational Chemistry. 2006;27:1765. doi: 10.1002/jcc.20482. [DOI] [PubMed] [Google Scholar]
  • 5.Koukos PI, Glykos NM. Journal of Computational Chemistry. 2013;34:2310. doi: 10.1002/jcc.23381. [DOI] [PubMed] [Google Scholar]
  • 6.Michaud-Agrawal N, Denning EJ, Woolf TB, Beckstein O. Journal of Computational Chemistry. 2011;32:2319. doi: 10.1002/jcc.21787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.DeLano WL. The pymol molecular graphics system. URL http://www.pymol.org.
  • 8.Humphrey W, Dalke A, Schulten K. Journal of molecular graphics. 1996;14:33. doi: 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]
  • 9.Jeong JC, Jo S, Wu EL, Qi Y, Monje-Galvan V, Yeom MS, Gorenstein L, Chen F, Klauda JB, Im W. Journal of Computational Chemistry. 2014;35:957. doi: 10.1002/jcc.23584. [DOI] [PubMed] [Google Scholar]
  • 10. [accessed April 26, 2014];BOOST C++ Libraries. URL http://www.boost.org.
  • 11.Anderson E, Bai Z, Dongarra J, Greenbaum A, McKenney A, Du Croz J, Hammerling S, Demmel J, Bischof C, Sorensen D. Supercomputing ‘90: Proceedings of the 1990 conference on Supercomputing; Los Alamitos, CA, USA. IEEE Computer Society Press; 1990. pp. 2–11. [Google Scholar]
  • 12.Dongarra JJ, Dongarra JJ, Croz JD, Croz JD, Hammarling S, Hammarling S, Hanson RJ, Hanlson RJ. ACM Transactions on Mathematical Software. 1988;14:1. [Google Scholar]
  • 13.Dongarra JJ, Croz JD, Hammarling S, Duff I. ACM Transactions on Mathematical Software. 1990;16:1. [Google Scholar]
  • 14.Whaley RC, Dongarra J. SuperComputing 1998: High Performance Networking and Computing. 1998 (in CD-ROM Proceedings) [Google Scholar]
  • 15.Ferrin TE, Huang C-C, Jarvis LE, Langridge R. Journal of molecular graphics. 1988;6:13. [Google Scholar]
  • 16.van der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE, Berendsen HJC. Journal of Computational Chemistry. 2005;26:1701. doi: 10.1002/jcc.20291. [DOI] [PubMed] [Google Scholar]
  • 17.Feller SE, Gawrisch K, Mackerell AD. Journal of the American Chemical Society. 2002;124:318. doi: 10.1021/ja0118340. [DOI] [PubMed] [Google Scholar]
  • 18.Klauda JB, Brooks BR, Mackerell AD, Venable RM, Pastor RW. The journal of physical chemistry B. 2005;109:5300. doi: 10.1021/jp0468096. [DOI] [PubMed] [Google Scholar]
  • 19.Klauda JB, Venable RM, Freites JA, O’Connor JW, Tobias DJ, Vorobyov I, Mondragon-Ramirez C, Pastor RW, MacKerell ADJ. Journal of Physical Chemistry B. 2010;114:7830. doi: 10.1021/jp101759q. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Woodcock HL, Zheng W, Ghysels A, Shao Y, Kong J, Brooks BR. Journal Of Chemical Physics. 2008;129:214109. doi: 10.1063/1.3013558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Meyers S. Effective C++: 55 Specific Ways to Improve Your Programs and Designs. 3rd Edition. Addison-Wesley Professional; 2005. [Google Scholar]
  • 22. [accessed April 21, 2014];Swig-3.0. URL http://www.swig.org/Doc3.0/index.html.
  • 23.Grossfield A, Feller SE, Pitman MC. Proteins: Structure, Function, and Bioinformatics. 2007;67:31. doi: 10.1002/prot.21308. [DOI] [PubMed] [Google Scholar]
  • 24.Gamma E, Helm R, Johnson R, Vlissides J. Design Patterns: Elements of Reusable Object-Oriented Software. 1st ed. Addison-Wesley Professional; 1994. [Google Scholar]
  • 25.Wadley LM, Keating KS, Duarte CM, Pyle AM. Journal of Molecular Biology. 2007;372:942. doi: 10.1016/j.jmb.2007.06.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Romo TD, Bradney LA, Greathouse DV, Grossfield A. Biochimica et biophysica acta. 2011;1808:2019. doi: 10.1016/j.bbamem.2011.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Horn JN, Romo TD, Grossfield A. Biochemistry. 2013;52:5604. doi: 10.1021/bi400773q. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Horn JN, Sengillo JD, Lin D, Romo TD, Grossfield A. Biochimica et biophysica acta. 2012;1818:212. doi: 10.1016/j.bbamem.2011.07.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Leioatts N, Suresh P, Romo TD, Grossfield A. Structure-based simulations reveal concerted dynamics of GPCR activation. 2014 doi: 10.1002/prot.24617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Flyvbjerg H, Petersen H. Journal Of Chemical Physics. 1989;91:461. [Google Scholar]
  • 31.Grossfield A, Pitman MC, Feller SE, Soubias O, Gawrisch K. Journal of Molecular Biology. 2008;381:478. doi: 10.1016/j.jmb.2008.05.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Tirion M. Physical Review Letters. 1996;77:1905. doi: 10.1103/PhysRevLett.77.1905. [DOI] [PubMed] [Google Scholar]
  • 33.Bahar I, Atilgan AR, Erman B. Folding & design. 1997;2:173. doi: 10.1016/S1359-0278(97)00024-2. [DOI] [PubMed] [Google Scholar]
  • 34.Bahar I, Jernigan RL. Journal of Molecular Biology. 1998;281:871. doi: 10.1006/jmbi.1998.1978. [DOI] [PubMed] [Google Scholar]
  • 35.Zheng W, Brooks BR, Thirumalai D. Biophysical Journal. 2007;93:2289. doi: 10.1529/biophysj.107.105270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Bakan A, Bahar I. Proceedings of the National Academy of Sciences of the United States of America. 2009;106:14349. doi: 10.1073/pnas.0904214106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Leioatts N, Romo TD, Grossfield A. Journal of Chemical Theory and Computation. 2012;8:2424. doi: 10.1021/ct3000316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Seckler JM, Leioatts N, Miao H, Grossfield A. Proteins: Structure, Function, and Bioinformatics. 2013;81:1792. doi: 10.1002/prot.24325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Tama F, Valle M, Frank J, Brooks CL. Proceedings of the National Academy of Sciences of the United States of America. 2003;100:9319. doi: 10.1073/pnas.1632476100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Romo TD, Grossfield A. Proteins: Structure, Function, and Bioinformatics. 2011;79:23. doi: 10.1002/prot.22855. [DOI] [PubMed] [Google Scholar]
  • 41.Liu L, Gronenborn AM, Bahar I. Proteins: Structure, Function, and Bioinformatics. 2011;80:616. [Google Scholar]
  • 42.Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I. Biophysical Journal. 2001;80:505. doi: 10.1016/S0006-3495(01)76033-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Hinsen K, Petrescu A, Dellerue S. Chemical Physics. 2000;261:25. [Google Scholar]
  • 44.Yang L-W, Song G, Jernigan RL. Proceedings of the National Academy of Sciences of the United States of America. 2009;106:12347. doi: 10.1073/pnas.0902159106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Hess B. Physical Review E. 2002;65:031910. doi: 10.1103/PhysRevE.65.031910. [DOI] [PubMed] [Google Scholar]
  • 46.Grossfield A, Zuckerman DM. Annu Rep Comput Chem. 2009;5:23. doi: 10.1016/S1574-1400(09)00502-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Neale C, Bennett WFD, Tieleman DP, Pom`es R. Journal of Chemical Theory and Computation. 2011;7:4175. doi: 10.1021/ct200316w. [DOI] [PubMed] [Google Scholar]
  • 48.Romo TD, Grossfield A. Biophysical Journal. 2014;106:1553. doi: 10.1016/j.bpj.2014.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Hess B. Physical review E, Statistical physics, plasmas, fluids, and related interdisciplinary topics. 2000;62:8438. doi: 10.1103/physreve.62.8438. [DOI] [PubMed] [Google Scholar]
  • 50.Smith LJ, Daura X, van Gunsteren WF. Proteins: Structure, Function, and Bioinformatics. 2002;48:487. doi: 10.1002/prot.10144. [DOI] [PubMed] [Google Scholar]
  • 51.Faraldo-Gómez JD, Forrest LR, Baaden M, Bond PJ, Domene C, Patargias G, Cuthbertson J, Sansom MSP. Proteins: Structure, Function, and Bioinformatics. 2004;57:783. doi: 10.1002/prot.20257. [DOI] [PubMed] [Google Scholar]
  • 52.Lyman E, Zuckerman DM. Biophysical Journal. 2006;91:164. doi: 10.1529/biophysj.106.082941. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Lyman E, Zuckerman DM. The journal of physical chemistry B. 2007;111:12876. doi: 10.1021/jp073061t. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Romo TD, Grossfield A. Journal Of Chemical Theory And Computation. 2011;7:2464. doi: 10.1021/ct2002754. [DOI] [PubMed] [Google Scholar]
  • 55.García AE. Physical Review Letters. 1992;68:2696. doi: 10.1103/PhysRevLett.68.2696. [DOI] [PubMed] [Google Scholar]
  • 56.Romo TD, Clarage JB, Sorensen DC, Phillips GN. Proteins: Structure, Function, and Bioinformatics. 1995;22:311. doi: 10.1002/prot.340220403. [DOI] [PubMed] [Google Scholar]
  • 57.Clarage JB, Romo TD, Andrews BK, Pettitt BM, Phillips GN. Proceedings of the National Academy of Sciences of the United States of America. 1995;92:3288. doi: 10.1073/pnas.92.8.3288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Brünger AT. X-PLOR Version 3.1: A System for X-ray Crystallography and NMR. Yale University Press; 1992. [Google Scholar]
  • 59.Romo TD, Grossfield A, Pitman MC. Biophysical Journal. 2010;98:76. doi: 10.1016/j.bpj.2009.09.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Gnu general public license. URL http://www.gnu.org/licenses/gpl.html.
  • 61.van Heesch D. Doxygen. URL http://www.stack.nl/~dimitri/doxygen/

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Material

RESOURCES