Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Apr 15.
Published in final edited form as: J Comput Chem. 2023 Dec 10;45(10):633–637. doi: 10.1002/jcc.27278

MPI-Parallelization of the Grid Inhomogeneous Solvation Theory Calculation

Daniel R Roe 1, Bernard R Brooks 1
PMCID: PMC10922152  NIHMSID: NIHMS1949117  PMID: 38071482

Abstract

The Grid Inhomogeneous Solvation Theory (GIST) method requires the often time-consuming calculation of water-water and water-solute energy on a grid. Previous efforts to speed up this calculation include using OpenMP, GPUs, and particle mesh Ewald. This paper details how the speed of this calculation can be increased by parallelizing it with MPI, where trajectory frames are divided among multiple processors. This requires very little communication between individual processes during trajectory processing, meaning the calculation scales well to large processor counts. This paper also details how the entropy calculation, which must happen after trajectory processing since it requires information from all trajectory frames, is parallelized via MPI. This parallelized GIST method has been implemented in the freely-available CPPTRAJ analysis software.

Keywords: Molecular dynamics, Trajectory Analysis, GIST, Parallel Analysis

Graphical Abstract

graphic file with name nihms-1949117-f0003.jpg

The GIST functionality in the software analysis tool CPPTRAJ has been MPI-parallelized, potentially providing orders of magnitude speedup. This will allow the application of GIST calculations to larger systems and longer molecular dynamics trajectories.

Introduction

The Grid Inhomogeneous Solvation Theory (GIST) method is a technique where various solvation properties are determined from energy calculations of snapshots from molecular dynamics trajectories mapped onto a grid, providing a statistical mechanical formalism for determining the thermodynamics of water around a solute.[1] This helps to indicate where solvent binding may be thermodynamically favorable or unfavorable, which is important in many areas of research such as rational drug design.[2]

The GIST method primarily consists of calculating the solvation energy and first-order solvation entropy. The solvation energy calculation time scales with the number of atoms in the system, while the solvation entropy calculation scales with the number of simulation frames. For relatively short simulations (about 1,000 frames or so), the most time-consuming calculation (> 90% of the total calculation time) is the solvation energy, since it is composed of energy calculations between all atoms and every solvent atom that is on the grid. In its original implementation in the analysis software CPPTRAJ (https://github.com/Amber-MD/cpptraj), the solvation energy calculation was parallelized with OpenMP.[3] Subsequent work further accelerated this calculation via GPUs (CUDA).[4] More recently, this calculation was accelerated and made more scalable by using the particle mesh Ewald method.[5]

Here we describe a further acceleration of the GIST method in CPPTRAJ by parallelizing the energy calculation via MPI. It was found that after speeding up the energy calculation, for longer trajectories the entropy calculation time began to dominate, necessitating parallelization of the entropy calculation via MPI as well. The parallelization of the solvation energy calculation makes use of the existing across-trajectory parallel framework in CPPTRAJ,[6] while the parallelization of the entropy calculation required adding new framework to CPPTRAJ. The new MPI code scales well to large processor counts since it has minimal communication requirements, and is orthogonal to (and can be used in combination with) the existing OpenMP/CUDA code.

Methods

GIST Theory

The local density-weighted solvation free energy of grid voxel k in GIST is separated into energy and entropy contributions:

ΔAk=ΔEk-TΔSk

Where the energy of voxel k (ΔEk) is separated into contributions from solvent-solvent and solute-solvent:

ΔEk=ΔEk,VV+ΔEk,UV

The entropy of voxel k can be calculated in two ways. It can be separated into contributions from translational and orientational entropy:

ΔSkΔSk,UVtrans+ΔSk,UVorient

Or it can be estimated using all six translational and orientational degrees of freedom:

ΔSkΔSk,UVsix

Only the solute-solvent entropy is considered; solvent-solvent correlations are neglected.

In CPPTRAJ, the GIST solvation energy term for a given voxel k is calculated as:

Ek=iNjMEk,NBrij

Where N is the total number of atoms in the system, M is the number of on-grid solvent atoms (i.e. all solvent atoms residing in a grid voxel for that frame), rij is the distance (using the minimum image convention) between atoms i and j, and Ek,NB is a function representing the non-bonded energy calculation for on-grid atoms in voxel k. This means the time needed to perform the nonbonded energy calculation scales with the number of atoms and grid voxels. For the original GIST implementation, ENB is a simple Coulombic term for electrostatics plus a Lennard-Jones 6-12 potential (LJ6-12) for van der Waals with no distance cutoff. For the PME GIST implementation, ENB uses particle mesh Ewald[7] for electrostatics and LJ6-12 term with a long-range correction for van der Waals.[8]

The GIST entropy terms for a given voxel k are calculated using a first nearest-neighbor approach[9] as:

Sk,UVtransRγ+1Nki=1NklnNfρ04πdtrans,i33
Sk,UVorientRγ+1Nki=1NklnNkΔωi36π

Where R is the ideal gas constant, γ is the Euler-Mascheroni constant that is meant to compensate for asymptotic bias in the entropy estimator, Nk is the total number of solvent molecules in voxel k, Nf is the total number of frames, ρ0 is the reference density of the bulk solvent, dtrans,i is the Cartesian distance of the solvent molecule i to its nearest neighbor, and Δωi is the angular distance of the solvent molecule i to its nearest neighbor in the same voxel (calculated between two quaternions representing the orientations of the solvent molecules).

The entropy using all six degrees of freedom (6D) of solvent molecules[10] is calculated using the following:

Sk,UVsixRγ+1Nki=1NklnNfρ0πΔωi2+dtrans,i2348

In CPPTRAJ, the translational entropy and the 6D entropy are calculated in the same function, and the Cartesian and angular distances are considered not just within the same voxel, but neighboring voxels as well. Typically, the search is only done for the first layer of neighboring voxels (26 total), but the number of layers can be increased if specified by the user. As a result, the Strans and Ssix calculations are much slower than the Sorient calculation. This also means that the time needed to perform the entropy calculation will scale with the number of incoming frames as well as the number of grid voxels.

MPI Parallel GIST Implementation

In CPPTRAJ, previous efforts to speed up the GIST calculation have focused on speeding up the nonbonded energy calculation directly. For example, the OpenMP version divides the outer loop over atoms in the nonbonded energy calculation among all available OpenMP threads. The MPI parallel strategy aims to speed up GIST by dividing the incoming trajectory frames N among M MPI processes:

FR=NM+1,R<N%M0,RN%M

Where FR is the number of frames for the MPI rank to process, R is the MPI rank of the process (starting from 0) and % represents the modulo (remainder) operator. After trajectory processing, the voxels and counts from all processes are summed back to process 0 before the final calculations (grid normalizations and entropy) are performed and results are written (only process 0 does I/O in cpptraj).

To speed up the Strans and Ssix calculations, solvent molecule coordinates and quaternions are saved for each frame, then distributed to all processes prior to the entropy calculation step. To reduce the amount of communication calls, the individual arrays from each voxel are combined on each process and sent as one large array. After communication (which typically takes less than 1% of the total calculation time), the total number of voxels is then divided among all processes in the same manner as the incoming trajectory frames (i.e., N in the above equation now represents the total number of voxels and FR is the number of voxels assigned to rank R) and the entropy calculation is performed for each voxel. The results are then sent back to process 0 for I/O.

Test Systems

F. Tularensis FabI

MD simulations were run on the FabI enoyl reductase protein from F. Tularensis (PDB 3NRC). The PDB was preprocessed using the CPPTRAJ command ‘prepareforleap’[11]; all solvent molecules and the triclosan residue were removed. The system was then aligned along principal axes using the CPPTRAJ principal command. CPPTRAJ scripts can be found in the Supporting Information.

The system was built with the Amber program tleap using protein parameters from the ff14SB force field along with contributed nicotinamide-adenine-dinucleotide (NAD) parameters.[12],[13] The atom names for the NAD parameters were modified by hand to be consistent with the input PDB; the modified library file can be found in the Supporting Information. The triclosan residues were removed. The system was solvated with 18,613 OPC waters (12 Å buffer) and neutralized with 10 Na+ ions (82,246 total atoms).[14]

The solvated system was relaxed using a modified version of the Roe & Brooks protocol[15] except in steps 5 and 9 where no restraints would have been used, minimal (0.5 kcal*mol−1−1) restraints heavy atom positional restraints (using the final coordinates of the previous step as reference) were applied in order to maintain the alignment of the system. The final relaxation step was run for 2 ns without restraints to resolve any remaining strain in the protein. Based on inspection of both the best-fit and no-fit RMSD to the initial structure, the initial relaxation appeared complete at around frame 650 (corresponding to 650 ps), which was chosen as the reference structure for subsequent restrained production simulations.

Production MD simulations were run for 2 ns starting from the previously mentioned frame 650 structure in the NVT ensemble on a Tesla V100 using pmemd.cuda[16] from Amber 20.[17] Simulations used a time step of 2 fs; coordinate trajectory frames were written every 1 ps. Bonds to hydrogen were constrained with SHAKE.[18] Long range electrostatics were handled using particle mesh Ewald with a cut off of 8.0 Å. Temperature was maintained at 300 K using a Langevin scheme with a collision frequency of 2 ps−1. Positional restraints to the initial structure were applied to all protein heavy atoms using a force constant of 0.5 kcal*mol−1−1.

Cucurbit[7]uril

MD simulations were also run on the small molecule cucurbit[7]uril (CB7). The initial structure was the same one used by Fenley et al.[19] Parameters were assigned with the Amber program antechamber using GAFF2 [20], [21] and was run through the parmchk2 program to assign missing parameters.

The system was aligned along principal axes with CPPTRAJ and then built using tleap and solvated with TIP4PEW waters (~12.5 Å buffer); 16 excess waters were removed for a total of 1,699 waters in order to match previous (unpublished) results. The final system size is 6,922 total atoms. The system was then minimized and relaxed in the same manner as the FabI system, with minimal positional restraints applied to maintain the alignment of the system.

Production simulations were run for 400 ns starting from the final relaxed structure in the NVT ensemble on a Tesla V100 using pmemd.cuda from Amber 20 in the same manner as the FabI system except coordinate trajectory frames were written every 0.5 ps.

Parallel Benchmarks

The GIST parallel code modifications were added to CPPTRAJ version 6.20.4. CPPTRAJ was compiled with Intel 17.0.4 compilers, Intel MPI 5.1, CUDA 11.8.0, NetCDF 4.9.2 (with HDF5 1.10.9), and parallel NetCDF 1.9.0. The Intel MKL was used for BLAS/LAPACK/FFTW. CPU calculations were run on Intel Xeon E5-2630 (Haswell) nodes with Infiniband 40 GB interconnects.

FabI Benchmarks

The GIST calculation was run both with and without PME on the first 1,000 frames of the production trajectory. The trajectory was imaged, and positioned so that both the GIST grid and one of the NAD residues (513) were centered at the origin. The order calculation was enabled. The grid dimensions were 62, 50, and 46 in the X, Y, and Z directions respectively, and the grid spacing was 0.5 Å. Grid dimensions were chosen by checking the boundaries of NAD 513 with an offset of 12 Å in each direction. The reference density was 0.0332 Å−3. The full CPPTRAJ input script is provided in Supporting Information.

Figure 1 shows the total GIST calculation time vs number of CPU cores for 1,000 frames of the FabI trajectory, both with and without PME, for execution in serial and with OpenMP, MPI, and on several models of GPU. In serial, PME GIST provides a significant speedup (~2.5x) over the non-PME GIST; however, the OpenMP scaling of PME-GIST is far worse. This is because currently the reciprocal space part of the PME GIST nonbonded energy calculation is not yet OpenMP-parallelized and so becomes a bottleneck at higher thread counts. In contrast, the MPI versions of PME/non-PME GIST both scale well to higher processor counts since the parallelization is orthogonal to the non-bonded energy calculation. At 128 processes, the calculation which had previously required on the order of an hour of time requires less than a minute to complete.

Figure 1.

Figure 1.

GIST calculation time vs number of CPU cores for 1,000 frames of the Fab trajectory. Plot made with Grace 5.1.25.

The GPU version of non-PME GIST actually shows little to no speedup over the CPU version for Kepler-based GPUs (K40 and K20 respectively). The Pascal (TitanXP and P100), Volta, and Ampere architectures do show speedup (2.1x, 2.5x, 4.4x, and 5.9x respectively). This indicates that the GPU-GIST CUDA code may benefit from some optimization, particularly on older GPU architectures. Like the CPU code, the GPU code benefits from MPI parallelization and scales well to higher GPU counts (1 process per GPU).

The MPI-parallelization has no major effect on accuracy of the results; the energy and entropy grids match serial results to within 0.0001 kcal*mol−1−3.

CB7 Benchmarks

Cucurbit[7]uril was chosen to test the MPI parallelization of the entropy calculation since it is smaller than FabI. Since the nonbonded energy calculation scales with system size and the entropy calculations scale with the number of frames, for smaller systems with larger numbers of trajectory frames the entropy calculation begins to dominate the total calculation time (see Table 1).

Table 1.

CB7 MPI-parallelized GIST timings when the entropy calculation is done in serial vs parallel (128 processes). Action is the total time to process trajectory frames and is dominated (78%) by the non-bonded energy calculation. The Entropy calculation occurs after all frames have been processed.

Entropy Type Total Energy Entropy
Serial 1,146 s 7.70% 88%
Parallel 190 s 62% 9.50%

The GIST calculation was run with PME on every 10th frame of the production trajectory (80,000 frames total). The trajectory was imaged, and positioned so that both the GIST grid and CB7 were centered at the origin. The order calculation was enabled. The grid dimensions were 28, 44, and 44 in the X, Y, and Z directions respectively, and the grid spacing was 0.5 Å; this was chosen to match the grid in the original GIST paper.[1] The reference density was 0.0332 Å−3. The full CPPTRAJ input script is provided in Supporting Information.

Figure 2 shows the total entropy and PME energy calculation time vs number of CPU cores for 80,000 frames of the CB7 trajectory. Although the energy calculation time still makes up the majority of the total time, by 16 cores the entropy calculation can become a bottleneck if it is not also parallelized. Both the energy and entropy calculations scale reasonably well; however, the energy calculation seems to parallelize more efficiently than the entropy calculation, with a speedup of 99x vs 27x respectively. Since the time needed to communicate the entropy arrays is only about 1% of the total entropy calculation time, this indicates there is room for improving the parallelization of the calculation itself.

Figure 2.

Figure 2.

GIST entropy and PME energy calculation time vs number of CPU cores for 80,000 frames of the CB7 trajectory. Plot made with Grace 5.1.25.

Conclusions

GIST is a useful framework for determining the thermodynamics of water around a solute of interest. We have added MPI-parallelization of GIST in the analysis software CPPTRAJ that can be used separately from or in conjunction with existing parallelization schemes (OpenMP, PME, CUDA) that scales well to relatively high processor counts. The MPI-enabled GIST is implemented in CPPTRAJ version 6.22.0 and the code is freely available from https://github.com/Amber-MD/cpptraj. This will facilitate the application of GIST to larger systems and longer simulation lengths.

Supplementary Material

Supinfo

Acknowledgments

This work was supported by the intramural research program of the National Heart, Lung and Blood Institute (NHLBI) of the National Institutes of Health, NHLBI Z01 HL001051-25. Computational resources provided by the LoBoS cluster (NIH/NHLBI).

Footnotes

Supporting Information

Additional Supporting Information may be found in the online version of this article.

References and Notes

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supinfo

RESOURCES