Abstract
The Grid Inhomogeneous Solvation Theory (GIST) method requires the often time-consuming calculation of water-water and water-solute energy on a grid. Previous efforts to speed up this calculation include using OpenMP, GPUs, and particle mesh Ewald. This paper details how the speed of this calculation can be increased by parallelizing it with MPI, where trajectory frames are divided among multiple processors. This requires very little communication between individual processes during trajectory processing, meaning the calculation scales well to large processor counts. This paper also details how the entropy calculation, which must happen after trajectory processing since it requires information from all trajectory frames, is parallelized via MPI. This parallelized GIST method has been implemented in the freely-available CPPTRAJ analysis software.
Keywords: Molecular dynamics, Trajectory Analysis, GIST, Parallel Analysis
Graphical Abstract

The GIST functionality in the software analysis tool CPPTRAJ has been MPI-parallelized, potentially providing orders of magnitude speedup. This will allow the application of GIST calculations to larger systems and longer molecular dynamics trajectories.
Introduction
The Grid Inhomogeneous Solvation Theory (GIST) method is a technique where various solvation properties are determined from energy calculations of snapshots from molecular dynamics trajectories mapped onto a grid, providing a statistical mechanical formalism for determining the thermodynamics of water around a solute.[1] This helps to indicate where solvent binding may be thermodynamically favorable or unfavorable, which is important in many areas of research such as rational drug design.[2]
The GIST method primarily consists of calculating the solvation energy and first-order solvation entropy. The solvation energy calculation time scales with the number of atoms in the system, while the solvation entropy calculation scales with the number of simulation frames. For relatively short simulations (about 1,000 frames or so), the most time-consuming calculation (> 90% of the total calculation time) is the solvation energy, since it is composed of energy calculations between all atoms and every solvent atom that is on the grid. In its original implementation in the analysis software CPPTRAJ (https://github.com/Amber-MD/cpptraj), the solvation energy calculation was parallelized with OpenMP.[3] Subsequent work further accelerated this calculation via GPUs (CUDA).[4] More recently, this calculation was accelerated and made more scalable by using the particle mesh Ewald method.[5]
Here we describe a further acceleration of the GIST method in CPPTRAJ by parallelizing the energy calculation via MPI. It was found that after speeding up the energy calculation, for longer trajectories the entropy calculation time began to dominate, necessitating parallelization of the entropy calculation via MPI as well. The parallelization of the solvation energy calculation makes use of the existing across-trajectory parallel framework in CPPTRAJ,[6] while the parallelization of the entropy calculation required adding new framework to CPPTRAJ. The new MPI code scales well to large processor counts since it has minimal communication requirements, and is orthogonal to (and can be used in combination with) the existing OpenMP/CUDA code.
Methods
GIST Theory
The local density-weighted solvation free energy of grid voxel k in GIST is separated into energy and entropy contributions:
Where the energy of voxel k (ΔEk) is separated into contributions from solvent-solvent and solute-solvent:
The entropy of voxel k can be calculated in two ways. It can be separated into contributions from translational and orientational entropy:
Or it can be estimated using all six translational and orientational degrees of freedom:
Only the solute-solvent entropy is considered; solvent-solvent correlations are neglected.
In CPPTRAJ, the GIST solvation energy term for a given voxel k is calculated as:
Where N is the total number of atoms in the system, M is the number of on-grid solvent atoms (i.e. all solvent atoms residing in a grid voxel for that frame), rij is the distance (using the minimum image convention) between atoms i and j, and Ek,NB is a function representing the non-bonded energy calculation for on-grid atoms in voxel k. This means the time needed to perform the nonbonded energy calculation scales with the number of atoms and grid voxels. For the original GIST implementation, ENB is a simple Coulombic term for electrostatics plus a Lennard-Jones 6-12 potential (LJ6-12) for van der Waals with no distance cutoff. For the PME GIST implementation, ENB uses particle mesh Ewald[7] for electrostatics and LJ6-12 term with a long-range correction for van der Waals.[8]
The GIST entropy terms for a given voxel k are calculated using a first nearest-neighbor approach[9] as:
Where R is the ideal gas constant, γ is the Euler-Mascheroni constant that is meant to compensate for asymptotic bias in the entropy estimator, Nk is the total number of solvent molecules in voxel k, Nf is the total number of frames, ρ0 is the reference density of the bulk solvent, dtrans,i is the Cartesian distance of the solvent molecule i to its nearest neighbor, and Δωi is the angular distance of the solvent molecule i to its nearest neighbor in the same voxel (calculated between two quaternions representing the orientations of the solvent molecules).
The entropy using all six degrees of freedom (6D) of solvent molecules[10] is calculated using the following:
In CPPTRAJ, the translational entropy and the 6D entropy are calculated in the same function, and the Cartesian and angular distances are considered not just within the same voxel, but neighboring voxels as well. Typically, the search is only done for the first layer of neighboring voxels (26 total), but the number of layers can be increased if specified by the user. As a result, the Strans and Ssix calculations are much slower than the Sorient calculation. This also means that the time needed to perform the entropy calculation will scale with the number of incoming frames as well as the number of grid voxels.
MPI Parallel GIST Implementation
In CPPTRAJ, previous efforts to speed up the GIST calculation have focused on speeding up the nonbonded energy calculation directly. For example, the OpenMP version divides the outer loop over atoms in the nonbonded energy calculation among all available OpenMP threads. The MPI parallel strategy aims to speed up GIST by dividing the incoming trajectory frames N among M MPI processes:
Where FR is the number of frames for the MPI rank to process, R is the MPI rank of the process (starting from 0) and % represents the modulo (remainder) operator. After trajectory processing, the voxels and counts from all processes are summed back to process 0 before the final calculations (grid normalizations and entropy) are performed and results are written (only process 0 does I/O in cpptraj).
To speed up the Strans and Ssix calculations, solvent molecule coordinates and quaternions are saved for each frame, then distributed to all processes prior to the entropy calculation step. To reduce the amount of communication calls, the individual arrays from each voxel are combined on each process and sent as one large array. After communication (which typically takes less than 1% of the total calculation time), the total number of voxels is then divided among all processes in the same manner as the incoming trajectory frames (i.e., N in the above equation now represents the total number of voxels and FR is the number of voxels assigned to rank R) and the entropy calculation is performed for each voxel. The results are then sent back to process 0 for I/O.
Test Systems
F. Tularensis FabI
MD simulations were run on the FabI enoyl reductase protein from F. Tularensis (PDB 3NRC). The PDB was preprocessed using the CPPTRAJ command ‘prepareforleap’[11]; all solvent molecules and the triclosan residue were removed. The system was then aligned along principal axes using the CPPTRAJ principal command. CPPTRAJ scripts can be found in the Supporting Information.
The system was built with the Amber program tleap using protein parameters from the ff14SB force field along with contributed nicotinamide-adenine-dinucleotide (NAD) parameters.[12],[13] The atom names for the NAD parameters were modified by hand to be consistent with the input PDB; the modified library file can be found in the Supporting Information. The triclosan residues were removed. The system was solvated with 18,613 OPC waters (12 Å buffer) and neutralized with 10 Na+ ions (82,246 total atoms).[14]
The solvated system was relaxed using a modified version of the Roe & Brooks protocol[15] except in steps 5 and 9 where no restraints would have been used, minimal (0.5 kcal*mol−1*Å−1) restraints heavy atom positional restraints (using the final coordinates of the previous step as reference) were applied in order to maintain the alignment of the system. The final relaxation step was run for 2 ns without restraints to resolve any remaining strain in the protein. Based on inspection of both the best-fit and no-fit RMSD to the initial structure, the initial relaxation appeared complete at around frame 650 (corresponding to 650 ps), which was chosen as the reference structure for subsequent restrained production simulations.
Production MD simulations were run for 2 ns starting from the previously mentioned frame 650 structure in the NVT ensemble on a Tesla V100 using pmemd.cuda[16] from Amber 20.[17] Simulations used a time step of 2 fs; coordinate trajectory frames were written every 1 ps. Bonds to hydrogen were constrained with SHAKE.[18] Long range electrostatics were handled using particle mesh Ewald with a cut off of 8.0 Å. Temperature was maintained at 300 K using a Langevin scheme with a collision frequency of 2 ps−1. Positional restraints to the initial structure were applied to all protein heavy atoms using a force constant of 0.5 kcal*mol−1*Å−1.
Cucurbit[7]uril
MD simulations were also run on the small molecule cucurbit[7]uril (CB7). The initial structure was the same one used by Fenley et al.[19] Parameters were assigned with the Amber program antechamber using GAFF2 [20], [21] and was run through the parmchk2 program to assign missing parameters.
The system was aligned along principal axes with CPPTRAJ and then built using tleap and solvated with TIP4PEW waters (~12.5 Å buffer); 16 excess waters were removed for a total of 1,699 waters in order to match previous (unpublished) results. The final system size is 6,922 total atoms. The system was then minimized and relaxed in the same manner as the FabI system, with minimal positional restraints applied to maintain the alignment of the system.
Production simulations were run for 400 ns starting from the final relaxed structure in the NVT ensemble on a Tesla V100 using pmemd.cuda from Amber 20 in the same manner as the FabI system except coordinate trajectory frames were written every 0.5 ps.
Parallel Benchmarks
The GIST parallel code modifications were added to CPPTRAJ version 6.20.4. CPPTRAJ was compiled with Intel 17.0.4 compilers, Intel MPI 5.1, CUDA 11.8.0, NetCDF 4.9.2 (with HDF5 1.10.9), and parallel NetCDF 1.9.0. The Intel MKL was used for BLAS/LAPACK/FFTW. CPU calculations were run on Intel Xeon E5-2630 (Haswell) nodes with Infiniband 40 GB interconnects.
FabI Benchmarks
The GIST calculation was run both with and without PME on the first 1,000 frames of the production trajectory. The trajectory was imaged, and positioned so that both the GIST grid and one of the NAD residues (513) were centered at the origin. The order calculation was enabled. The grid dimensions were 62, 50, and 46 in the X, Y, and Z directions respectively, and the grid spacing was 0.5 Å. Grid dimensions were chosen by checking the boundaries of NAD 513 with an offset of 12 Å in each direction. The reference density was 0.0332 Å−3. The full CPPTRAJ input script is provided in Supporting Information.
Figure 1 shows the total GIST calculation time vs number of CPU cores for 1,000 frames of the FabI trajectory, both with and without PME, for execution in serial and with OpenMP, MPI, and on several models of GPU. In serial, PME GIST provides a significant speedup (~2.5x) over the non-PME GIST; however, the OpenMP scaling of PME-GIST is far worse. This is because currently the reciprocal space part of the PME GIST nonbonded energy calculation is not yet OpenMP-parallelized and so becomes a bottleneck at higher thread counts. In contrast, the MPI versions of PME/non-PME GIST both scale well to higher processor counts since the parallelization is orthogonal to the non-bonded energy calculation. At 128 processes, the calculation which had previously required on the order of an hour of time requires less than a minute to complete.
Figure 1.

GIST calculation time vs number of CPU cores for 1,000 frames of the Fab trajectory. Plot made with Grace 5.1.25.
The GPU version of non-PME GIST actually shows little to no speedup over the CPU version for Kepler-based GPUs (K40 and K20 respectively). The Pascal (TitanXP and P100), Volta, and Ampere architectures do show speedup (2.1x, 2.5x, 4.4x, and 5.9x respectively). This indicates that the GPU-GIST CUDA code may benefit from some optimization, particularly on older GPU architectures. Like the CPU code, the GPU code benefits from MPI parallelization and scales well to higher GPU counts (1 process per GPU).
The MPI-parallelization has no major effect on accuracy of the results; the energy and entropy grids match serial results to within 0.0001 kcal*mol−1*Å−3.
CB7 Benchmarks
Cucurbit[7]uril was chosen to test the MPI parallelization of the entropy calculation since it is smaller than FabI. Since the nonbonded energy calculation scales with system size and the entropy calculations scale with the number of frames, for smaller systems with larger numbers of trajectory frames the entropy calculation begins to dominate the total calculation time (see Table 1).
Table 1.
CB7 MPI-parallelized GIST timings when the entropy calculation is done in serial vs parallel (128 processes). Action is the total time to process trajectory frames and is dominated (78%) by the non-bonded energy calculation. The Entropy calculation occurs after all frames have been processed.
| Entropy Type | Total | Energy | Entropy |
|---|---|---|---|
| Serial | 1,146 s | 7.70% | 88% |
| Parallel | 190 s | 62% | 9.50% |
The GIST calculation was run with PME on every 10th frame of the production trajectory (80,000 frames total). The trajectory was imaged, and positioned so that both the GIST grid and CB7 were centered at the origin. The order calculation was enabled. The grid dimensions were 28, 44, and 44 in the X, Y, and Z directions respectively, and the grid spacing was 0.5 Å; this was chosen to match the grid in the original GIST paper.[1] The reference density was 0.0332 Å−3. The full CPPTRAJ input script is provided in Supporting Information.
Figure 2 shows the total entropy and PME energy calculation time vs number of CPU cores for 80,000 frames of the CB7 trajectory. Although the energy calculation time still makes up the majority of the total time, by 16 cores the entropy calculation can become a bottleneck if it is not also parallelized. Both the energy and entropy calculations scale reasonably well; however, the energy calculation seems to parallelize more efficiently than the entropy calculation, with a speedup of 99x vs 27x respectively. Since the time needed to communicate the entropy arrays is only about 1% of the total entropy calculation time, this indicates there is room for improving the parallelization of the calculation itself.
Figure 2.

GIST entropy and PME energy calculation time vs number of CPU cores for 80,000 frames of the CB7 trajectory. Plot made with Grace 5.1.25.
Conclusions
GIST is a useful framework for determining the thermodynamics of water around a solute of interest. We have added MPI-parallelization of GIST in the analysis software CPPTRAJ that can be used separately from or in conjunction with existing parallelization schemes (OpenMP, PME, CUDA) that scales well to relatively high processor counts. The MPI-enabled GIST is implemented in CPPTRAJ version 6.22.0 and the code is freely available from https://github.com/Amber-MD/cpptraj. This will facilitate the application of GIST to larger systems and longer simulation lengths.
Supplementary Material
Acknowledgments
This work was supported by the intramural research program of the National Heart, Lung and Blood Institute (NHLBI) of the National Institutes of Health, NHLBI Z01 HL001051-25. Computational resources provided by the LoBoS cluster (NIH/NHLBI).
Footnotes
Supporting Information
Additional Supporting Information may be found in the online version of this article.
References and Notes
- [1].Nguyen CN, Kurtzman Young T, Gilson MK, J. Chem. Phys, 2012, DOI: 10.1063/1.4733951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Balius TE, Fischer M, Stein RM, Adler TB, Nguyen CN, Cruz A, Gilson MK, Kurtzman T, Shoichet BK, Proc. Natl. Acad. Sci. U. S. A, 2017, DOI: 10.1073/pnas.1703287114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Roe DR, Cheatham TE, J. Chem. Theory Comput, 2013, DOI: 10.1021/ct400341p. [DOI] [PubMed] [Google Scholar]
- [4].Kraml J, Kamenik AS, Waibl F, Schauperl M, Liedl KR, J. Chem. Theory Comput, 2019, DOI: 10.1021/acs.jctc.9b00742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Chen L, Cruz A, Roe DR, Simmonett AC, Wickstrom L, Deng N, Kurtzman T, J. Chem. Theory Comput, 2021, DOI: 10.1021/acs.jctc.0c01185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Roe DR, Cheatham TE, J. Comput. Chem, 2018, DOI: 10.1002/jcc.25382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Darden T, York D, Pedersen L, J. Chem. Phys, 1993, DOI: 10.1063/1.464397. [DOI] [Google Scholar]
- [8].Allen MP, Tildesley DJ, in Computer simulation of liquids; Oxford University Press, Oxford, 1987. [Google Scholar]
- [9].Singh H, Misra N, Hnizdo V, Fedorowicz A, Demchuk E, Am. J. Math. Manag. Sci, 2003, DOI: 10.1080/01966324.2003.10737616. [DOI] [Google Scholar]
- [10].Huggins DJ, J. Chem. Theory Comput, 2014, DOI: 10.1021/ct500415g. [DOI] [PubMed] [Google Scholar]
- [11].Roe DR, Bergonzo C, J. Comput. Chem, 2022, DOI: 10.1002/jcc.26847. [DOI] [PubMed] [Google Scholar]
- [12].Pavelites JJ, Gao J, Bash PA, Mackerell AD, J. Comput. Chem, 1997, DOI:. [DOI] [Google Scholar]
- [13].Walker RC, De Souza MM, Mercer IP, Gould IR, Klug DR, J. Phys. Chem. B, 2002, DOI: 10.1021/jp0261814. [DOI] [Google Scholar]
- [14].Izadi S, Anandakrishnan R, Onufriev AV, J. Phys. Chem. Lett, 2014, DOI: 10.1021/jz501780a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Roe DR, Brooks BR, J. Chem. Phys, 2020, DOI: 10.1063/5.0013849. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Salomon-Ferrer R, Götz AW, Poole D, Le Grand S, Walker RC, J. Chem. Theory Comput, 2013, DOI: 10.1021/ct400314y. [DOI] [PubMed] [Google Scholar]
- [17].Case DA, Cheatham TE, Darden T, Gohlke H, Luo R, Merz KM, Onufriev A, Simmerling C, Wang B, Woods RJ, J. Comput. Chem, 2005, DOI: 10.1002/jcc.20290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Ryckaert JP, Ciccotti G, Berendsen HJC, J. Comput. Phys, 1977, 23, 327–341. [Google Scholar]
- [19].Fenley AT, Henriksen NM, Muddana HS, Gilson MK, J. Chem. Theory Comput, 2014, DOI: 10.1021/ct5004109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA, J. Comput. Chem, 2004, DOI: 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
- [21].He X, Man VH, Yang W, Lee T-S, Wang J, J. Chem. Phys, 2020, DOI: 10.1063/5.0019056. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
