Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2024 Feb 8;20(5):1862–1877. doi: 10.1021/acs.jctc.3c01206

Kartograf: A Geometrically Accurate Atom Mapper for Hybrid-Topology Relative Free Energy Calculations

Benjamin Ries †,‡,*, Irfan Alibay , David W H Swenson , Hannah M Baumann , Michael M Henry ‡,§, James R B Eastwood , Richard J Gowers
PMCID: PMC10941767  PMID: 38330251

Abstract

graphic file with name ct3c01206_0014.jpg

Relative binding free energy (RBFE) calculations have emerged as a powerful tool that supports ligand optimization in drug discovery. Despite many successes, the use of RBFEs can often be limited by automation problems, in particular, the setup of such calculations. Atom mapping algorithms are an essential component in setting up automatic large-scale hybrid-topology RBFE calculation campaigns. Traditional algorithms typically employ a 2D subgraph isomorphism solver (SIS) in order to estimate the maximum common substructure. SIS-based approaches can be limited by time-intensive operations and issues with capturing geometry-linked chemical properties, potentially leading to suboptimal solutions. To overcome these limitations, we have developed Kartograf, a geometric-graph-based algorithm that uses primarily the 3D coordinates of atoms to find a mapping between two ligands. In free energy approaches, the ligand conformations are usually derived from docking or other previous modeling approaches, giving the coordinates a certain importance. By considering the spatial relationships between atoms related to the molecule coordinates, our algorithm bypasses the computationally complex subgraph matching of SIS-based approaches and reduces the problem to a much simpler bipartite graph matching problem. Moreover, Kartograf effectively circumvents typical mapping issues induced by molecule symmetry and stereoisomerism, making it a more robust approach for atom mapping from a geometric perspective. To validate our method, we calculated mappings with our novel approach using a diverse set of small molecules and used the mappings in relative hydration and binding free energy calculations. The comparison with two SIS-based algorithms showed that Kartograf offers a fast alternative approach. The code for Kartograf is freely available on GitHub (https://github.com/OpenFreeEnergy/kartograf). While developed for the OpenFE ecosystem, Kartograf can also be utilized as a standalone Python package.

Introduction

Drug design is a complex process that necessitates balancing cost, speed, and accuracy to achieve efficiency.1,2 In recent years, in silico methods have become increasingly important in enhancing these factors.1,37 A fundamental property in the multifaceted drug design process is potency, alongside ADME and synthetic accessibility.2,811 To evaluate the potency of potential drug candidates in silico, free energy calculation methods, including simulations on the theoretical level of molecular mechanics or quantum mechanics, are currently the state-of-the-art that promise the most accurate estimates.7,12 Furthermore, large efforts are undertaken to automatize such approaches.1322 In order to rank the most promising drug candidates by potency with in silico methods, ligands are typically ranked based on their calculated binding free energies.3,23

Binding free energy calculations can be categorized into two types depending on the calculated property: relative binding free energies (RBFEs) or absolute binding free energies (ABFEs).12 ABFEs provide direct insight into the binding free energy ΔGbindA of ligand A but are computationally expensive as the entire ligand perturbation is simulated from a bound to an unbound state by turning all interactions of the ligand off and on again.3,2427 In contrast, RBFEs leverage alchemical transformations of the ligands, representing a nonphysical transformation of a system between ligand A and ligand B (end-states), enabling fewer atomic mutations and leading to a smaller computational effort.3 The results describe the alchemical changes in water ΔΔGsolvBA and in the complex ΔΔGcomplexBA with the target protein, which can be used to calculate the binding free energy difference between the two ligands, ΔΔGbindBA, required to rank the potential drug candidates (see Figure 1).12

Figure 1.

Figure 1

Binding free energy thermodynamic cycle can be employed to calculate RBFEs (ΔΔGRBFEBA) by comparing two molecules A and B. The absolute free energy differences Inline graphic and Inline graphic represent the change of molecules A and B’s environment from water to the protein complex and can be combined to calculate Inline graphic. Alternatively, calculating the RBFE between A and B in a specific environment Inline graphic can often be used to more efficiently calculate Inline graphic. The environments, such as water and the complex, can be exchanged with, e.g., the combination of vacuum and water in order to retrieve RHFEs.

A crucial component of an RBFE calculation is the system representation.28 In previous work, Ries et al. identified three distinct ways of representing the system from literature: single topology,2931 hybrid topology,32,33 and dual topology.28,30,34

Dual-topology variants, such as linked dual topology and separated dual topology, represent the changing molecules in the end-states by independent sets of coordinates, allowing a straightforward representation of large ligand transformations with the respective restraint placement algorithm.22,3540 The opposite extreme to dual topology is the single-topology approach, where a minimal set of coordinates represents both end-states. The single-topology approach reduces the separation of the phase space between the two end-states significantly and is proposed to give an efficiency win compared to dual-topology approaches but allows for less diverse molecule changes.30,31,41,42 The hybrid-topology approach offers an improved calculation efficiency compared to that of dual-topology approaches as the coordinates of shared parts of the molecules are merged to the so-called core region, and differing parts of the molecule are represented by dual-topology-like regions.32,33,4348 This approach is an intermediate between dual- and single-topology approaches. Its primary challenge is to find the mapping of atoms for the core region (see Figure 2).32

Figure 2.

Figure 2

When considering end-state A (depicted in dark blue) and end-state B (depicted in beige), there exist three distinct approaches for representing the systems in free energy calculations: single topology, employing a minimal set of coordinates for both end-states (green merged atoms); hybrid topology, which merges shared parts into a core region and uses dual-topology-like regions for differing regions; and dual topology, where the coordinates of both end-states are kept separate. In the realm of dual topology, various approaches are found in literature, including the unconstrained case, the separated dual-topology approach using orientational restraints, and the linked dual-topology approach, connecting the end-states with restraints.

Many atom mapping algorithms for free energy calculations use cheminformatics-driven approaches focusing on 2D chemical properties and SIS searches in order to estimate the maximum common substructure (MCS) serving as the core region, as seen in Lomap,32 pmx,49 fkckombu,50 FESetup,51 ProtoCaller,52 or SMArt.5355 Alternatively, atom mapping approaches could be focused on the 3D geometry of the system.56 One benefit of such a method is the potential to bypass expensive SISs, which are trying to solve an NP-complete problem.57

Some atom mapping approaches based on SIS, such as pmx and Lomap, actually combine the 2D cheminformatics and the 3D geometry aspects but focus on the SIS algorithm.32,49

From practical considerations, the optimal atom mapper algorithm for a given application might depend on the intention of the modeler. For instance, if a given set of coordinates needs to be preserved as much as possible because they represent a binding mode or a special design idea, then a 3D geometry-focused approach might be the preferred method. However, if an optimal overlap of the molecules and their chemical properties is important, cheminformatics-based approaches may be more suitable.56

In this work, we present Kartograf, a package that implements an optimal atom mapping algorithm based on 3D geometric information. The proposed algorithm essentially addresses a bipartite graph matching problem using the Jonker–Volgenant algorithm, which has a computational worst-case complexity of O(n) = N3.58,59 We showcase features of our approach through theoretical examples illustrating its potential advantages. Furthermore, we will validate our atom mapper with test data sets for relative hydration and binding free energies. Kartograf is seamlessly integrated into the OpenFE environment but can alternatively be installed as a standalone package via PyPi or the conda-forge.60,61 The source code is readily available under the MIT license on GitHub at https://github.com/OpenFreeEnergy/kartograf.

Theory

Perturbations in Free Energy Calculations

The primary challenge in establishing RBFE calculations using a hybrid-topology approach is identifying the largest shared core region in both end-states. The complexity of the phase space can be reduced by merging the atom coordinates of the found core region and therefore enhance the sampling efficiency.30,31,33,62 Optimal merging of coordinates aims to minimize atomic perturbations and does not reduce the phase space overlap between end-states, therefore lowering the sampling cost of retrieving converged free energy estimations.27 The following objectives can be outlined for ligand atom mapping.

Minimizing the Number of Dummy Atoms

The noninteracting dummy atoms are used to make transformations possible between molecules that do not share the same number of atoms or do not overlap well in phase space.31,41 Atoms that transition from a dummy state to a physical state may need to displace surrounding atoms, such as solvent or protein components, in order to generate the necessary volume. Nonbonded interactions, such as van der Waals and electrostatic interactions, play a critical role in creating the necessary cavity.27,62 However, the more dummy atoms present in a state, the larger the volume or cavity that needs to be generated, making the transformation more challenging.27,62 Incorporating soft-core potentials into the calculations can help avoid singularities.63,64

Avoiding Flexibility Changes in Mapped Regions

If the flexibility of a mapped region varies between end-states, simulations are likely to lead to convergence issues. The difference in flexibility can be induced by bond-order changes, ring-breaking, or even ring-size changes. Distinct phase spaces may be explored by the different end-states, and coupling them through atom mapping might bias the states into nonoptimal phase space regions, resulting in poor representation of the end-states.65

Minimizing the Spatial Distance between Atoms Being Mapped

The final challenge identified pertains to the mapping of atoms that have large spatial distances from each other. This is particularly problematic when the input coordinates describe binding modes or specific desired 3D contexts. For small atom-pair distances, the perturbation might be minimal and acceptable. However, for larger distances, a modified binding mode or an incorrect stereochemistry might be sampled. This can be a significant issue in protein pockets, where the environment typically adapts slowly to ligands and steric hindrance limits binding mode sampling.

Additionally, some mapping algorithms might introduce coordinate shifts or further alignments to the molecules being mapped, which can potentially lead to clashes with the environment, further increasing the system perturbation.32 To mitigate this, SIS-based atom mappers like Lomap and PMX have introduced distance cutoffs to reduce the atom displacement caused by mapping.32,46 Still, stereochemistry remains a challenge for these SIS-based mappers.

It is important to note that all three goals, minimizing the number of dummy atoms, avoiding flexibility changes in mapped regions, and minimizing the spatial distance between atoms being mapped, can partially conflict with each other. The final mapping will need to find a balance among all the rules to be feasible.

Kartograf’s Atom Mapping Algorithm

Kartograf’s atom mapping algorithm is based on minimizing the spatial distance between the mapped atoms to identify the MCS (see section). The initial assumption of the atom mapper is that the coordinate space of the end-state molecules has a high spatial overlap and is considered to be static for the mapping process.

This high overlap can be assumed because the input to the free energy pipelines typically comes from, for example, docking procedures into the same coordinate space, a previous ligand alignment, or manual modeling.

The atom mapping approach of Kartograf does not take into account atom types and atom properties in the atom mapping itself. This challenge is addressed by network generation and the mapping scoring method. These methods are the only ways to avoid unfavorable atom type changes, assuming that the positions of the molecules are fixed.

Procedure

Solving the Bipartite Graph Matching Problem for Atom Mappings

As input, the atom mapper uses two molecule conformations. First, the two conformations A and B are translated into independent sets of nodes NA and NB based on the coordinates indicating the atom positions of the molecules. Next, a complete weighted bipartite graph is calculated by generating a full distance matrix from each node in NA to each node in NB (Figure 3 step 1). This step is followed by a filtering step that removes edges that have a distance weight longer than a given threshold (default: 0.95 Å), resulting in a sparser bipartite graph (Figure 3, step 2).

Figure 3.

Figure 3

Kartograf’s atom mapping approach assumes as input a well-aligned pair of molecules A (dark blue) and B (beige). The approach calculates distances (red) between all atoms in both ligands. Subsequently, a distance prefilter is applied, significantly reducing the search scope by allowing only reasonable atom mapping distances (default cutoff: 0.95 Å). Optionally, users can mask atoms to narrow down the search space. An initial optimal mapping (green) is determined based on distances using the Kuhn–Munkres algorithm.6668 Additionally, users can apply filter rules to the initial mapping, such as restricting hydrogen-to-heavy atom mapping and preventing ring breaks. From the resulting mapping, the largest connected set is returned as the mapped core region of the hybrid-topology approach.

The final perfect matching problem is solved with the Kuhn–Munkres algorithm6668 for optimal linear sum assignments. The algorithm reduces the sparse bipartite graph edges to a perfect matching of this graph, where each node of the graph has a degree of one (Figure 3-3). The gained atom pair set pAB can be further reduced by rules-based filters (i.e., by not allowing ring-breaking/ring-size changes in the atom-pair sets; more below). We consider this algorithmic approach to be an exact maximum common edge subgraph approach if and only if the atom alignment is ideal for both molecules as it returns the optimal matching for a given set or coordinates.54

The previously mentioned challenges of stereochemistry and molecule symmetries in atom mappings are intrinsically dealt with in this initial mapping due to the geometric matching approach as long as the atom alignment is optimal and the distance threshold is smaller than the smallest covalent bond.

Finding the Hybrid-Topology Core Region

The perfect matching, denoted as pAB, serves as an optimal starting point for identifying the core region of the hybrid topology. A key requirement for this core region is that it should only consist of atoms connected by covalent bonds in both A (aAiconnected) and B (aBjconnected) to avoid the resonance effects of disjunctive mapping regions during the simulation. The perfect matching obtained in the previous step may result in several disconnected sets of atoms for both A and B in pAB (see Figure 3, step 3, which yields four connected sets for A and three in B). In the final step, the largest overlap of two aAiconnected and aBjconnected in pAB is found. The overlap is measured in the number of edges the two sets can form together in pAB. The identified sets with the largest connected mapping region are then chosen as the core region for the hybrid-topology approach (see Figure 3, step 4).

Additional Filters by Chemical Rules/Premapping

In an effort to optimize atom mappings concerning the previously mentioned objectives, we offer users the ability to filter the resulting atom mappings using rule-based filters (see Figure 3, steps 3 and 4). Several filters have already been implemented, such as those that prevent element changes in the mapping, hydrogen atoms being mapped onto heavy atoms, and alterations in molecule flexibility, for example, by ring-breaking or molecule ring-size changes. The latter can cause issues in many MD codes as bonded terms are typically left unscaled for efficiency reasons, leading to singularities during the simulations. The benefits of avoiding ring breaks have been described in the literature by Liu et al.65 One solution to this problem was introduced by Wang et al., who proposed softbond-bonded potentials that enable scaffold hops in core regions.69 Alternatively, such mappings can simply be avoided if a hybrid-topology core region remains.

Furthermore, users have the option to provide premappings, allowing for the iterative generation of the mappings or ensuring the mapping of specific substructures (see Figure 3, steps 1 and 2). A more detailed explanation of the practical usage is given in the documentation of Kartograf (https://kartograf.readthedocs.io/en/latest/tutorial/custom_filters.html).

Mapping Scoring Metrics Taking Geometry into Account

The quantitative evaluation of a given mapping remains a critical question in free energy calculations beyond the mapping phase. The number of potential mappings for a set of N ligands is N(N – 1)/2.32 However, to generate a comprehensive ligand ranking, only a set of N – 1 mappings is necessary. An efficient network with N – 1 mappings can be constructed with the minimal spanning tree (MST) approach, which ideally uses the best N – 1 mappings of all possible mappings. Additional edges are often incorporated into perturbation networks to enhance the robustness of the free energy estimates. This strategy is evident in cycle closure approaches, for instance.6971

To automatically obtain a quality assessment of mappings, various scoring approaches have been proposed in the past, such as Lomap and machine-learning approaches by Scheen et al.32,70 In this study, we investigate geometry-focused metrics to analyze the mappings, which could be of potential interest to the development of mapping scoring functions, like the mapping-based root-mean-square deviation (RMSD), the mapped volume ratio (MVR), the mapped atom ratio (MAR), the shape mismatch score (SMS), and the shape overlap score (SOS).

The mapping-based RMSD calculates the distance atoms need to travel from their original position to the resulting mapped position (refer to Figure 4, mapping RMSD). This score seeks to penalize large atomic displacements that may require excess configurational sampling.

Figure 4.

Figure 4

Geometry-based scores are used to assess the mappings. The mapping RMSD describes the atom displacement due to the mapping from one state to the other. The average volume ratio quantifies the extent to which the mapped region’s volume overlaps with the entire molecules, averaged across both molecules. Lastly, the shape difference is calculated for the mappings, comparing the shape difference of the mapped region and the whole molecules. Two distinct approaches were tested with varying emphases: one centered on overlap and the other emphasizing differences of the molecule shapes.

MVR quantifies the shape overlap of the mapped region and the two molecules on average. In this case, a larger ratio is preferred as it indicates a larger overlap of the two ligands in the mapping. For planar molecules, this score is expected to highly correlate with the MAR or the maximum common substructure rule from the Lomap scorer. However, if a molecule with a fold or significant 3D structure is involved, it might diverge from the simpler metric (refer to Figure 4, average volume ratio).

The final metric added is shape-based and illustrates the difference between the mapped region and the original molecules (refer to Figure 4, difference). Here, we evaluate two different approaches to this score from the package rdkit:72 in the first mode, the focus is on molecule shape differences called SMS using the shape protrude distance difference of the molecules, and in the second mode, the focus is on molecule shape overlap called SOS by calculating the Tanimoto shape distance of the two molecules.

Comparing Mappings

To quantitatively compare mappings generated by different methods, correlations of the number of mapped atoms and the Jaccard similarity coefficient (JCS) were used. The JCS indicates the diversity of mapped atom pairs of mapping A and mapping B (see eq 1). A score of 1 translates to two identical mappings, and a score of 0 translates to two completely different mappings.

graphic file with name ct3c01206_m001.jpg 1

Methods

Implementation

Kartograf was written with Python 373,74 in an object-oriented style. The source code and documentation are available on GitHub (https://github.com/OpenFreeEnergy/kartograf). The package can be used either as a standalone package with PyPi60 or the conda-forge project channel61 or from the OpenFE environment (https://github.com/OpenFreeEnergy/openfe).75 The Kartograf package uses NumPy76 for vector calculations and SciPy59 for the Kuhn–Munkres algorithm and convex hull volume determination77 in the MVR. The RDKit72 is heavily used for the representation of molecules and the score calculations, like the shape-difference scores or RMSD. The basic class types used in Kartograf were derived from the Grand Unified Free Energy (gufe) package, which provides standardized base types for classes in the OpenFE environment (https://github.com/OpenFreeEnergy/gufe).

Mapping Approaches

In this study, the performance of Kartograf’s atom mapper was tested along with two alternative approaches representing different algorithmic flavors: a 2D SIS-only variant of the Lomap atom mapper (2D Lomap) and a mixed variant using both SIS and geometry aspects (default for Lomap-3D Lomap here) to estimate the MCS. The Lomap settings were used as defined in the Lomap Github repository (https://github.com/OpenFreeEnergy/Lomap), for the Lomap-3D approach the threed option was activated.

Simulation Details

The following simulations were all conducted with the OpenFE release 0.10.175 and the contained OpenFE CLI tools directly enabling the usage of the OpenFE Relative Free Energy (RFE) protocol.78 OpenFE was used for the simulations and preparations of OpenMM 8.0.079 using the GPU code, OpenMMTools,80 and the OpenFF Toolkit 0.13.0.81 We note that the OpenFE RFE protocol is based on the Perses toolkit.78

The systems, except the vacuum systems, were first solvated using TIP3P waters82 up to a distance of 1.2 nm from the solute, directly defining the cubic periodic simulation box. Afterward, the systems were neutralized and set to an ion concentration of approximately 0.15 M with sodium and chloride ions.

For the ligand parameterization, OpenFF 2.1.083 was used, and for the protein, AMBERFF14SB.84 To allow an integration speedup, the masses of the hydrogens were set to 3.0 amu following the hydrogen mass repartitioning (HMR) scheme.85,86 During any system simulation or optimization of solvated systems, the short-range nonbonded cutoff was set to 1 nm with long-range interactions handled using the PME scheme.87 For vacuum simulations in the relative hydration free energies (RHFE) calculations, no nonbonded cutoff was applied. All simulations employed constraints for hydrogen-containing bonds using SHAKE for solutes and SETTLE for water molecules. The tolerance was set to 10–6 kJ mol–1 nm–1 for the SHAKE88 and SETTLE89 combination used by OpenMM.

After the system setup, an energy minimization with an implementation of the L-BFGS Optimizer90 was performed for a maximum of 5000 steps. Following this, the system was replicated for 11 λ that forms the transformation path between end-state A and end-state B. The transformation is described by a linear coupling between both end-states with the extreme points of 0 and 1.91 To avoid singularities during the transformations, LJ softcore interactions as defined by Gapsys et al. were applied for the Lennard-Jones nonbonded terms with an α of 0.85 and a σ of 1.0.64,92

Each replicate was equilibrated with the corresponding λ-value for 1 ns. For the equilibration and the final production run, a Langevin integrator93 was used with an integration step of 4 fs due to the HMR scheme and a collision frequency of 1.0 ps–1. The temperature of the system was intrinsically kept at 298.15 K by the Langevin integrator. The pressure was kept at 1 bar with a Monte Carlo Barostat94,95 that was coupled every 1 ps. Additionally, a λ-dependent Hamiltonian replica exchange96,97 scheme was applied. The number of replicas equaled the number of λ-points, and after each 1 ps, a Metropolis–Hastings–Monte Carlo move tried to perform an all-to-all exchange scheme on the replicas.98,99 The production runs were performed to generate 5 ns simulations.

Finally, the free energies of each transformation were estimated using MBAR100 via pymbar.101

The whole procedure was repeated three times with different random seeds to assess the sampling uncertainty of the approach. Cinnabar102 was used for the network-wide free energy analysis over a full set of ligands. The bootstrap error estimate for the statistics (such as the root-mean-square error (RMSE), mean unsigned error (MUE), Pearson correlation coefficient (ρPears), and Kendall’s tau (τKendall)) was estimated over all replicates with SciPy.59

Test Systems

RHFE System

In order to test the robustness and the outcome of the mapping approach, the RHFEs of the benzene set of Ries et al. were calculated.22,28,91,103,104 In the benzene RHFE data set, a large variety of transformation types occur, including R-Group changes, ring system growth, and ring hybridization changes (Figure 5).

Figure 5.

Figure 5

RHFE data set molecules represent a complex set of transformations that include R-group modifications, ring-system expansions, and ring-size changes. The shown radial network layout was used by Ries et al.,28 but ligands 7 and 4 were excluded from the data set as the current implementation of the RFE protocol in OpenFE does not allow ring-breaking. Adapted with permission from Ries et al. Copyright 2022 Springer Nature.28

Two different types of free energy perturbation networks were investigated: the original radial network by Ries et al. and MST networks generated using both the different mapping approaches and the Lomap scoring function. From the networks, ligands 7 and 4 were excluded as the current OpenFE RFE protocol implementation did not allow ring-breaking. The molecule coordinates were used from the work of Ries et al.28

Protein Complex Systems

The tyrosine kinase 2 (TYK2) and hypoxia-inducible factor 2 alpha (HIF2A) test set coordinates and experimental data were derived from the protein–ligand benchmark (PLB).69,105109 The systems cover ring-breaking, R-group changes, and ring-size changes. In the case of TYK2, all molecule changes are localized in the same molecule region and all atoms have a very high overlap, making the system a very well-behaved system (Figure 6). In contrast, HIF2A spreads the molecule transformations over the whole molecule, leading to much more difficult transformations (Figure S1).

Figure 6.

Figure 6

TYK2 test set is a small benchmarking test set, containing R-group changes in one location with a large common scaffold. The starting structures for the complex were taken from Hahn and Wagner.105

The protein complex and the RHFE systems can be retrieved from the openfe-benchmarks release v.0.1.0 (https://github.com/OpenFreeEnergy/openfe-benchmarks).

Protein Mutation Mapping

As a potential outlook for an expanded use-case of Kartograf, the protein mapping mutation test system was added. In the protein mutation system, aspartate 153 of TYK2 was mutated to a tyrosine (Figure S2). Note that no free energy calculation was performed with the protein mutation test system.

Results and Discussion

Assessment of Mapping Generated by the Mapping Approaches

To evaluate the performance of each mapping approach, we applied them to each possible ligand pair of the four different test systems. All mappers were able to find mappings for the provided data sets, except for the protein single residue mutation system, in which only Kartograf was able to converge on a mapping.

Comparing Mappings of Different Approaches

Comparing the number of mapped atoms for the RHFE, TYK2, and HIF2A data sets, we found that in most cases, the atom mappers generated mappings of equal amounts of atoms being mapped (see Figures S3–S5). In a limited number of cases, Kartograf generated mappings with fewer matched atoms for mappings: RHFE data set 3% (5 mappings), TYK2 data set 2% (5 mappings), and HIF2A 1% (12 mappings). A visual inspection of the exceptional cases showed a few reasons for the outliers. In a substantial number of cases, the SIS-based approaches “realigned” the ligands to maximize the number of mapped atoms (see Figures 7 and S6). This was quantified by counting the number of mappings that displaced more than 3 atoms with a distance larger than 1.2 Å. Of the total 1755 mappings, the 2D Lomap approach “realigned” 891 mappings and the 3D Lomap approach “realigned” 310 mappings, but no mapping was observed to be “realigned” in the Kartograf’s atom mapping approach. In one case of the RHFE data set, it was found that the ligand alignment was nonoptimal, leading to a decreased number of atoms mapped by Kartograf (see Figure S7). This finding emphasizes the importance of the molecule input coordinates for Kartograf’s atom mapper.

Figure 7.

Figure 7

Mapping of the HIF2A ligands 124 and 163 demonstrates the differences in outputs from each mapper. Given the input coordinate set (top), the Kartograf mapper (middle) maps atoms such that the spatial distance is minimal (colored spheres). However, in this case, the Lomap mapping approaches (bottom) “realigned” the mappings, such that in the final hybrid coordinates, the ring was inverted (colored spheres). Such a “realignment” could potentially lead to an undesired alternative binding mode simulated in the free energy calculations.

In the next step, the diversity of mappings between the three different mappers was investigated by using the Jaccard score (see Figure 8). If the Jaccard score is 1 between two mappings, then both mappings are equal. If the score is not close to 1, then mappings do not share the same mapped atom pairs. The RHFE data set was the most diverse set of all mappings. 50% of mappings were shared between 3D Lomap and Kartograf, 60% were identical between 2D Lomap and 3D Lomap, and 70% were identical between 2D Lomap and Kartograf. We note that the molecules in the RHFE data set have a very small number of atoms. As a consequence, small changes in the mapping lead to very large differences in the Jaccard score. The three different mapping approaches generated the least diverse mappings in the TYK2 data set, where 95% of the mappings were identical across the different mapping approaches. Finally, in the HIF2A set, the mappings generated using the different approaches were again more diverse. Here, Kartograf shared 60% of identical mappings with 3D Lomap and 80% with 2D Lomap. Between 2D Lomap and 3D Lomap, 65% of mappings were shared. In very low Jaccard scores (i.e., very diverse mappings), the SIS approaches often “realigned” the molecules to map them. Such “realignments” could lead to larger changes, translating or rotating substructures and effectively modifying a given binding mode (see Figure 7).

Figure 8.

Figure 8

Jaccard score histograms illustrate the diversity between two mapping approaches. The Jaccard score compares two mappings by evaluating their selection of atom pairs. A score of 1 indicates identical mappings, while a score close to 0 suggests that the mappings consist of entirely different atom pairs. The comparisons are presented here for each data set and every combination of atom mapper used.

As a summary of this first analysis, we considered most mappings from the three different mapping approaches to be similar. Still, we observed a non-negligible portion of diverse mappings, which mainly were derived from so-called “realignments” in which atom positions were shifted in the mapping structures by the SIS approaches, ignoring the geometry.

Investigating Mappings with Mapping Scores

We next analyzed each mapping using six distinct scores: SOS, SMS, the mapping RMSD score, MAR, MVR, and the MCS difference score. Generally, for all mapping scores, large histogram overlaps were observed using the different mapping methods, hinting at very similar score performance, except for the mapping RMSD score. The mapping RMSD score showed a lower performance with the Lomap approaches compared to that of the Kartograf’s atom mapper (see Figure 9). This performance difference finding might not come as a large surprise as Kartograf focuses on finding an optimal solution to RMSD differences.

Figure 9.

Figure 9

Application of mapping scores to all generated mappings is summarized in histograms (2D Lomap in blue, 3D Lomap in orange, and Kartograf in green). The score ranges from 1, a “good” score, to 0, a “bad” score. There is generally a high overall overlap in the scores, with the exception of the mapping RMSD score. For the mapping RMSD scores, the two Lomap approaches generated mappings with lower scores.

In the next step, it was aimed to determine if the very diverse cases found with the Jaccard scores could be explained using a single mapping score for one of the two mappings. Upon calculating the correlation between all Jaccard scores and the different mapping scores applied to all mappings of one method, we observed poor correlation across all mapping metric scores toward the Jaccard score (with rPears < 0.35 and τKendall < 0.25), except for the mapping RMSD score. In the case of the mapping RMSD score vs the Jaccard score, an overall rPears correlation of 0.95 and 0.82 was found for 2D Lomap and 3D Lomap, respectively (τKendall of 0.84 and 0.69). For Kartograf mappings, the mapping RMSD score vs the Jaccard score correlation was relatively poor, with an rPears of 0.47 and a τKendall of 0.34. This outcome was anticipated as the core of the Kartograf atom mapper algorithm is based on an algorithm that finds the optimal atom mapping concerning geometric distances, while the Lomap approaches do not prioritize this aspect.

In terms of correlations between different mapping scores, we identified strong correlations between the two shape-based scores (Inline graphic and Inline graphic) and a moderate correlation with the RMSD score (SMS Inline graphic and Inline graphic, SOS: Inline graphic and Inline graphic). A strong correlation was found between MVR and MAR (Inline graphic and Inline graphic), with a weak correlation to the RMSD mapping score (Inline graphic and Inline graphic). Another strong correlation of Inline graphic and Inline graphic was observed between MVR and the MCS difference score. Interestingly, an anticorrelation was detected between SMS and SOS versus MVR, MAR, and the MCS difference score (rPears < −0.5 and τKendall < −0.5), indicating a partially mutually exclusive relationship.

In conclusion, the analysis of the mapping scores identified three correlating groups: the first group comprises SOS and SMS, the second group includes MVR, MAR, and the MCS difference score, and the final group consists solely of the mapping RMSD score. Concerning the mapping diversity found in the Jaccard scores, a high correlation of the mapping RMSD score with mappings of the Lomap approaches was observed. Further visual examination of the RMSD outliers revealed the previously mentioned “realignments” as the primary cause of low scores. The interpretation of the score depends on the expectations for the coordinates as it evaluates how close the final mapping coordinates are in comparison with the input. If the given coordinates originate from, for example, a ligand-based approach without any ligand–protein complex information, then the RMSD deviations detected could be considered an improvement to the coordinates. However, in a case where the molecules were modeled in an exact desired binding mode, the score indicates a deviation from that binding mode, which should be avoided. The MVR differences were mainly triggered by Kartograf for the RHFE data set due to the fewer mapped atoms observed in the comparative analysis of the number of mapped atoms. A visual inspection of the results shows that shape-based mapping scores picked up large shape changes in the mappings and input alignment deviations (which might lead to a shape overlap decrease in the hybrid-topology approach). These mapping scores could be an attractive addition to scoring the complexity of a transformation with given binding modes. As a final remark, future work may select one mapping score of the highly correlating score groups to represent different aspects in a mapping evaluation.

Time Consumption Comparison of Mapping Approaches

As anticipated in the Introduction, it was found that the time required to generate the mappings is directly proportional to the combination of the number of states and the number of atoms involved in a mapping. This means that as the complexity of the system (in terms of both states and atoms) increases, the time required to create the mappings also increases.

The three atom mapping methods required the same amount of time of 3 s for the all-to-all mappings of the RHFE set, which translates to 91 mappings with an average of 38.6 atoms per mapping. However, for TYK2 with 120 all-to-all mappings and an average atom count in the mappings of 74, the time difference reaches 18 s between the Lomap approaches taking 26 s and the Kartograf approach taking 8 s.

For the HIF2A data set, with 666 mappings with an average of 73 atoms per mapping, the difference between the methods exceeded 10 min. The 3D Lomap approaches required more than 11 min, while the Kartograf approach resolved the all-to-all mappings in approximately 1 min.

As an extreme example, we also attempt to generate a mapping from a full protein mutation transformation of ASP to TYR in TYK2 with 9349 atoms. Here, Kartograf solved a mapping within 2 s, while the SIS-based algorithms were unable to converge to a solution within a 4 h limit (see Table 1). It is noted that the Lomap approaches with the isomorphic graph search were designed with small molecules in mind.

Table 1. Mapping Time Consumption Comparison among 3D Lomap, 2D Lomap, and Kartograf across Diverse Input System Complexities, Varying the Number of States and the Number of Atomsa.

system nstates avg
graphic file with name ct3c01206_m019.jpg
mapper gen. mappings duration
RHFE 14 38.6 Kartograf 182 3 s
      3D Lomap   3 s
      2D Lomap   3 s
TYK2 16 74.0 Kartograf 240 8 s
      3D Lomap   26 s
      2D Lomap   26 s
HIF2A 35 73.0 Kartograf 1225 58 s
      3D Lomap   11 min 33 s
      2D Lomap   11 min 16 s
protein 2 9349.0 Kartograf 1 2 s
      3D Lomap   >4 h
      2D Lomap   >4 h
a

It should be noted that Lomap was not designed to tackle protein mutation challenges, yielding an unfair comparison.

Calculation of Relative Free Energies

To evaluate the atom mapping algorithm of Kartograf, we applied it along with two variants of the Lomap atom mappers to multiple relative free energy calculation approaches.

Calculation of RHFEs

In this study, we used a toy system to calculate the RHFEs. However, we had to exclude ring-size changes from the analysis as the hybrid-topology approach did not support bond-breaking during the transformations. Initially, the same radial layout from Ries et al. was used to assess the performance of the mapping algorithms; later, an MST approach was studied.28

For the radial network, all of the mapping approaches successfully lead to the calculation of the necessary 13 edges to obtain the complete ranking of compounds. Comparing the generated free energies from the different starting points of Kartograf, 2D Lomap, and 3D Lomap with the experimental data, we observed no significant differences in terms of RMSE, MUE, ρPears, and τKendall between the different approaches (see Table 2). However, two clear outliers were observed in the ΔΔG correlation plots (see Table S1 and Figure S8). These outliers correspond to the ligand transformations 12–8 and 12–5, which consistently showed high deviations from the experimental values in all approaches. These deviations could potentially be attributed to sampling issues in the hybrid-topology approach, particularly for the complex transformations involving ligand 5, which were already identified as difficult in the previous work.28 Similarly, for transformations 12–8, the presence of a flexible and long aliphatic chain in ligand 8 may be causing sampling difficulties. An additional analysis of ring hybridization changes and the resulting torsion distributions of ligand transformations 12–2 are provided in the Supporting Information (see Section S2.1.1 and Figure S9).

Table 2. ΔΔG RHFE-Statistics Summary of Estimate Errors and Correlation Metrics to Experiment103 for Each Mapping Method and Network Typea.

  radial
MST
network layout approach Kartograf 2D Lomap 3D Lomap Kartograf 2D Lomap 3D Lomap
RMSE (kcal/mol) 1.21 ± 0.21 1.22 ± 0.22 1.20 ± 0.21 1.12 ± 0.14 1.06 ± 0.17 1.28 ± 0.18
MUE (kcal/mol) 1.00 ± 0.19 1.00 ± 0.20 1.00 ± 0.19 0.97 ± 0.16 0.92 ± 0.15 1.13 ± 0.17
ρPears 0.91 ± 0.05 0.91 ± 0.05 0.92 ± 0.05 0.95 ± 0.03 0.95 ± 0.03 0.92 ± 0.05
τKendall 0.81 ± 0.10 0.76 ± 0.12 0.77 ± 0.12 0.80 ± 0.13 0.82 ± 0.13 0.76 ± 0.13
a

Errors in statistical estimates are calculated via bootstrap resampling.

Using Cinnabar’s maximum likelihood estimation (MLE) approach, absolute ΔG values were predicted. The resulting statistical metrics showed trends similar to those of the ΔΔG values (see Figure 10).

Figure 10.

Figure 10

MLE-derived ABFEs for the radial RHFE data set plotted against the experimental results. A strong correlation was observed for all approaches with the experiment. The color gradient of the data points (blue: up to ≤0.5 kcal/mol; red: max. ≥2 kcal/mol) indicates the distance of the predictions from the experimental reference.

To explore the impact of different mappers on the network layout, we used a minimal spanning tree-ligand network layout instead of a radial map.

The usage of different mappers resulted in changes in the graph structure (Figure S10). Out of the 13 edges required to describe the complete ranking of all molecules, 5 edges were common across all networks. Two additional edges were shared between 3D Lomap and Kartograf, suggesting a closer relationship between the mappings in the data set for these two approaches.

With the MST networks, a slight but within-error improvement in observed statistics comparing the calculated values with the experimental values is seen (see Table S2, Figures S11, and S12). In all MST approaches, only one transformation edge (ligands 3–11) selected by the 3D Lomap atom mapper approach deviated more than 2 kcal/mol in all three replicates from the experimental result, leading to an average MUE of 2.55 kcal/mol.

Similar to ΔΔG, the overall ΔG value performance of the approaches remained comparably equal, with no ligand deviating by more than 2 kcal/mol from the experimental values (see Figure S12).

Calculation of RBFEs

Multiple RBFE campaigns were conducted to assess the performance of the three different mapping approaches in more complex scenarios.

RBFE Calculations with the TYK2 System

We first investigated the TYK2 system, which consists of relatively small alchemical modifications (see Figure 6). The radial network layout for the 16 ligands was centered around ligand ejm31, resulting in the best average Lomap score across all edges. MST networks were generated for all three mappers with 14 edges. Six common edges were found across the networks, with two extra edges shared between Kartograf and 2D Lomap and six extra edges shared between Kartograf and 3D Lomap. 2D Lomap and 3D Lomap shared one exclusive edge. One unique edge was found by 3D Lomap and five by 2D Lomap for the MST network layout. As previously found, we find that the overlap in identified edges is larger between Kartograf and 3D Lomap.

Comparing the obtained ΔΔG values from the different approaches revealed no significant differences between them. Interestingly, in the case of TYK2, the average statistics seemed to worsen with the MST layout compared to the radial layout, although this change was not statistically significant (see Tables 3, S3, and S4). None of the approaches yielded any outliers exceeding 2 kcal/mol deviation from the experimental values (see Figures S14 and S15).

Table 3. ΔΔG TYK2 RBFE-Statistics Summary of Estimate Errors and Correlation Metrics to Experiment103 for Each Mapping Method and Network Type of the TYK2 Systema.

  radial
MST
network layout approach Kartograf 2D Lomap 3D Lomap Kartograf 2D Lomap 3D Lomap
RMSE (kcal/mol) 0.65 ± 0.13 0.69 ± 0.15 0.65 ± 0.14 0.90 ± 0.17 0.79 ± 0.14 0.92 ± 0.15
MUE (kcal/mol) 0.53 ± 0.11 0.54 ± 0.12 0.48 ± 0.12 0.72 ± 0.16 0.64 ± 0.13 0.76 ± 0.16
ρPears 0.72 ± 0.13 0.74 ± 0.11 0.78 ± 0.11 0.41 ± 0.29 0.65 ± 0.18 0.40 ± 0.29
τKendall 0.46 ± 0.18 0.57 ± 0.16 0.50 ± 0.17 0.14 ± 0.18 0.41 ± 0.20 0.21 ± 0.20
a

Errors in statistical estimates are calculated via bootstrap resampling.

To evaluate the overall performance of the approaches, we compared the ΔG values that did not exhibit any significant changes between the three approaches (see Figure 11 and Table S4).

Figure 11.

Figure 11

MLE-derived ABFEs for the radial TYK2 data set plotted against the experimental results. The color gradient of the data points (blue: up to ≤0.5 kcal/mol; red: max. ≥2 kcal/mol) indicates the distance of the predictions from the experimental results.

RBFE Calculations with the HIF2A System

Lastly, the HIF2A system with 35 ligands was used as a test system with changes in multiple regions of the ligands (see Figure S1).

The radial network layout was formed around ligand 163, which yielded a radial network with the best average Lomap score across all edges.

MST networks containing 34 transformations were generated for all three mapping methods; of these, 18 edges were shared across all three networks. Kartograf and 2D Lomap shared two additional edges; Kartograf and 3D Lomap shared 11 exclusive edges; and 2D Lomap and 3D Lomap shared five additional edges. The network constructed with Kartograf identified two unique edges, while 2D Lomap identified nine unique edges.

Once again, all approaches yielded comparable results without any significant differences being detected. However, in the HIF2A data set, the average statistics of the MST approach showed slight improvements in RMSE and MUE compared to the radial network layout. Nevertheless, these observations were not significant. Based on the analysis of the HIF2A data set, we can conclude that all three approaches (Kartograf, 2D Lomap, and 3D Lomap) demonstrate comparable performance. Despite the complexity of the data set, none of the approaches showed a significant advantage over the others in terms of ΔΔG performance (see Figure 12, Table 4, Tables S5 and S6, and Figures S16–18).

Figure 12.

Figure 12

MLE-derived ABFEs for the radial HIF2A data set plotted against the experimental results. The color gradient of the data points (blue: up to ≤0.5 kcal/mol; red: >2 kcal/mol) indicates the distance of the free energy calculation results from the experimental results.

Table 4. ΔΔG HIF2A RBFE-Statistics Summary of Estimate Errors and Correlation Metrics to Experiment109 for Each Mapping Method and Network Type of the HIF2A Systema.

  radial
MST
network layout approach Kartograf 2D Lomap 3D Lomap Kartograf 2D Lomap 3D Lomap
RMSE (kcal/mol) 1.96 ± 0.18 2.16 ± 0.22 2.05 ± 0.22 1.58 ± 0.30 1.97 ± 0.29 1.59 ± 0.36
MUE (kcal/mol) 1.67 ± 0.18 1.81 ± 0.20 1.73 ± 0.19 1.20 ± 0.19 1.54 ± 0.21 1.17 ± 0.20
ρPears 0.33 ± 0.12 0.26 ± 0.13 0.25 ± 0.15 0.29 ± 0.22 0.15 ± 0.21 0.21 ± 0.31
τKendall 0.26 ± 0.11 0.22 ± 0.12 0.18 ± 0.12 0.29 ± 0.12 0.18 ± 0.14 0.22 ± 0.14
a

Errors in statistical estimates are calculated via bootstrap resampling.

Torsion Sampling Analysis of Ligand 163 in the Context of “Realignments” in Mappings

To investigate the impact of “realignments” by the Lomap mapping approaches that inverted the coordinates of a benzyl substituent, potentially impacting the binding mode during simulations, a torsion analysis was conducted for molecule ligand 163. The analysis focused on the torsional distributions of all heavy-atom-related dihedrals in both water and complex simulations.

In the water case, it was observed that the bonds between atoms 4–5 and 5–6 were able to rotate fully during the simulations. However, in the complex environment with HIF2A, the behavior changed due to the steric hindrance of the protein. No rotation of the ring was observed in any of the three replicates (Figure 13). This suggests that for the atom mapping generated by 2D Lomap and 3D Lomap for this ligand pair, the binding mode might be altered (see Figure 7). In the simulated case, we found that the correct orientation was obtained on energy-minimizing the system, and therefore no change of the binding mode occurred, fortunately.

Figure 13.

Figure 13

Torsion angle (Tors) distributions of ligand 124 sampled during the simulation. These results indicate that the rotation around Tors 4–5 and Tors 5–6 can occur in the water environment but not in the complex. Combined with our findings in 7, this indicates the possibility for these realignments to lead to sampling issues should they not be resolved during system equilibration.

This finding emphasizes the importance of carefully considering atom mappings, especially for ligand pairs in complex environments. Ensuring that the input coordinates accurately represent the desired binding mode in a 3D coordinate system is crucial to avoid potential issues during simulations.

Conclusions

We have introduced Kartograf, a tool that offers an efficient algorithm for atom mappings based on 3D coordinates. This algorithm is suitable for both small molecules and potentially large biomolecules. Kartograf also provides useful features such as 3D-based mapping scores that could be further used in novel atom mapping scoring functions.

The evaluation of the atom mapper showed that Kartograf contains a fast mapping approach for a large number of molecules and a large number of atoms contained in their mappings, often yielding similar mapping results compared to those of Lomap. If the coordinates should be preserved as much as possible, we want to suggest Kartograf’s atom mapper as a tool of choice. However, for Kartograf’s mapping algorithm, the alignment of the ligands is crucial, and a possible source of error was found. To potentially overcome alignment problems, Kartograf offers shape and SIS alignment wrapping functions. Additionally, we want to mention that there are a plethora of alternative MCS finding algorithms, as described, for example, by Raymond and Willett, which could be further compared to the here presented atom mapper in order to get a better understanding of the advantages and disadvantages of our approach.54

To evaluate the performance of Kartograf’s atom mapper, multiple atom mappers and various network layouts were compared using the RFE protocol approach of OpenFE. Our results showed that the Kartograf algorithm produced comparable results to the Lomap atom mappers for the given test sets. In the future, we would like to continue to monitor the performance of Kartograf in the context of more diverse and larger benchmark sets, with the expectation of a much faster mapping phase and smaller geometric perturbations derived from the atom mappings for the calculations.

Kartograf’s mapping algorithm may also be useful for other applications that require mappings, such as product/educt mapping in reaction predictions with QM software or the protein mutation use case.

The presented geometry-based mapping scores, especially the mapping RMSD score, could show potential in the development of future mapping scorers.

The code for Kartograf is available on GitHub (https://github.com/OpenFreeEnergy/kartograf) and can be easily used within the OpenFE environment or as a standalone package, allowing for good interoperability between different packages. Installation as a standalone process can be performed using pip and conda.

Acknowledgments

We express our sincere gratitude to Roger Sayle, John Mayfield, Clara Christ, David Mobley, John Chodera, and Darrin York for their helpful discussions, which greatly enriched this research. We also acknowledge the contributions of Melissa Boby and Iván Pulido in preparing the protein–ligand benchmark systems used in this study. We also acknowledge the Open Free Energy Consortium and Technical Advisory Committee members for helpful discussions. Special thanks are extended to Aniket Magarkar for his support throughout the course of this study. We thank the partners in the Open Free Energy Fund and Boehringer Ingelheim Pharma GmbH & Co KG for funding. We also thank Boehringer Ingelheim for providing the necessary compute time to carry out the simulations included in this work.

Data Availability Statement

The code of Kartograf is available at https://github.com/OpenFreeEnergy/kartograf. For the relative free energy simulations, we used OpenFE, which is available at https://github.com/OpenFreeEnergy/openfe. The RHFE, TYK2, and HIF2A System start coordinates can be accessed at https://github.com/OpenFreeEnergy/openfe-benchmarks.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.3c01206.

  • Additional descriptions of the HIF2A and protein mutation systems, correlation plots of the number of mapped atoms in test networks, visualization of edge case mappings, analysis of the sampled torsion distributions of ligand 2 from the RHFE data set, and individual free energy calculation results and additional correlation plots for the radial and MST networks (PDF)

The authors declare no competing financial interest.

Supplementary Material

ct3c01206_si_001.pdf (50.4MB, pdf)

References

  1. Michel J.; Foloppe N.; Essex J. W. Rigorous Free Energy Calculations in Structure-Based Drug Design. Mol. Inf. 2010, 29, 570–578. 10.1002/minf.201000051. [DOI] [PubMed] [Google Scholar]
  2. Awale M.; Hert J.; Guasch L.; Riniker S.; Kramer C. The Playbooks of Medicinal Chemistry Design Moves. J. Chem. Inf. Model. 2021, 61, 729–742. 10.1021/acs.jcim.0c01143. [DOI] [PubMed] [Google Scholar]
  3. Cournia Z.; Allen B.; Sherman W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. J. Chem. Inf. Model. 2017, 57, 2911–2937. 10.1021/acs.jcim.7b00564. [DOI] [PubMed] [Google Scholar]
  4. Cournia Z.; Allen B. K.; Beuming T.; Pearlman D. A.; Radak B. K.; Sherman W. Rigorous Free Energy Simulations in Virtual Screening. J. Chem. Inf. Model. 2020, 60, 4153–4169. 10.1021/acs.jcim.0c00116. [DOI] [PubMed] [Google Scholar]
  5. Armacost K. A.; Riniker S.; Cournia Z. Novel Directions in Free Energy Methods and Applications. J. Chem. Inf. Model. 2020, 60, 1–5. 10.1021/acs.jcim.9b01174. [DOI] [PubMed] [Google Scholar]
  6. Meier K.; Bluck J. P.; Christ C. D. Use of Free Energy Methods in the Drug Discovery Industry. J. Am. Chem. Soc. 2021, 1397, 39–66. 10.1021/bk-2021-1397.ch002. [DOI] [Google Scholar]
  7. Barros E. P.; Ries B.; Böselt L.; Champion C.; Riniker S. Recent Developments in Multiscale Free Energy Simulations. Curr. Opin. Struct. Biol. 2022, 72, 55–62. 10.1016/j.sbi.2021.08.003. [DOI] [PubMed] [Google Scholar]
  8. Li A. P. Screening for Human ADME/Tox Drug Properties in Drug Discovery. Drug Discovery Today 2001, 6, 357–366. 10.1016/S1359-6446(01)01712-3. [DOI] [PubMed] [Google Scholar]
  9. Butina D.; Segall M. D.; Frankcombe K. Predicting ADME Properties in Silico: Methods and Models. Drug Discovery Today 2002, 7, S83–S88. 10.1016/S1359-6446(02)02288-2. [DOI] [PubMed] [Google Scholar]
  10. Ertl P.; Schuffenhauer A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminf. 2009, 1, 8–11. 10.1186/1758-2946-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Ertl P.; Altmann E.; McKenna J. M. The Most Common Functional Groups in Bioactive Molecules and How Their Popularity Has Evolved over Time. J. Med. Chem. 2020, 63, 8408–8418. 10.1021/acs.jmedchem.0c00754. [DOI] [PubMed] [Google Scholar]
  12. Song L. F.; Merz K. M. J. Evolution of Alchemical Free Energy Methods in Drug Discovery. J. Chem. Inf. Model. 2020, 60, 5308–5318. 10.1021/acs.jcim.0c00547. [DOI] [PubMed] [Google Scholar]
  13. Christ C. D.; Fox T. Accuracy Assessment and Automation of Free Energy Calculations for Drug Design. J. Chem. Inf. Model. 2014, 54, 108–120. 10.1021/ci4004199. [DOI] [PubMed] [Google Scholar]
  14. Williams-Noonan B. J.; Yuriev E.; Chalmers D. K. Free Energy Methods in Drug Design: Prospects of “Alchemical Perturbation” in Medicinal Chemistry. J. Med. Chem. 2018, 61, 638–649. 10.1021/acs.jmedchem.7b00681. [DOI] [PubMed] [Google Scholar]
  15. Gao Y.-D.; Hu Y.; Crespo A.; Wang D.; Armacost K. A.; Fells J. I.; Fradera X.; Wang H.; Wang H.; Sherborne B.; Verras A.; Peng Z. Workflows and Performances in the Ranking Prediction of 2016 D3R Grand Challenge 2: Lessons Learned from a Collaborative Effort. J. Comput. Aided Mol. Des. 2018, 32, 129–142. 10.1007/s10822-017-0072-z. [DOI] [PubMed] [Google Scholar]
  16. Loeffler H. H.; Bosisio S.; Duarte Ramos Matos G.; Suh D.; Roux B.; Mobley D. L.; Michel J. Reproducibility of Free Energy Calculations Across Different Molecular Simulation Software Packages. J. Chem. Theory Comput. 2018, 14, 5567–5582. 10.1021/acs.jctc.8b00544. [DOI] [PubMed] [Google Scholar]
  17. Jespers W.; Esguerra M.; Åqvist J.; Gutiérrez-de-Terán H. QligFEP: An Automated Workflow for Small Molecule Free Energy Calculations in Q. J. Cheminf. 2019, 11, 26. 10.1186/s13321-019-0348-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gapsys V.; Pérez-Benito L.; Aldeghi M.; Seeliger D.; van Vlijmen H.; Tresadern G.; de Groot B. L. Large Scale Relative Protein Ligand Binding Affinities Using Non-Equilibrium Alchemy. Chem. Sci. 2020, 11, 1140–1152. 10.1039/C9SC03754C. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lee T.-S.; Allen B. K.; Giese T. J.; Guo Z.; Li P.; Lin C.; McGee T. D. J.; Pearlman D. A.; Radak B. K.; Tao Y.; Tsai H.-C.; Xu H.; Sherman W.; York D. M. Alchemical Binding Free Energy Calculations in AMBER20: Advances and Best Practices for Drug Discovery. J. Chem. Inf. Model. 2020, 60, 5595–5623. 10.1021/acs.jcim.0c00613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Heinzelmann G.; Gilson M. Automation of Absolute Protein-Ligand Binding Free Energy Calculations for Docking Refinement and Compound Evaluation. Sci. Rep. 2021, 11, 1116. 10.1038/s41598-020-80769-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Tielker N.; Eberlein L.; Beckstein O.; Güssregen S.; Iorga B. I.; Kast S. M.; Liu S.. Free Energy Methods in Drug Discovery: Current State and Future Directions, 2021; Chapter 3, pp 67–107. [Google Scholar]
  22. Ries B.; Normak K.; Weiß R. G.; Rieder S.; Barros E. P.; Champion C.; König G.; Riniker S. Relative Free-Energy Calculations for Scaffold Hopping-Type Transformations with an Automated RE-EDS Sampling Procedure. J. Comput. Aided Mol. Des. 2022, 36, 117–130. 10.1007/s10822-021-00436-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Borhani D. W.; Shaw D. E. The Future of Molecular Dynamics Simulations in Drug Discovery. J. Comput. Aided Mol. Des. 2012, 26, 15–26. 10.1007/s10822-011-9517-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Aldeghi M.; Heifetz A.; Bodkin M. J.; Knapp S.; Biggin P. C. Accurate Calculation of the Absolute Free Energy of Binding for Drug Molecules. Chem. Sci. 2016, 7, 207–218. 10.1039/C5SC02678D. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Alibay I.; Magarkar A.; Seeliger D.; Biggin P. C. Evaluating the Use of Absolute Binding Free Energy in the Fragment Optimisation Process. Commun. Chem. 2022, 5, 105. 10.1038/s42004-022-00721-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Baumann H. M.; Gapsys V.; de Groot B. L.; Mobley D. L. Challenges Encountered Applying Equilibrium and Nonequilibrium Binding Free Energy Calculations. J. Phys. Chem. B 2021, 125, 4241–4261. 10.1021/acs.jpcb.0c10263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Cabeza de Vaca I.; Zarzuela R.; Tirado-Rives J.; Jorgensen W. L. Robust Free Energy Perturbation Protocols for Creating Molecules in Solution. J. Chem. Theory Comput. 2019, 15, 3941–3948. 10.1021/acs.jctc.9b00213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Ries B.; Rieder S.; Rhiner C.; Hünenberger P. H.; Riniker S. R.M. RestraintMaker: a graph-based approach to select distance restraints in free-energy calculations with dual topology. J. Comput. Aided Mol. Des. 2022, 36, 175–192. 10.1007/s10822-022-00445-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Jorgensen W. L.; Ravimohan C. Monte Carlo Simulation of Differences in Free Energies of Hydration. J. Chem. Phys. 1985, 83, 3050–3054. 10.1063/1.449208. [DOI] [Google Scholar]
  30. Pearlman D. A.; Kollman P. A. The Overlooked Bond-Stretching Contribution in Free Energy Perturbation Calculations. J. Chem. Phys. 1991, 94, 4532–4545. 10.1063/1.460608. [DOI] [Google Scholar]
  31. Pearlman D. A. A. A Comparison of Alternative Approaches to Free Energy Calculations. J. Phys. Chem. 1994, 98, 1487–1493. 10.1021/j100056a020. [DOI] [Google Scholar]
  32. Liu S.; Wu Y.; Lin T.; Abel R.; Redmann J. P.; Summa C. M.; Jaber V. R.; Lim N. M.; Mobley D. L. Lead Optimization Mapper: Automating Free Energy Calculations for Lead Optimization. J. Comput. Aided Mol. Des. 2013, 27, 755–770. 10.1007/s10822-013-9678-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Jiang W.; Chipot C.; Roux B. Computing Relative Binding Affinity of Ligands to Receptor: An Effective Hybrid Single-Dual-Topology Free-Energy Perturbation Approach in NAMD. J. Chem. Inf. Model. 2019, 59, 3794–3802. 10.1021/acs.jcim.9b00362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Gao J.; Kuczera K.; Tidor B.; Karplus M. Hidden Thermodynamics of Mutant Proteins: A Molecular Dynamics Analysis. Science 1989, 244, 1069–1072. 10.1126/science.2727695. [DOI] [PubMed] [Google Scholar]
  35. Mobley D. L.; Chodera J. D.; Dill K. A. On the Use of Orientational Restraints and Symmetry Corrections in Alchemical Free Energy Calculations. J. Chem. Phys. 2006, 125, 084902. 10.1063/1.2221683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Rocklin G. J.; Mobley D. L.; Dill K. A. Separated Topologies – A Method for Relative Binding Free Energy Calculations using Orientational Restraints. J. Chem. Phys. 2013, 138, 085104. 10.1063/1.4792251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Baumann H. M.; Dybeck E.; McClendon C. L.; Pickard F. C.; Gapsys V.; Pérez-Benito L.; Hahn D. F.; Tresadern G.; Mathiowetz A. M.; Mobley D. L.; et al. Broadening the Scope of Binding Free Energy Calculations using a Separated Topologies Approach. ChemRxiv 2023, 19, 5058–5076. 10.1021/acs.jctc.3c00282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Sidler D.; Schwaninger A.; Riniker S. Replica Exchange Enveloping Distribution Sampling (RE-EDS): A Robust Method to Estimate Multiple Free-Energy Differences From a Single Simulation. J. Chem. Phys. 2016, 145, 154114. 10.1063/1.4964781. [DOI] [PubMed] [Google Scholar]
  39. Rieder S. R.; Ries B.; Schaller K.; Champion C.; Barros E. P.; Hünenberger P. H.; Riniker S. Replica-Exchange Enveloping Distribution Sampling Using Generalized AMBER Force-Field Topologies: Application to Relative Hydration Free-Energy Calculations for Large Sets of Molecules. J. Chem. Inf. Model. 2022, 62, 3043–3056. 10.1021/acs.jcim.2c00383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Rieder S. R.; Ries B.; Kubincová A.; Champion C.; Barros E. P.; Hünenberger P. H.; Riniker S. Leveraging the Sampling Efficiency of RE-EDS in OpenMM Using a Shifted Reaction-Field with an Atom-Based Cutoff. J. Chem. Phys. 2022, 157, 104117. 10.1063/5.0107935. [DOI] [PubMed] [Google Scholar]
  41. Fleck M.; Wieder M.; Boresch S. Dummy Atoms in Alchemical Free Energy Calculations. J. Chem. Theory Comput. 2021, 17, 4403–4419. 10.1021/acs.jctc.0c01328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Donnini S.; Tegeler F.; Groenhof G.; Grubmüller H. Constant pH Molecular Dynamics in Explicit Solvent with λ-Dynamics. J. Chem. Theory Comput. 2011, 7, 1962–1978. 10.1021/ct200061r. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Shobana S.; Roux B.; Andersen O. S. Free Energy Simulations: Thermodynamic Reversibility and Variability. J. Phys. Chem. B 2000, 104, 5179–5190. 10.1021/jp994193s. [DOI] [Google Scholar]
  44. Petrov D. Perturbation Free-Energy Toolkit: An Automated Alchemical Topology Builder. J. Chem. Inf. Model. 2021, 61, 4382–4390. 10.1021/acs.jcim.1c00428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Eriksson M. A.; Nilsson L. Structure, Thermodynamics and Cooperativity of the Glucocorticoid Receptor DNA-binding Domain in Complex with Different Response Elements. Molecular Dynamics Simulation and Free Energy Perturbation Studies. J. Mol. Biol. 1995, 253, 453–472. 10.1006/jmbi.1995.0566. [DOI] [PubMed] [Google Scholar]
  46. Gapsys V.; Michielssens S.; Seeliger D.; de Groot B. L. pmx: Automated Protein Structure and Topology Generation for Alchemical Perturbations. J. Comput. Chem. 2015, 36, 348–354. 10.1002/jcc.23804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Seeliger D.; de Groot B. L. Protein Thermostability Calculations Using Alchemical Free Energy Simulations. Biophys. J. 2010, 98, 2309–2316. 10.1016/j.bpj.2010.01.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Zhang I.; Rufa D. A.; Pulido I.; Henry M. M.; Rosen L. E.; Hauser K.; Singh S.; Chodera J. D. Identifying and Overcoming the Sampling Challenges in Relative Binding Free Energy Calculations of a Model Protein:Protein Complex. J. Chem. Theory Comput. 2023, 19, 4863–4882. 10.1021/acs.jctc.3c00333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Gapsys V.; Michielssens S.; Peters J. H.; Groot B. L. d.; Leonov H. Calculation of Binding Free Energies. Methods Mol. Biol. 2015, 1215, 173–209. 10.1007/978-1-4939-1465-4_9. [DOI] [PubMed] [Google Scholar]
  50. Kawabata T.; Nakamura H. 3D Flexible Alignment Using 2D Maximum Common Substructure: Dependence of Prediction Accuracy on Target-Reference Chemical Similarity. J. Chem. Inf. Model. 2014, 54, 1850–1863. 10.1021/ci500006d. [DOI] [PubMed] [Google Scholar]
  51. Loeffler H. H.; Michel J.; Woods C. FESetup: Automating Setup for Alchemical Free Energy Simulations. J. Chem. Inf. Model. 2015, 55, 2485–2490. 10.1021/acs.jcim.5b00368. [DOI] [PubMed] [Google Scholar]
  52. Suruzhon M.; Senapathi T.; Bodnarchuk M. S.; Viner R.; Wall I. D.; Barnett C. B.; Naidoo K. J.; Essex J. W. ProtoCaller: Robust Automation of Binding Free Energy Calculations. J. Chem. Inf. Model. 2020, 60, 1917–1921. 10.1021/acs.jcim.9b01158. [DOI] [PubMed] [Google Scholar]
  53. Petrov D. Perturbation Free-Energy Toolkit: An Automated Alchemical Topology Builder. J. Chem. Inf. Model. 2021, 61, 4382–4390. 10.1021/acs.jcim.1c00428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Raymond J. W.; Willett P. Maximum Common Subgraph Isomorphism Algorithms for the Matching of Chemical Structures. J. Comput. Aided Mol. Des. 2002, 16, 521–533. 10.1023/A:1021271615909. [DOI] [PubMed] [Google Scholar]
  55. Dalke A.; Hastings J. FMCS A Novel Algorithm for the Multiple MCS Problem. J. Cheminf. 2013, 5, O6. 10.1186/1758-2946-5-s1-o6. [DOI] [Google Scholar]
  56. Mey A. S. J. S.; Allen B. K.; Bruce Macdonald H. E.; Chodera J. D.; Hahn D. F.; Kuhn M.; Michel J.; Mobley D. L.; Naden L. N.; Prasad S.; Rizzi A.; Scheen J.; Shirts M. R.; Tresadern G.; Xu H. Best Practices for Alchemical Free Energy Calculations [article v1.0]. LiveCoMS 2020, 2, 18378. 10.33011/livecoms.2.1.18378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Duesbury E.; Holliday J.; Willett P. Comparison of Maximum Common Subgraph Isomorphism Algorithms for the Alignment of 2D Chemical Structures. ChemMedChem 2018, 13, 588–598. 10.1002/cmdc.201700482. [DOI] [PubMed] [Google Scholar]
  58. Jonker R.; Volgenant T.. A Shortest Augmenting Path Algorithm for Dense and Sparse Linear Assignment Problems. DGOR/NSOR: Papers of the 16th Annual Meeting of DGOR in Cooperation with NSOR/Vorträge der 16; Jahrestagung der DGOR zusammen mit der NSOR, 1988; pp 622.
  59. Virtanen P.; Gommers R.; Oliphant T. E.; Haberland M.; Reddy T.; Cournapeau D.; Burovski E.; Peterson P.; Weckesser W.; Bright J.; van der Walt S. J.; Brett M.; Wilson J.; Millman K. J.; Mayorov N.; Nelson A. R. J.; Jones E.; Kern R.; Larson E.; Carey C. J.; Polat I. ˙.; Feng Y.; Moore E. W.; VanderPlas J.; Laxalde D.; Perktold J.; Cimrman R.; Henriksen I.; Quintero E. A.; Harris C. R.; Archibald A. M.; Ribeiro A. H.; Pedregosa F.; van Mulbregt P.; et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 2020, 17, 261–272. 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Python Package Index-PyPI. https://pypi.org/ (accessed January 31, 2024).
  61. conda-forge community . The Conda-Forge Project: Community-Based Software Distribution Built on the Conda Package Format and Ecosystem, 2015.
  62. König G.; Glaser N.; Schroeder B.; Kubincová A.; Hünenberger P. H.; Riniker S. An Alternative to Conventional λ-Intermediate States in Alchemical Free Energy Calculations: λ-Enveloping Distribution Sampling. J. Chem. Inf. Model. 2020, 60, 5407–5423. 10.1021/acs.jcim.0c00520. [DOI] [PubMed] [Google Scholar]
  63. Beutler T. C.; Mark A. E.; van Schaik R. C.; Gerber P. R.; Van Gunsteren W. F. Avoiding Singularities and Numerical Instabilities in Free Energy Calculations Based on Molecular Simulations. Chem. Phys. Lett. 1994, 222, 529–539. 10.1016/0009-2614(94)00397-1. [DOI] [Google Scholar]
  64. Gapsys V.; Seeliger D.; de Groot B. L. New Soft-Core Potential Function for Molecular Dynamics Based Alchemical Free Energy Calculations. J. Chem. Theory Comput. 2012, 8, 2373–2382. 10.1021/ct300220p. [DOI] [PubMed] [Google Scholar]
  65. Liu S.; Wang L.; Mobley D. L. Is Ring Breaking Feasible in Relative Binding Free Energy Calculations?. J. Chem. Inf. Model. 2015, 55, 727–735. 10.1021/acs.jcim.5b00057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Kuhn H. W. The Hungarian Method for the Assignment Problem. Nav. Res. Logist. 1955, 2, 83–97. 10.1002/nav.3800020109. [DOI] [Google Scholar]
  67. Munkres J. Algorithms for the Assignment and Transportation Problems. J. Soc. Ind. Appl. Math. 1957, 5, 32–38. 10.1137/0105003. [DOI] [Google Scholar]
  68. Crouse D. F. On Implementing 2D Rectangular Assignment Algorithms. IEEE Trans. Aerosp. Electron. Syst. 2016, 52, 1679–1696. 10.1109/TAES.2016.140952. [DOI] [Google Scholar]
  69. Wang L.; Wu Y.; Deng Y.; Kim B.; Pierce L.; Krilov G.; Lupyan D.; Robinson S.; Dahlgren M. K.; Greenwood J.; Romero D. L.; Masse C.; Knight J. L.; Steinbrecher T.; Beuming T.; Damm W.; Harder E.; Sherman W.; Brewer M.; Wester R.; Murcko M.; Frye L.; Farid R.; Lin T.; Mobley D. L.; Jorgensen W. L.; Berne B. J.; Friesner R. A.; Abel R. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. J. Am. Chem. Soc. 2015, 137, 2695–2703. 10.1021/ja512751q. [DOI] [PubMed] [Google Scholar]
  70. Scheen J.; Wu W.; Mey A. S. J. S.; Tosco P.; Mackey M.; Michel J. Hybrid Alchemical Free Energy/Machine-Learning Methodology for the Computation of Hydration Free Energies. J. Chem. Inf. Model. 2020, 60, 5331–5339. 10.1021/acs.jcim.0c00600. [DOI] [PubMed] [Google Scholar]
  71. Pitman M.; Hahn D. F.; Tresadern G.; Mobley D. L. To Design Scalable Free Energy Perturbation Networks, Optimal Is Not Enough. J. Chem. Inf. Model. 2023, 63, 1776–1793. 10.1021/acs.jcim.2c01579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Landrum G.; Tosco P.; Kelley B.; Riniker S.; Ric; gedeck; Vianello R.; Schneider N.; Dalke A.; N D.; Eisuke K.; Cole B.; Turk S.; Swain M.; Alexander S.; Cosgrove D.; Vaucher A.; Wójcikowski M.; Jones G.; Probst D.; Godin G.; Scalfani V. F.; Pahl A.; Francois B.; Sforna G.; Jensen J. H.. Rdkit/Rdkit: 2021_03_2 (Q1 2021) Release, 2023.
  73. van Rossum G.; Drake F. L.. Python 3 Reference Manual, 2009.
  74. van Rossum G.Python Tutorial; Centrum voor Wiskunde en Informatica; CWI: Amsterdam, 1995.
  75. Gowers R. J.; Alibay I.; Swenson D. W.; Henry M. M.; Ries B.; Baumann H. M.; Eastwood J. R. B.. The Open Free Energy Library, 2023. [DOI] [PMC free article] [PubMed]
  76. Harris C. R.; Millman K. J.; van der Walt S. J.; Gommers R.; Virtanen P.; Cournapeau D.; Wieser E.; Taylor J.; Berg S.; Smith N. J.; Kern R.; Picus M.; Hoyer S.; van Kerkwijk M. H.; Brett M.; Haldane A.; del Río J. F.; Wiebe M.; Peterson P.; Gérard-Marchant P.; Sheppard K.; Reddy T.; Weckesser W.; Abbasi H.; Gohlke C.; Oliphant T. E. Array Programming with NumPy. Nature 2020, 585, 357–362. 10.1038/s41586-020-2649-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Barber C. B.; Dobkin D. P.; Huhdanpaa H. The Quickhull Algorithm for Convex Hulls. ACM Trans. Math Software 1996, 22, 469–483. 10.1145/235815.235821. [DOI] [Google Scholar]
  78. Rufa D. A.; Zhang I.; Bruce Macdonald H. E.; Grinaway P. B.; Pulido I.; Henry M. M.; Rodríguez-Guerra J.; Wittmann M.; Albanese S. K.; Glass W. G.; Silveira A.; Schaller D.; Naden L. N.; Chodera J. D. Perses. Zenodo 2023, 10.5281/zenodo.8350218. [DOI] [Google Scholar]
  79. Eastman P.; Swails J.; Chodera J. D.; McGibbon R. T.; Zhao Y.; Beauchamp K. A.; Wang L.-P.; Simmonett A. C.; Harrigan M. P.; Stern C. D.; Wiewiora R. P.; Brooks B. R.; Pande V. S. OpenMM 7: Rapid Development of High Performance Algorithms for Molecular Dynamics. PLoS Comput. Biol. 2017, 13, e1005659 10.1371/journal.pcbi.1005659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Chodera J.; Rizzi A.; Naden L.; Beauchamp K.; Grinaway P.; Fass J.; Pulido I.; Wade A.; Henry M.; Ross G. A.; Krämer A.; Macdonald H. B.; Rustenburg B.; Swenson D. W.; Zhang I.; Simmonett A.; Williamson M. J.; Fennick J.; Roet S.; Silveira A.; Rufa D. choderalab/openmmtools: 0.23.0. Zenodo 2023, 10.5281/zenodo.8030019. [DOI] [Google Scholar]
  81. Wagner J.; Thompson M.; Mobley D. L.; Chodera J.; Bannan C.; Rizzi A.; trevorgokey; Dotson D. L.; Mitchell J. A.; jaimergp; Camila; Behara P.; Bayly C.; JoshHorton; Wang L.; Pulido I.; Lim V.; Sasmal S.; SimonBoothroyd; Dalke A.; Smith D.; Horton J.; Wang L.-P.; Gowers R.; Zhao Z.; connordavel; Zhao Y.. openforcefield/openff-toolkit: 0.14.4 minor feature release. GitHub, 2023.
  82. Jorgensen W. L.; Chandrasekhar J.; Madura J. D.; Impey R. W.; Klein M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926–935. 10.1063/1.445869. [DOI] [Google Scholar]
  83. Boothroyd S.; Behara P. K.; Madin O.; Hahn D.; Jang H.; Gapsys V.; Wagner J.; Horton J.; Dotson D.; Thompson M.; Maat J.; Gokey T.; Wang L.-P.; Cole D.; Gilson M.; Chodera J.; Bayly C.; Shirts M.; Mobley D. Development and Benchmarking of Open Force Field 2.0.0 the Sage Small Molecule Force Field. ChemRxiv 2023, 19, 3251–3275. 10.1021/acs.jctc.3c00039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Maier J. A.; Martinez C.; Kasavajhala K.; Wickstrom L.; Hauser K. E.; Simmerling C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput. 2015, 11, 3696–3713. 10.1021/acs.jctc.5b00255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Feenstra K. A.; Hess B.; Berendsen H. J. Improving Efficiency of Large Time-Scale Molecular Dynamics Simulations of Hydrogen-Rich Systems. J. Comput. Chem. 1999, 20, 786–798. . [DOI] [PubMed] [Google Scholar]
  86. Hopkins C. W.; Le Grand S.; Walker R. C.; Roitberg A. E. Long-Time-Step Molecular Dynamics through Hydrogen Mass Repartitioning. J. Chem. Theory Comput. 2015, 11, 1864–1874. 10.1021/ct5010406. [DOI] [PubMed] [Google Scholar]
  87. Essmann U.; Perera L.; Berkowitz M. L.; Darden T.; Lee H.; Pedersen L. G. A Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995, 103, 8577–8593. 10.1063/1.470117. [DOI] [Google Scholar]
  88. Ryckaert J.-P.; Ciccotti G.; Berendsen H. J. Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of N-Alkanes. J. Comput. Phys. 1977, 23, 327–341. 10.1016/0021-9991(77)90098-5. [DOI] [Google Scholar]
  89. Miyamoto S.; Kollman P. A. Settle: An Analytical Version of the SHAKE and RATTLE Algorithm for Rigid Water Models. J. Comput. Chem. 1992, 13, 952–962. 10.1002/jcc.540130805. [DOI] [Google Scholar]
  90. Liu D. C.; Nocedal J. On the limited memory BFGS method for large scale optimization. Math. Program. 1989, 45, 503–528. 10.1007/bf01589116. [DOI] [Google Scholar]
  91. Kirkwood J. G. Statistical Mechanics of Fluid Mixtures. J. Chem. Phys. 1935, 3, 300–313. 10.1063/1.1749657. [DOI] [Google Scholar]
  92. Pham T. T.; Shirts M. R. Identifying Low Variance Pathways for Free Energy Calculations of Molecular Transformations in Solution Phase. J. Chem. Phys. 2011, 135, 034114. 10.1063/1.3607597. [DOI] [PubMed] [Google Scholar]
  93. Zhang Z.; Liu X.; Yan K.; Tuckerman M. E.; Liu J. Unified Efficient Thermostat Scheme for the Canonical Ensemble with Holonomic or Isokinetic Constraints via Molecular Dynamics. J. Phys. Chem. A 2019, 123, 6056–6079. 10.1021/acs.jpca.9b02771. [DOI] [PubMed] [Google Scholar]
  94. Chow K.-H.; Ferguson D. M. Isothermal-Isobaric Molecular Dynamics Simulations with Monte Carlo Volume Sampling. Comput. Phys. Commun. 1995, 91, 283–289. 10.1016/0010-4655(95)00059-O. [DOI] [Google Scholar]
  95. Åqvist J.; Wennerström P.; Nervall M.; Bjelic S.; Brandsdal B. O. Molecular Dynamics Simulations of Water and Biomolecules with a Monte Carlo Constant Pressure Algorithm. Chem. Phys. Lett. 2004, 384, 288–294. 10.1016/j.cplett.2003.12.039. [DOI] [Google Scholar]
  96. Hansmann U. H. Parallel Tempering Algorithm for Conformational Studies of Biological Molecules. Chem. Phys. Lett. 1997, 281, 140–150. 10.1016/S0009-2614(97)01198-6. [DOI] [Google Scholar]
  97. Sugita Y.; Kitao A.; Okamoto Y. Multidimensional Replica-Exchange Method for Free-Energy Calculations. J. Chem. Phys. 2000, 113, 6042–6051. 10.1063/1.1308516. [DOI] [Google Scholar]
  98. Hastings W. K. Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 1970, 57, 97–109. 10.1093/biomet/57.1.97. [DOI] [Google Scholar]
  99. Chodera J. D.; Shirts M. R. Replica Exchange and Expanded Ensemble Simulations as Gibbs Sampling: Simple Improvements for Enhanced Mixing. J. Chem. Phys. 2011, 135, 194110. 10.1063/1.3660669. [DOI] [PubMed] [Google Scholar]
  100. Shirts M. R.; Chodera J. D. Statistically Optimal Analysis of Samples from Multiple Equilibrium States. J. Chem. Phys. 2008, 129, 124105. 10.1063/1.2978177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Shirts M.; Beauchamp K.; Naden L.; Chodera J.; Rodríguez-Guerra J.; Martiniani S.; Stern C.; Henry M.; Fass J.; Gowers R.; McGibbon R. T.; Dice B.; Jones C.; Dotson D.; Burgin T. choderalab/pymbar: 3.1.1. Zenodo 2022, 10.5281/zenodo.7383197. [DOI] [Google Scholar]
  102. Macdonald H. B.; Henry M.; Chodera J.; Dotson D.; Glass W.; Pulido I. openforcefield/openff-arsenic: v0.2.1. Zenodo 2022, 10.5281/zenodo.6210305. [DOI] [Google Scholar]
  103. Stroet M.; Caron B.; Visscher K. M.; Geerke D. P.; Malde A. K.; Mark A. E. Automated Topology Builder Version 3.0: Prediction of Solvation Free Enthalpies in Water and Hexane. J. Chem. Theory Comput. 2018, 14, 5834–5845. 10.1021/acs.jctc.8b00768. [DOI] [PubMed] [Google Scholar]
  104. Sidler D.; Schwaninger A.; Riniker S. Replica Exchange Enveloping Distribution Sampling (RE-EDS): A Robust Method to Estimate Multiple Free-Energy Differences From a Single Simulation. J. Chem. Phys. 2016, 145, 154114. 10.1063/1.4964781. [DOI] [PubMed] [Google Scholar]
  105. Hahn D. F.; Wagner J. openforcefield/protein-ligand-benchmark: 0.1.2. Zenodo 2021, 10.5281/zenodo.6600875. [DOI] [Google Scholar]
  106. Liang J.; van Abbema A.; Balazs M.; Barrett K.; Berezhkovsky L.; Blair W.; Chang C.; Delarosa D.; DeVoss J.; Driscoll J.; Eigenbrot C.; Ghilardi N.; Gibbons P.; Halladay J.; Johnson A.; Kohli P. B.; Lai Y.; Liu Y.; Lyssikatos J.; Mantik P.; Menghrajani K.; Murray J.; Peng I.; Sambrone A.; Shia S.; Shin Y.; Smith J.; Sohn S.; Tsui V.; Ultsch M.; Wu L. C.; Xiao Y.; Yang W.; Young J.; Zhang B.; Zhu B.-y.; Magnuson S. Lead Optimization of a 4-Aminopyridine Benzamide Scaffold To Identify Potent, Selective, and Orally Bioavailable TYK2 Inhibitors. J. Med. Chem. 2013, 56, 4521–4536. 10.1021/jm400266t. [DOI] [PubMed] [Google Scholar]
  107. Liang J.; Tsui V.; Van Abbema A.; Bao L.; Barrett K.; Beresini M.; Berezhkovskiy L.; Blair W. S.; Chang C.; Driscoll J.; Eigenbrot C.; Ghilardi N.; Gibbons P.; Halladay J.; Johnson A.; Kohli P. B.; Lai Y.; Liimatta M.; Mantik P.; Menghrajani K.; Murray J.; Sambrone A.; Xiao Y.; Shia S.; Shin Y.; Smith J.; Sohn S.; Stanley M.; Ultsch M.; Zhang B.; Wu L. C.; Magnuson S. Lead Identification of Novel and Selective TYK2 Inhibitors. Eur. J. Med. Chem. 2013, 67, 175–187. 10.1016/j.ejmech.2013.03.070. [DOI] [PubMed] [Google Scholar]
  108. Schindler C. E. M.; Baumann H.; Blum A.; Böse D.; Buchstaller H.-P.; Burgdorf L.; Cappel D.; Chekler E.; Czodrowski P.; Dorsch D.; Eguida M. K. I.; Follows B.; Fuchß T.; Grädler U.; Gunera J.; Johnson T.; Jorand Lebrun C.; Karra S.; Klein M.; Knehans T.; Koetzner L.; Krier M.; Leiendecker M.; Leuthner B.; Li L.; Mochalkin I.; Musil D.; Neagu C.; Rippmann F.; Schiemann K.; Schulz R.; Steinbrecher T.; Tanzer E.-M.; Unzue Lopez A.; Viacava Follis A.; Wegener A.; Kuhn D. Large-Scale Assessment of Binding Free Energy Calculations in Active Drug Discovery Projects. J. Chem. Inf. Model. 2020, 60, 5457–5474. 10.1021/acs.jcim.0c00900. [DOI] [PubMed] [Google Scholar]
  109. Wallace E. M.; Rizzi J. P.; Han G.; Wehn P. M.; Cao Z.; Du X.; Cheng T.; Czerwinski R. M.; Dixon D. D.; Goggin B. S.; Grina J. A.; Halfmann M. M.; Maddie M. A.; Olive S. R.; Schlachter S. T.; Tan H.; Wang B.; Wang K.; Xie S.; Xu R.; Yang H.; Josey J. A. A Small-Molecule Antagonist of HIF2α Is Efficacious in Preclinical Models of Renal Cell Carcinoma. Cancer Res. 2016, 76, 5491–5500. 10.1158/0008-5472.CAN-16-0473. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ct3c01206_si_001.pdf (50.4MB, pdf)

Data Availability Statement

The code of Kartograf is available at https://github.com/OpenFreeEnergy/kartograf. For the relative free energy simulations, we used OpenFE, which is available at https://github.com/OpenFreeEnergy/openfe. The RHFE, TYK2, and HIF2A System start coordinates can be accessed at https://github.com/OpenFreeEnergy/openfe-benchmarks.


Articles from Journal of Chemical Theory and Computation are provided here courtesy of American Chemical Society

RESOURCES