Abstract
Rational structure-based drug design relies on accurate predictions of protein–ligand binding affinity from structural molecular information. Although deep learning-based methods for predicting binding affinity have shown promise in computational drug design, certain approaches have faced criticism for their potential to inadequately capture the fundamental physical interactions between ligands and their macromolecular targets or for being susceptible to dataset biases. Herein, we propose to include bond-critical points based on the electron density of a protein–ligand complex as a fundamental physical representation of protein–ligand interactions. Employing a geometric deep learning model, we explore the usefulness of these bond-critical points to predict absolute binding affinities of protein–ligand complexes, benchmark model performance against existing methods, and provide a critical analysis of this new approach. The models achieved root-mean-squared errors of 1.4–1.8 log units on the PDBbind dataset, and 1.0–1.7 log units on the PDE10A dataset, not indicating significant advantages over benchmark methods, and thus rendering the utility of electron density for deep learning models context-dependent. The relationship between intermolecular electron density and corresponding binding affinity was analyzed, and Pearson correlation coefficients r > 0.7 were obtained for several macromolecular targets.
A deep learning approach centered on electron density is suggested for predicting the binding affility between proteins and ligands. The approach is thoroughly assessed using various pertinent benchmarks.
Introduction
A key requirement for most drug candidates is a high binding affinity of the investigated ligand to its desired biological target.1 However, the experimental determination of binding affinity can be time- and resource-intensive, often requiring synthesis of the ligand and a suitable experimental assay. Aiming to limit the number of laboratory experiments and enable more rapid ligand design, several computational approaches have been developed for in silico binding affinity prediction.2 Structure-based deep learning has received particular attention for this specific task,3–5 as well as for binding site identification,6,7 molecular docking,8,9 and de novo molecular design.10
Although different deep learning-based techniques for predicting binding affinity have achieved success in computational drug design studies,11,12 certain deep learning models have been criticized for specific challenges regarding their generalization abilities and their potential to adequately capture the fundamental physical principles governing intermolecular interactions.2,13–15 Much simpler methods such as nearest-neighbor analysis or prediction of random values have been shown to achieve a similar predictive performance.16 Furthermore, existing datasets such as PDBbind17,18 commonly used for model development have been criticized as biased because similar prediction accuracies were achieved irrespective of whether the whole protein–ligand complex, only the protein, or only the ligand was considered.2,19,20 This potential bias in dataset composition makes it challenging to meaningfully assess and compare model performance.2,16,19,20 As a potential remedy to the challenge of learning underlying dataset biases rather than meaningfully capturing physical interactions, Rognan and coworkers suggested to consider “only noncovalent interactions while omitting their protein and ligand atomic environments”.2
In addition to geometric deep learning approaches,5,23–26 also simulation-based approaches (e.g., free energy perturbation,27,28 MM/PBSA29–31) are frequently used for binding affinity prediction. While simulation-based methods are, by design, rooted in physical principles, deep-learning methods often incur lower computational cost for predictions and can leverage pre-existing data. Herein, we combine a commonly used simulation technique, the Quantum Theory of Atoms in Molecules (QTAIM),32 with geometric deep learning neural networks.
The QTAIM analyzes the topology of the electron density surrounding a molecular geometry.32 The electron density is highest at the nuclei, decreases as one moves away from a nucleus, and eventually increases again as another nucleus is approached (Fig. 1). The QTAIM partitions the electron density in three-dimensional (3D) space into atomic basins, the boundaries between which are surfaces of zero flux of the gradient vector of the electron density. A bond path is formed between two interacting nuclei, following the line along which the electron density is maximal relative to its local surroundings. The point along this line with the lowest electron density (i.e., the point at which the bond path crosses the zero-flux interatomic surface) is termed a “bond-critical point” (BCP).32,34 A range of quantum-mechanical (QM) properties (Table 1) can be evaluated at BCPs and have been connected to physically observable quantities.32,34,41–43
Quantum-mechanical properties at bond-critical points (BCPs) and their chemical interpretation.
Property | Formula | Interpretation | Ref. |
---|---|---|---|
Electron density | ρ | Bond order/strength of interaction | 33 and 34 |
Laplacian | ∇2ρ | Covalent character of interaction | 34 |
Electron localization function | ELF | Degree of electron localization | 35 and 36 |
Localized orbital locator | t σ | Degree of electron localization | 37 |
Reduced density gradient | RDG | Homogeneity of electron density & type of interactions | 38 |
sign (λ2)ρ | sign (λ2)ρ | Strength & type of interactionsa | 38 |
Gradient norm | |∇ρ| | Magnitude of the gradient | 39 |
Bond ellipticity | ε | Cylindrical symmetry and π-character of interaction | 34 |
ETA index | η | Type of interaction | 40 |
λ 2 denotes the second-largest eigenvalue of the electron density's Hessian matrix.
Herein, we investigate the hypothesis that an electron density-based, interaction-centric view of protein–ligand complexes may be exploited with geometric deep learning to predict binding affinities. Our objective is to evaluate whether this approach offers advantages in comparison to established methods for predicting binding affinity. Choosing a molecular representation directly rooted in the electron density is motivated by the following three considerations: First, the electron density surrounding a molecule is uniquely mapped to all observable properties of that molecule.21 Second, the electron density is a fundamental description of physical and chemical phenomena.32 By providing deep learning models with a molecular description that is more closely connected to the observed phenomenon itself than commonly used model concepts (such as atomic nuclei as points in space), we hypothesized that the experimentally measured binding affinity might potentially be more accurately predicted. Third, the electron density and other derived properties at intermolecular BCPs were successfully employed in previous quantitative structure–activity relationship (QSAR) studies.32,34,41–48While the idea of combining QM with machine learning (ML) is not new (e.g., ML to predict QM-calculated properties,49–54 or QM-calculated features being used as ML inputs,55–59) this work represents, to the best or our knowledge, the first combination of BCPs with 3D-aware neural networks. Previous studies on BCP-based QSAR models mainly relied on aggregated information or scalar descriptors of structural information.41–43,60,61 For example, a recent study showcased a strong correlation (r = 0.891) between the sum of the electron density at BCPs and experimental protein–ligand binding affinity for 34 D2-dopamine receptor inhibitors.43 The authors argued that this relationship depends on e.g., the number of rotatable bonds in the series of ligands, as a low number of rotatable bonds may indicate that entropic effects (which cannot be captured by the static picture of a QTAIM analysis) can be neglected or are very similar across the set of ligands.43
This current study offers two key contributions: First, we investigated whether specific quantum mechanical (QM) properties and their spatial distribution can be utilized to predict binding affinity. We employed 3D message-passing neural networks (MPNNs) for this analysis. An automated Python-based pipeline was created to prepare protein structures, perform QM calculations, and conduct electron density analysis for this research. We conducted a comprehensive evaluation of the suggested method and discovered that, for two extensive data sets, the selected representation did not appear to offer any discernible advantages over established methods in the specific cases we examined. Our intention in sharing these findings is to offer valuable insights for future research directions that concentrate on the utilization of electron density-based descriptors for predicting protein–ligand binding affinity. Second, we analyzed the correlation between the sum of the electron density at BPCs and the measured binding affinities across two large-scale datasets of protein–ligand binding complexes (i.e., PDBBind17,18 and PDE10A62).
Data compilation & processing
We performed our analysis using two datasets of protein–ligand complex structures: the commonly used PDBbind (version 2019) dataset17,18 (as prepared in ref. 2) and a recently released collection of PDE10A inhibitors62 originating from a former discovery project at Roche. In addition to consistently measured binding affinities, crystal structures, and expert-curated docking poses, the PDE10A dataset contributes dataset splits inspired by real-world drug discovery programs. These splitting strategies include temporal splits and splits according to binding mode, enabling a thorough investigation of a model's ability to extrapolate to unseen types of interactions.62 The PDE10A dataset may help address some of the shortcomings and biases previously identified in using the PDBbind dataset to assess model performance. These shortcomings include similar performance when using ligand-only or protein-only representations and a failure of PDBbind-trained models to meaningfully capture physical interactions.2 In keeping with previous work63 and to reduce computational cost, we removed protein residues in which all atoms are farther away than 6 Å from the ligand (see details in Section S1†).
QM & QTAIM calculations
While previous QTAIM studies of biomolecular systems have used density functional theory (DFT) to obtain electron densities for a few dozen structures,43,66 such an approach is challenging to apply to tens of thousands of protein–ligand complexes, each consisting of hundreds to thousands of atoms. Accordingly, the semiempirical method GFN2-xTB67–70 (version 6.4.1) was used for QM calculations using Analytical Linearized Poisson–Boltzmann (ALPB) implicit water solvent.71,72 GFN2-xTB has been successfully used in several biological applications,63,73 though the considerable speed-up with respect to DFT comes at the expense of lower accuracy.74 However, it has been observed that the electron density shows little sensitivity to the electronic structure method being used for computation and that key features are well preserved between different methods.38,75,76 In a preliminary study, we benchmarked GFN2-xTB against two DFT approaches, ωB97X-D/def2-QZVP77,78 and B3LYP-D3/6-31G*,79,80 calculated using Psi4 (ref. 81) (version 1.7). While we observed some discrepancies (Section S2†), the overall acceptable performance and dramatic speedup (5–6 or 3 orders of magnitude, respectively [Table S1†]) of GFN2-xTB versus ωB97X-D/def2-QZVP or B3LYP-D3/6-31G*, respectively, suggested GFN2-xTB as a suitable choice given the need for computational efficiency in such a large-scale investigation.
Following QM calculations, Multiwfn39 (version 3.8(dev), 03/2023) was used to find BCPs and compute their QM properties (Table 1) based on the wavefunction files obtained from the GFN2-xTB calculation. RDKit82 (version 2021.09.4) was used for general molecular processing tasks. At the end of this pipeline, 14 181 (out of 14 215) and 1162 (out of 1162) successfully processed graphs were obtained for PDBbind and PDE10A, respectively (Section S1†).
Molecular representation
While we primarily investigated the usefulness of BCP-based, interaction-centric graphs (Fig. 2), we additionally investigated a related graph setup based on nucleus-critical points (NCPs). Although this approach deviates slightly from the recommendation of Rognan and coworkers2 to use interaction-centric molecular representations, it appears as a natural alternative to BCP-based graphs for the use of MPNNs in the context of the QTAIM. Both graphs consist of nodes, edges, and node positional information and were constructed as follows.
BCP graphs
The nodes in the graph correspond to BCPs, and edges were added between any pair of nodes within 6 Å (similar to previous studies83,84). Initial node features v0i were the QM properties at the respective BCP (see details in Section S3†). Optionally, the identities of atoms connected by the BCP were added. Node-to-node distances dij described via a sinusoidal and cosinusoidal encoding (similar to other 3D-tasks85,86) were used as edge features eij.
NCP graphs
Nodes were placed at NCPs (atomic nuclei) that participate in interactions between the protein and ligand (as identified by the presence of intermolecular BCPs). Initial node features v0i for NCPs were obtained from their QM properties (Table 1) and (optionally) atomic identities. Intermolecular edges were added between interacting NCPs (one on the protein side and one on the ligand side) and featurized using Fourier-like encoded distances and QM properties of the corresponding BCP. Additional intramolecular edges were introduced between all NCPs of the ligand and featurized using their respective BCPs for covalent bonds (if present). See Section S3† for full details.
Extreme QM values of individual BCPs or NCPs rendered common input normalization strategies unsuitable, prompting the use of a custom scaling method (Section S3†).
Model architecture & training
3D-MPNNs based on the EGNN architecture,85,91 which had previously shown good performance in several other 3D-based prediction tasks,55,86 were used to operate on the BCP-/NCP-based graphs. Initial node features were based on QM properties at the BCPs (for BCP-based graphs) or at the NCPs (for NCP-based graphs), respectively, and optionally combined with atom types. QM properties were transformed using multi-layer perceptrons (MLPs). Atom types were embedded using MLPs. For BCP-based graphs with atom types, the atom embeddings of both connected atoms were summed to achieve permutation invariance to the neighbor ordering. The transformed QM properties and atom embeddings were concatenated (in cases where both were present) to obtain initial node features v0l. Node features vil in layer l were iteratively updated via the message-passing scheme
mij = ϕe(vil,vjl,eij) | 1 |
2 |
vil+1 = ϕh(vil,mi) | 3 |
where the non-linear transformations on edge and node features, ϕe and ϕh, were described with SiLU-activated MLPs.92 Edge messages mij were obtained from the features of a pair of connected nodes and their (Fourier-encoded) distance eij (eqn (1)), achieving E(3)-invariance (to translation, rotation, and inversion of the input). Incoming edges to one node were mean-aggregated (eqn (2)), and the node's features were updated based on the aggregated message mi and the previous node features vil (eqn (3)). After five message-passing steps, the node features from each step were concatenated and transformed again using a SiLU-activated MLP ϕf to obtain final node-level features Vi:
Vi = ϕf(concatl=0l=5(vil)). | 4 |
The final node-level features Vi were mean-pooled (achieving permutation invariance) as preliminary experiments did not indicate a benefit from using sum or multi-headed attention pooling (not shown). The predicted binding affinity was obtained using another MLP. In line with previous work,86 models were trained for 1000 epochs to minimize the mean squared error (MSE) loss using the Adam optimizer93 with an initial learning rate of 10−4, a learning rate decay factor of 0.7, and a patience of 20 epochs. Combinations of the hyperparameters batch size (∈ {16, 32, 64, 128}), kernel dimension (∈ {16, 32, 64, 128}), and MLP dimension (∈ {128, 256, 512}) were screened, and the configuration with the lowest root mean squared error (RMSE) on the validation set was used for testing. Models were built and trained using PyTorch94 (version 1.9.1), and PyTorch Geometric95 (version 2.0.3). Code for structure preparation, QM/QTAIM calculations, and model training is available at https://github.com/ETHmodlab/bcpaff (resp. https://doi.org/10.5281/zenodo.8097403).
Benchmarks
One recently published benchmark model was selected for each dataset. For the PDBbind dataset, we used the “merged protein, ligand and interaction graph”,2 a 3D-aware MPNN model that has shown strong performance on the PDBbind dataset and has undergone careful evaluation. This model uses pseudoatoms to represent protein–ligand interactions. For the PDE10A dataset, we used the 2D3D hybrid model which has previously performed well on this dataset.62 The 2D3D hybrid constitutes an ensemble model combining predictions from the RF-PLP88,89 (3D, protein–ligand structure-based) and AttentiveFP90 (2D, ligand-based) approaches.
Electron density-based models can predict binding affinity but show greater errors than benchmark models
Initial experiments suggested that using all the available QM properties (Table 1) as node features might be beneficial for model performance (see Section S4.4† for statistical analysis). Unless otherwise specified, all discussed models used the entire range of QM properties given in Table 1. Model performance is shown in Fig. 3 (Section S4†). These models were trained using only QM properties without access to atomic identities, providing the desired “interaction-centric view”. These electron density-based models (both BCP-/NCP-based graphs) achieved root-mean-squared errors (RMSEs) of 1.4–1.8 log units on the PDBbind dataset, and 1.0–1.7 log units on the PDE10A dataset, not outperforming the benchmark models. This finding suggests that the proposed electron density-based representation did not improve on existing methods for the prediction of absolute protein–ligand binding affinities. Particularly for the non-random splits of the PDE10A dataset, these deep-learning models even failed to outperform the mean absolute deviation (MAD) baseline model, which predicts the arithmetic mean of the training and validation sets for all compounds in the test set. This apparent failure to outperform the MAD highlights the more challenging extrapolation task in these cases. The errors on the random split of the PDE10A dataset were generally lower than those on the PDBbind dataset (∼1 log unit vs. ∼1.5–1.7 log units), which is potentially due to consistently measured binding affinities and the presence of only a single protein target across the entire dataset. However, also the MAD was lower for the PDE10A dataset than for the PDBbind dataset, indicating a higher bar for non-trivial performance. While the PDBbind-trained BCP/NCP models achieved Pearson correlation coefficients r in the range of 0.45–0.76, the PDE10A-trained models achieved r > 0.5 only for the random and the temporal 2013 (NCP model only) splits, while showing very poor or no correlation to experimental values for other dataset splits (Section S4†). For the BCP models trained on binding mode splits, the predicted values were contained within a small range of values around the mean (Fig. S6†), mirroring previous observations.2 While the NCP-based models showed significantly lower test set errors than the BCP-based models for the PDBbind dataset (Section S4.3† for statistical analysis), this trend was less pronounced or even reversed for the individual splits of the PDE10A dataset. Specifically, when considering the splits by binding mode, which necessitate more complex generalization across various interaction types, the incorporation of NCPs resulted in a rise in test set error. This suggests that this approach struggles with poor generalizability in such cases.
In order to more closely assess the predictive performance of the proposed representation, we additionally analyzed BCP/NCP-based models trained on the PDBbind dataset to make predictions for the CASF-2013 (ref. 96 and 97) and CASF-2016 (ref. 98) challenges, respectively. For model training, only structures that were not part of the test sets (CASF-2013 and CASF-2016) were incorporated, in adherence to an established splitting strategy.99 (see Section S1.3† for details). With Pearson correlation coefficients of 0.552 and 0.591, and RMSE values of ∼1.9 and ∼1.8 for the CASF-2013 and CASF-2016 benchmark sets respectively, the BCP-based models did not exhibit any advantages in comparison to other benchmark models when considering these metrics (see details in Section S4.1.3†). A somewhat different scenario emerged for the models based on NCP descriptors, as they ranked center-field among the benchmark methods in terms of Pearson correlation coefficients and RMSE (see details in Section S4.1.3†). The finding that the NCP-based models showed higher correlations and lower RMSEs than the BCP-based models is in line with the findings for the test and core sets of PDBbind. This outcome reinforces the notion that, for the PDBbind dataset, details about the atomic characteristics of interacting atoms offer advantages in conjunction with interaction-centric information provided by the BCPs. However, no such pattern was identified for the PDE10A dataset splits, underscoring the possibility of a dataset bias. Overall, no substantial advantages were identified concerning scoring performance when compared to the best benchmark model that was examined (GraphscoreDTA99).
Furthermore, we explored the approach of training the PDBbind-based models exclusively on the refined set and then making predictions for the test and core sets. In both BCP- and NCP-based models, there were moderate increases in prediction errors observed on the core set (RMSEs increasing from ∼1.8 to ∼2.1 log units, and from ∼1.4 to ∼1.6 log units, respectively [Section S4.1.2†]), suggesting that the larger amount of training data available in the PDBbind general set had a small positive impact on modeling performance. Similar trends were observed for the test set predictions (see details in Section S4.1.2†).
Based on these results, our initial hypothesis regarding the use of electron density-based graphs for binding affinity prediction via MPNNs was rejected. There are several potential explanations for this outcome, offering insights into why our deep models trained on electron density-based graphs did not provide more accurate binding affinity predictions than benchmark models.
Firstly, the (calculated) electron density was obtained by inputting atomic coordinates into the QM method of choice (i.e., GFN2-xTB). The output, in terms of BCP-/NCP-based graphs annotated with QM properties, was thus mapped 1 : 1 to the initial atomic coordinates. Accordingly, much of the information contained in the BCP-centric view was already implicitly contained in the atomic coordinates, lacking only the information contributed by the data GFN2-xTB was fitted to. Owing to this lack of additional information, a BCP-centric view may not exhibit an advantage over more traditional atom position-based views, and the results of this study do not indicate that this alternative molecular representation renders the prediction task more feasible.
Secondly, although a generally acceptable agreement with more accurate QM calculations was confirmed (Section S2†), the lower accuracy of GFN2-xTB (compared to e.g., DFT) may still contribute to the limited predictive accuracy of the deep-learning models. A similar effect was observed when using electron densities calculated with DFTB+100 as an alternative semiempirical method (Section S4.2.1† for details). Given the substantial computational cost associated with running higher-accuracy QM calculations for thousands of protein–ligand interactions (Table S1†), one might consider turning towards recently developed ML-based methods that predict the electron density at a fraction of the cost of first-principles methods.101–109 In addition to the epistemological problem associated with stacking multiple models on top of one another, making existing approaches compatible with commonly used BCP/NCP calculation software39,110,111 is not straight-forward.
Thirdly, when considering BCP-based representations (without access to ligand coordinates), evaluating the inherent strain in the ligand within a bound conformation may pose greater challenges than with alternative methods. Ligand strain, which refers to the energy penalty associated with a ligand potentially having to deviate from a more stable solution-phase conformation to fit into the binding pocket, can considerably impact the measured binding affinity.112,113 This hypothesis was tested by visualizing absolute per-structure model errors against ligand strain energies calculated for structures from the PDBbind dataset113 (Fig. S6 and S7†). Based on the lack of correlation (slope = −0.028 ± 0.010 log units kcal−1 mol and −0.008 ± 0.022 log units kcal−1 mol for the test and core sets, BCP-based models) between ligand strain and model error, we rejected the hypothesis that poorly capturing ligand strain is a key driver behind the unsatisfactory model performance.
Fourthly, “a more precise chemical description of the protein–ligand complex does not generally lead to a more accurate prediction of binding affinity”.114 Potential challenges associated with the use of a more precise description lie in the difficulty of extracting information from such a representation, and in additional modeling assumptions required to produce the representation, such as protonation states and the choice of method for electron density calculation.114
Finally, an inherent limitation of many current deep learning-based binding affinity prediction methods lies in their focus on fixed atom positions (respectively fixed BCP/NCP positions, as in the present study). This view fails to capture the relevance of entropic contributions and the dynamic process of protein–ligand binding.115–118 It has previously been argued that descriptors that remain relatively constant during the binding event (such as atomic identities) may alleviate the restrictions from this focus on the bound state to an extent.114 Accordingly, the strong dependence of BCP-based graphs on the underlying molecular structure42 may contribute to their inability to capture the dynamic binding process. This strong dependence of BCPs on the molecular structure may also amplify the effects of e.g., incorrectly assigned atom positions or protonation states.52 Other potentially contributing factors include the free energy of protein–ligand proton transfer or solvation effects.119
Using atom identities instead of quantum-mechanical properties achieves similar performance
To assess the impact of using atom identities in the initial node features (as is typically done in structure-based deep learning methods for binding affinity prediction3,4), we compared the test set errors of models trained using only QM properties to those of models trained using only atom types or a combination of both. This approach moves away from a purely interaction-focused view, as it directly includes information about the chemical composition of the protein and ligand, respectively. Fig. 4 shows the effect of varying node features on the performance of deep learning models. The effect of modifying the node features was limited, resulting in mostly overlapping 95% confidence intervals for RMSE estimates. For the “binding mode 1” and “binding mode 3” splits of the PDE10A dataset, the models trained on BCP-based graphs with QM features (Fig. 4A) performed significantly better (Section S4.3† for statistical analysis) than models trained using only atom types or a combination of QM features and atom types, with RMSE values differing by up to 0.6 log units. While one might hypothesize that using atom types instead of QM properties could be detrimental for BCP-based graphs (essentially placing pseudo-atoms at the BCP locations but not using their respective QM properties), this effect was not observed in other dataset splits. Because the binding mode splits require the model to make more challenging extrapolations than a random split, the increasing error when going from QM properties to atom types to a combination of both could be related to the models overfitting to this information. For NCP-based graphs (Fig. 4B), for which an augmentation with atom types appears more natural than for BCP-based graphs (as NCPs correspond to positions of atomic nuclei), minor improvements were observed for individual data splitting strategies (binding modes 2 and 3), though no decisive impact was measured.
Intermolecular electron density correlates with binding affinity for some, but not all, protein targets
To gain additional insights into the utility of electron density for binding affinity prediction, we turned to a simpler method, foregoing ML techniques. We pursued a previously used approach43,48 of assessing the correlation between the sum of electron density at intermolecular BCPs on the one hand, and the binding affinity on the other hand (Section S5†). Focusing this analysis on the PDBbind refined set and the PDE10A dataset was motivated by the goal of using high-quality structures and binding affinity measurements and ensuring comparability between individual binding affinity measurements (e.g., not comparing IC50 values to pKI or pKD values as present in the PDBbind general set). While a previous study43 used molecular dynamics-refined docking structures, we directly used crystal structures for the PDBbind dataset. We found no correlation (r = 0.006) or extremely limited correlation (r = 0.263) between the sum of electron density at intermolecular BCPs and binding affinity when looking at the PDE10A and PDBbind datasets, respectively (Fig. 5A and B), indicating that no such general trend exists when comparing these large sets of different protein–ligand interactions. This observation may be related to very different entropic contributions to protein–ligand binding across the datasets. When analyzing protein-specific correlations in the PDBbind dataset (Fig. 5C), large differences in correlations were observed for different proteins. In this more detailed view, individual proteins emerged for which good correlations (r > 0.7) were observed (Fig. 5D). The three top-scoring protein targets featured sets of ligands with both a wide range of affinity values (8 log units for β-secretase 1) and with narrow ranges (1.5 log units for T4 lysozyme, 3 log units for β-lactamase). These ranges were also reflected in the respective ranges of electron densities. An additional analysis using a very simple measure, such as the number of ligand atoms instead of the sum of electron density, is shown in Table S15.† For T4 lysozyme, the intermolecular electron density and number of intermolecular BCPs showed a substantially better correlation with binding affinity than the number of ligand atoms, though less pronounced trends were observed for the other protein targets. This result suggests that for these targets, the connection between intermolecular electron density and binding affinity is at least partly driven by ligand size.
Aiming to analyze to what degree the observed trends were affected by the accuracy of the semiempirical GFN2-xTB method, electron densities at the B3LYP-D3/6-31G* level of theory79,80 (analogous to a previous study43) were computed for three sets of ligands binding to different protein targets (Section S5†). These targets included a set of two high-correlation protein targets with medium-to-high affinity ranges (β-lactamase and β-secretase 1) and one low-correlation target (acetylcholine-binding protein) from the PDBbind refined set. This analysis revealed that the correlations identified using GFN2-xTB were also identified using B3LYP-D3/6-31G*, showing only minor deviations (Δr < 0.05).
To understand the impact of previously suggested43 structural factors that may contribute to the differing correlations between sets of ligands, we assessed the average number of atoms, number of rotatable bonds, solvent-accessible surface area (SASA), spatial dimensions, and mean pairwise ligand similarity for each set of ligands binding to one protein target (Fig. 5E). This analysis indicated a rather weak trend of smaller and more rigid ligands showing better correlations. Groups of ligands with higher mean pairwise similarity between them showed a slight tendency to have poorer correlation between electron density and binding affinity. While such weak trends were observed, none of the investigated properties fully rationalized the vastly different correlations observed between the electron density at intermolecular BCPs and the measured binding affinity, suggesting that more detailed studies on the applicability of simple correlation-based QTAIM approaches are warranted.
Conclusion
Herein, we explored the use of a QM-based, interaction-centric descriptor of protein–ligand complexes, focusing on the electron density as a fundamental physical description of molecular systems. To this end, a computational pipeline was introduced that enables the large-scale computation of QM properties and the extraction of BCPs for protein–ligand complexes. Training geometric deep learning models on BCP-based descriptors does not seem to alleviate the “frustration to predict binding affinities from protein–ligand structures with deep neural networks”.2 The lack of competitive predictive performance of electron density-based deep neural networks may be driven by insufficient accuracy of the chosen QM method, the already implicitly-contained information in atom center-based representations, and the inability to account for entropic contributions in protein–ligand binding.
The findings regarding the correlation of the electron density at intermolecular BCPs with the binding affinity may help focus future QTAIM analyses on protein targets for which informative results can be obtained. Certain groups of ligands that bind to the same target show better correlations than others. However, the specific structural features of ligands and protein pockets that drive the utility of electron density as a predictor of binding affinity remain currently unclear.
To complement the incomplete picture emerging from the static view provided by a QTAIM analysis, a molecular dynamics (MD) approach might be suitable. Such an approach could use the ensemble of BCPs sampled from an MD trajectory of a protein–ligand complex, aiming to capture a physically more meaningful representation of the binding event. Specifics of the graph-construction process (“through-time” edges or other aggregation strategies) and whether such a strategy potentially provides benefits over a corresponding atom-centered view remain as questions for future work. An opportunity lies in the exploration of predicted (rather than calculated) electron densities using deep models trained on ab initio calculations.
Author contributions
C. I.: conceptualization, data curation, formal analysis, investigation, methodology, software, validation, visualization, writing – original draft. K. A.: formal analysis, investigation, methodology, writing – review & editing. S. R.: formal analysis, investigation, methodology, supervision, writing – review & editing. G. S.: formal analysis, investigation, methodology, supervision, funding acquisition, project administration, writing – review & editing.
Conflicts of interest
G. S. declares a potential financial conflict of interest as co-founder of https://insili.com/ GmbH, Zurich, and in his role as a scientific consultant to the pharmaceutical industry.
Supplementary Material
Acknowledgments
We thank Dr Andreas Tosstorff for providing raw model predictions for the 2D3D hybrid model from ref. 62. This work was financially supported by the Swiss National Science Foundation (grant no. 205321_182176). C. I. acknowledges support from the Scholarship Fund of the Swiss Chemical Industry.
Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3ra08650j
Notes and references
- Steinbrecher T., in Free Energy Calculations in Drug Lead Optimization, John Wiley & Sons, Ltd, 2012, ch. 11, pp. 207–236 [Google Scholar]
- Volkov M. Turk J.-A. Drizard N. Martin N. Hoffmann B. Gaston-Mathé Y. Rognan D. J. Med. Chem. 2022;65:7946–7958. doi: 10.1021/acs.jmedchem.2c00487. [DOI] [PubMed] [Google Scholar]
- Li S., Zhou J., Xu T., Huang L., Wang F., Xiong H., Huang W., Dou D. and Xiong H., Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 975–985 [Google Scholar]
- Moon S. Zhung W. Yang S. Lim J. Kim W. Y. Chem. Sci. 2022;13:3661–3673. doi: 10.1039/d1sc06946b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Z. Zhong W. Lv Q. Dong T. Yu-Chian Chen C. J. Phys. Chem. Lett. 2023;14:2020–2033. doi: 10.1021/acs.jpclett.2c03906. [DOI] [PubMed] [Google Scholar]
- Jiménez J. Doerr S. Martínez-Rosell G. Rose A. S. De Fabritiis G. Bioinformatics. 2017;33:3036–3042. doi: 10.1093/bioinformatics/btx350. [DOI] [PubMed] [Google Scholar]
- Möller L. Guerci L. Isert C. Atz K. Schneider G. Mol. Inf. 2022;41:2200059. doi: 10.1002/minf.202200059. [DOI] [PubMed] [Google Scholar]
- Corso G., Stärk H., Jing B., Barzilay R. and Jaakkola T., arXiv, 2022, preprint, arXiv:2210.01776, 10.48550/arXiv.2210.01776 [DOI]
- Ketata M. A., Laue C., Mammadov R., Stärk H., Wu M., Corso G., Marquet C., Barzilay R. and Jaakkola T. S., arXiv, 2023, preprint, arXiv:2304.03889, 10.48550/arXiv.2304.03889 [DOI]
- Atz K., Muñoz L. C., Isert C., Håkansson M., Focht D., Nippa D. F., Hilleke M., Iff M., Ledergerber J., Schiebroek C. C., Hiss J. A., Merk D., Schneider P., Kuhn B., Grether U. and Schneider G., ChemRxiv, 2023, preprint, 10.26434/chemrxiv-2023-cbq9k [DOI] [Google Scholar]
- Shen C. Ding J. Wang Z. Cao D. Ding X. Hou T. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2020;10:e1429. [Google Scholar]
- Singh R. Sledzieski S. Bryson B. Cowen L. Berger B. Proc. Natl. Acad. Sci. U. S. A. 2023;120:e2220778120. doi: 10.1073/pnas.2220778120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sieg J. Flachsenberg F. Rarey M. J. Chem. Inf. Model. 2019;59:947–961. doi: 10.1021/acs.jcim.8b00712. [DOI] [PubMed] [Google Scholar]
- Chen L. Cruz A. Ramsey S. Dickson C. J. Duca J. S. Hornak V. Koes D. R. Kurtzman T. PLoS One. 2019;14:e0220113. doi: 10.1371/journal.pone.0220113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scantlebury J. Brown N. Von Delft F. Deane C. M. J. Chem. Inf. Model. 2020;60:3722–3730. doi: 10.1021/acs.jcim.0c00263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Janela T. Bajorath J. Nat. Mach. Intell. 2022;4:1246–1255. [Google Scholar]
- Liu Z. Li Y. Han L. Li J. Liu J. Zhao Z. Nie W. Liu Y. Wang R. Bioinformatics. 2015;31:405–412. doi: 10.1093/bioinformatics/btu626. [DOI] [PubMed] [Google Scholar]
- Wang R. Fang X. Lu Y. Wang S. J. Med. Chem. 2004;47:2977–2980. doi: 10.1021/jm030580l. [DOI] [PubMed] [Google Scholar]
- Yang J. Shen C. Huang N. Front. Pharmacol. 2020;11:69. doi: 10.3389/fphar.2020.00069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanakala G. C. Aggarwal R. Nayar D. Priyakumar U. D. ACS Omega. 2023;8:2389–2397. doi: 10.1021/acsomega.2c06781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matta C. F. Arabi A. A. Future Med. Chem. 2011;3:969–994. doi: 10.4155/fmc.11.65. [DOI] [PubMed] [Google Scholar]
- Ahrens J., Geveci B., Law C., Hansen C. and Johnson C., in ParaView: An end-user tool for large-data visualization, Citeseer, 2005, vol. 717, pp. 50038–50041 [Google Scholar]
- Somnath V. R. Bunne C. Krause A. Adv. Neural Inf. Process. 2021;34:25244–25255. [Google Scholar]
- Moesser M. A., Klein D., Boyles F., Deane C. M., Baxter A. and Morris G. M., bioRxiv, 2022, preprint, bioRxiv:2022.03.04.483012, 10.1101/2022.03.04.483012 [DOI]
- Isert C. Atz K. Schneider G. Curr. Opin. Struct. Biol. 2023;79:102548. doi: 10.1016/j.sbi.2023.102548. [DOI] [PubMed] [Google Scholar]
- Atz K. Grisoni F. Schneider G. Nat. Mach. Intell. 2021;3:1023–1032. [Google Scholar]
- Wang L., Chambers J. and Abel R., in Protein–Ligand Binding Free Energy Calculations with FEP+, ed. M. Bonomi and C. Camilloni, Springer New York, New York, NY, 2019, pp. 201–232 [DOI] [PubMed] [Google Scholar]
- Steinbrecher T. Abel R. Clark A. Friesner R. J. Mol. Biol. 2017;429:923–929. doi: 10.1016/j.jmb.2017.03.002. [DOI] [PubMed] [Google Scholar]
- Huang K. Luo S. Cong Y. Zhong S. Zhang J. Z. Duan L. Nanoscale. 2020;12:10737–10750. doi: 10.1039/c9nr10638c. [DOI] [PubMed] [Google Scholar]
- Kuhn B. Kollman P. A. J. Med. Chem. 2000;43:3786–3791. doi: 10.1021/jm000241h. [DOI] [PubMed] [Google Scholar]
- Kuhn B. Gerber P. Schulz-Gasch T. Stahl M. J. Med. Chem. 2005;48:4040–4048. doi: 10.1021/jm049081q. [DOI] [PubMed] [Google Scholar]
- Bader R. F. Acc. Chem. Res. 1985;18:9–15. [Google Scholar]
- Bader R. F., Atoms in Molecules: A Quantum Theory, Clarendon Press, 1990 [Google Scholar]
- Matta C. F. and Boyd R. J., in An Introduction to the Quantum Theory of Atoms in Molecules, ed. C. F. Matta and R. J. Boyd, Wiley, 2007, pp. 1–34 [Google Scholar]
- Becke A. D. Edgecombe K. E. J. Chem. Phys. 1990;92:5397–5403. [Google Scholar]
- Lu T. Chen F.-W. Acta Phys.-Chim. Sin. 2011;27:2786–2792. [Google Scholar]
- Schmider H. Becke A. J. Mol. Struct.: THEOCHEM. 2000;527:51–61. [Google Scholar]
- Johnson E. R. Keinan S. Mori-Sánchez P. Contreras-García J. Cohen A. J. Yang W. J. Am. Chem. Soc. 2010;132:6498–6506. doi: 10.1021/ja100936w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu T. Chen F. J. Comput. Chem. 2012;33:580–592. doi: 10.1002/jcc.22885. [DOI] [PubMed] [Google Scholar]
- Niepötter B. Herbst-Irmer R. Kratzert D. Samuel P. P. Mondal K. C. Roesky H. W. Jerabek P. Frenking G. Stalke D. Angew. Chem., Int. Ed. 2014;53:2766–2770. doi: 10.1002/anie.201308609. [DOI] [PubMed] [Google Scholar]
- Sukumar N. and Breneman C. M., in QTAIM in Drug Discovery and Protein Modeling, ed. C. F. Matta and R. J. Boyd, Wiley, 2007, pp. 473–498 [Google Scholar]
- Tosso R. D. Vettorazzi M. Andujar S. A. Gutierrez L. J. Garro J. C. Angelina E. Rodríguez R. Suvire F. D. Nogueras M. Cobo J. Enriz R. D. J. Mol. Struct. 2017;1134:464–474. [Google Scholar]
- Rojas S. Parravicini O. Vettorazzi M. Tosso R. Garro A. Gutiérrez L. Andújar S. Enriz R. Eur. J. Med. Chem. 2020;208:112792. doi: 10.1016/j.ejmech.2020.112792. [DOI] [PubMed] [Google Scholar]
- Tosso R. D. Andujar S. A. Gutierrez L. Angelina E. Rodriguez R. Nogueras M. Baldoni H. Suvire F. D. Cobo J. Enriz R. D. J. Chem. Inf. Model. 2013;53:2018–2032. doi: 10.1021/ci400178h. [DOI] [PubMed] [Google Scholar]
- Vettorazzi M. Angelina E. Lima S. Gonec T. Otevrel J. Marvanova P. Padrtova T. Mokry P. Bobal P. Acosta L. M. Palma A. Cobo J. Bobalova J. Csollei J. Malik I. Alvarez S. Spiegel S. Jampilek J. Enriz R. D. Eur. J. Med. Chem. 2017;139:461–481. doi: 10.1016/j.ejmech.2017.08.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Firme C. L. Monteiro N. K. Silva S. R. Comput. Theor. Chem. 2017;1111:40–49. [Google Scholar]
- Luchi A. M. Villafañe R. N. Gomez Chavez J. L. Bogado M. L. Angelina E. L. Peruchena N. M. ACS Omega. 2019;4:19582–19594. doi: 10.1021/acsomega.9b01934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gutiérrez L. J. Parravicini O. Sánchez E. Rodríguez R. Cobo J. Enriz R. D. J. Biomol. Struct. 2019;37:229–246. doi: 10.1080/07391102.2018.1424036. [DOI] [PubMed] [Google Scholar]
- von Lilienfeld O. A. Müller K.-R. Tkatchenko A. Nat. Rev. Chem. 2020;4:347–358. doi: 10.1038/s41570-020-0189-9. [DOI] [PubMed] [Google Scholar]
- Unke O. T. Chmiela S. Sauceda H. E. Gastegger M. Poltavsky I. Schütt K. T. Tkatchenko A. Müller K.-R. Chem. Rev. 2021;121:10142–10186. doi: 10.1021/acs.chemrev.0c01111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lemm D. von Rudorff G. F. von Lilienfeld O. A. Nat. Commun. 2021;12:4468. doi: 10.1038/s41467-021-24525-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rai B. K. Sresht V. Yang Q. Unwalla R. Tu M. Mathiowetz A. M. Bakken G. A. J. Chem. Inf. Model. 2022;62:785–800. doi: 10.1021/acs.jcim.1c01346. [DOI] [PubMed] [Google Scholar]
- Musaelian A. Batzner S. Johansson A. Sun L. Owen C. J. Kornbluth M. Kozinsky B. Nat. Commun. 2023;14:579. doi: 10.1038/s41467-023-36329-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Isert C. Kromann J. C. Stiefl N. Schneider G. Lewis R. A. ACS Omega. 2023;8:2046–2056. doi: 10.1021/acsomega.2c05607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nippa D. F. Atz K. Hohler R. Müller A. T. Marx A. Bartelmus C. Wuitschik G. Marzuoli I. Jost V. Wolfard J. Binder M. Stepan A. F. Konrad D. B. Grether U. Martin R. E. Schneider G. Nat. Chem. 2023 doi: 10.1038/s41557-023-01360-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stuyver T. Coley C. W. J. Chem. Phys. 2022;156:084104. doi: 10.1063/5.0079574. [DOI] [PubMed] [Google Scholar]
- Isert C. Atz K. Jiménez-Luna J. Schneider G. Sci. Data. 2022;9:273. doi: 10.1038/s41597-022-01390-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neeser R. M. Isert C. Stuyver T. Schneider G. Coley C. W. Chem. Data Collect. 2023;46:101040. [Google Scholar]
- Nippa D. F. Atz K. Müller A. T. Wolfard J. Isert C. Binder M. Scheidegger O. Konrad D. B. Grether U. Martin R. E. et al. . Commun. Chem. 2023;6:256. doi: 10.1038/s42004-023-01047-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eldred D. V. Weikel C. L. Jurs P. C. Kaiser K. L. Chem. Res. Toxicol. 1999;12:670–678. doi: 10.1021/tx980273w. [DOI] [PubMed] [Google Scholar]
- Breneman C. M. Sundling C. M. Sukumar N. Shen L. Katt W. P. Embrechts M. J. J. Comput.-Aided Mol. Des. 2003;17:231–240. doi: 10.1023/a:1025334310107. [DOI] [PubMed] [Google Scholar]
- Tosstorff A. Rudolph M. G. Cole J. C. Reutlinger M. Kramer C. Schaffhauser H. Nilly A. Flohr A. Kuhn B. J. Comput.-Aided Mol. Des. 2022;36:753–765. doi: 10.1007/s10822-022-00478-x. [DOI] [PubMed] [Google Scholar]
- Chen Y.-q. Sheng Y.-j. Ma Y.-q. Ding H.-m. Phys. Chem. Chem. Phys. 2022;24:14339–14347. doi: 10.1039/d2cp00161f. [DOI] [PubMed] [Google Scholar]
- Read J. A. Wilkinson K. W. Tranter R. Sessions R. B. Brady R. L. J. Biol. Chem. 1999;274:10213–10218. [PubMed] [Google Scholar]
- Schrödinger, LLC, PyMOL. The PyMOL Molecular Graphics System, Version 2.5.2, Schrödinger, LLC [Google Scholar]
- Angelina E. L. Andujar S. A. Tosso R. D. Enriz R. D. Peruchena N. M. J. Phys. Org. Chem. 2014;27:128–134. [Google Scholar]
- Grimme S. Bannwarth C. Shushkov P. J. Chem. Theory Comput. 2017;13:1989–2009. doi: 10.1021/acs.jctc.7b00118. [DOI] [PubMed] [Google Scholar]
- Bannwarth C. Ehlert S. Grimme S. J. Chem. Theory Comput. 2019;15:1652–1671. doi: 10.1021/acs.jctc.8b01176. [DOI] [PubMed] [Google Scholar]
- Grimme S. J. Chem. Theory Comput. 2019;15:2847–2862. doi: 10.1021/acs.jctc.9b00143. [DOI] [PubMed] [Google Scholar]
- Bannwarth C. Caldeweyher E. Ehlert S. Hansen A. Pracht P. Seibert J. Spicher S. Grimme S. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2020:e01493. [Google Scholar]
- Sigalov G. Fenley A. Onufriev A. J. Chem. Phys. 2006;124:124902. doi: 10.1063/1.2177251. [DOI] [PubMed] [Google Scholar]
- Ehlert S. Stahn M. Spicher S. Grimme S. J. Chem. Theory Comput. 2021;17:4250–4261. doi: 10.1021/acs.jctc.1c00471. [DOI] [PubMed] [Google Scholar]
- Schmitz S. Seibert J. Ostermeir K. Hansen A. Göller A. H. Grimme S. J. Phys. Chem. B. 2020;124:3636–3646. doi: 10.1021/acs.jpcb.0c00549. [DOI] [PubMed] [Google Scholar]
- Gundelach L. Fox T. Tautermann C. S. Skylaris C.-K. Phys. Chem. Chem. Phys. 2021;23:9381–9393. doi: 10.1039/d1cp00206f. [DOI] [PubMed] [Google Scholar]
- Matta C. F. J. Comput. Chem. 2010;31:1297–1311. doi: 10.1002/jcc.21417. [DOI] [PubMed] [Google Scholar]
- Matta C. F. J. Comput. Chem. 2014;35:1165–1198. doi: 10.1002/jcc.23608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chai J.-D. Head-Gordon M. Phys. Chem. Chem. Phys. 2008;10:6615–6620. doi: 10.1039/b810189b. [DOI] [PubMed] [Google Scholar]
- Weigend F. Ahlrichs R. Phys. Chem. Chem. Phys. 2005;7:3297–3305. doi: 10.1039/b508541a. [DOI] [PubMed] [Google Scholar]
- Grimme S. Antony J. Ehrlich S. Krieg H. J. Chem. Phys. 2010;132:154104. doi: 10.1063/1.3382344. [DOI] [PubMed] [Google Scholar]
- Hehre W. J. Ditchfield R. Pople J. A. J. Chem. Phys. 1972;56:2257–2261. [Google Scholar]
- Smith D. G. A. Burns L. A. Simmonett A. C. Parrish R. M. Schieber M. C. Galvelis R. Kraus P. Kruse H. Di Remigio R. Alenaizan A. James A. M. Lehtola S. Misiewicz J. P. Scheurer M. Shaw R. A. Schriber J. B. Xie Y. Glick Z. L. Sirianni D. A. O'Brien J. S. Waldrop J. M. Kumar A. Hohenstein E. G. Pritchard B. P. Brooks B. R. Schaefer I. Henry F. Sokolov A. Y. Patkowski K. DePrince I. Eugene A. Bozkaya U. King R. A. Evangelista F. A. Turney J. M. Crawford T. D. Sherrill C. D. J. Chem. Phys. 2020;152:184108. doi: 10.1063/5.0006002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Landrum G., RDKit: Open-source cheminformatics and machine learning, https://www.rdkit.org/docs/index.html, accessed 19.06.23
- Lim J. Ryu S. Park K. Choe Y. J. Ham J. Kim W. Y. J. Chem. Inf. Model. 2019;59:3981–3988. doi: 10.1021/acs.jcim.9b00387. [DOI] [PubMed] [Google Scholar]
- Torng W. Altman R. B. J. Chem. Inf. Model. 2019;59:4131–4149. doi: 10.1021/acs.jcim.9b00628. [DOI] [PubMed] [Google Scholar]
- Satorras V. G., Hoogeboom E. and Welling M., ICML, 2021, pp. 9323–9332 [Google Scholar]
- Atz K. Isert C. Böcker M. N. Jiménez-Luna J. Schneider G. Phys. Chem. Chem. Phys. 2022;24:10775–10783. doi: 10.1039/d2cp00834c. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jensen J. H. PeerJ Prepr. 2017;5:e2693v1. [Google Scholar]
- Tosstorff A. Cole J. C. Taylor R. Harris S. F. Kuhn B. J. Chem. Inf. Model. 2020;60:6595–6611. doi: 10.1021/acs.jcim.0c00858. [DOI] [PubMed] [Google Scholar]
- Tosstorff A. Cole J. C. Bartelt R. Kuhn B. ChemMedChem. 2021;16:3428–3438. doi: 10.1002/cmdc.202100387. [DOI] [PubMed] [Google Scholar]
- Xiong Z. Wang D. Liu X. Zhong F. Wan X. Li X. Li Z. Luo X. Chen K. Jiang H. Zheng M. J. Med. Chem. 2019;63:8749–8760. doi: 10.1021/acs.jmedchem.9b00959. [DOI] [PubMed] [Google Scholar]
- EGNN-PyTorch, https://github.com/lucidrains/egnn-pytorch, accessed 19.06.23
- Elfwing S. Uchibe E. Doya K. Neural Networks. 2018;107:3–11. doi: 10.1016/j.neunet.2017.12.012. [DOI] [PubMed] [Google Scholar]
- Kingma D. P. and Ba J., arXiv, 2014, preprint, arXiv:1412.6980, 10.48550/arXiv.1412.6980 [DOI]
- Paszke A. Gross S. Massa F. Lerer A. Bradbury J. Chanan G. Killeen T. Lin Z. Gimelshein N. Antiga L. Desmaison A. Köpf A. Yang E. DeVito Z. Raison M. Tejani A. Chilamkurthy S. Steiner B. Fang L. Bai J. Chintala S. Adv. Neural Inf. Process. 2019;32:8026–8037. [Google Scholar]
- Fey M. and Lenssen J. E., ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019 [Google Scholar]
- Li Y. Liu Z. Li J. Han L. Liu J. Zhao Z. Wang R. J. Chem. Inf. Model. 2014;54:1700–1716. doi: 10.1021/ci500080q. [DOI] [PubMed] [Google Scholar]
- Li Y. Han L. Liu Z. Wang R. J. Chem. Inf. Model. 2014;54:1717–1736. doi: 10.1021/ci500081m. [DOI] [PubMed] [Google Scholar]
- Su M. Yang Q. Du Y. Feng G. Liu Z. Li Y. Wang R. J. Chem. Inf. Model. 2018;59:895–913. doi: 10.1021/acs.jcim.8b00545. [DOI] [PubMed] [Google Scholar]
- Wang K. Zhou R. Tang J. Li M. Bioinform. 2023;39:btad340. [Google Scholar]
- Hourahine B. Aradi B. Blum V. Bonafé F. Buccheri A. Camacho C. Cevallos C. Deshaye M. Y. Dumitrică T. Dominguez A. Ehlert S. Elstner M. van der Heide T. Hermann J. Irle S. Kranz J. J. Köhler C. Kowalczyk T. Kubař T. Lee I. S. Lutsker V. Maurer R. J. Min S. K. Mitchell I. Negre C. Niehaus T. A. Niklasson A. M. N. Page A. J. Pecchia A. Penazzi G. Persson M. P. Řezáč J. Sánchez C. G. Sternberg M. Stöhr M. Stuckenberg F. Tkatchenko A. Yu V. W.-z. Frauenheim T. J. Chem. Phys. 2020;152:124101. doi: 10.1063/1.5143190. [DOI] [PubMed] [Google Scholar]
- Chandrasekaran A. Kamal D. Batra R. Kim C. Chen L. Ramprasad R. npj Comput. Mater. 2019;5:22. [Google Scholar]
- Grisafi A. Fabrizio A. Meyer B. Wilkins D. M. Corminboeuf C. Ceriotti M. ACS Cent. Sci. 2018;5:57–64. doi: 10.1021/acscentsci.8b00551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cuevas-Zuviría B. Pacios L. F. J. Chem. Inf. Model. 2020;60:3831–3842. doi: 10.1021/acs.jcim.0c00197. [DOI] [PubMed] [Google Scholar]
- Qiao Z. Christensen A. S. Welborn M. Manby F. R. Anandkumar A. Miller III T. F. Proc. Natl. Acad. Sci. U. S. A. 2022;119:e2205221119. doi: 10.1073/pnas.2205221119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Unke O. Bogojeski M. Gastegger M. Geiger M. Smidt T. Müller K.-R. Adv. Neural Inf. Process. 2021;34:14434–14447. [Google Scholar]
- Jørgensen P. B. Bhowmik A. npj Comput. Mater. 2022;8:183. [Google Scholar]
- Cuevas-Zuviría B. Pacios L. F. J. Chem. Inf. Model. 2021;61:2658–2666. doi: 10.1021/acs.jcim.1c00227. [DOI] [PubMed] [Google Scholar]
- Rackers J. A., Tecot L., Geiger M. and Smidt T. E., arXiv, 2022, preprint, arXiv:2201.03726, 10.48550/arXiv.2201.03726 [DOI]
- Lee A. J. Rackers J. A. Bricker W. P. Biophys. J. 2022;121:3883–3895. doi: 10.1016/j.bpj.2022.08.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keith T. A., AIMAll (Version 19.10.12), TK Gristmill Software, Overland Park KS, USA, 2019, https://aim.tkgristmill.com/, accessed 29.06.23 [Google Scholar]
- Cheeseman J. and Keith T. A. and Bader R. F. W., AIMPAC Program Package, McMaster University, Hamilton, Ontario, 1992, https://www.chemistry.mcmaster.ca/aimpac/imagemap/imagemap.htm, accessed 29.06.23 [Google Scholar]
- Gu S. Smith M. S. Yang Y. Irwin J. J. Shoichet B. K. J. Chem. Inf. Model. 2021;61:4331–4341. doi: 10.1021/acs.jcim.1c00368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jain A. N. Brueckner A. C. Cleves A. E. Reibarkh M. Sherer E. C. J. Med. Chem. 2023;66:1955–1971. doi: 10.1021/acs.jmedchem.2c01744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ballester P. J. Schreyer A. Blundell T. L. J. Chem. Inf. Model. 2014;54:944–955. doi: 10.1021/ci500091r. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schneider G. Nat. Rev. Drug Discovery. 2010;9:273–276. doi: 10.1038/nrd3139. [DOI] [PubMed] [Google Scholar]
- Chodera J. D. Mobley D. L. Annu. Rev. Biophys. 2013;42:121–142. doi: 10.1146/annurev-biophys-083012-130318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu S. Zhang W. Li W. Huang W. Kong Q. Chen Z. Wei W. Yan S. Biochemistry. 2022;61:433–445. doi: 10.1021/acs.biochem.1c00771. [DOI] [PubMed] [Google Scholar]
- Siebenmorgen T., Menezes F., Benassou S., Merdivan E., Kesselheim S., Piraud M., Theis F. J., Sattler M. and Popowicz G. M., bioRxiv, 2023, preprint, 10.1101/2023.05.24.542082 [DOI] [Google Scholar]
- Pecina A., Fanfrlík J., Lepšík M. and Řezáč J., ChemRxiv, 2023, preprint, 10.26434/chemrxiv-2023-zh03k [DOI] [Google Scholar]
- Morgan H. L. J. Chem. Doc. 1965;5:107–113. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.