Skip to main content
Bioinformatics Advances logoLink to Bioinformatics Advances
. 2023 Feb 3;3(1):vbad011. doi: 10.1093/bioadv/vbad011

G-RANK: an equivariant graph neural network for the scoring of protein–protein docking models

Ha Young Kim 1, Sungsik Kim 2, Woong-Yang Park 3,4,5, Dongsup Kim 6,
Editor: Michael Gromiha
PMCID: PMC9927558  PMID: 36818727

Abstract

Motivation

Protein complex structure prediction is important for many applications in bioengineering. A widely used method for predicting the structure of protein complexes is computational docking. Although many tools for scoring protein–protein docking models have been developed, it is still a challenge to accurately identify near-native models for unknown protein complexes. A recently proposed model called the geometric vector perceptron–graph neural network (GVP-GNN), a subtype of equivariant graph neural networks, has demonstrated success in various 3D molecular structure modeling tasks.

Results

Herein, we present G-RANK, a GVP-GNN-based method for the scoring of protein-protein docking models. When evaluated on two different test datasets, G-RANK achieved a performance competitive with or better than the state-of-the-art scoring functions. We expect G-RANK to be a useful tool for various applications in biological engineering.

Availability and implementation

Source code is available at https://github.com/ha01994/grank.

Contact

kds@kaist.ac.kr

Supplementary information

Supplementary data are available at Bioinformatics Advances online.

1 Introduction

Protein–protein interactions play key roles in many biological processes. Knowledge on the 3D structures of protein complexes is key for understanding protein–protein interactions, which can be useful in various applications such as protein design or drug discovery. As experimental approaches pose limitations in terms of time and cost, computational docking is commonly used to predict the structures of protein complexes. Computational docking involves the generation of a large number of candidate structural models, from which final models are selected according to a certain scoring function. The scoring function is highly important for the successful identification of near-native docking models among all generated candidates.

Various scoring functions have been developed based on physics-based, statistical potential-based, or machine learning-based methods (Geng et al., 2020). Examples of physics-based methods include HADDOCK (Dominguez et al., 2003), ZDock (Pierce et al., 2014) and pyDock (Cheng et al., 2007). Examples of methods based on knowledge-based statistical potentials include GOAP (Zhou and Skolnick, 2011) and ITScore (Huang and Zou, 2008). Finally, machine learning-based methods include iScore (Geng et al., 2020), DeepRank (Renaud et al., 2021), DOVE (Wang et al., 2020), GNN-DOVE (Wang et al., 2021), PAUL (Eismann et al., 2021) and DeepRank-GNN (Réau et al., 2023). Currently, there is no comprehensive benchmarking study on the performances of the more recently developed machine learning methods. However, these machine learning-based methods show high predictive performances in their own assessment studies, often outperforming the traditional scoring functions.

The models introduced above have achieved considerable success in scoring protein-protein docking models, but there is still room for improvement (Réau et al., 2023). Recently, a new type of neural network called the equivariant graph neural network (EGNN) has been proposed (Satorras et al., 2021). If a function is equivariant to some transformation, then the transformation of the input results in an equivalent transformation of the function’s output (Townshend et al., 2021). EGNNs are E(3)-equivariant; that is, they are equivariant to translation, rotation and reflection in 3D Euclidean space (Satorras et al., 2021). Equivariance is a desired property because neural networks are needed to recognize proteins in different positions and orientations. A subtype of EGNN, geometric vector perceptron–graph neural network (GVP-GNN), has recently shown success in various 3D molecular structure modeling tasks (Jing et al., 2020, 2021).

Herein, we propose G-RANK, a new GVP-GNN-based method, for ranking protein–protein docking models. We compare our model with DeepRank-GNN (Réau et al., 2023), GNN-DOVE (Wang et al., 2021), iScore (Geng et al., 2020), HADDOCK (Van Zundert et al., 2016) and GOAP (Zhou and Skolnick, 2011). When tested on two different test datasets, our model’s performance is competitive with or better than the compared scoring functions. We expect this model to be a useful tool for predicting the structures of protein–protein complexes.

2 Materials and methods

2.1 Model architecture

We used the GVP-GNN model architecture from Jing et al. (2021). GVP-GNNs differ from typical EGNNs in that the node and edge embeddings are tuples of scalar and vector features. Multi-layer perceptrons in standard EGNNs are replaced with GVPs in GVP-GNNs. A GVP is a module with a scalar channel and a vector channel that can process scalar and vector features, respectively. This module involves information propagation from a scalar channel to a vector channel and vice versa. Details on the formulation of GVP can be found in the original article (Jing et al., 2021).

The architecture of the model is shown in Figure 1. Nodes are defined as atoms, and edges are defined for all pairs of atoms within 4.5 Å. The initial node embedding is the atom type in one-hot encoding. The initial edge embedding consists of the Gaussian radial basis function encoding of the edge length and the unit vector in the edge direction. These edge embeddings are computed from the atom coordinates. After the node and edge embeddings are each passed through a GVP layer, they are provided as input to a GVPConvLayer block which is repeated five times. This block consists of two layers: a graph propagation layer and a feed-forward layer.

Fig. 1.

Fig. 1.

Architecture of GVP-GNN developed by Jing et al. (2021). GVPConvLayer block is shown in blue. Node embedding is indicated by h_V, and edge embedding is indicated by h_E

The graph propagation layer first computes messages from node and edge embeddings and then uses the message to update node embeddings. This layer can be formulated as:

mji= ϕconcathVj,hEji for j's in NihViLayerNormhVi+ 1k'DropoutjN(i)m(ji), (1)

where m(ji) is the message from node j to node i, ϕ is a sequence of 3 GVP layers, hVj is the embedding of node j, hEji is the embedding of edge ji, N(i) is the set of neighboring nodes of node i, and k is the size of N(i).

The feed-forward layer then updates node embeddings in a point-wise manner. This layer can be formulated as:

hViLayerNormhVi+DropoutϕhVi, (2)

where ϕ is a sequence of 2 GVP layers. The updated node embeddings are repeatedly used as the input node embeddings for the next GVPConvLayer block.

After five repetitions of the GVPConvLayer block, the final node embeddings are passed through a GVP layer, after which all node embeddings are reduced to scalars. This is followed by a mean pooling layer that averages the embeddings across all nodes. The embeddings finally pass through two dense layers, thus resulting in a single scalar value.

2.2 Datasets

2.2.1 BM5 cross-validation dataset

The authors of DeepRank (Renaud et al., 2021) conducted docking by using HADDOCK version 2.2 (Dominguez et al., 2003; Van Zundert et al., 2016) on the docking benchmark dataset version 5 (BM5) (Vreven et al., 2015). The authors of DeepRank obtained 232 non-redundant complexes from the BM5 dataset, from which they excluded antibody–antigen complexes and complexes involving more than two chains, which resulted in 142 complexes. The dataset was made available by the authors of DeepRank (available at https://data.sbgrid.org/dataset/843/), and the details on the dataset generation can be found in Supplementary Note S1. The dataset was used by the authors of DeepRank-GNN (Réau et al., 2023) for 10-fold cross-validation. The cross-validation is performed in the following manner: 15 complexes are left out as the test set. For each fold, 102 and 25 complexes are assigned to the training and validation sets, respectively. We use the same cross-validation dataset as DeepRank-GNN to ensure a fair comparison. Details on the size and composition of the 10-fold cross-validation dataset can be found in Supplementary Table S2.

2.2.2 Independent test dataset

We assessed our model on an independent test dataset called the CAPRI score set (Lensink and Wodak, 2014), which was also made available by the authors of DeepRank (available at https://data.sbgrid.org/dataset/843/). This dataset consists of more than 16 000 docking models for 13 protein–protein complexes, which were generated using a variety of docking methods by more than 40 research groups. We left out one complex containing no positive samples, resulting in 12 complexes. Details on the size and composition of this dataset can be found in Supplementary Table S3.

2.3 Model inputs and training

As in DeepRank-GNN (Réau et al., 2023), we constructed the graph only on the interface of the protein–protein complex, rather than the full complex. The interface was determined with pdb2sql (Renaud and Geng, 2020) at a cutoff of 8.5 Å. Some models for which no contact atoms were found were excluded. Five atoms, C, N, O, S and H, were included in the model inputs. The ATOM3D package (Townshend et al., 2020) was used to convert the PDB files into the LMDB data format required for model inputs.

We trained G-RANK to predict the f-nat (fraction of native contacts) (Méndez et al., 2003) values of docking models. f-nat ranges from 0 to 1, with higher f-nat values indicating higher model quality. The cutoff for discriminating between near-native and wrong models was defined as 0.3, as done in DeepRank-GNN (Réau et al., 2023). The mean squared error was used as the loss function, and Adam optimizer was used for optimization. The models were trained by using a batch size of 64 and a learning rate of 0.0001. We trained the models for 50 epochs, saved the model in each epoch and retained the model with the lowest validation loss for final evaluation on the test set.

2.4 Assessment metrics

We used the area under the receiver operating characteristic curve (ROC-AUC) and the precision-recall area under the curve (PR-AUC) to assess model performance. We also used a per-complex assessment metric called the hit rate (Renaud et al., 2021), which is defined as follows:

hit ratek= Nhits(k) Npos, (3)

where Nhits(k) is the number of hits (positive models) among the top k ranked models predicted by the predictor, and Npos is the total number of positive models for this complex. We also assess using the success rate, which is the number of docking cases with at least one near-native model among the top-k predicted models, divided by the total number of cases.

3 Results

3.1 Results on the BM5 test set

We trained G-RANK on the BM5 10-fold cross-validation dataset (training and validation loss curves shown in Supplementary Fig. S4). Afterwards, we used an ensemble method to compare the performance of the model with that of DeepRank-GNN by averaging the predictions of all models from the 10 folds for final evaluation on the left-out test set. DeepRank-GNN predictions were also obtained using the same ensemble method. The left-out test set consisted of 15 complexes with 22 099 positive models and 342 096 negative models. The boxplots in Figure 2 show the distributions of per-complex ROC-AUC values and per-complex PR-AUC values. Both G-RANK and DeepRank-GNN tended to achieve high ROC-AUC values. In terms of PR-AUC, G-RANK achieved a lower 75% quantile than DeepRank-GNN (0.864 < 0.918) but a significantly higher median value (0.785 > 0.581). Figure 3 shows the per-complex hit rates for the top-k ranked models, for values of k ranging from 0 to 1000. The performance of G-RANK is similar to that of DeepRank-GNN in terms of the median but outperforms DeepRank-GNN in terms of the 75% quantile.

Fig. 2.

Fig. 2.

Distributions of per-complex ROC-AUC and PR-AUC on the BM5 test set

Fig. 3.

Fig. 3.

Distributions of per-complex hit rates on the BM5 test set, where k ranges from 0 to 1000. The thick lines denote the median hit rates, and the shaded areas denote the 25–75% quantile interval of the hit rates

3.2 Results on the independent test dataset

We used the ensemble method described above to evaluate our model on the CAPRI score set consisting of 1970 positive models and 14 113 negative models. We compared our model with DeepRank-GNN, GOAP, GNN-DOVE, HADDOCK and iScore. Figure 4 shows the distributions of per-complex ROC-AUC and PR-AUC values. In terms of both metrics, G-RANK achieved a higher distribution of scores than other methods. Figure 5 shows the median of per-complex hit rates for the top-k ranked models, for values of k ranging from 0 to 500, and Figure 6 shows the distributions of per-complex hit rates for k = 1, 10, 25, 50, 100. In both figures, it can be seen that the median hit rates of G-RANK are overall higher compared to other methods. Also, Table 1 shows the success rates when k = 1, 10, 25, 50, 100. G-RANK demonstrates the highest success rate among all predictors for values of k = 1, 10, 25, 50.

Fig. 4.

Fig. 4.

Distributions of per-complex ROC-AUC and PR-AUC on the CAPRI score set

Fig. 5.

Fig. 5.

Distributions of the median of the per-complex hit rates on the CAPRI score set, where k ranges from 0 to 500

Fig. 6.

Fig. 6.

Distributions of per-complex hit rates on the CAPRI score set (k =1, 10, 25, 50, 100)

Table 1.

Assessment of success rates on the CAPRI score set (k =1, 10, 25, 50, 100)

G-RANK DeepRank-GNN GOAP GNN-DOVE HADDOCK iScore
Top 1 0.25 0.17 0.00 0.08 0.00 0.00
Top 10 0.5 0.42 0.42 0.5 0.17 0.17
Top 25 0.58 0.58 0.58 0.58 0.42 0.25
Top 50 0.75 0.75 0.75 0.58 0.5 0.5
Top 100 0.83 0.92 0.83 0.67 0.58 0.58

Note: Success rate refers to the percentage of cases in which a correct model is found within the top k ranked models.

Additionally, we performed a model assessment by labeling the correct and incorrect models using the CAPRI scoring criteria (Lensink et al., 2018), instead of f-nat values. CAPRI classes are computed based on iRMSD (interface RMSD), lRMSD (ligand RMSD) and f-nat values, and each docking model is classified as ‘high’, ‘medium’, ‘acceptable’ or ‘incorrect’ (Supplementary Note S5). In this assessment, we consider a model correct if it is at least of ‘acceptable’ quality. The per-complex ROC-AUC, PR-AUC, hit rates and the success rates are shown in Supplementary Figures S6 and S7 and Supplementary Table S8. In terms of ROC-AUC and PR-AUC, G-RANK still achieved the highest performance. In terms of the hit rates, when the value of k is small, it shows a slightly lower performance compared to other models, but it generally shows a good predictive performance. In terms of the success rates, its performance is similar to other methods in general.

4 Discussion

In this study, we developed G-RANK, a new method based on GVP-GNN for assessing the quality of protein–protein docking models. When tested on two different test sets, G-RANK’s performance was competitive with or outperformed the existing state-of-the-art methods. Analysis on hit rates and success rates demonstrated that G-RANK performs well in identifying near-native models among a large pool of candidate models. Our results confirm the previous finding (Jing et al., 2021) that GVP-GNNs perform well across prediction tasks involving 3D molecular structures. The equivariance property of the neural network, as well as the representation of 3D structures as both scalar and vectors, is important factors that led to G-RANK’s outstanding performance.

As shown by the generally low PR-AUC of the predictors on the CAPRI score set, the identification of several near-native models among a large pool of candidate models remains a difficult problem. Development of more advanced equivariant neural network architectures in the future may help overcome this problem. In addition, the continual accumulation of experimentally determined protein complex structures may enable better prediction in the future. Also, the recent development of AlphaFold2-Multimer (Evans et al., 2022), a fold-and-dock approach, has shown significant success in the prediction of protein complex structures. We expect that the scoring function developed in this work can collaborate with such model to help advance our knowledge on protein complex structures.

The GVP-GNN model used in this study could be further modified and applied to other research objectives related to protein structure and function, such as ranking of protein–ligand docking models. We expect that G-RANK may contribute to various applications in biological engineering, such as protein design and drug discovery.

Supplementary Material

vbad011_Supplementary_Data

Contributor Information

Ha Young Kim, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea.

Sungsik Kim, GENINUS Inc., Seoul 05836, South Korea.

Woong-Yang Park, GENINUS Inc., Seoul 05836, South Korea; Samsung Genome Institute, Samsung Medical Center, Seoul 06351, South Korea; Department of Molecular Cell Biology, Sungkyunkwan University School of Medicine, Suwon 16419, South Korea.

Dongsup Kim, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea.

Author contributions

Ha Young Kim (Investigation [lead], Methodology [lead], Software [lead], Writing – original draft [lead]), Sungsik Kim (Software [supporting]), Woong-Yang Park (Software [supporting]), and Dongsup Kim (Conceptualization [lead], Funding acquisition [lead], Supervision [lead], Writing – review & editing [lead])

Funding

This work was supported by the National Research Foundation of Korea (NRF) [NRF-2022R1A2C1006609 and NRF-2021M3H9A2097443] funded by the Korean Government (MSIT).

Conflict of Interest: S.K. and W.-Y.P. report personal fees and other support from Geninus Inc.

Data availability

Data is made available by the authors of DeepRank and can be accessed at https://data.sbgrid.org/dataset/843/.

References

  1. Cheng T.M.K. et al. (2007) pyDock: electrostatics and desolvation for effective scoring of rigid‐body protein–protein docking. Proteins, 68, 503–515. [DOI] [PubMed] [Google Scholar]
  2. Dominguez C. et al. (2003) HADDOCK: a protein–protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc., 125, 1731–1737. [DOI] [PubMed] [Google Scholar]
  3. Eismann S. et al. (2021) Hierarchical, rotation‐equivariant neural networks to select structural models of protein complexes. Proteins, 89, 493–501. [DOI] [PubMed] [Google Scholar]
  4. Evans R. et al. (2022) Protein complex prediction with AlphaFold-Multimer. BioRxiv 10.1101/2021.10.04.463034. [DOI]
  5. Geng C. et al. (2020) iScore: a novel graph kernel-based function for scoring protein–protein docking models. Bioinformatics, 36, 112–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Huang S.Y., Zou X. (2008) An iterative knowledge‐based scoring function for protein–protein recognition. Proteins, 72, 557–579. [DOI] [PubMed] [Google Scholar]
  7. Jing B. et al. (2020) Learning from protein structure with geometric vector perceptrons. arXiv, arXiv:2009.01411 2020, preprint: not peer reviewed. 10.48550/arXiv.2009.01411. [DOI]
  8. Jing B. et al. (2021) Equivariant graph neural networks for 3d macromolecular structure. arXiv, arXiv:2106.03843 2021, preprint: not peer reviewed. 10.48550/arXiv.2106.03843. [DOI]
  9. Lensink M.F. et al. (2018) The challenge of modeling protein assemblies: the CASP12‐CAPRI experiment. Proteins, 86, 257–273. [DOI] [PubMed] [Google Scholar]
  10. Lensink M.F., Wodak S.J. (2014) Score_set: a CAPRI benchmark for scoring protein complexes. Proteins, 82, 3163–3169. [DOI] [PubMed] [Google Scholar]
  11. Méndez R. et al. (2003) Assessment of blind predictions of protein–protein interactions: current status of docking methods. Proteins, 52, 51–67. [DOI] [PubMed] [Google Scholar]
  12. Pierce B.G. et al. (2014) ZDOCK server: interactive docking prediction of protein–protein complexes and symmetric multimers. Bioinformatics, 30, 1771–1773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Réau M. et al. (2023) DeepRank-GNN: a graph neural network framework to learn patterns in protein-protein interfaces. Bioinformatics, 39, btac759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Renaud N., Geng C. (2020) The pdb2sql python package: parsing, manipulation and analysis of PDB files using SQL queries. J. Open Source Softw., 5, 2077. [Google Scholar]
  15. Renaud N. et al. (2021) DeepRank: a deep learning framework for data mining 3D protein-protein interfaces. Nat. Commun., 12, 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Satorras V.G. et al. (2021) E (n) equivariant graph neural networks. In: International Conference on Machine Learning, PMLR, pp. 9323–9332.
  17. Townshend R.J. et al. (2021) Geometric deep learning of RNA structure. Science, 373, 1047–1051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Townshend R.J. et al. (2020) Atom3d: tasks on molecules in three dimensions. arXiv, arXiv:2012.04035 2020. 10.48550/arXiv.2012.04035. [DOI]
  19. Van Zundert G. et al. (2016) The HADDOCK2. 2 web server: user-friendly integrative modeling of biomolecular complexes. J. Mol. Biol., 428, 720–725. [DOI] [PubMed] [Google Scholar]
  20. Vreven T. et al. (2015) Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J. Mol. Biol., 427, 3031–3041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Wang X. et al. (2021) Protein docking model evaluation by graph neural networks. Front. Mol. Biosci., 402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Wang X. et al. (2020) Protein docking model evaluation by 3D deep convolutional neural networks. Bioinformatics, 36, 2113–2118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Zhou H., Skolnick J. (2011) GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. Biophys. J., 101, 2043–2052. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

vbad011_Supplementary_Data

Data Availability Statement

Data is made available by the authors of DeepRank and can be accessed at https://data.sbgrid.org/dataset/843/.


Articles from Bioinformatics Advances are provided here courtesy of Oxford University Press

RESOURCES