Assessing the functional impact of protein binding site definition

Prithviraj Nandigrami; Andras Fiser

doi:10.1002/pro.5026

. 2024 May 16;33(6):e5026. doi: 10.1002/pro.5026

Assessing the functional impact of protein binding site definition

Prithviraj Nandigrami ¹, Andras Fiser ^1,^✉

PMCID: PMC11099757 PMID: 38757384

Abstract

Many biomedical applications, such as classification of binding specificities or bioengineering, depend on the accurate definition of protein binding interfaces. Depending on the choice of method used, substantially different sets of residues can be classified as belonging to the interface of a protein. A typical approach used to verify these definitions is to mutate residues and measure the impact of these changes on binding. Besides the lack of exhaustive data, this approach also suffers from the fundamental problem that a mutation introduces an unknown amount of alteration into an interface, which potentially alters the binding characteristics of the interface. In this study we explore the impact of alternative binding site definitions on the ability of a protein to recognize its cognate ligand using a pharmacophore approach, which does not affect the interface. The study also shows that methods for protein binding interface predictions should perform above approximately F‐score = 0.7 accuracy level to capture the biological function of a protein.

Keywords: pharmacophore modeling, protein interface, protein–protein interaction, PROTLID

1. INTRODUCTION

Protein–protein interactions are key players in many biological processes including metabolism, development, and regulation. Residue level descriptions of protein binding interfaces are essential for explaining, classifying, and modulating the formation of specific protein complexes. The biological function of a protein binding interface critically relies on its specificity, the ability to selectively recognize its cognate binding partners. However, the question of what exactly comprises a biological interface does not have a clear answer.

Recent advances in deep learning delivered impressive results on modeling protein structures (Jumper et al., 2021), but yet to achieve similar breakthrough in modeling protein complexes. It was noted at the latest CAPRI meeting that docking performance remained very limited for complexes for which evolutionary relationships to known interfaces were lacking (Lensink et al., 2023; Wodak et al., ²⁰²³). This could be explained by the fact that the modularity and limited possible combination of protein motifs, such as a finite number of super secondary structures (Fernandez‐Fuentes et al., 2010) can sample all possible protein folds (Skolnick et al., 2021) and therefore make this problem suitable for deep learning approaches. Meanwhile rules or an exhaustive repertoire of protein complexes or binding interfaces are not yet available (Bryant et al., 2022) although it is expected that the number of possible interfaces might also be limited (Gao & Skolnick, 2010). Predicting possible binding sites in proteins can greatly reduce the complexity and increase the success rate of docking approaches and can help to catalog interfaces. But identifying these sites with confidence remains difficult and predicting protein interfaces remains an active field of study. Several studies reported the possibilities of certain universal features, such as composition of residues, sequence entropy, and secondary structure (Laine & Carbone, 2015; Nadalin & Carbone, ²⁰¹⁸; Raucci et al., ²⁰¹⁸; Sacquin‐Mora et al., ²⁰⁰⁸; Seo & Kim, ²⁰¹⁸; Seoane & Carbone, ²⁰²¹; Xue et al., ²⁰¹⁵), while others found no significant differences compared with non‐interfaces (Hamp & Rost, 2012; Mukherjee & Chakrabarti, ²⁰²¹; Tonddast‐Navaei & Skolnick, ²⁰¹⁵). It is also difficult to reconcile the idea of universal features of binding sites and the fact that binding sites continually emerged and dissolved in the same protein family over evolution and binding sites can be successfully engineered into proteins with sometimes as little change as single residue mutation (Garcia‐Seisdedos et al., 2017; Grueninger et al., ²⁰⁰⁸; Pillai et al., ²⁰²²). A fundamental input to all these classifications, prediction and docking methods is the correct definition of protein interfaces. A major conceptual issue that we sought to address here is that current approaches exclusively try to define protein interfaces by physical definitions, without testing the question if an interface definition is biologically necessary and sufficiently well defined.

Many physical interface prediction approaches exist. Some of these employ a radial distance cutoff‐based approach (Bordner & Abagyan, 2005; Ofran & Rost, ²⁰⁰³), which can be combined with a requirement of compatibility of interactions (Sobolev et al., 1999), while others use Voronoi‐polyhedra based calculations, such as INTERCAAT (Grudman et al., 2022), QCONS (Fischer et al., 2006) and others (Cazals et al., 2006; McConkey et al., ²⁰⁰²). Another group of methods focus on monitoring changes in the accessible surface area upon complex formation, such as NACCESS (Bahadur et al., 2003, 2004) or POPSCOMB (Cavallo et al., 2003), which are all based on the original method of Lee and Richards (1971). However, there is a lack of consensus among these methodologies that employ various subjective and often system‐specific physical measures regarding how to consistently define the biologically relevant protein–protein interaction interface (Aumentado‐Armstrong et al., 2015; Burgoyne & Jackson, ²⁰⁰⁶; de Vries & Bonvin, ²⁰⁰⁸; Esmaielbeiki et al., ²⁰¹⁵; Ezkurdia et al., ²⁰⁰⁹; Gil & Fiser, ²⁰¹⁹; Keskin et al., ²⁰⁰⁴; Mabonga & Kappo, ²⁰¹⁹; Zhao & Gong, ²⁰¹⁷). A recent study illustrated that even among approaches employing nearly identical algorithms to define interfaces in known protein complexes, a minimal difference in definition can reduce the agreement between them to about 80% and in a significant number of cases the interface definitions could overlap by as little as 40% (Gil & Fiser, 2019). The differences of interface definitions among different approaches are even larger. In contrast, from a biological point of view, it is important to know what level of interface definition inaccuracy is acceptable while still maintaining a useful prediction. Given the uncertainty in defining a protein interface, it would be important to know how much one can mis‐predict or mis‐define the cognate biological interface and still capture its biological function, that is, to successfully predict or identify an interface that is sufficient to selectively recognize its cognate partner and the underlying interaction be specific for the interface.

A major strategy, both computationally (Bradshaw et al., 2011; Kortemme et al., ²⁰⁰⁴; Laurini et al., ²⁰²⁰; Moreira et al., ²⁰⁰⁷; Ramadoss et al., ²⁰¹⁶) and experimentally (Clackson & Wells, 1995; Dall'Acqua et al., ¹⁹⁹⁶^,¹⁹⁹⁸; Taylor et al., ¹⁹⁹⁸; Williams et al., ²⁰⁰⁶) used to verify a biologically relevant interface is alanine scanning mutagenesis. Another similar but complementary strategy is to mutate interface positions with closely related amino acids (Dougan et al., 1998; Ito et al., ¹⁹⁹³). However, the availability of complete datasets that exhaustively explore the interface of a given protein by measuring the impact of mutations on binding affinity is limited (Fischer et al., 2006). Besides this practical limitation, a more important, conceptual problem is that the effect of mutation on the overall stability of a receptor–ligand complex is not straightforward to address, in part, due to the conformational alterations associated with induced mutations (Gao & Skolnick, 2010). An additional consideration is that positions outside of the binding interface, that do not play a role in the direct recognition of the cognate ligand, can have an indirect effect on recognition if altered. Mutations at distal sites from the interface can either increase (Larsen et al., 2005; Ohtaka et al., ²⁰⁰³) or diminish the binding affinity (Li et al., 2014) of a receptor against a ligand. All these computational and experimental approaches inherently suffer from the problem that to probe the importance of any one position of an interface, an unknown amount of alteration is introduced into the very interface being studied. Taking this into account, one can conclude that it is not straightforward to define the binding interface of proteins by probing the impact of residue mutations, which are often associated with global structural modulation of the protein (Vaughan et al., 1999).

In this paper, we explore a novel approach to computationally assess the biological validity of various physical protein interface definitions by probing the minimum essential interface cohort of residues necessary to successfully recognize the cognate ligand of a protein receptor. Through this analysis we quantitatively investigate how much overlap alternative receptor interface patches must share with their true biological interface to still recognize their cognate partner reliably. Similarly, we investigate how many true interface residues can be lost without diminishing a protein's ability to accurately recognize its cognate partners.

Importantly, the analysis presented here does not perturb the integrity of the protein structure with mutations. Instead, we utilize a computational approach, ProtLID (Yap & Fiser, 2016), to obtain a residue‐specific pharmacophore (rs‐pharmacophore) description for the protein interface. A pharmacophore is an abstract description of the critical atoms, groups, charged regions, and their spatial distributions that are essential for the biological activity of binding entity, such as a small‐molecule drug or a protein ligand (Goodford, 1985; Kier, ¹⁹⁶⁷). In ProtLID, the preferred, complementary residue positions on the receptor interface are predicted via rigorous molecular dynamic sampling of single residue probes. The consolidated spatial preferences establish a unique fingerprint of the single‐residue probe preferences on a hypothetical ligand interface. The rs‐pharmacophore is then used to screen candidate ligands for the given protein receptor where each candidate ligand is ranked according to the degree of match against the predicted rs‐pharmacophore. The candidates that are highly ranked are the predicted binding partner(s) of the given protein (Shrestha et al., 2021). ProtLID was successfully used to identify the cognate partners for a given receptor (Yap & Fiser, 2016), and to redesign various protein interfaces for ligand binding specificity (Shrestha et al., 2019, 2020). By employing ProtLID, we do not introduce any alterations into the protein but instead construct complementary rs‐pharmacophore descriptions for each alternatively defined binding interface. These alternative interfaces are then tested against a library of candidate ligands to see which alternatively defined interfaces retain their ability to recognize their cognate ligands from a set of decoys.

In this work, we study three protein interfaces: PD1:PD‐L1, CTLA4:CD80, and CTLA4:CD86. These interactions are critical in regulating cellular functions in the immune synapse, formed between antigen presenting cells and T cells. PD1 and PD‐L1 form a co‐inhibitory complex that can limit the development of T‐cell response. PD1‐PD‐L1 interaction helps ensure that the immune system is activated only at the appropriate time to minimize autoimmune inflammation (Han et al., 2020). CTLA4, on the other hand, is the first immune checkpoint receptor to be clinically targeted. It is expressed exclusively on T cells where it primarily regulates the amplitude of the early stages of T cell activation through its interactions with CD80 and CD86 ligands (Vandenborre et al., 1999). In addition to their biomedical importance, these interactions were chosen because pharmacophore modeling showed that the corresponding ligand partner can be identified with high confidence. The goal of this work is to monitor the ability of an interface to specifically recognize its cognate binding partner, therefore we had to pick high accuracy predictions to be able to monitor the adverse changes as the interface definition changes. Furthermore, the PD1 and CTLA4 interfaces have a range of possible cognate binding partners with known structures, which gives us a statistical power to observe changes in the binding specificity. PD1 and CTLA4 have 24 and 2 cognate binding partners with known structures, respectively.

Our results show that, on average, one can mis‐define a protein interface by about 20%–30% while still preserving its ability to recognize its cognate partner with statistically significant results. These results suggest that methods that predict protein interfaces should achieve above 70% accuracy to be useful, i.e. significantly higher than the best methods currently available (Dequeker et al., 2022; Walder et al., ²⁰²²). Our results also show that receptors with higher binding affinity (CTLA4:CD80) compared to receptors with lower binding affinity (CTLA4:CD86) are harder to “destroy” by removing true interface residues and adding false interface residues.

2. RESULTS

2.1. Generating and assessing alternative interface definitions of the same size

The goal of this work is to assess the impact of alternative interface definitions for a protein receptor by monitoring how well each alternatively defined interface can recognize its cognate binding partner(s) from an ensemble of candidates. We consider the receptor‐ligand interactions in three protein–protein complexes: PD1:PD‐L1, where the interface on PD1 comprises 14 residues, and CTLA4:CD86/CD80, with a smaller interface on CTLA4 of 9 and 10 residues, respectively (Figure 1). CTLA4 was selected because ProtLID was shown to accurately assign a high ranking to its two known cognate ligands, while PD1 was selected because it has several known cognate ligands, which can provide statistical power for the observations, although with somewhat lower accuracy. First, we generated a reasonably large number of alternative poses of the cognate protein–protein complex in question. From those poses, alternative interface patches were selected with the same interface size as, but with varying degrees of overlap with, the original interface. The number of residues in the interfaces was kept constant to avoid dealing with the confounding impact of varying interface sizes when recognizing the cognate ligand. To sample physically plausible alternative interface patches, we employed the docking software ZDOCK (Pierce et al., 2011) to generate 2000 top scoring docked poses by keeping the receptor PD1 or CTLA4 fixed, and allowing their biological binding partners, PD‐L1, CD80 or CD86 to dock on the surface of the receptor. The interfaces of the receptor in all the cognate poses were determined using the program CSU (Sobolev et al., 1999). From the 2000 conformations that ZDOCK generated (Figure S3), we extracted all complexes with interface patch sizes equal to the original interface and varying degree of overlap with the original interface on the receptor protein (Figure 2). In case of PD1:PD‐L1, out of the 2000 docked complexes 181 poses had the same number of residues as the original interface, for CTLA4:CD80 and CTLA4:CD86, 230 and 255 such complexes were found, respectively. For PD1:PD‐L1, CTLA4:CD80, and CTLA4:CD86, out of the 181, 230, and 255 cases 98, 44, and 64 cases had no overlap with the original interface at all, respectively (Figure 3).

PD1:PD‐L1 (left), CTLA4:CD80 (middle) and CTLA4:CD86 (right) protein complexes. The receptors (PD1, CTLA4 and CTLA4) are shown in yellow while the cognate ligands (PD‐L1, CD80 and CD86) are shown in green ribbon presentation. The interface residues on the receptors and ligands are shown in orange and blue sticks, respectively. Visualization rendered in Pymol (DeLano, 2002).

Schematic representation of different alternative interfaces explored in this work. In the first approach, each alternative interface generated preserves the same number of residues as the starting interface but with varying degrees of overlap with it. In the second approach, residues from the original interface are removed combinatorically. Also shown is a flowchart schematic of the docking step using ZDOCK, filtering alternative poses with same number of residues as the cognate receptor, and introduction of candidate ligand library to screen the rs‐pharmacophore against. The schematic is created with BioRender.com.

Distribution of docked poses comprising the same number of residues as the cognate protein–protein complex for (a) PD1:PD‐L1, (b) CTLA4:CD80, and (c) CTLA4:CD86 interfaces, generated by ZDOCK. The X‐axis shows the number of residues from each alternative pose that overlap with the original interface.

For each alternative interface definition, a new rs‐pharmacophore was calculated, and a ligand search was performed to assess the accuracy of the pharmacophore. In other words, we tested how well each alternative pharmacophore was able to identify its cognate ligand from a library of 103 IgSF structures (Table S1). Figure 4 shows the number of cognate ligands identified in the top 20th percentile for the PD1 receptor when we consider interface patches with gradually reduced degree of overlap with the original interface. The original interface (14 out of 14 residues overlap with the original interface (Figure 4)), was able to capture 13 out of 24 ligands in the top 20% of all candidate ligands screened. None of the alternatively defined interfaces provided by ZDOCK surpassed this performance, which suggests that the original interface definition was the best available one. As we considered alternative interfaces with reduced numbers of residual overlaps with the original interface, a gradual drop in accuracy was observed, with fewer and fewer cognate ligands recognized in the top 20%. It appears that approximately 70% of the original interface needs to be correctly captured to reach a signal that is statistically significantly distinguishable from random ranking (Figure 4 and Table 1). If we consider only the rs‐pharmacophore results that were reliable (based on the skewness of the distribution of ProtLID scores) (Gil et al., 2021), they performed better on average than considering signal from both reliable and unreliable interface patches. This is shown by the higher levels of black data points compared to the blue dotted line. Statistically, if the 24 cognate ligands with known structures were uniformly distributed (i.e., randomly predicted) among the ligand candidates, then one would expect 20% of these to be identified by chance in the top 20% of all ligands. This theoretically expected level is marked by a horizontal orange line at approximately 4.67 in Figure 4 for reference. The random expectation of 4.67 is obtained in a hypothetical scenario when all candidate ligands ranked uniformly among the decoys, and we note that this cutoff number depends on the size of the decoy library and the number of cognate ligands in the library. In this case, the number of cognate ligands out of the total of 24 cognate ligands in the top 20 percentile is expected to be 24 × (20/103) = 4.67. To test empirically this hypothetical expectation, we randomly selected 10 interface patches from ZDOCK poses, which had no overlap with the original interface at all (0 on x‐axis) and tested how the corresponding pharmacophores recognized their cognate ligands. First, all predictions were correctly identified as not reliable (empty black circles), second, the average of these rankings showed a faithful reproduction of recognizing about 4.67 ligands as expected by random chance. Hence, we did not calculate signals from poses that have less than 50% overlap with the original interface. The standard error of the mean of the resulting signal for each cohort of patches shows a narrower uncertainty when the number of constituent patches analyzed is higher, and the range remains reasonably above the random expectation until about 70% of the original interface is captured accurately. In agreement with the observations above, if we relate the residues that the pharmacophore most frequently identifies as part of a binding site, we can see a strong correlation between the overlap of the alternative and cognate interface and the number of residues matched by the pharmacophore (Figure S4).

Number of cognate ligands ranked in the top 20 percentile as we deviate away from the original interface of PD1:PD‐L1. The number of residues overlapping between the alternative and the original interface is shown on the X‐axis. Solid black circles are reliable and hollow circles are unreliable predictions. The blue circles and corresponding blue dotted line shows the average number of ligands recognized for all receptor patches explored, while black dotted line connects average results for reliable prediction only. The orange line marks a theoretically expected random reference assuming uniform ranking of ligands, which is approximately 4.67. The vertical blue lines for each cohort of complexes analyzed show the standard error of the mean.

TABLE 1.

Statistical p‐values calculated to assess the significance of differences in performances between the cohort of random interfaces and each of the cohorts of interfaces with various levels of overlaps with the original interface.

PD1:PD‐L1		CTLA4:CD80		CTLA4:CD86
% overlap with cognate	p‐value	% overlap with cognate	p‐value	% overlap with cognate	p‐value
86%	0.022	90%	0.038	89%	0.026
78%	0.003	80%	0.0003	77%	0.3
71%	0.053	70%	0.008	66%	0.003
64%	0.050	60%	0.074	55%	0.2

Open in a new tab

Note: These p‐values correspond to the cohorts displayed in Figures 4 and 5.

It is interesting to consider protein receptor interfaces with a smaller number of known cognate ligands as well. We consider two CTLA4 interfaces in complex with CD80 and CD86. We collected all the docked poses of CTLA4:CD80 and CTLA4:CD86 generated by ZDOCK that have the same interface size as CTLA4 in complex with its respective cognate binding partners (Figure 2). Because CTLA4 only has two cognate ligands, we did not monitor the fraction of the ligands found in the top 20% of all hits as we did before. Instead, we directly monitored the actual average ranking of the CTLA4 cognate ligands (Figure 5). Once again, cognate ligand rankings were computationally calculated on several interface patches by systematically reducing the degree of overlap with the true cognate interface residues, while keeping the total number of interface residues constant. Like we observed with the PD1:PD‐L1 complexes, the interface reliability tends to disappear if more than 20%–30% of residues from the original interface is lost (Figure 5 and Table 1). Finally, as before, none of the alternatively defined interfaces produced higher accuracy results than the original interface. This may simply be a consequence of the fact that the cases we picked generally performed well in the original testing to rank cognate ligands, which requires that the interfaces had to be correctly defined. We picked better performing cases to start with because we wanted to monitor how the ability of an interface to distinguish its cognate binding partners decreased as the accuracy of its interface definition decreased.

The percentile rank of cognate ligands (Y‐axis) as a function of residual overlap between the alternative interface and the original interface. Solid black circles are reliable and hollow circles are unreliable predictions. (a) The average percentile rank of cognate ligands as we deviate away from the original interface of CTLA4:CD80, and (b) in CTLA4:CD86 interface. Each ProtLID output result is assessed for reliability: solid circles are reliable and hollow circles are unreliable. Blue circles show the average results for a given interface size for all interfaces, while pink circles show the average of only the reliable predictions. The corresponding standard error of the mean around the averages of all points and only reliable points are shown by vertical blue and red lines, respectively. The blue and red dashed lines connect the average signal for each cohort of complexes for all and reliable signals only, respectively. The solid orange line shows the random expectation, with the corresponding standard error of the mean by orange dashed lines.

The large degree of overlap between the two CTLA4 interfaces (8 residues out of 9 and 10, respectively), defined using the CTLA4:CD80 complex or the CTLA4:CD86 complex, enables us to assess the ability of the CTLA4:CD80 interface to recognize CD86, and the CTLA4:CD86 interface to recognize CD80. As shown in Figure S1, the ability of the CTLA4:CD80 interface to recognize CD86 as a cognate ligand is poorer than the average ability of it to recognize its own cognate ligands. In contrast, as shown in Figure S2, the ability of the CTLA4:CD86 interface to recognize CD80 as a cognate ligand is better than the average ability of it to recognize its own cognate ligands. This suggests that CD80 has greater specificity than CD86 for recognizing the CTLA4 interface.

In general, we find that the performance of the CTLA4:CD80 interface is better on average than that of the CTLA4:CD86 interface in recognizing cognate partners. This observation possibly implies that the binding affinity of CTLA4 for CD80 is markedly higher than the affinity for CD86 (Collins et al., 2002; Kennedy et al., ²⁰²²; Lander et al., ²⁰⁰¹). The CTLA4:CD80 interface recognizes its two known cognate partners as ranks 1 and 2 while the CTLA4:CD86 interface recognizes its two known cognate binding partners as ranks 7 and 27.

2.2. Removing true positive residues from the interface

In an alternative approach, we explored what happens when we gradually eliminate residues in a combinatorial way from the original interface, without replacing them with non‐interface residues (thereby reducing the size of the interface). By starting with the cognate pharmacophore of the receptor (obtained from the starting receptor‐ligand complex), we systematically remove unique residues in cohorts. For example, the cognate pharmacophore of CTLA4–CD80 has 10 unique occurrences of receptor residues. First, we remove a single unique occurrence of each residue and calculate the average rank of cognate ligands for each scenario, which in this case would be 10 enumerations. Then, we remove these occurrences in pairs, which have 45 enumerations (10C2), and so on. The average rank for each cohort is calculated with the corresponding standard error of the mean. As we systematically removed residues in cohorts comprising the rs‐pharmacophore, we reassessed the ability of the reduced rs‐pharmacophore to recognize its cognate ligands. Figure 6 shows the ability of PD1 to recognize its cognate ligands. Two assessment metrics are used—the average number of cognate ligands in the top 20 percentile identified by a given pharmacophore and the overall average rank of cognate ligands identified by a given pharmacophore. First, considering only those data points that are predicted to be reliable produces an improvement of the output signal. Second, when approximately 78% of the original interface residues remain in the pharmacophore, the average rank of cognate ligands and the average number of cognate ligands in the top 20 percentile approaches the random limit. Once again, we observe that the best performance in recognizing cognate ligands is observed for the pharmacophore generated from the original interface. Next, we repeated the same exercise with CTLA4 in complex with CD80 and CTLA4 in complex with CD86. Since there are only two cognate ligands for CTLA4 with known crystal structures, we used the metric of the average percentile rank of cognate ligands. One key difference between the performance of two systems (Figure 7) is that the CTLA4:CD80 interface results in a substantially stronger ranking for the cognate ligands compared to the CTLA4:CD86 interface. In addition, it appears that it is somewhat more difficult to “destroy” the cognate ligand recognition ability of the CTLA4:CD80 interface compared to the CTLA4:CD86 interface. Even at a 60% overlap, the CTLA4:CD80 interface recognizes its cognate ligands at a statistically significant level. Like the PD1:PD‐L1 interface, the results show that, on average, the output signal improves by considering only the reliable data points. When approximately 60%–78% of the true interface residues remain in the pharmacophore, the average rank of cognate ligands approaches the random limit. None of the alternative interfaces outperformed the original one, that is, initial receptor interfaces studied in this work faithfully represent a reasonably accurate description of the true biological interface because these show the best performance in recognizing their corresponding cognate ligands.

Assessment of PD1 ability to recognize its cognate ligands as we gradually reduce the number of residues from the original receptor interface. (Left) Average number of cognate ligands in top 20 percentile for all (blue) and reliable (green) data points. (Right) Average rank of cognate ligands for all (blue) and reliable (green) data points with the corresponding standard error of the mean shown by vertical lines. Orange solid line represent the random expectation, with the orange dashed line showing the corresponding standard error of the mean: (left) the average number of cognate ligands in the top 20% is about 4.67 and (right) average rank of all ligands is about 50%.

Assessment of recognizing the cognate ligand (CD80 (left) and CD86 (right)) of CTLA4 receptor interface. Average percentile rank of cognate ligands is shown (Y‐axis) as a function of gradually reduced interfaces (X‐axis). Blue and green points represent all and only reliable data points, respectively with the corresponding standard error of the mean shown by vertical lines. The orange solid line shows the random expectations for average rank of cognate ligands, and orange dashed line shows the standard deviation.

3. DISCUSSION

Protein interfaces can be defined using various approaches, which include using radial cutoffs, monitoring change in solvent accessible surface areas, or employing Voronoi polyhedra calculations. The disagreement among the results of these methods can be traced back to the fact that these approaches aim at identifying interfaces from a physical point of view using alternative criteria. In contrast, in this work we tried to assess the impact of alternative interface definitions from a biological point of view, from the perspective of a protein to maintain its ability to selectively recognize its cognate binding partner. The central question we ask in this work is how imperfect can the interface definition be while retaining the ability to recognize the cognate ligand(s). We assessed this by pursuing alternative interface patches that had decreasing overlap with the original biological interface, and iteratively recalculating the corresponding pharmacophore model and testing its ability to identify the cognate ligands. We applied the approach to three interfaces only, mostly because of the computationally demanding nature of the approach. For instance, in the first strategy (gradually decreasing the overlap of alternative interfaces with the original one) we tested at least 35 alternatives for each interface, about 105 for all three studied. The second strategy (removing true positive residues from the original interface) requires ligand ranking calculations starting with at least an additional 20–25 pharmacophore for each protein. Meanwhile, each pharmacophore calculation requires at least about 2–3 days of processing with our own in‐house resources. Nevertheless, the results from these three interfaces turned out to be conclusive. In this work, none of the alternative interfaces sampled produced higher accuracy results than the original definition obtained from CSU. This does not validate CSU as a uniformly accurate approach, but simply reflects the fact that we picked well performing cases, where the starting interface definition and the corresponding pharmacophore reliably identified the cognate ligand partners from a set of decoys. This means that the original definition of the interface had to be reasonably good to start with. The results show that on average, we can safely lose approximately 20%–30% of the true biological interface and still recognize the cognate ligands of the receptor with reasonable, albeit lower, confidence than the original interface. Additionally, we observe that the skewness of ProtLID scores is informative to identify reliable alternative interface definitions (Gil et al., 2021). These results also provide guidance for interface prediction methods. Current methods fall in two major categories, ab‐initio methods (Northey et al., 2017; Viswanathan et al., ²⁰¹⁹) and template‐based methods (Gil & Fiser, 2019). Ab‐initio methods are more broadly applicable but their accuracy ranges between 30% and 40%, while template‐based methods can reach higher prediction accuracies, but they rely on the availability of known template (Gil & Fiser, 2019; Walder et al., ²⁰²²). Two recent combined approaches reported F‐score accuracies just above 0.5 (Dequeker et al., 2022; Walder et al., ²⁰²²). Meanwhile the results from the current study suggest that prediction methods should really reach F‐score ~ 0.7 accuracy level to produce interface predictions that are biologically useful for practical applications and highlights the need to develop methods that breach this performance gap. This is in agreement with a recent study that reported a high correlation (Pearson correlation coefficient = 0.86) between the accuracy of a predicted interfaces and their ability to detect cognate ligands in cross‐docking exercises (Dequeker et al., 2022). It observed a particularly steep drop in accuracy (an area under the curve measure drop of 12%) when the first 10% of the interface was analytically redefined. While this study approached the question of the role of interface accuracy from a different technical perspective, it also concluded that current interface prediction methods, on average, fall short of providing a biologically useful prediction. Furthermore, experimental methods that rely on the precise determination of the interacting residues for a receptor‐ligand complex would benefit from our approach.

In this work, we provide a novel computational approach to determine the most meaningful functionally important interface definition for a given complex. The approach is computationally intensive, as it relies on iteratively recalculating pharmacophore descriptions of alternatively defined interfaces, which requires running thousands of short MD simulations with different pharmacophore probes, each of which can take between a few minutes up to 2 h. However, these MD calculations are independent and can be performed in a fully parallel computing settings.

4. METHODS

4.1. Protein interfaces

We studied three protein–protein heterodimer complexes in this work, PD1:PD‐L1 (PDB code: 4ZQK), CTLA4:CD80 (PDB code: 1I8L), and CTLA4:CD86 (PDB code: 1I85). A rs‐pharmacophore is calculated with ProtLID treating PD1 (4ZQK, chain B), CTLA4 (1I8L, chain C), and CTLA4 (1I85, chain D) as receptor proteins. Each receptor rs‐pharmacophore is screened against a library of 103 structures of the immunoglobulin superfamily (Table S1). There are 24 available structures of the cognate ligand of receptor PD1 (Protein Data Bank (Ohtaka et al., 2003) codes: 4ZQK.A, 3BIK.A, 3BIS.A, 3FN3.A, 3SBW.C, 4Z18.A, 5C3T.A, 5GGT.A, 5GRJ.A, 5IUS.C, 5J89.A, 5J8O.A, 5JDR.A, 5JDS.A, 5N2D.A, 5N2F.A, 5NIU.A, 5NIX.A, 5O45.A, 5O4Y.B, 5X8L.A, 5X8M.A, 5XJ4.A, 5XXY.A). On the other hand, CTLA4 has two cognate binding partners with a known structure: CD80 (1I8L.A and 1DR9.A) and CD86 (1I85.B and 1NCN.A).

The 24 cognate binding partners of PD1 are different structures of the same protein. Hence, the list of 103 decoys contains 77 unique proteins.

4.2. Docking

ZDOCK (Pierce et al., 2011) was used to perform rigid body docking. In each case the receptor was kept rigid, and the ligand was docked onto the receptor surface to generate 2000 top scoring alternative poses of the receptor–ligand complex.

4.3. Interface definition

CSU (Contacts of Structural Units) (Sobolev et al., 1999) was used to determine the interacting residues based on interatomic contacts distances and complementarity of interacting atomic groups in the complex structures with the criteria that contact distance ≤4 Å, contacting atoms are from legitimate CSU classes, solvent accessible surface area larger than 1 Å², and carbon/sulfur atoms must represent hydrophobic residues as described and implemented in protein ligand interface design (ProtLID) (Yap & Fiser, 2016).

4.4. Protein ligand Interface design

Protein ligand interface design (Yap & Fiser, 2016) is a computational method that generates a rs‐pharmacophore description for a given protein interface. This is achieved by running extensive molecular dynamics simulations of discrete single‐residue probes around a starting protein interface and tracking their preferences with respect to the interface residues. This approach will treat both the receptor protein and the amino acid probe flexible. Residue preferences obtained by clustering residue occurrences and normalizing these by the geometrical artifacts of the surface. ProtLID essentially calculates the optimal complementary interface in form of residue preferences. The resulting residue‐based pharmacophore (rs‐pharmacophore) comprises the residue types and location preferences on the complementary interface. This rs‐pharmacophore is subsequently used to find potential matches among candidate ligands using a pattern matching algorithm (Shrestha et al., 2021).

4.5. Assessing the reliability of ProtLID pharmacophore

Mathematical skewness of ProtLID scores assesses the reliability of the rs‐pharmacophore generated by ProtLID (Yap & Fiser, 2016). Skewness is defined as, skewness = (Pm3/Psd³) where Pm3 is the third moment of a ProtLID score distribution, and Psd is the standard deviation of ProtLID scores. Once a pharmacophore for a given protein interface is generated, we enumerate all possible 5‐mer combinations of the calculated residue preferences to screen a ligand structure database. This results in a certain number of matches for each potential ligand out of all combinatorial possibilities. For instance, for a 15‐residue large rs‐pharmacophore, the number of 5‐mer enumerations is 3003 (15C5). The ProtLID score is the number of 5‐mer hits for a particular ligand. Skewness is calculated over the distribution, or scores obtained for all possible ligands. Only interface patches for which the skewness is above 2.5 are deemed reliable (Gil et al., 2021), while others are unreliable.

4.6. Assessing the significance of differences between ProtLID rank performance

Each cohort of complexes (c ₀) shown in Figures 4 and 5 is assessed for significance against a corresponding random distribution by calculating a t‐score given by t = (c ₀ − μ)/σ, where μ = average of random distribution (cohorts of complexes with no overlapping residues with the cognate interface), σ = standard error of the mean calculated from the random distribution. Using this equation, we obtain a t‐value for each of the cohorts of complexes. We then look up p‐values for each cohort of complexes from a t‐test table as a function of calculated t‐value (for each cohort) and degrees of freedom (number of points within each cohort).

AUTHOR CONTRIBUTIONS

Prithviraj Nandigrami: Investigation; writing – original draft; writing – review and editing; formal analysis. Andras Fiser: Conceptualization; funding acquisition; writing – original draft; writing – review and editing; supervision.

Supporting information

FIGURE S1. The percentile rank of cognate ligands as a function of overlap of the alternative interface definitions with the original interface of CTLA4:CD80. Solid black circles are reliable and hollow circles are unreliable predictions. Starting with the CTLA4:CD80 interface, we calculate the average and standard errors of ranking CD86 ligand (green circles and green vertical lines). The number of residues common between CTLA4:CD80 and CTLA4:CD86 interfaces is 8. Green and blue lines correspond to reliable and all data points, respectively. The orange solid line and dashed lines show the random expectation and corresponding standard error of the mean, respectively.

FIGURE S2. As in Figure S1, but we explored the ranking of CD80 for the alternative CTLA4:CD86 interface definitions.

FIGURE S3. Distribution of the number of residues in the interface of 2000 ZDOCK poses for (left) PD1, (middle) CTLA4 (with CD80 complex), and (right) CTLA4 (with CD86 complex) for the corresponding cognate ligand interface on PD‐L1, CD80, and CD86.

FIGURE S4. The accuracy of predicted binding site by the rs‐pharmacophores for (left) PD1, (middle) CTLA4 (with CD80 complex), and (right) CTLA4 (with CD86 complex) for the corresponding cognate ligand interface on PD‐L1, CD80, and CD86. The x‐axis shows the varying degree of overlap between alternative interfaces and the cognate interface on the receptor, and the y‐axis shows the number of common binding site residues on the ligand between the predicted (pharmacophore mapped) and the cognate ligand interface.

TABLE S1. List of 103 decoy structures used to screen rs‐pharmacophore for the protein receptors studied.

PRO-33-e5026-s001.docx^{(748.1KB, docx)}

ACKNOWLEDGMENTS

This work was supported by National Institutes of Health (NIH) grants GM136357 and AI141816.

Nandigrami P, Fiser A. Assessing the functional impact of protein binding site definition. Protein Science. 2024;33(6):e5026. 10.1002/pro.5026

Review Editor: Nir Ben‐Tal

REFERENCES

Aumentado‐Armstrong TT, Istrate B, Murgita RA. Algorithmic approaches to protein‐protein interaction site prediction. Algorithms Mol Biol. 2015;10:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bahadur RP, Chakrabarti P, Rodier F, Janin J. Dissecting subunit interfaces in homodimeric proteins. Proteins. 2003;53(3):708–719. [DOI] [PubMed] [Google Scholar]
Bahadur RP, Chakrabarti P, Rodier F, Janin J. A dissection of specific and non‐specific protein–protein interfaces. J Mol Biol. 2004;336(4):943–955. [DOI] [PubMed] [Google Scholar]
Bordner AJ, Abagyan R. Statistical analysis and prediction of protein–protein interfaces. Proteins. 2005;60(3):353–366. [DOI] [PubMed] [Google Scholar]
Bradshaw RT, Patel BH, Tate EW, Leatherbarrow RJ, Gould IR. Comparing experimental and computational alanine scanning techniques for probing a prototypical protein–protein interaction. Protein Eng Sel. 2011;24(1–2):197–207. [DOI] [PubMed] [Google Scholar]
Bryant P, Pozzati G, Elofsson A. Improved prediction of protein‐protein interactions using AlphaFold2. Nat Commun. 2022;13(1):1265. [DOI] [PMC free article] [PubMed] [Google Scholar]
Burgoyne NJ, Jackson RM. Predicting protein interaction sites: binding hot‐spots in protein–protein and protein–ligand interfaces. Bioinformatics. 2006;22(11):1335–1342. [DOI] [PubMed] [Google Scholar]
Cavallo L, Kleinjung J, Fraternali F. POPS: a fast algorithm for solvent accessible surface areas at atomic and residue level. Nucleic Acids Res. 2003;31(13):3364–3366. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cazals F, Proust F, Bahadur RP, Janin J. Revisiting the Voronoi description of protein–protein interfaces. Protein Sci. 2006;15(9):2082–2092. [DOI] [PMC free article] [PubMed] [Google Scholar]
Clackson T, Wells JA. A hot spot of binding energy in a hormone‐receptor interface. Science. 1995;267(5196):383–386. [DOI] [PubMed] [Google Scholar]
Collins AV, Brodie DW, Gilbert RJ, Iaboni A, Manso‐Sancho R, Walse B, et al. The interaction properties of costimulatory molecules revisited. Immunity. 2002;17(2):201–210. [DOI] [PubMed] [Google Scholar]
Dall'Acqua W, Goldman ER, Eisenstein E, Mariuzza RA. A mutational analysis of the binding of two different proteins to the same antibody. Biochemistry. 1996;35(30):9667–9676. [DOI] [PubMed] [Google Scholar]
Dall'Acqua W, Goldman ER, Lin W, Teng C, Tsuchiya D, Li H, et al. A mutational analysis of binding interactions in an antigen−antibody protein−protein complex. Biochemistry. 1998;37(22):7981–7991. [DOI] [PubMed] [Google Scholar]
de Vries SJ, Bonvin AM. How proteins get in touch: interface prediction in the study of biomolecular complexes. Curr Protein Peptide Sci. 2008;9(4):394–406. [DOI] [PubMed] [Google Scholar]
DeLano WL. Pymol: an open‐source molecular graphics tool. CCP4 Newsl Protein Crystallogr. 2002;40(1):82–92. [Google Scholar]
Dequeker C, Mohseni Behbahani Y, David L, Laine E, Carbone A. From complete cross‐docking to partners identification and binding sites predictions. PLoS Comput Biol. 2022;18(1):e1009825. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dougan DA, Malby RL, Gruen LC, Kortt AA, Hudson PJ. Effects of substitutions in the binding surface of an antibody on antigen affinity. Protein Eng. 1998;11(1):65–74. [DOI] [PubMed] [Google Scholar]
Esmaielbeiki R, Krawczyk K, Knapp B, Nebel JC, Deane CM. Progress and challenges in predicting protein interfaces. Brief Bioinform. 2015;17:117–131. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ezkurdia I, Bartoli L, Fariselli P, Casadio R, Valencia A, Tress ML. Progress and challenges in predicting protein‐protein interaction sites. Brief Bioinform. 2009;10(3):233–246. [DOI] [PubMed] [Google Scholar]
Fernandez‐Fuentes N, Dybas JM, Fiser A. Structural characteristics of novel protein folds. PLoS Comput Biol. 2010;6(4):e1000750. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fischer TB, Holmes JB, Miller IR, Parsons JR, Tung L, Hu JC, et al. Assessing methods for identifying pair‐wise atomic contacts across binding interfaces. J Struct Biol. 2006;153(2):103–112. [DOI] [PubMed] [Google Scholar]
Gao M, Skolnick J. Structural space of protein‐protein interfaces is degenerate, close to complete, and highly connected. Proc Natl Acad Sci U S A. 2010;107(52):22517–22522. [DOI] [PMC free article] [PubMed] [Google Scholar]
Garcia‐Seisdedos H, Empereur‐Mot C, Elad N, Levy ED. Proteins evolve on the edge of supramolecular self‐assembly. Nature. 2017;548(7666):244–247. [DOI] [PubMed] [Google Scholar]
Gil N, Fiser A. The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis. Bioinformatics. 2019;35(1):12–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gil N, Shrestha R, Fiser A. Estimating the accuracy of pharmacophore‐based detection of cognate receptor‐ligand pairs in the immunoglobulin superfamily. Proteins. 2021;89:632–638. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goodford PJ. A computational‐procedure for determining energetically favorable binding‐sites on biologically important macromolecules. J Med Chem. 1985;28(7):849–857. [DOI] [PubMed] [Google Scholar]
Grudman S, Fajardo JE, Fiser A. INTERCAAT: identifying interface residues between macromolecules. Bioinformatics. 2022;38(2):554–555. [DOI] [PMC free article] [PubMed] [Google Scholar]
Grueninger D, Treiber N, Ziegler MO, Koetter JW, Schulze MS, Schulz GE. Designed protein‐protein association. Science. 2008;319(5860):206–209. [DOI] [PubMed] [Google Scholar]
Hamp T, Rost B. Alternative protein‐protein interfaces are frequent exceptions. 2012. [DOI] [PMC free article] [PubMed]
Han Y, Liu D, Li L. PD‐1/PD‐L1 pathway: current researches in cancer. Am J Cancer Res. 2020;10(3):727–742. [PMC free article] [PubMed] [Google Scholar]
Ito W, Iba Y, Kurosawa Y. Effects of substitutions of closely related amino acids at the contact surface in an antigen‐antibody complex on thermodynamic parameters. J Biol Chem. 1993;268(22):16639–16647. [PubMed] [Google Scholar]
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Applying and improving AlphaFold at CASP14. Proteins. 2021;89(12):1711–1721. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kennedy A, Waters E, Rowshanravan B, Hinze C, Williams C, Janman D, et al. Differences in CD80 and CD86 transendocytosis reveal CD86 as a key target for CTLA‐4 immune regulation. Nat Immunol. 2022;23(9):1365–1378. [DOI] [PMC free article] [PubMed] [Google Scholar]
Keskin O, Tsai CJ, Wolfson H, Nussinov R. A new, structurally nonredundant, diverse data set of protein–protein interfaces and its implications. Protein Sci. 2004;13(4):1043–1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kier LB. Molecular orbital calculation of preferred conformations of acetylcholine, muscarine, and muscarone. Mol Pharmacol. 1967;3(5):487–494. [PubMed] [Google Scholar]
Kortemme T, Joachimiak LA, Bullock AN, Schuler AD, Stoddard BL, Baker D. Computational redesign of protein‐protein interaction specificity. Nat Struct Mol Biol. 2004;11(4):371–379. [DOI] [PubMed] [Google Scholar]
Laine E, Carbone A. Local geometry and evolutionary conservation of protein surfaces reveal the multiple recognition patches in protein‐protein interactions. PLoS Comput Biol. 2015;11(12):e1004580. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921. [DOI] [PubMed] [Google Scholar]
Larsen CP, Pearson TC, Adams AB, Tso P, Shirasugi N, Strobert E, et al. Rational development of LEA29Y (belatacept), a high‐affinity variant of CTLA4‐Ig with potent immunosuppressive properties. Am J Transplant. 2005;5(3):443–453. [DOI] [PubMed] [Google Scholar]
Laurini E, Marson D, Aulic S, Fermeglia M, Pricl S. Computational alanine scanning and structural analysis of the SARS‐CoV‐2 spike protein/angiotensin‐converting enzyme 2 complex. ACS Nano. 2020;14(9):11821–11830. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee B, Richards FM. The interpretation of protein structures: estimation of static accessibility. J Mol Biol. 1971;55(3):379‐IN4. [DOI] [PubMed] [Google Scholar]
Lensink MF, Brysbaert G, Raouraoua N, Bates PA, Giulini M, Honorato RV, et al. Impact of AlphaFold on structure prediction of protein complexes: the CASP15‐CAPRI experiment. Proteins. 2023;91(12):1658–1683. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li M, Petukh M, Alexov E, Panchenko AR. Predicting the impact of missense mutations on protein‐protein binding affinity. J Chem Theory Comput. 2014;10(4):1770–1780. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mabonga L, Kappo AP. Protein‐protein interaction modulators: advances, successes and remaining challenges. Biophys Rev. 2019;11(4):559–581. [DOI] [PMC free article] [PubMed] [Google Scholar]
McConkey BJ, Sobolev V, Edelman M. Quantification of protein surfaces, volumes and atom–atom contacts using a constrained Voronoi procedure. Bioinformatics. 2002;18(10):1365–1373. [DOI] [PubMed] [Google Scholar]
Moreira IS, Fernandes PA, Ramos MJ. Computational alanine scanning mutagenesis—an improved methodological approach. J Comput Chem. 2007;28(3):644–654. [DOI] [PubMed] [Google Scholar]
Mukherjee I, Chakrabarti S. Co‐evolutionary landscape at the interface and non‐interface regions of protein‐protein interaction complexes. Comput Struct Biotechnol J. 2021;19:3779–3795. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nadalin F, Carbone A. Protein–protein interaction specificity is captured by contact preferences and interface composition. Bioinformatics. 2018;34(3):459–468. [DOI] [PMC free article] [PubMed] [Google Scholar]
Northey T, Baresic A, Martin ACR. IntPred: a structure‐based predictor of protein‐protein interaction sites. Bioinformatics. 2017;34:223–229. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ofran Y, Rost B. Analysing six types of protein‐protein interfaces. J Mol Biol. 2003;325(2):377–387. [DOI] [PubMed] [Google Scholar]
Ohtaka H, Schön A, Freire E. Multidrug resistance to HIV‐1 protease inhibition requires cooperative coupling between distal mutations. Biochemistry. 2003;42(46):13659–13666. [DOI] [PubMed] [Google Scholar]
Pierce BG, Hourai Y, Weng ZP. Accelerating protein docking in ZDOCK using an advanced 3D convolution library. PLoS One. 2011;6(9):e24657. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pillai AS, Hochberg GKA, Thornton JW. Simple mechanisms for the evolution of protein complexity. Protein Sci. 2022;31(11):e4449. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ramadoss V, Dehez F, Chipot C. AlaScan: a graphical user interface for alanine scanning free‐energy calculations. Washington: ACS Publications; 2016:1122–6. [DOI] [PubMed] [Google Scholar]
Raucci R, Laine E, Carbone A. Local interaction signal analysis predicts protein‐protein binding affinity. Structure. 2018;26(6):905–915.e4. [DOI] [PubMed] [Google Scholar]
Sacquin‐Mora S, Carbone A, Lavery R. Identification of protein interaction partners and protein‐protein interaction sites. J Mol Biol. 2008;382(5):1276–1289. [DOI] [PubMed] [Google Scholar]
Seo M‐H, Kim PM. The present and the future of motif‐mediated protein–protein interactions. Curr Opin Struct Biol. 2018;50:162–170. [DOI] [PubMed] [Google Scholar]
Seoane B, Carbone A. The complexity of protein interactions unravelled from structural disorder. PLoS Comput Biol. 2021;17(1):e1008546. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shrestha R, Fajardo JE, Fiser A. Residue‐based pharmacophore approaches to study protein‐protein interactions. Curr Opin Struct Biol. 2021;67:205–211. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shrestha R, Garrett SC, Almo SC, Fiser A. Computational redesign of PD‐1 interface for PD‐L1 ligand selectivity. Structure. 2019;27(5):829–836. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shrestha R, Garrett‐Thomson SC, Liu W, Almo SC, Fiser A. Redesigning HVEM interface for selective binding to LIGHT, BTLA, and CD160. Structure. 2020;28:1197–1205.e2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Skolnick J, Gao M, Zhou H, Singh S. AlphaFold 2: why it works and its implications for understanding the relationships of protein sequence, structure, and function. J Chem Inf Model. 2021;61(10):4827–4831. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sobolev V, Sorokine A, Prilusky J, Abola EE, Edelman M. Automated analysis of interatomic contacts in proteins. Bioinformatics. 1999;15(4):327–332. [DOI] [PubMed] [Google Scholar]
Taylor MG, Kirsch JF, Rajpal A. Kinetic epitope mapping of the chicken lysozyme. HyHEL‐10 Fab complex: delineation of docking trajectories. Protein Sci. 1998;7(9):1857–1867. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tonddast‐Navaei S, Skolnick J. Are protein‐protein interfaces special regions on a protein's surface? J Chem Phys. 2015;143(24):243149. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vandenborre K, Van Gool S, Kasran A, Ceuppens J, Boogaerts M, Vandenberghe P. Interaction of CTLA‐4 (CD152) with CD80 or CD86 inhibits human T‐cell activation. Immunology. 1999;98(3):413–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vaughan CK, Buckle AM, Fersht AR. Structural response to mutation at a protein‐protein interface. J Mol Biol. 1999;286(5):1487. [DOI] [PubMed] [Google Scholar]
Viswanathan R, Fajardo E, Steinberg G, Haller M, Fiser A. Protein‐protein binding supersites. PLoS Comput Biol. 2019;15(1):e1006704. [DOI] [PMC free article] [PubMed] [Google Scholar]
Walder M, Edelstein E, Carroll M, Lazarev S, Fajardo JE, Fiser A, et al. Integrated structure‐based protein interface prediction. BMC Bioinformatics. 2022;23(1):301. [DOI] [PMC free article] [PubMed] [Google Scholar]
Williams AD, Shivaprasad S, Wetzel R. Alanine scanning mutagenesis of Aβ (1‐40) amyloid fibril stability. J Mol Biol. 2006;357(4):1283–1294. [DOI] [PubMed] [Google Scholar]
Wodak SJ, Vajda S, Lensink MF, Kozakov D, Bates PA. Critical assessment of methods for predicting the 3D structure of proteins and protein complexes. Annu Rev Biophys. 2023;52:183–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xue LC, Dobbs D, Bonvin AM, Honavar V. Computational prediction of protein interfaces: a review of data driven methods. FEBS Lett. 2015;589(23):3516–3526. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yap EH, Fiser A. ProtLID, a residue‐based pharmacophore approach to identify cognate protein ligands in the immunoglobulin superfamily. Structure. 2016;24(12):2217–2226. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao Z, Gong X. Protein‐protein interaction interface residue pair prediction based on deep learning architecture. IEEE/ACM Trans Comput Biol Bioinform. 2017;16(5):1753–1759. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

FIGURE S2. As in Figure S1, but we explored the ranking of CD80 for the alternative CTLA4:CD86 interface definitions.

TABLE S1. List of 103 decoy structures used to screen rs‐pharmacophore for the protein receptors studied.

PRO-33-e5026-s001.docx^{(748.1KB, docx)}

[pro5026-bib-0001] Aumentado‐Armstrong TT, Istrate B, Murgita RA. Algorithmic approaches to protein‐protein interaction site prediction. Algorithms Mol Biol. 2015;10:7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0002] Bahadur RP, Chakrabarti P, Rodier F, Janin J. Dissecting subunit interfaces in homodimeric proteins. Proteins. 2003;53(3):708–719. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0003] Bahadur RP, Chakrabarti P, Rodier F, Janin J. A dissection of specific and non‐specific protein–protein interfaces. J Mol Biol. 2004;336(4):943–955. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0004] Bordner AJ, Abagyan R. Statistical analysis and prediction of protein–protein interfaces. Proteins. 2005;60(3):353–366. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0005] Bradshaw RT, Patel BH, Tate EW, Leatherbarrow RJ, Gould IR. Comparing experimental and computational alanine scanning techniques for probing a prototypical protein–protein interaction. Protein Eng Sel. 2011;24(1–2):197–207. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0006] Bryant P, Pozzati G, Elofsson A. Improved prediction of protein‐protein interactions using AlphaFold2. Nat Commun. 2022;13(1):1265. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0007] Burgoyne NJ, Jackson RM. Predicting protein interaction sites: binding hot‐spots in protein–protein and protein–ligand interfaces. Bioinformatics. 2006;22(11):1335–1342. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0008] Cavallo L, Kleinjung J, Fraternali F. POPS: a fast algorithm for solvent accessible surface areas at atomic and residue level. Nucleic Acids Res. 2003;31(13):3364–3366. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0009] Cazals F, Proust F, Bahadur RP, Janin J. Revisiting the Voronoi description of protein–protein interfaces. Protein Sci. 2006;15(9):2082–2092. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0010] Clackson T, Wells JA. A hot spot of binding energy in a hormone‐receptor interface. Science. 1995;267(5196):383–386. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0011] Collins AV, Brodie DW, Gilbert RJ, Iaboni A, Manso‐Sancho R, Walse B, et al. The interaction properties of costimulatory molecules revisited. Immunity. 2002;17(2):201–210. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0012] Dall'Acqua W, Goldman ER, Eisenstein E, Mariuzza RA. A mutational analysis of the binding of two different proteins to the same antibody. Biochemistry. 1996;35(30):9667–9676. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0013] Dall'Acqua W, Goldman ER, Lin W, Teng C, Tsuchiya D, Li H, et al. A mutational analysis of binding interactions in an antigen−antibody protein−protein complex. Biochemistry. 1998;37(22):7981–7991. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0014] de Vries SJ, Bonvin AM. How proteins get in touch: interface prediction in the study of biomolecular complexes. Curr Protein Peptide Sci. 2008;9(4):394–406. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0015] DeLano WL. Pymol: an open‐source molecular graphics tool. CCP4 Newsl Protein Crystallogr. 2002;40(1):82–92. [Google Scholar]

[pro5026-bib-0016] Dequeker C, Mohseni Behbahani Y, David L, Laine E, Carbone A. From complete cross‐docking to partners identification and binding sites predictions. PLoS Comput Biol. 2022;18(1):e1009825. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0017] Dougan DA, Malby RL, Gruen LC, Kortt AA, Hudson PJ. Effects of substitutions in the binding surface of an antibody on antigen affinity. Protein Eng. 1998;11(1):65–74. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0018] Esmaielbeiki R, Krawczyk K, Knapp B, Nebel JC, Deane CM. Progress and challenges in predicting protein interfaces. Brief Bioinform. 2015;17:117–131. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0019] Ezkurdia I, Bartoli L, Fariselli P, Casadio R, Valencia A, Tress ML. Progress and challenges in predicting protein‐protein interaction sites. Brief Bioinform. 2009;10(3):233–246. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0020] Fernandez‐Fuentes N, Dybas JM, Fiser A. Structural characteristics of novel protein folds. PLoS Comput Biol. 2010;6(4):e1000750. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0021] Fischer TB, Holmes JB, Miller IR, Parsons JR, Tung L, Hu JC, et al. Assessing methods for identifying pair‐wise atomic contacts across binding interfaces. J Struct Biol. 2006;153(2):103–112. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0022] Gao M, Skolnick J. Structural space of protein‐protein interfaces is degenerate, close to complete, and highly connected. Proc Natl Acad Sci U S A. 2010;107(52):22517–22522. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0023] Garcia‐Seisdedos H, Empereur‐Mot C, Elad N, Levy ED. Proteins evolve on the edge of supramolecular self‐assembly. Nature. 2017;548(7666):244–247. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0024] Gil N, Fiser A. The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis. Bioinformatics. 2019;35(1):12–19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0025] Gil N, Shrestha R, Fiser A. Estimating the accuracy of pharmacophore‐based detection of cognate receptor‐ligand pairs in the immunoglobulin superfamily. Proteins. 2021;89:632–638. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0026] Goodford PJ. A computational‐procedure for determining energetically favorable binding‐sites on biologically important macromolecules. J Med Chem. 1985;28(7):849–857. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0027] Grudman S, Fajardo JE, Fiser A. INTERCAAT: identifying interface residues between macromolecules. Bioinformatics. 2022;38(2):554–555. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0028] Grueninger D, Treiber N, Ziegler MO, Koetter JW, Schulze MS, Schulz GE. Designed protein‐protein association. Science. 2008;319(5860):206–209. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0029] Hamp T, Rost B. Alternative protein‐protein interfaces are frequent exceptions. 2012. [DOI] [PMC free article] [PubMed]

[pro5026-bib-0030] Han Y, Liu D, Li L. PD‐1/PD‐L1 pathway: current researches in cancer. Am J Cancer Res. 2020;10(3):727–742. [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0031] Ito W, Iba Y, Kurosawa Y. Effects of substitutions of closely related amino acids at the contact surface in an antigen‐antibody complex on thermodynamic parameters. J Biol Chem. 1993;268(22):16639–16647. [PubMed] [Google Scholar]

[pro5026-bib-0032] Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Applying and improving AlphaFold at CASP14. Proteins. 2021;89(12):1711–1721. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0033] Kennedy A, Waters E, Rowshanravan B, Hinze C, Williams C, Janman D, et al. Differences in CD80 and CD86 transendocytosis reveal CD86 as a key target for CTLA‐4 immune regulation. Nat Immunol. 2022;23(9):1365–1378. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0034] Keskin O, Tsai CJ, Wolfson H, Nussinov R. A new, structurally nonredundant, diverse data set of protein–protein interfaces and its implications. Protein Sci. 2004;13(4):1043–1055. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0035] Kier LB. Molecular orbital calculation of preferred conformations of acetylcholine, muscarine, and muscarone. Mol Pharmacol. 1967;3(5):487–494. [PubMed] [Google Scholar]

[pro5026-bib-0036] Kortemme T, Joachimiak LA, Bullock AN, Schuler AD, Stoddard BL, Baker D. Computational redesign of protein‐protein interaction specificity. Nat Struct Mol Biol. 2004;11(4):371–379. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0037] Laine E, Carbone A. Local geometry and evolutionary conservation of protein surfaces reveal the multiple recognition patches in protein‐protein interactions. PLoS Comput Biol. 2015;11(12):e1004580. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0038] Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0039] Larsen CP, Pearson TC, Adams AB, Tso P, Shirasugi N, Strobert E, et al. Rational development of LEA29Y (belatacept), a high‐affinity variant of CTLA4‐Ig with potent immunosuppressive properties. Am J Transplant. 2005;5(3):443–453. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0040] Laurini E, Marson D, Aulic S, Fermeglia M, Pricl S. Computational alanine scanning and structural analysis of the SARS‐CoV‐2 spike protein/angiotensin‐converting enzyme 2 complex. ACS Nano. 2020;14(9):11821–11830. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0041] Lee B, Richards FM. The interpretation of protein structures: estimation of static accessibility. J Mol Biol. 1971;55(3):379‐IN4. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0042] Lensink MF, Brysbaert G, Raouraoua N, Bates PA, Giulini M, Honorato RV, et al. Impact of AlphaFold on structure prediction of protein complexes: the CASP15‐CAPRI experiment. Proteins. 2023;91(12):1658–1683. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0043] Li M, Petukh M, Alexov E, Panchenko AR. Predicting the impact of missense mutations on protein‐protein binding affinity. J Chem Theory Comput. 2014;10(4):1770–1780. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0044] Mabonga L, Kappo AP. Protein‐protein interaction modulators: advances, successes and remaining challenges. Biophys Rev. 2019;11(4):559–581. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0045] McConkey BJ, Sobolev V, Edelman M. Quantification of protein surfaces, volumes and atom–atom contacts using a constrained Voronoi procedure. Bioinformatics. 2002;18(10):1365–1373. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0046] Moreira IS, Fernandes PA, Ramos MJ. Computational alanine scanning mutagenesis—an improved methodological approach. J Comput Chem. 2007;28(3):644–654. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0047] Mukherjee I, Chakrabarti S. Co‐evolutionary landscape at the interface and non‐interface regions of protein‐protein interaction complexes. Comput Struct Biotechnol J. 2021;19:3779–3795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0048] Nadalin F, Carbone A. Protein–protein interaction specificity is captured by contact preferences and interface composition. Bioinformatics. 2018;34(3):459–468. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0049] Northey T, Baresic A, Martin ACR. IntPred: a structure‐based predictor of protein‐protein interaction sites. Bioinformatics. 2017;34:223–229. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0050] Ofran Y, Rost B. Analysing six types of protein‐protein interfaces. J Mol Biol. 2003;325(2):377–387. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0051] Ohtaka H, Schön A, Freire E. Multidrug resistance to HIV‐1 protease inhibition requires cooperative coupling between distal mutations. Biochemistry. 2003;42(46):13659–13666. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0052] Pierce BG, Hourai Y, Weng ZP. Accelerating protein docking in ZDOCK using an advanced 3D convolution library. PLoS One. 2011;6(9):e24657. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0053] Pillai AS, Hochberg GKA, Thornton JW. Simple mechanisms for the evolution of protein complexity. Protein Sci. 2022;31(11):e4449. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0054] Ramadoss V, Dehez F, Chipot C. AlaScan: a graphical user interface for alanine scanning free‐energy calculations. Washington: ACS Publications; 2016:1122–6. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0055] Raucci R, Laine E, Carbone A. Local interaction signal analysis predicts protein‐protein binding affinity. Structure. 2018;26(6):905–915.e4. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0056] Sacquin‐Mora S, Carbone A, Lavery R. Identification of protein interaction partners and protein‐protein interaction sites. J Mol Biol. 2008;382(5):1276–1289. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0057] Seo M‐H, Kim PM. The present and the future of motif‐mediated protein–protein interactions. Curr Opin Struct Biol. 2018;50:162–170. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0058] Seoane B, Carbone A. The complexity of protein interactions unravelled from structural disorder. PLoS Comput Biol. 2021;17(1):e1008546. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0059] Shrestha R, Fajardo JE, Fiser A. Residue‐based pharmacophore approaches to study protein‐protein interactions. Curr Opin Struct Biol. 2021;67:205–211. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0060] Shrestha R, Garrett SC, Almo SC, Fiser A. Computational redesign of PD‐1 interface for PD‐L1 ligand selectivity. Structure. 2019;27(5):829–836. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0061] Shrestha R, Garrett‐Thomson SC, Liu W, Almo SC, Fiser A. Redesigning HVEM interface for selective binding to LIGHT, BTLA, and CD160. Structure. 2020;28:1197–1205.e2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0062] Skolnick J, Gao M, Zhou H, Singh S. AlphaFold 2: why it works and its implications for understanding the relationships of protein sequence, structure, and function. J Chem Inf Model. 2021;61(10):4827–4831. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0063] Sobolev V, Sorokine A, Prilusky J, Abola EE, Edelman M. Automated analysis of interatomic contacts in proteins. Bioinformatics. 1999;15(4):327–332. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0064] Taylor MG, Kirsch JF, Rajpal A. Kinetic epitope mapping of the chicken lysozyme. HyHEL‐10 Fab complex: delineation of docking trajectories. Protein Sci. 1998;7(9):1857–1867. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0065] Tonddast‐Navaei S, Skolnick J. Are protein‐protein interfaces special regions on a protein's surface? J Chem Phys. 2015;143(24):243149. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0066] Vandenborre K, Van Gool S, Kasran A, Ceuppens J, Boogaerts M, Vandenberghe P. Interaction of CTLA‐4 (CD152) with CD80 or CD86 inhibits human T‐cell activation. Immunology. 1999;98(3):413–421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0067] Vaughan CK, Buckle AM, Fersht AR. Structural response to mutation at a protein‐protein interface. J Mol Biol. 1999;286(5):1487. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0068] Viswanathan R, Fajardo E, Steinberg G, Haller M, Fiser A. Protein‐protein binding supersites. PLoS Comput Biol. 2019;15(1):e1006704. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0069] Walder M, Edelstein E, Carroll M, Lazarev S, Fajardo JE, Fiser A, et al. Integrated structure‐based protein interface prediction. BMC Bioinformatics. 2022;23(1):301. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0070] Williams AD, Shivaprasad S, Wetzel R. Alanine scanning mutagenesis of Aβ (1‐40) amyloid fibril stability. J Mol Biol. 2006;357(4):1283–1294. [DOI] [PubMed] [Google Scholar]

[pro5026-bib-0071] Wodak SJ, Vajda S, Lensink MF, Kozakov D, Bates PA. Critical assessment of methods for predicting the 3D structure of proteins and protein complexes. Annu Rev Biophys. 2023;52:183–206. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0072] Xue LC, Dobbs D, Bonvin AM, Honavar V. Computational prediction of protein interfaces: a review of data driven methods. FEBS Lett. 2015;589(23):3516–3526. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0073] Yap EH, Fiser A. ProtLID, a residue‐based pharmacophore approach to identify cognate protein ligands in the immunoglobulin superfamily. Structure. 2016;24(12):2217–2226. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro5026-bib-0074] Zhao Z, Gong X. Protein‐protein interaction interface residue pair prediction based on deep learning architecture. IEEE/ACM Trans Comput Biol Bioinform. 2017;16(5):1753–1759. [DOI] [PubMed] [Google Scholar]

PERMALINK

Assessing the functional impact of protein binding site definition

Prithviraj Nandigrami

Andras Fiser

Abstract

1. INTRODUCTION

2. RESULTS