Skip to main content
eLife logoLink to eLife
. 2023 Feb 17;12:e81850. doi: 10.7554/eLife.81850

Coevolution-based prediction of key allosteric residues for protein function regulation

Juan Xie 1, Weilin Zhang 2, Xiaolei Zhu 3, Minghua Deng 1,4,5, Luhua Lai 1,2,6,
Editors: Shozeb Haider7, Volker Dötsch8
PMCID: PMC9981151  PMID: 36799896

Abstract

Allostery is fundamental to many biological processes. Due to the distant regulation nature, how allosteric mutations, modifications, and effector binding impact protein function is difficult to forecast. In protein engineering, remote mutations cannot be rationally designed without large-scale experimental screening. Allosteric drugs have raised much attention due to their high specificity and possibility of overcoming existing drug-resistant mutations. However, optimization of allosteric compounds remains challenging. Here, we developed a novel computational method KeyAlloSite to predict allosteric site and to identify key allosteric residues (allo-residues) based on the evolutionary coupling model. We found that protein allosteric sites are strongly coupled to orthosteric site compared to non-functional sites. We further inferred key allo-residues by pairwise comparing the difference of evolutionary coupling scores of each residue in the allosteric pocket with the functional site. Our predicted key allo-residues are in accordance with previous experimental studies for typical allosteric proteins like BCR-ABL1, Tar, and PDZ3, as well as key cancer mutations. We also showed that KeyAlloSite can be used to predict key allosteric residues distant from the catalytic site that are important for enzyme catalysis. Our study demonstrates that weak coevolutionary couplings contain important information of protein allosteric regulation function. KeyAlloSite can be applied in studying the evolution of protein allosteric regulation, designing and optimizing allosteric drugs, and performing functional protein design and enzyme engineering.

Research organism: None

Introduction

Allostery commonly refers to one type of distant regulation, that is, a perturbation at one site of a macromolecule can affect the function of another site (Dokholyan, 2016), which plays important roles in many biological processes, such as enzyme catalysis (Tsai et al., 2009) and signal transduction (Hilser et al., 2012). Compared to traditional orthosteric drugs, allosteric drugs have unique advantages, including higher specificity, fewer side effects, etc. (Changeux and Christopoulos, 2016; Thal et al., 2018; Wenthur et al., 2014). However, optimization of allosteric molecules faces great challenges as allosteric molecules usually have flat structure-activity relationships (flat SARs or shallow SARs, referring to the phenomenon that no robust SARs could be obtained as small modifications may destroy the activity), and higher binding affinity does not always correspond to better activity (Christopoulos, 2002; Jimenez et al., 2012; Lewis et al., 2008; Lindsley, 2014; Nussinov and Tsai, 2012). In an allosteric pocket, the contribution of each residue to the allosteric effect is different. Bi et al. found that the interactions between allosteric molecules and the target protein can be divided into two types, interactions that only contribute to binding and interactions that contribute to both binding and signaling. Based on these understandings, they rationally designed Tar variants and engineered Escherichia coli to sense new ligands by maintaining the interactions responsible for chemotaxis allosteric signaling while changing the interactions responsible only for ligand binding (Bi et al., 2013). Nussinov et al. proposed that atoms in allosteric effectors could be divided into anchor and driver atoms. The anchors docked into the allosteric pockets, which allowed the drivers to perform a ‘pull’ or ‘push’ action (Nussinov and Tsai, 2014). Both drivers and anchors showed specific interactions with their host proteins, with the former mainly responsible for the allosteric efficacy and the latter for binding affinity (Nussinov et al., 2014). Therefore, it is important to identify residues in the host proteins that form interactions with allosteric molecules and produce allosteric signaling, which we refer to as key allo-residues, so that allosteric molecules can be optimized and designed based on these key allo-residues. Unfortunately, identifying key allo-residues remains challenging. Currently available computational methods mainly focused on the prediction of allosteric sites (Ma et al., 2016; Qi et al., 2012; Wagner et al., 2016; Xie et al., 2022), allosteric pathways (Botello-Smith and Luo, 2019; Lake et al., 2020), and key residues in allosteric pathways (Wang et al., 2020). Kalescky et al. developed the rigid residue scan method to identify key residues for protein allostery, in which multiple molecular dynamics (MD) simulations need to be performed for unbound and bound proteins. As only one residue was regarded as a single rigid body in each simulation, many simulations were necessary, which are computationally expensive and time-consuming (Kalescky et al., 2015). Therefore, methods for systematically and rapidly identifying key allo-residues in protein allosteric pockets need to be developed.

During evolution, unrelated residues may evolve independently, while functionally coupled residues coevolve. Coevolution means that when a residue changes, the residues that are structurally or functionally coupled with it will also change accordingly to maintain the overall spatial structure and biological function (de Juan et al., 2013). In principle, since homologous sequences record the long-term evolution of a protein family, the coupling pattern between residues can be estimated from multiple sequence alignment (MSA; Reynolds et al., 2011).

Various methods have been developed to analyze residue-residue coupling during evolution, which greatly expedite the recent progress of protein structure prediction (Ekeberg et al., 2013; Marks et al., 2011; Morcos et al., 2011). Direct coupling analysis (DCA) is one of such approaches that can remove the indirect correlation between residues and reflect the direct coevolution between residues (Cocco et al., 2013). DCA mainly uses methods in statistical physics to infer the pairwise coupling Jij between positions, which can explain the observed correlation between residues in an MSA. In structure predictions, only the top couplings in Jij were used (Morcos et al., 2011; Weigt et al., 2009). Recent studies showed that the weak, non-contact couplings in Jij are significantly important for the prediction of protein function, although they play as noise in predicting structural contacts. Salinas et al. proposed that the information of allosteric energy interactions is included in the statistics of MSAs and therefore is part of the entire evolutionary constraints (Salinas and Ranganathan, 2018). Russ et al. found that the top coupling items in Jij alone cannot effectively reproduce the alignment statistics in the AroQ family or the functional effects of mutations. This implies that protein functions may depend on many weak, non-contact items in Jij . Although there is no simple physical explanation for these weak items at present, they seem to represent the collective global evolution of residues, and further research is needed to reveal the significance of these items (Russ et al., 2020). Similar findings have been reported in the DCA-based prediction of protein-protein interaction (PPI), where the quality of prediction depends on many weak couplings (Bitbol, 2018).

In the present study, we analyzed the evolutionary couplings (ECs) between residues in orthosteric and allosteric sites. We found that weak couplings in Jij contain allosteric information and developed the first systematic and efficient computational method KeyAlloSite to predict key allo-residues based on the EC model (ECM; see Materials and methods for details). For each protein in the allosteric protein data set, we first performed MSA (Figure 1A) and calculated the ECs between residues using the ECM (Figure 1B). We then studied the coevolution between orthosteric and allosteric sites and found that their EC strength (ECS) is stronger than that between orthosteric and other non-functional pockets. We further performed pairwise comparison of the differences in EC scores of residues in the allosteric pocket with orthosteric pocket (Figure 1C–E; see Materials and methods for details) and got the number of significant differences corresponding to each residue in the allosteric pocket (Figure 1F). After the numbers of significant differences were normalized into Z-scores, residues corresponding to Z-scores larger than a threshold were predicted as key allo-residues. We have applied KeyAlloSite to identify key allo-residues in several allosteric proteins, including BCR-ABL1, Tar, and PDZ3, and compared the prediction results with previously reported experimental data. KeyAlloSite was also used to predict cancer mutations as well as key distant residues for enzymatic catalysis. Our study provides essential information for understanding how allosteric regulations are evolved, for designing and optimizing allosteric drugs, and for designing highly efficient enzymes and other functional proteins.

Figure 1. Steps to identify key allo-residues.

Figure 1.

(A) Multiple sequence alignment. (B) Evolutionary coupling (EC) analysis. (C–D) Calculation of the EC values between residues in allosteric and orthosteric pockets. (E) Pairwise compared the difference of EC values corresponding to residues in allosteric pocket. (F) The number of significant differences corresponding to each residue in allosteric pocket.

Results

The evolutionary coupling between orthosteric and allosteric sites is stronger

We selected 23 allosteric proteins from the ‘Core Set’ of ASBench (Huang et al., 2015) as our data set, including 25 known allosteric sites (Supplementary file 1; see Materials and methods for details). The sequence lengths of the proteins in the data set range from 166 to 788 amino acid residues (Figure 2A), and the number of homologous sequences and effective homologous sequences corresponding to each protein were shown in Figure 2B. For each protein in the data set, we used CAVITY (Xu et al., 2018; Yuan et al., 2013) to identify all the potential ligand binding pockets on the protein surface and designated the mth pocket as cavity_m. Previous studies showed that motions of orthosteric and allosteric sites are highly correlated (Ma et al., 2016; Xie et al., 2022; Zhang et al., 2019), and the regulation between orthosteric and allosteric site is bidirectional (An et al., 2019). It would be interesting to see whether orthosteric and allosteric sites are coupled in evolution. We then explored the ECS (ECScavity_m) between the orthosteric and allosteric pocket, as well as all the other pockets. The ECS between the orthosteric pocket and the mth pocket is defined as the sum of the coupling strength between the residues in the two pockets (see Materials and methods for details). Among the 25 allosteric pockets in the data set, 23 have Z-scores greater than 0.5 (Figure 2C, Supplementary file 2), which means that the recall of KeyAlloSite on predicting allosteric sites is 0.92 (Supplementary file 8). The probabilities that the known allosteric pockets were ranked in the top 1, top 2, and top 3 of Z-scores were 56.0, 76.0, and 96.0%, respectively (Figure 2D, Supplementary file 2), indicating that orthosteric and allosteric pockets are more evolutionarily coupled to each other than the orthosteric and other pockets, which can be used to predict potential allosteric pockets. We further analyzed the two proteins with Z-scores less than 0.5, AR1 and CYP3A4. For AR1, as there were only 108 effective homologous sequences, we speculated that the number of homologous sequences may not be enough for evolutionary analysis. The protein sequence–based phylogenetic tree of AR1 homologous proteins showed that AR1 located near the tail of the tree (Figure 2—figure supplement 1), implying that this allosteric function did not exist in the early evolutionary period, and it may have appeared in the later stage of evolution. Due to the relatively large number of sequences in the early evolutionary period and relatively few sequences in the later stage of evolution, the allosteric signal was weak. As a cytochrome P450 protein, CYP3A4 can bind and catalyze the transformation of a variety of substrates (Williams et al., 2004). Sequence alignments showed that several positions in its orthosteric pocket are less conserved, which may lead to the difficulty in allosteric site prediction based on evolutional coupling analysis.

Figure 2. Z-scores of allosteric pockets and probabilities of ranking an allosteric pocket in the top 3.

(A) The sequence lengths of all proteins in our data set. (B) The number of homologous sequences. Neff represents the number of effective homologous sequences obtained under 80% reweighting. (C) Z-scores of allosteric pockets on proteins in the data set. Among the 25 allosteric pockets, the Z-scores of 23 allosteric pockets were greater than 0.5. (D) The probabilities that the known allosteric pockets were ranked in the top 1, top 2, and top 3.

Figure 2—source data 1. Raw data for Figure 2.

Figure 2.

Figure 2—figure supplement 1. Phylogenetic tree of the androgen receptor.

Figure 2—figure supplement 1.

Figure 2—figure supplement 2. Comparison of evolutionary coupling strength between pockets when all residue pairs and partial residue pairs were used.

Figure 2—figure supplement 2.

(A) Prediction accuracy of using different numbers of residue pairs. We defined that the criterion for successful prediction is that the Z-score of the allosteric pocket is greater than 0.5. (B) The probabilities that the known allosteric pockets were ranked in the top 3 when using different numbers of residue pairs. (C) The distribution of Z-scores for the allosteric pockets when different numbers of residue pairs were used.
Figure 2—figure supplement 2—source data 1. Raw data for Figure 2—figure supplement 2.
Figure 2—figure supplement 3. Difference between the evolutionary coupling between orthosteric and allosteric sites and the evolutionary coupling between two random patches.

Figure 2—figure supplement 3.

Two residues that are not part of the orthosteric and allosteric sites were randomly selected from the surface residues of proteins. Among them, one was taken as the first center, and the residues around it with the same number as the residues in orthosteric pocket were selected as patch1; and the other residue was taken as the second center, and the residues around it with the same number as the residues in allosteric pocket were selected as patch2. Then we calculated the evolutionary coupling strength between patch1 and patch2. The process was repeated four times, and the mean and standard deviation of the evolutionary coupling strength were calculated. Finally, we compared the evolutionary coupling strength between patch1 and patch2 with that between orthosteric and allosteric sites by using the student’s t-test. If the p value was less than 0.05, we marked it with an asterisk; if the p value was less than 0.01, we marked it with two asterisks.
Figure 2—figure supplement 3—source data 1. Raw data for Figure 2—figure supplement 3.

We used human Aurora A (AurA) kinase that is not included in the data set as a test case to further verify whether the ECS can be used to predict allosteric sites. AurA (PDB ID: 1OL5) is a Ser-Thr protein kinase that is essential for the cell cycle progression. Its abnormal levels can lead to inappropriate centrosome maturation, spindle formation, and enhanced cancer growth (Toji et al., 2004). AurA is known to be regulated by two distinct allosteric mechanisms, one is specific PPI, which binds TPX2 to its hydrophobic pocket, and the other is phosphorylation of the activation loop at T288 (pT288; Hadzipasic et al., 2020). We used CAVITY to find all of the potential ligand binding pockets on the surface of AurA, and a total of 12 pockets were found; cavity_3 is the known allosteric PPI pocket, and cavity_2 is the orthosteric pocket. For consistency, we chose residues within 6 Å around the ATP molecule as the orthosteric pocket. Then we calculated the ECS between the orthosteric pocket and each of the remaining 11 pockets. Cavity_3 ranked the second among the 11 pockets with a Z-score of 1.48 (Supplementary file 3), indicating that the ECS between the orthosteric and allosteric pockets is indeed stronger. Since phosphorylation mainly occurs on Serine/Threonine/Tyrosine (Ser/Thr/Tyr) residues, we then calculated the ECS between the orthosteric pocket and each of the 26 exposed Ser/Thr/Tyr residues and normalized to Z-scores. Among the 26 Ser/Thr/Tyr residues, 10 residues have Z-scores larger than 0.5, and T288 and T287 ranked the fifth and the fourth with a Z-score of 0.83 and 1.05, respectively (Supplementary file 3). This indicates that KeyAlloSite can also be used to predict post-translational modification (PTM) sites, and the predicted Ser/Thr/Tyr residues with Z-scores greater than 0.5 in addition to T288 and T287 are worth further investigation.

To exclude the possible influence of the pocket size, we further checked the dependence of ECScavity_m on the number of residues used in the calculation. For allosteric and other pockets in each protein, we summed the ECS of the top 200, top 300, and top 400 residue-residue pairs with the highest FNi,j corresponding to each pocket as the ECS between each pocket (except for orthosteric pocket) and orthosteric pocket. When different numbers of residue-residue pairs were used, the Z-scores corresponding to the ECS between allosteric and orthosteric pockets in most proteins were still greater than 0.5, which is weakly different from that when all residue-residue pairs were used (Figure 2—figure supplement 2). This indicates that the number of residues in the mth pocket does not play important role on the ECS between pockets. In other words, ECScavity_m revealed the intrinsic EC between allosteric and orthosteric pockets. These results indicate that the ECS between orthosteric and allosteric pockets is stronger than that between orthosteric and other pockets, which can be used to predict potential allosteric pockets.

We further checked whether the coevolutionary signal between allosteric and orthosteric pockets is significantly different from that between two random patches in proteins with the same number of residue pairs. For each protein in the data set, two residues that are not part of the orthosteric and allosteric sites were randomly selected from the surface residues. Among them, one was taken as the first center, and the residues around it with the same number as the residues in orthosteric pocket were selected as patch1; and the other residue was taken as the second center, and the residues around it with the same number as the residues in allosteric pocket were selected as patch2. Then we calculated the ECS between patch1 and patch2. The process was repeated four times, and then the mean and standard deviation of the ECS were calculated. We then compared the ECS between patch1 and patch2 with that between orthosteric and allosteric sites. The results showed that the ECS between orthosteric and allosteric sites was significantly higher than that between two random patches (Figure 2—figure supplement 3). In other words, there is intrinsic EC between orthosteric and allosteric sites, which is different from the EC between any two random patches.

Coevolution analysis revealed key allo-residues in allosteric pockets

For each protein, we calculated the EC values between the residues in the orthosteric and allosteric pockets by ECM and compared the corresponding pairwise EC values of the residues in the allosteric pocket (Figure 1C–E, Figure 3—figure supplement 2). Since orthosteric and allosteric pockets are two different pockets, the residues in the two pockets generally do not contact. Therefore, most of the EC values between residues in the two pockets are relatively small, that is, they correspond to the weak terms in Jij (Figure 1D). We then calculated the number of significant differences of each residue in the allosteric pocket and normalized to Z-scores. Finally, residues were predicted as key allo-residues if their corresponding Z-scores were greater than 0.8 (Supplementary file 4). We chose this threshold to ensure that the most known key allo-residues can be correctly predicted by KeyAlloSite, and at the same time, the number of predicted key allo-residues should be as small as possible. We calculated the number of known key allo-residues that could be predicted by KeyAlloSite (Supplementary file 5) and the ratios of the predicted key allo-residues in all residues of allosteric pockets in all proteins of the data set (Figure 3—figure supplement 1) for thresholds of 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0. Therefore, we finally chose the threshold as 0.8. For the allosteric pockets, the average number of pocket residues is 43, and the average number of identified key allo-residues is 8, accounting for 18.6% of all the residues in the allosteric site (Figure 3).

Figure 3. The number of predicted key allo-residues.

Number of residues refers to the number of residues from allosteric pockets, including the number of all residues in allosteric pockets and predicted key allo-residues.

Figure 3—source data 1. Raw data for Figure 3.

Figure 3.

Figure 3—figure supplement 1. Distribution of the ratios of the number of key allo-residues predicted by KeyAlloSite in the number of all residues in allosteric pockets when using different cutoffs in all proteins.

Figure 3—figure supplement 1.

Figure 3—figure supplement 1—source data 1. Raw data for Figure 3—figure supplement 1.
Figure 3—figure supplement 2. Examples of distributions of the statistics corresponding to significant scores obtained from the t-test.

Figure 3—figure supplement 2.

These three distributions are the distributions of the statistics in BCR-ABL1, Tar, and PDZ3.
Figure 3—figure supplement 3. Random sampling of homologous sequences.

Figure 3—figure supplement 3.

For each of the seven proteins, we randomly sampled different numbers of homologous sequences such as 1 L, 2 L, and so on. The ratio refers to the proportion of identical key allo-residues identified by all and different numbers of homologous sequences. The process was repeated three times, and the mean and standard deviation of the ratios were calculated. L: length of protein.
Figure 3—figure supplement 3—source data 1. Raw data for Figure 3—figure supplement 3.

Since the number of homologous sequences is important in coevolution analysis, we selected seven proteins with a relatively large number of homologous sequences and randomly sampled different numbers of homologous sequences. Since the number of homologous sequences required might be related to the sequence length of the protein itself, the number of homologous sequences was divided by the length of the protein to obtain a ratio. Within this ratio, different ratios of homologous sequences were randomly sampled according to different gradients, and each gradient was repeated three times. Then we calculated how many key allo-residues determined by homologous sequences with different gradients were the same as those determined by all homologous sequences. Taking the key allo-residues determined by all homologous sequences as references, we calculated the proportion of the same residues (Figure 3—figure supplement 3). It can be seen that generally speaking, 7 L (±4L) number of effective homologous sequences is enough to give good and stable results. If the protein sequence was relatively short, the number of homologous sequences required could be less; otherwise, more homologous sequences were needed.

The predicted key allo-residues were supported by experimental results

We searched for literatures to see whether the key allo-residues we predicted were experimentally tested before. The first example is tyrosine-protein kinase ABL1 (BCR-ABL1), which is a fusion protein whose constitutive activity can cause chronic myeloid leukemia (CML). Tyrosine kinase inhibitors targeting the ABL1 ATP-binding site, such as imatinib (Gleevec) and nilotinib (Tasigna), significantly improved the overall survival of CML patients (Kalmanti et al., 2015; Miura, 2015). However, patients may develop drug resistance due to mutations in the ATP-site. The novel fourth generation ABL1 drug, asciminib (ABL001) was developed, which is an allosteric inhibitor that binds to the myristoyl pocket of BCR-ABL1 (Figure 4A). Asciminib was developed from fragment-based drug discovery approach. In the early stage of hit identification, compounds that bind BCR-ABL1 without inhibition activity were found. Among them, hit 4 binds BCR-ABL1 with a Kd of 6 μM. After changing the Cl atom at the para-position of the aniline to the CF3O- group, hit 5 showed inhibition activity with a slightly weakened Kd of 10 μM compared to hit 4 (Figure 4C; Schoepfer et al., 2018). This shows that the interaction between CF3O- and BCR-ABL1 is essential for the allosteric signaling and inhibitory activity. This group forms favorable hydrophobic interaction with L359, one of the key allo-residues predicted by our method (Figure 4B). In contrast, there is no favorable interaction between hit 4 and L359. These experimental evidences support that the predicted allo-residue L359 plays key role in allosteric signaling.

Figure 4. Key allo-residues predicted in BCR-ABL1.

Figure 4.

(A) The crystal structure of the kinase domain of BCR-ABL1. The allosteric inhibitor asciminib, represented by sticks, binds to the myristoyl pocket (marine). (B) Predicted key allo-residues in the myristoyl pocket. The predicted key allo-residues are represented by marine sticks. One of the predicted key allo-residues, L359, forms a favorable hydrophobic interaction with a fluorine atom in asciminib, represented by a red dashed line. Water is represented by a red sphere. (C) The structure of fragment-derived hit 4 and hit 5 and the final marketed drug asciminib.

Figure 4—source data 1. Raw data for Figure 4.

In the allosteric pocket of the asciminib binding site which contains 44 residues, we predicted 7 key allo-residues. In addition to L359, R479, V525, Y454, E450, T453, and T364 were also identified as potential key allo-residues (Supplementary file 6). T453 forms favorable hydrophobic interaction with the pyrazole ring in asciminib, and Y454 participates in the water-mediated H-bond with the oxygen atom in asciminib. Previous studies have shown that the conformational state of helix-I is important for functional activity, and V525 serves as a good indicator for the conformational change. Functional antagonists binding to the myristoyl pocket can bend helix-I and make the disordered region that V525 locates become ordered (Jahnke et al., 2010; Schoepfer et al., 2018). This indicates that V525 plays key role in allosterically regulating ABL1 kinase activity. At the same time, it also shows that coevolutionary information can help to capture multiple allosteric functional conformations of proteins (Morcos et al., 2013).

KeyAlloSite correctly identified key allo-residues in other proteins not in the data set

We further tested KeyAlloSite on proteins not included in the data set. The first protein is the E. coli aspartate (Asp) chemoreceptor Tar. Tar mediates the chemotaxis of bacteria toward attractants, such as Asp, and away from repellents. Tar is a homodimer during transmembrane signaling, and the signals are transmitted from the extracellular region of Tar to the cytoplasmic region through the transmembrane domain (Mise, 2016). Bi et al. discovered six new attractants and two new antagonists of Tar by computational virtual screening and experimental study. By comparing the binding patterns of attractants and antagonists, they found that the interactions between the chemoeffectors and Y149 and/or Q152 in Tar are critical for attractant chemotactic signaling (Bi et al., 2013). We chose the holo structure that binds Asp (PDB ID: 4Z9H) for analysis and used the CAVITY to identify all potential ligand binding pockets on the surface of chain B. Among the three pockets found, cavity_2 is the pocket where Asp binds, containing 21 residues, which we referred as the allosteric pocket. Since previous studies proposed that transmembrane signaling is triggered by the relative piston-like downward sliding of the α4 helix in the periplasmic domain (Mise, 2016), we chose the 16 residues (A166-T181) in the C-terminal of the α4 helix as the orthosteric site (Figure 5A). Through the coevolutionary analysis of residues in the orthosteric and allosteric sites, KeyAlloSite identified six key allo-residues in cavity_2, which were I65', F150', Y149', Q152', P153', and T154' (Figure 5B). It can be seen that our method could predict the key allo-residues Y149 and Q152. We also tested KeyAlloSite on the apo structure of Tar (PDB ID: 4Z9J). Of the eight pockets found by CAVITY, cavity_5 is the allosteric pocket, which contains 24 residues. As in the holo structure, the 16 residues (A166-T181) in the C-terminal of the α4 helix were chosen as the orthosteric site. In the prediction results, Y149' and Q152' ranked fourth and fifth among the 24 residues, with corresponding Z-scores of 1.01 and 0.66, respectively. Therefore, KeyAlloSite could also correctly predict the key allo-residue Y149 in the apo structure. For Q152, its Z-score is slightly smaller than the threshold of 0.8, though with a high ranking. This indicates that conformational changes do have subtle influence on the predicted results, probably mainly due to the change of residue composition in the allosteric pocket detected by CAVITY, which will lead to some fluctuations in the predicted key allo-residues. However, when the conformational changes between apo and holo states are not large, the influence on the results is small.

Figure 5. The key allo-residues predicted by our method in Tar and PDZ3.

Figure 5.

(A) The crystal structure of holo-Tar. Aspartate (Asp) is represented by magenta sticks, the allosteric pocket is represented by marine surface, and the salmon helix is selected as the orthosteric site. (B) The key allo-residues predicted at the Asp-binding site. The predicted key allo-residues in the allosteric cavity_2 are represented by marine sticks, among which Y149 and Q152 are the true key allo-residues that have been confirmed by experiments. Hydrogen bonds are shown as red dash lines. (C) The predicted key allo-residues in PDZ3. The peptide bound to the orthosteric site is represented by salmon sticks, the allosteric pocket is represented by marine surface, and the predicted key allo-residues are represented by marine sticks.

Figure 5—source data 1. Raw data for Figure 5.

The second protein is the PDZ3 domain, and its allosteric mechanism has been extensively studied. We selected the crystal structure of PDZ3 binding with a peptide in its orthosteric site for analysis (PDB ID: 1BE9; Doyle et al., 1996). Since the allosteric site of this structure does not bind an allosteric ligand, we used CAVITY to find the potential ligand binding pockets on its surface. Among the three pockets identified, cavity_1 is the orthosteric pocket, and cavity_2 contains the known allosteric sites, which were used for further analysis. KeyAlloSite predicted D306, S409, L302, A347, and L353 as key allo-residues (Figure 5C). Kalescky et al. used rigid residue scan to identify residues that are important for the allosteric effect of the PDZ3 domain. In the rigid residue scan, only one residue was regarded as a single rigid body in each MD simulation. They proposed that A347 is a ‘switch residue’, which is needed to turn on the allosteric effect (Kalescky et al., 2015). Lockless et al. used evolutionary data of protein families to measure the statistical coupling between amino acid positions. For the PDZ protein family, they found that there are strong statistical couplings between A347 and L353 and the key residue H372 of the orthosteric site, and verified using thermodynamic mutational studies (Lockless and Ranganathan, 1999). Moreover, McLaughlin et al. developed a high-throughput quantitative method that can individually replace a residue at each position with every other residue for comprehensive single-mutation studies. Their results showed that mutations of A347 and L353 caused significant functional loss (McLaughlin et al., 2012). These evidences all indicate that the key allo-residues A347 and L353 we predicted are important for the protein function by allosteric regulation.

KeyAlloSite identified pathogenetic mutations in human proteins

Previous studies have shown that allosteric mutation, that is, abnormal protein allosteric regulation caused by mutation is related to pathological processes such as cancer (Kurochkin et al., 2017). Shen et al. analyzed the dysfunction of allosteric proteins caused by somatic mutations in about 7000 cancer genomes across 33 cancer types, mapped these mutations to allosteric sites, orthosteric sites, and other sites in the Allosteric Database and established the Allo-Mutation data set (Shen et al., 2017). We searched for the somatic mutation data corresponding to the human proteins in our data set from the Allo-Mutation data set, and found that 11 of a total of 51 predicted key allo-residues in 7 human proteins were mutated in a variety of cancers (Table 1). Among them, cancers that contain a large number of mutations in key allo-residues are uterine corpus endometrial carcinoma and skin cutaneous melanoma. This indicates that the abnormal allosteric regulation caused by the mutation of key allo-residues plays key role in the occurrence and development of cancer. These key allo-residues can affect allosteric signal transduction and thus affect protein functions, suggesting that KeyAlloSite can be used to predict key pathogenetic mutations in proteins.

Table 1. Predicted key allo-residues that were mutated in cancers.

Protein Gene Predicted key allo-residues Mutation* Cancer type
AR1 AR D732 D732N SKCM
AR2 AR M832 M832I SKCM
PTP-1B PTPN1 M282 M282T COAD
CDK2 CDK2 P155 P155H UCEC
CK2alpha CSNK2A1 F54; A110 F54C; A110T UCEC; UCEC, GBM
MAPK14 MAPK14 P191; E192 P191S; P191H; E192Q SKCM; KIRC; BLCA
MAPK8 MAPK8 E195; M200 E195K; M200I UCEC; SKCM
CYP3A4 CYP3A4 F219 F219L UCEC
*

Mutation: confirmed disease mutations among the predicted key allo-residues.

Cancer type: COAD: colon adenocarcinoma; SKCM: skin cutaneous melanoma; UCEC: uterine corpus endometrial carcinoma; GBM: glioblastoma multiforme; KIRC: kidney renal clear cell carcinoma; BLCA: bladder urothelial carcinoma.

KeyAlloSite can also identify key allosteric functional residues of enzymes

Enzyme evolution studies mainly focus on mutations in the substrate binding pocket (Wu et al., 2013; Yi et al., 2011). However, covariant residues far from the substrate binding site may also play an important role in regulating catalysis, which are difficult to identify. Since KeyAlloSite can calculate the functional coupling between residues outside and inside the orthosteric pocket, we wonder whether it can also be used to identify key allo-residues for enzyme activity regulation. We used Candida antarctica lipase B (CALB, PDB ID: 1TCA), which has many annotated key functional residues (Wu et al., 2020), as a test case. CALB is one of the most widely used biocatalysts in academia and industry that is often applied in acylating kinetic resolution of racemic alcohols and amines and desymmetrization of diols and diacetates, and is robust and easy to express (Wu et al., 2013). We selected the six angstrom-cutoff orthosteric pocket and calculated the EC values between each residue outside the orthosteric pocket and the residues in the orthosteric pocket. Then we compared the difference in EC values corresponding to each residue outside the orthosteric pocket and calculated the significant difference number di corresponding to each residue. Finally, the significant difference number di was normalized to Z-score. Due to the large number of residues outside the orthosteric pocket, residues with Z-scores greater than 0.9 were referred as key allo-residues here. Our method predicted a total of 52 residues from the 296 residues outside the orthosteric pocket, of which 20 residues have been annotated as functional residues in literature according to mutagenesis experiments (Figure 6A, Supplementary file 7). For example, A225 ranked the third out of the 52 predicted residues with a Z-score of 3.04, and A225M improves the catalytic efficiency of the enzyme by about 11 folds. V37 ranked the 35th out of the 52 predicted residues with a Z-score of 1.42, and V37I improves the catalytic efficiency of the enzyme by about threefolds (Wu et al., 2020). Therefore, our method can also be used to predict allosteric functional residues that are important for enzyme catalysis, providing a new computational tool for identifying mutant enzyme with improved catalytic properties.

Figure 6. KeyAlloSite predicted key allo-residues for enzymes.

Figure 6.

(A) KeyAlloSite predicted key allo-residues for Candida antarctica lipase B. Among the predicted residues, the residues that have been annotated by the literature are shown as marine spheres, and the orthosteric pocket is represented by salmon surface. (B) KeyAlloSite predicted key allo-residues for Escherichia coli chorismate mutase (CM). Experimentally discovered key functional residues of CM are shown as marine spheres, the labels of key allo-residues predicted by KeyAlloSite are shown in marine, and the orthosteric pocket and ligand are represented by salmon surface and sticks.

Figure 6—source data 1. Raw data for Figure 6.

Russ et al. recently used an evolution-based model to design chorismate mutase enzymes (CMs). They used DCA to learn the constraints for specifying proteins purely from evolutionary sequence data and performed Monte Carlo sampling from this model to generate artificial sequences. They were able to obtain proteins with natural-like catalytic function with sequence diversity. Eight residues (L40, L41, R44, D50, D83, L92, Q93, and H95) at the periphery of the active site were found to be important for CM catalysis in E. coli specific function (Russ et al., 2020; Figure 6B). We wonder if some of these residues regulate the enzyme function by allostery. We used the 6 angstrom-cutoff orthosteric pocket and identified 46 residues on the surface of the protein outside the orthosteric pocket. We calculated the EC values between each of these 46 residues and the orthosteric pocket and compared the difference in EC values of these 46 residues. We then calculated the significant difference number di corresponding to each of these 46 residues and normalized to Z-scores. Using the criterion of Z-score >0.9, we predicted 10 key allo-residues from the 48 residues, which were L86, A9, R44, Q89, E57, E12, L40, D69, R58, and N5. Among them, two residues (R44 and L40) were shown to be important for CM catalysis by Russ et al. And Russ et al. used DCA to show that sequence-based statistical models contain protein function information, while our study used DCA to predict allosteric sites or key allosteric residues based on the significance analysis of EC between these sites with the orthosteric site. Their study showed that the function of proteins seems to depend on many weak terms in Jij , which lack simple physical interpretations. Our study demonstrated that weak terms in Jij contain information for protein allosteric function evolution and can be used to predict allosteric sites and key allosteric residues.

To further verify the effectiveness of KeyAlloSite, we made indirect comparisons with the statistical coupling analysis (SCA) method. SCA predicts several groups of coevolved residues (sectors) that form physically continuous networks, often able to connect major functional site and allosteric sites, that is, allosteric pathways (Lockless and Ranganathan, 1999; Rivoire et al., 2016; Salinas and Ranganathan, 2018; Shulman et al., 2004; Süel et al., 2003). We analyzed all the proteins in our data set using SCA. We first compared the performance of SCA and KeyAlloSite in predicting known key allo-residues in allosteric sites (Supplementary file 9). For the known key allo-residue L359 in the BCR-ABL1 protein, it was not present in the sectors predicted by SCA, despite that the sectors contain 68 residues, while it could be correctly predicted by KeyAlloSite. For the known key allo-residues Y149 and Q152 in the Tar receptor, KeyAlloSite could correctly predict both of them. However, although the sectors predicted by SCA contained Y149 and Q152, it also included two residues (R69 and R73) that have been experimentally verified to contribute only to ligand binding and not to allosteric signaling (Bi et al., 2013). For the PDZ3, the sectors predicted by SCA contained the key allo-residues A347 and L353, which were also successfully predicted by KeyAlloSite. We further compared the performance of SCA and KeyAlloSite in predicting the key allosteric functional residues of enzymes. For the CALB, the sectors predicted by SCA missed one of the key allo-residues A225, which has been experimentally shown to have a great impact on enzyme activity, and the predicted known key allo-residues account for 32.8% of all the residues in the sectors, while KeyAlloSite could predict the key allo-residue A225, and the predicted known key allo-residues account for 38.5% of all the predicted key allo-residues. For the CMs, the sectors predicted by SCA contained only one key allo-residue D83, while KeyAlloSite could predict key allo-residues R44 and L40. For the KeyAlloSite correctly predicted functional phosphorylation sites T288 and T287 in the AurA, SCA missed both of them. Thus, KeyAlloSite performs better than SCA in predicting key allo-residues.

Discussion

Identifying key allosteric residues responsible for allosteric signaling is important for the design and optimization of allosteric drugs, enzyme, and protein engineering studies. We developed, KeyAlloSite, a novel method for predicting allosteric sites and key allo-residues based on the ECM. To the best of our knowledge, this is the first systematic and efficient computational method to predict key allo-residues. Our study demonstrated that orthosteric and allosteric pockets are coupled in protein function evolution. Our predicted key allo-residues are in accordance with previously reported experimental studies in the BCR-ABL1, Tar, and PDZ3 systems, as well as key cancer mutations. We further showed that KeyAlloSite can be applied to predict key allosteric residues distant from the catalytic site that are important for enzyme catalysis. Our study also gives a possible physical explanation for the weak couplings in Jij , that is, they may represent allosteric functional couplings. The predicted key allo-residues can help us to understand the mechanism of allosteric regulation, to provide reference and guidance for the rational design and optimization of allosteric drugs and to facilitate enzyme engineering.

It should be noted that due to limited available experimental information, for the protein systems that we tested, although all the predicted known key allo-residues ranked among top 20%, there are other residues among the top list with unidentified function. Our analysis predicted that these residues should play important roles in allosteric signaling. Further experimental studies are needed to verify their functions in the future. At the same time, our predicted list of key allo-residues greatly reduces the number of residues that needs to be verified experimentally. We expect future experimental studies on the functions of these newly predicted key ‘allo-residues’ can not only verify and demonstrate the predictive power of KeyAlloSite but also offer more data to improve it.

Several methods were developed to infer protein function based on the coevolutionary couplings between residues. For example, Hopf et al. developed EV mutation to predict the effects of mutations based on the coevolutionary couplings between residues (Hopf et al., 2017). The theoretical formulation for extracting the coupling scores between residues in the initial step of our method is the same as that of Hopf et al., but the problem studied by us is different from that of Hopf et al., and the usage of the coupling scores between residues in the later steps of the two methods is different. Hopf et al. used the coevolutionary coupling scores between residues to predict the effects of mutations by calculating the difference in statistical energy between mutant and wild-type sequences. In contrast, we used the coevolutionary coupling scores between residues to predict the allosteric sites and key allo-residues in allosteric pockets that are mainly responsible for allosteric signaling by pairwise comparing the difference of the coevolutionary coupling scores of residues in allosteric pockets. Although Hopf et al. highlighted the significance of the pairwise coupling term for the prediction of mutation effects, we highlighted the importance of the weak pairwise coupling term for the study of allosteric function.

As KeyAlloSite attempts to capture coevolutionary coupling between residues from MSA, it requires that MSA should contain sufficient homologous and diversified sequences. For the MSAs with only a few homologous sequences, KeyAlloSite usually cannot give accurate predictions. How to reduce the number of homologous sequences required remains further research.

The scores of the key allo-residues predicted by KeyAlloSite depend not only on the coevolutionary information but also on the residues contained in allosteric pockets. When different conformations of the protein are used, pocket detection method may give allosteric sites with slightly or largely different residues (depending on the difference of the conformations) that may influence the final KeyAlloSite. For example, in the case of Tar, the ECs between residues are the same for the apo and holo conformations, while the allosteric pockets found by CAVITY in the two conformations contained a small number of different residues. Because the prediction of key allo-residues by KeyAlloSite requires pairwise comparison of residues in allosteric pockets, the predicted key allo-residues in the two conformations were slightly different. For the apo Tar, on the one hand, although the score of Q152 is 0.66, which is less than the threshold of 0.8, Q152 ranked high among all residues in the allosteric pocket with a ranking of 5/24. When we lower the threshold slightly, we will be able to correctly predict Q152. On the other hand, Bi et al. showed that although Y149 and Q152 are both key allo-residues, Y149 seems to be more important as allosteric signaling can be conducted when the allosteric molecule only interacts with Y149 but not Q152 (Bi et al., 2013). Although the holo Tar has some conformational changes compared to apo Tar, the key allo-residueY149 can be captured by KeyAlloSite when using either the holo or the apo structure. For applications, we recommend to use the holo structure whenever possible.

Although we used the three-dimensional structure of proteins and their binding ligands in our analysis, KeyAlloSite can also be applied in cases where no three-dimensional structures are available on condition that a certain number of homologous sequences of the protein under investigation and location of the functional site are known. Along with the rapid progress in recent years, protein structure prediction methods, such as AlphaFold (Jumper et al., 2021), can be used to predict the protein structure first. At the same time, as our method calculates the EC between any residue of the protein and residues of the orthosteric pocket, KeyAlloSite can be used to predict not only key allosteric residues but also PTM sites that have functional correlation with orthosteric sites, which will be further studied in the future.

Key residues in protein allosteric sites determine the direction and strength of allosteric signaling, while other residues in allosteric sites mainly contribute to binding. When optimizing allosteric molecules, we often face the challenge that simply increasing binding affinity cannot improve the efficacy of allosteric regulation. Therefore, to optimize the allosteric molecules, one can first use KeyAlloSite to predict the key allo-residues in the allosteric pocket and then maintain or enhance the interactions between the allosteric molecules and the key allo-residues while altering the interactions between the allosteric molecules and other residues in the allosteric pocket. In this way, the optimized allosteric molecules can be ensured to have both strong binding affinity and allosteric signal transduction ability.

Materials and methods

The allosteric protein data set

We selected allosteric proteins from the ‘Core Set’ of ASBench (Huang et al., 2015) and constructed our data set according to the following criteria: (1) the protein functions as a monomer; (2) the corresponding three-dimensional protein structure data should contain both allosteric ligand and orthosteric ligand; (3) the number of effective homologous sequences of the protein should be greater than 100. Finally, 23 allosteric proteins were selected, including 25 known allosteric sites (Supplementary file 1). We collected another two proteins from published literatures, including E. coli aspartate chemoreceptor Tar (PDB ID: 4Z9H) (Mise, 2016) and PDZ3 domain (PDB ID: 1BE9) (Doyle et al., 1996), as key allo-residues have been reported for these two proteins that can be used for comparative analysis. We used HMMER to search for the homologous sequences of each of the selected allosteric proteins from pfam (Finn et al., 2015). Due to the redundancy of homologous sequences, they were reweighted according to the standard of 80% sequence identity to obtain the effective homologous sequences.

The evolutionary coupling model

We used a global statistical model, the ECM, which can calculate the direct couplings between residues and remove the indirect couplings. The ECM we used here was mainly based on the work of Marks and her co-workers (Ekeberg et al., 2013; Weinreb et al., 2016). From the MSA of a protein family, we can calculate the observed frequency fia and pairwise co-occurrences fijab of residues (a, b) at position (i, j). From this first-order and second-order statistics, we can infer a model to account for the observed statistics optimally, which mainly includes two parameters: single-site propensities hia and direct coevolutionary couplings between residues Jij(a,b). This model defines a probability P for each protein sequence a = (a1,,aL) of length L:

Pa=1Zexp(i=1Lhiai+i=1L-1j=i+1LJij(ai,aj)) (1)
Z=aexp(i=1Lhiai+i=1L-1j=i+1LJij(ai,aj)) (2)

The parameters h and J of the model were estimated by pseudo-maximum likelihood (PLM) which approximates the full likelihood for each sequence a = (a1,,aL) by a product of conditional likelihoods for each site i:

P(a1,,aL |h,J) i=1LP(ai |aai,h,J) (3)

The global partition function Z is replaced by a number of local partition functions:

Pai|a\ai,h,J=exphiai+jiJijai,ajaexphia+jiJija,aj (4)

After modifying with L2-regularization and sample reweighting, the approximate likelihood function was optimized using a quasi-Newton method (Limited-memory BFGS).

Once the parameters h and J are fitted, the Frobenius norm FN(i,j) of Jij is used to measure the ECS between the two sites i and j.

FN(i,j)=Jij2=klJ,ijk,l2 (5)

J,ij was obtained after Jij was centralized.

Since the FN may include bias caused by phylogeny and undersampling, it can be corrected with average product correction (Dunn et al., 2008). The EC value is the EC score of the two sites.

EC(i,j)=FN(i,j)-i,iFNi,,jj,jFNi,j,i,j,iFNi,,j, (6)

Evolutionary coupling strength between orthosteric and other pockets

We first performed MSA (Figure 1A) and calculated the ECs between residues using the ECM (Figure 1B). We then used the CAVITY program (Xu et al., 2018; Yuan et al., 2013) to identify all the potential ligand binding pockets on the surface of a protein and designated the mth pocket as cavity_m. The ECS (ECScavity_m) between the orthosteric pocket and the mth pocket is defined as the sum of the coupling strength between the residues in the two pockets. The orthosteric pocket here is defined as all of the residues within 6 Å around the orthosteric ligand. If more than 50% of the residues in the mth pocket overlap with the orthosteric pocket, the pocket will be excluded. Otherwise, the overlapping residues between the mth pocket and orthosteric pocket will be removed from the mth pocket, and the remaining residues will be used to calculate the ECS between the mth pocket and orthosteric pocket.

ECScavity_m=icavity_mjorthostericpocketFNi,j (7)

Finally, we normalized the ECS of the pockets:

Z-scorecavity_m=ECScavity_m-μcavityσcavity (8)

μcavity is the mean value of the ECS between the orthosteric pocket and the other pockets, and σcavity is the standard deviation of the ECS.

Identification of key allo-residues

We first calculated the pairwise EC value between one residue ai (i=1,2,...,N) in the allosteric pocket and one residue bj (j=1,2,...,M) in the orthosteric pocket. An N × M matrix E was obtained, and each element Eij in the matrix E represents the EC value between residues in the allosteric and orthosteric pockets (Figure 1C–D). The allosteric pocket here refers to the allosteric pocket found by the CAVITY. In rare cases, if the CAVITY does not find the known allosteric pocket, all of the residues within 8 Å around the allosteric ligand are used as the allosteric pocket. After that, the difference of EC values corresponding to each residue in allosteric pocket was pairwisely compared by using the student’s t-test (α=0.05), that is, whether there was a difference between the mean values of each two rows of matrix E. The result of the comparison was expressed by Cmi (m=1,2,...,N, i=1,2,...,N), if there was a significant difference, Cmi was assigned a value of 1, if there was no significant difference, Cmi was assigned a value of 0. Finally, an N × N matrix C was obtained, and each element in C indicates whether there was a difference between the residues in the allosteric pocket (Figure 1E). On this basis, by adding up each column of C, we could get the number of significant differences di (i=1, 2,..., N) between each residue and the remaining residues in the allosteric pocket, that is, the number of significant differences corresponding to ith residue (Figure 1F).

di=m=1NCmi (9)

Finally, we normalized the number of significant differences of the residues:

Z-scoreresidue_i=di-μresidueσresidue (10)

μresidue is the mean value of the number of significant differences of the residues, and σresidue is the standard deviation of the number of significant differences.

In enzyme catalysis, in addition to residues in the active site, distant residues may also affect enzyme activity, which could not be rationally designed in enzyme engineering. We systematically studied the effect of residues outside of the active site on enzyme catalysis. We calculated the EC value between each residue outside the orthosteric pocket and residues in the orthosteric pocket (i.e. the rows in matrix E are all residues outside the orthosteric pocket here). Using the same steps as above, we determined the number of significant differences di for each residue outside the orthosteric pocket and normalized it to Z-score.

Data availability

The data that support the results of this study are in the Supplementary data, including information of the allosteric proteins in the data set (Supplementary file 1); list of the Z-scores and ranking of allosteric pockets in the data set (Supplementary file 2); KeyAlloSite prediction results of AurA kinase (Supplementary file 3); list of the predicted key allo-residues in allosteric pockets (Supplementary file 4); key allo-residues predicted by KeyAlloSite with different cutoffs (Supplementary file 5); KeyAlloSite prediction results of tyrosine-protein kinase ABL1 (Supplementary file 6); the key allo-residues predicted by our method on CALB (Supplementary file 7); the confusion matrices of KeyAlloSite in different scenarios (Supplementary file 8); comparison of KeyAlloSite and SCA methods (Supplementary file 9); phylogenetic tree of androgen receptor (Figure 2—figure supplement 1); comparison of ECS between pockets when all residue pairs and partial residue pairs were used (Figure 2—figure supplement 2); difference between the EC between orthosteric and allosteric sites and the EC between two random patches (Figure 2—figure supplement 3); distribution of the ratios of the number of key allo-residues predicted by KeyAlloSite in the number of all residues in allosteric pockets when using different cutoffs in all proteins (Figure 3—figure supplement 1); examples of distributions of the statistics corresponding to significant scores obtained from the t-test (Figure 3—figure supplement 2); and random sampling of homologous sequences (Figure 3—figure supplement 3). The homologous sequences of the proteins in the data set are available in the following GitHub repository: https://github.com/huilan1210/KeyAlloSite, Xie et al., 2023.

Code availability

KeyAlloSite is available at GitHub (https://github.com/huilan1210/KeyAlloSite, copy archived at swh:1:rev:8464b27b588af48d14033ab40d62f9eca4ed0051, Xie et al., 2023).

Acknowledgements

The authors thank Jintao Zhu from the Center for Quantitative Biology and Gaoxiang Pan from the College of Chemistry and Molecular Engineering, Peking University for helpful discussions. This study was supported in part by the National Key R&D Program of China (2022YFA1303700), the National Natural Science Foundation of China (21633001, 22237002) and the Chinese Academy of Medical Sciences (2021-I2M-5–014).

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Luhua Lai, Email: lhlai@pku.edu.cn.

Shozeb Haider, University College London, United Kingdom.

Volker Dötsch, Goethe University, Germany.

Funding Information

This paper was supported by the following grants:

  • National Key R&D Program of China 2022YFA1303700 to Luhua Lai.

  • National Natural Science Foundation of China 21633001 to Luhua Lai.

  • Chinese Academy of Medical Sciences 2021-I2M-5-014 to Luhua Lai.

  • National Natural Science Foundation of China 22237002 to Luhua Lai.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Conceptualization, Formal analysis, Investigation, Methodology.

Methodology, Writing - review and editing.

Conceptualization, Formal analysis, Methodology.

Conceptualization, Supervision, Funding acquisition, Methodology, Writing - review and editing.

Additional files

Supplementary file 1. Information of the allosteric proteins in the data set.
elife-81850-supp1.docx (29.8KB, docx)
Supplementary file 2. List of the Z-scores and ranking of allosteric pockets in the data set.
elife-81850-supp2.docx (27.6KB, docx)
Supplementary file 3. KeyAlloSite prediction results of Aurora A kinase.
elife-81850-supp3.docx (28.8KB, docx)
Supplementary file 4. List of the predicted key allo-residues in allosteric pockets.
elife-81850-supp4.docx (28.8KB, docx)
Supplementary file 5. Key allo-residues predicted by KeyAlloSite with different cutoffs.
elife-81850-supp5.docx (25.4KB, docx)
Supplementary file 6. KeyAlloSite prediction results of tyrosine-protein kinase ABL1.
elife-81850-supp6.docx (27.2KB, docx)
Supplementary file 7. The key allo-residues predicted by our method on Candida antarctica lipase B.
elife-81850-supp7.docx (28.7KB, docx)
Supplementary file 8. The confusion matrices of KeyAlloSite in different scenarios.
elife-81850-supp8.docx (26.9KB, docx)
Supplementary file 9. Comparison of KeyAlloSite and SCA methods.
elife-81850-supp9.docx (25.9KB, docx)
MDAR checklist

Data availability

All data that support the results of this study are included in the manuscript, supplementary files, and GitHub repository (https://github.com/huilan1210/KeyAlloSite; copy archived at swh:1:rev:333d4a48d2570c74806f68a1247611ed64794b97). Source Data files have been provided for all Figures (except Figure 1 and Figure 2-figure supplement 1).

References

  1. An XL, Lu SY, Song K, Shen QC, Huang ML, Yao XJ, Liu HX, Zhang J. Are the apo proteins suitable for the rational discovery of allosteric drugs? Journal of Chemical Information and Modeling. 2019;59:597–604. doi: 10.1021/acs.jcim.8b00735. [DOI] [PubMed] [Google Scholar]
  2. Bi SY, Yu DQ, Si GW, Luo CX, Li TQ, Ouyang Q, Jakovljevic V, Sourjik V, Tu YH, Lai LH. Discovery of novel chemoeffectors and rational design of Escherichia coli chemoreceptor specificity. PNAS. 2013;110:16814–16819. doi: 10.1073/pnas.1306811110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bitbol AF. Inferring interaction partners from protein sequences using mutual information. PLOS Computational Biology. 2018;14:e1006401. doi: 10.1371/journal.pcbi.1006401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Botello-Smith WM, Luo Y. Robust determination of protein allosteric signaling pathways. Journal of Chemical Theory and Computation. 2019;15:2116–2126. doi: 10.1021/acs.jctc.8b01197. [DOI] [PubMed] [Google Scholar]
  5. Changeux JP, Christopoulos A. Allosteric modulation as a unifying mechanism for receptor function and regulation. Cell. 2016;166:1084–1102. doi: 10.1016/j.cell.2016.08.015. [DOI] [PubMed] [Google Scholar]
  6. Christopoulos A. Allosteric binding sites on cell-surface receptors: novel targets for drug discovery. Nature Reviews. Drug Discovery. 2002;1:198–210. doi: 10.1038/nrd746. [DOI] [PubMed] [Google Scholar]
  7. Cocco S, Monasson R, Weigt M. From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction. PLOS Computational Biology. 2013;9:e1003176. doi: 10.1371/journal.pcbi.1003176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. de Juan D, Pazos F, Valencia A. Emerging methods in protein co-evolution. Nature Reviews. Genetics. 2013;14:249–261. doi: 10.1038/nrg3414. [DOI] [PubMed] [Google Scholar]
  9. Dokholyan NV. Controlling allosteric networks in proteins. Chemical Reviews. 2016;116:6463–6487. doi: 10.1021/acs.chemrev.5b00544. [DOI] [PubMed] [Google Scholar]
  10. Doyle DA, Lee A, Lewis J, Kim E, Sheng M, MacKinnon R. Crystal structures of a complexed and peptide-free membrane protein-binding domain: molecular basis of peptide recognition by PDZ. Cell. 1996;85:1067–1076. doi: 10.1016/s0092-8674(00)81307-0. [DOI] [PubMed] [Google Scholar]
  11. Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics. 2008;24:333–340. doi: 10.1093/bioinformatics/btm604. [DOI] [PubMed] [Google Scholar]
  12. Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer potts models. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics. 2013;87:012707. doi: 10.1103/PhysRevE.87.012707. [DOI] [PubMed] [Google Scholar]
  13. Finn RD, Clements J, Arndt W, Miller BL, Wheeler TJ, Schreiber F, Bateman A, Eddy SR. HMMER web server: 2015 update. Nucleic Acids Research. 2015;43:W30–W38. doi: 10.1093/nar/gkv397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hadzipasic A, Wilson C, Nguyen V, Kern N, Kim C, Pitsawong W, Villali J, Zheng YJ, Kern D. Ancient origins of allosteric activation in a Ser-Thr kinase. Science. 2020;367:912–917. doi: 10.1126/science.aay9959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hilser VJ, Wrabl JO, Motlagh HN. Structural and energetic basis of allostery. Annual Review of Biophysics. 2012;41:585–609. doi: 10.1146/annurev-biophys-050511-102319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Hopf TA, Ingraham JB, Poelwijk FJ, Schärfe CPI, Springer M, Sander C, Marks DS. Mutation effects predicted from sequence co-variation. Nature Biotechnology. 2017;35:128–135. doi: 10.1038/nbt.3769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Huang WK, Wang GQ, Shen QC, Liu XY, Lu SY, Geng L, Huang ZM, Zhang J. ASBench: benchmarking sets for allosteric discovery. Bioinformatics. 2015;31:2598–2600. doi: 10.1093/bioinformatics/btv169. [DOI] [PubMed] [Google Scholar]
  18. Jahnke W, Grotzfeld RM, Pellé X, Strauss A, Fendrich G, Cowan-Jacob SW, Cotesta S, Fabbro D, Furet P, Mestan J, Marzinzik AL. Binding or bending: distinction of allosteric abl kinase agonists from antagonists by an NMR-based conformational assay. Journal of the American Chemical Society. 2010;132:7043–7048. doi: 10.1021/ja101837n. [DOI] [PubMed] [Google Scholar]
  19. Jimenez HN, Liu KG, Hong SP, Reitman MS, Uberti MA, Bacolod MD, Cajina M, Nattini M, Sabio M, Doller D. 4-(1-phenyl-1H-pyrazol-4-yl)quinolines as novel, selective and brain penetrant metabotropic glutamate receptor 4 positive allosteric modulators. Bioorganic & Medicinal Chemistry Letters. 2012;22:3235–3239. doi: 10.1016/j.bmcl.2012.03.032. [DOI] [PubMed] [Google Scholar]
  20. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. Highly accurate protein structure prediction with alphafold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kalescky R, Liu J, Tao P. Identifying key residues for protein allostery through rigid residue scan. The Journal of Physical Chemistry. A. 2015;119:1689–1700. doi: 10.1021/jp5083455. [DOI] [PubMed] [Google Scholar]
  22. Kalmanti L, Saussele S, Lauseker M, Müller MC, Dietz CT, Heinrich L, Hanfstein B, Proetel U, Fabarius A, Krause SW, Rinaldetti S, Dengler J, Falge C, Oppliger-Leibundgut E, Burchert A, Neubauer A, Kanz L, Stegelmann F, Pfreundschuh M, Spiekermann K, Scheid C, Pfirrmann M, Hochhaus A, Hasford J, Hehlmann R. Safety and efficacy of imatinib in CML over a period of 10 years: data from the randomized CML-study IV. Leukemia. 2015;29:1123–1132. doi: 10.1038/leu.2015.36. [DOI] [PubMed] [Google Scholar]
  23. Kurochkin IV, Guarnera E, Wong JH, Eisenhaber F, Berezovsky IN. Toward allosterically increased catalytic activity of insulin-degrading enzyme against amyloid peptides. Biochemistry. 2017;56:228–239. doi: 10.1021/acs.biochem.6b00783. [DOI] [PubMed] [Google Scholar]
  24. Lake PT, Davidson RB, Klem H, Hocky GM, McCullagh M. Residue-level allostery propagates through the effective coarse-grained hessian. Journal of Chemical Theory and Computation. 2020;16:3385–3395. doi: 10.1021/acs.jctc.9b01149. [DOI] [PubMed] [Google Scholar]
  25. Lewis JA, Lebois EP, Lindsley CW. Allosteric modulation of kinases and GPCRs: design principles and structural diversity. Current Opinion in Chemical Biology. 2008;12:269–280. doi: 10.1016/j.cbpa.2008.02.014. [DOI] [PubMed] [Google Scholar]
  26. Lindsley CW. 2013 Philip S. Portoghese medicinal chemistry Lectureship: drug discovery targeting allosteric sites. Journal of Medicinal Chemistry. 2014;57:7485–7498. doi: 10.1021/jm5011786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lockless SW, Ranganathan R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science. 1999;286:295–299. doi: 10.1126/science.286.5438.295. [DOI] [PubMed] [Google Scholar]
  28. Ma XM, Meng H, Lai LH. Motions of allosteric and orthosteric ligand-binding sites in proteins are highly correlated. Journal of Chemical Information and Modeling. 2016;56:1725–1733. doi: 10.1021/acs.jcim.6b00039. [DOI] [PubMed] [Google Scholar]
  29. Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C. Protein 3D structure computed from evolutionary sequence variation. PLOS ONE. 2011;6:e28766. doi: 10.1371/journal.pone.0028766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. McLaughlin RN, Poelwijk FJ, Raman A, Gosal WS, Ranganathan R. The spatial architecture of protein function and adaptation. Nature. 2012;491:138–142. doi: 10.1038/nature11500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Mise T. Structural analysis of the ligand-binding domain of the aspartate receptor TAR from Escherichia coli. Biochemistry. 2016;55:3708–3713. doi: 10.1021/acs.biochem.6b00160. [DOI] [PubMed] [Google Scholar]
  32. Miura M. Therapeutic drug monitoring of imatinib, nilotinib, and dasatinib for patients with chronic myeloid leukemia. Biological & Pharmaceutical Bulletin. 2015;38:645–654. doi: 10.1248/bpb.b15-00103. [DOI] [PubMed] [Google Scholar]
  33. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. PNAS. 2011;108:E1293–E1301. doi: 10.1073/pnas.1111471108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Morcos F, Jana B, Hwa T, Onuchic JN. Coevolutionary signals across protein lineages help capture multiple protein conformations. PNAS. 2013;110:20533–20538. doi: 10.1073/pnas.1315625110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Nussinov R, Tsai CJ. The different ways through which specificity works in orthosteric and allosteric drugs. Current Pharmaceutical Design. 2012;18:1311–1316. doi: 10.2174/138161212799436377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Nussinov R, Tsai CJ. Unraveling structural mechanisms of allosteric drug action. Trends in Pharmacological Sciences. 2014;35:256–264. doi: 10.1016/j.tips.2014.03.006. [DOI] [PubMed] [Google Scholar]
  37. Nussinov R, Tsai CJ, Liu J. Principles of allosteric interactions in cell signaling. Journal of the American Chemical Society. 2014;136:17692–17701. doi: 10.1021/ja510028c. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Qi YF, Wang Q, Tang B, Lai LH. Identifying allosteric binding sites in proteins with a two-state go̅ model for novel allosteric effector discovery. Journal of Chemical Theory and Computation. 2012;8:2962–2971. doi: 10.1021/ct300395h. [DOI] [PubMed] [Google Scholar]
  39. Reynolds KA, McLaughlin RN, Ranganathan R. Hot spots for allosteric regulation on protein surfaces. Cell. 2011;147:1564–1575. doi: 10.1016/j.cell.2011.10.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Rivoire O, Reynolds KA, Ranganathan R. Evolution-based functional decomposition of proteins. PLOS Computational Biology. 2016;12:e1004817. doi: 10.1371/journal.pcbi.1004817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Russ WP, Figliuzzi M, Stocker C, Barrat-Charlaix P, Socolich M, Kast P, Hilvert D, Monasson R, Cocco S, Weigt M, Ranganathan R. An evolution-based model for designing chorismate mutase enzymes. Science. 2020;369:440–445. doi: 10.1126/science.aba3304. [DOI] [PubMed] [Google Scholar]
  42. Salinas VH, Ranganathan R. Coevolution-based inference of amino acid interactions underlying protein function. eLife. 2018;7:e34300. doi: 10.7554/eLife.34300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Schoepfer J, Jahnke W, Berellini G, Buonamici S, Cotesta S, Cowan-Jacob SW, Dodd S, Drueckes P, Fabbro D, Gabriel T, Groell J-M, Grotzfeld RM, Hassan AQ, Henry C, Iyer V, Jones D, Lombardo F, Loo A, Manley PW, Pellé X, Rummel G, Salem B, Warmuth M, Wylie AA, Zoller T, Marzinzik AL, Furet P. Discovery of asciminib (ABL001), an allosteric inhibitor of the tyrosine kinase activity of BCR-ABL1. Journal of Medicinal Chemistry. 2018;61:8120–8135. doi: 10.1021/acs.jmedchem.8b01040. [DOI] [PubMed] [Google Scholar]
  44. Shen QC, Cheng FX, Song HL, Lu WQ, Zhao JF, An XL, Liu MY, Chen GQ, Zhao ZM, Zhang J. Proteome-scale investigation of protein allosteric regulation perturbed by somatic mutations in 7,000 cancer genomes. American Journal of Human Genetics. 2017;100:5–20. doi: 10.1016/j.ajhg.2016.09.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Shulman AI, Larson C, Mangelsdorf DJ, Ranganathan R. Structural determinants of allosteric ligand activation in RXR heterodimers. Cell. 2004;116:417–429. doi: 10.1016/s0092-8674(04)00119-9. [DOI] [PubMed] [Google Scholar]
  46. Süel GM, Lockless SW, Wall MA, Ranganathan R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nature Structural Biology. 2003;10:59–69. doi: 10.1038/nsb881. [DOI] [PubMed] [Google Scholar]
  47. Thal DM, Glukhova A, Sexton PM, Christopoulos A. Structural insights into G-protein-coupled receptor allostery. Nature. 2018;559:45–53. doi: 10.1038/s41586-018-0259-z. [DOI] [PubMed] [Google Scholar]
  48. Toji S, Yabuta N, Hosomi T, Nishihara S, Kobayashi T, Suzuki S, Tamai K, Nojima H. The centrosomal protein LATS2 is a phosphorylation target of Aurora-A kinase. Genes to Cells. 2004;9:383–397. doi: 10.1111/j.1356-9597.2004.00732.x. [DOI] [PubMed] [Google Scholar]
  49. Tsai CJ, Del Sol A, Nussinov R. Protein allostery, signal transmission and dynamics: a classification scheme of allosteric mechanisms. Molecular BioSystems. 2009;5:207–216. doi: 10.1039/b819720b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Wagner JR, Lee CT, Durrant JD, Malmstrom RD, Feher VA, Amaro RE. Emerging computational methods for the rational discovery of allosteric drugs. Chemical Reviews. 2016;116:6370–6390. doi: 10.1021/acs.chemrev.5b00631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Wang J, Jain A, McDonald LR, Gambogi C, Lee AL, Dokholyan NV. Mapping allosteric communications within individual proteins. Nature Communications. 2020;11:3862. doi: 10.1038/s41467-020-17618-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein-protein interaction by message passing. PNAS. 2009;106:67–72. doi: 10.1073/pnas.0805923106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Weinreb C, Riesselman AJ, Ingraham JB, Gross T, Sander C, Marks DS. 3D RNA and functional interactions from evolutionary couplings. Cell. 2016;165:963–975. doi: 10.1016/j.cell.2016.03.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Wenthur CJ, Gentry PR, Mathews TP, Lindsley CW. Drugs for allosteric sites on receptors. Annual Review of Pharmacology and Toxicology. 2014;54:165–184. doi: 10.1146/annurev-pharmtox-010611-134525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Williams PA, Cosme J, Vinkovic DM, Ward A, Angove HC, Day PJ, Vonrhein C, Tickle IJ, Jhoti H. Crystal structures of human cytochrome P450 3A4 bound to metyrapone and progesterone. Science. 2004;305:683–686. doi: 10.1126/science.1099736. [DOI] [PubMed] [Google Scholar]
  56. Wu Q, Soni P, Reetz MT. Laboratory evolution of enantiocomplementary candida antarctica lipase b mutants with broad substrate scope. Journal of the American Chemical Society. 2013;135:1872–1881. doi: 10.1021/ja310455t. [DOI] [PubMed] [Google Scholar]
  57. Wu ZY, Liu H, Xu LS, Chen HF, Feng Y. Algorithm-based coevolution network identification reveals key functional residues of the α/β hydrolase subfamilies. FASEB Journal. 2020;34:1983–1995. doi: 10.1096/fj.201900948RR. [DOI] [PubMed] [Google Scholar]
  58. Xie J, Wang SW, Xu YJ, Deng MH, Lai LH. Uncovering the dominant motion modes of allosteric regulation improves allosteric site prediction. Journal of Chemical Information and Modeling. 2022;62:187–195. doi: 10.1021/acs.jcim.1c01267. [DOI] [PubMed] [Google Scholar]
  59. Xie J, Zhang W, Zhu X, Deng M, Lai L. KeyAlloSite. swh:1:rev:333d4a48d2570c74806f68a1247611ed64794b97Software Heritage. 2023 https://archive.softwareheritage.org/swh:1:dir:00c4f28732a97e144551a634e170b2fd52317fcd;origin=https://github.com/huilan1210/KeyAlloSite;visit=swh:1:snp:87ef094f109d5cca46c52a655b8547ec88444417;anchor=swh:1:rev:333d4a48d2570c74806f68a1247611ed64794b97
  60. Xu YJ, Wang SW, Hu QW, Gao SS, Ma XM, Zhang WL, Shen YH, Chen FJ, Lai LH, Pei JF. CavityPlus: a web server for protein cavity detection with pharmacophore modelling, allosteric site identification and covalent ligand binding ability prediction. Nucleic Acids Research. 2018;46:W374–W379. doi: 10.1093/nar/gky380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Yi ZL, Pei XQ, Wu ZL. Introduction of glycine and proline residues onto protein surface increases the thermostability of endoglucanase cela from clostridium thermocellum. Bioresource Technology. 2011;102:3636–3638. doi: 10.1016/j.biortech.2010.11.043. [DOI] [PubMed] [Google Scholar]
  62. Yuan YX, Pei JF, Lai LH. Binding site detection and druggability prediction of protein targets for structure-based drug design. Current Pharmaceutical Design. 2013;19:2326–2333. doi: 10.2174/1381612811319120019. [DOI] [PubMed] [Google Scholar]
  63. Zhang WL, Xie J, Lai LH. Correlation between allosteric and orthosteric sites. Advances in Experimental Medicine and Biology. 2019;1163:89–105. doi: 10.1007/978-981-13-8719-7_5. [DOI] [PubMed] [Google Scholar]

Editor's evaluation

Shozeb Haider 1

The manuscript reports on a useful tool to study protein allosteric regulation function. The work is based on inadequate experimental validation of the predicted residues implicated in mediating allosteric signaling. The study highlights the significance of the weak pairwise term for the prediction of the allosteric function.

Decision letter

Editor: Shozeb Haider1
Reviewed by: Sarath Dantu2

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

Thank you for submitting your article "Coevolution-based prediction of key allosteric residues for protein function regulation" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Volker Dötsch as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Sarath Dantu (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

There seems to be a consensus between the reviewers that:

1) There is a lack of comparison of the proposed method against existing ones that do pretty much the same. This is the main weakness of the manuscript.

2) The authors should also provide a better metric of the true positives of their method – and of course report the false positives they have for the cases they've tested. AlloSite seems to predict several residues as key, some of which happen to be oncogenic (Table 1). It's not clear how many of the top 10, let's say, residues that it predicts have an allosteric effect.

3) In addition to the comparison, the authors need to establish that the signal coming from comparison of allosteric vs. orthosteric is different from comparing any two random patches with same number of residue pairs.

4) A complete picture of the predictive power with both strengths and weakness of the tool should be presented.

Reviewer #1 (Public Review):

Allostery refers to processes whereby a change at one site of a biological macromolecule affects the structure and dynamics at another distinct functional site, enabling the regulation of the corresponding function. Xie et al. developed an in-silico method to predict residues involved in allosteric regulation using a coevolution-based method. A fast and accurate method of identifying key residues responsible for allosteric signalling is important for drug design purposes and protein engineering.

Strengths:

1) The authors applied their method to multiple targets from different protein families to test their method.

2) The method is able to predict in a retrospective analysis certain residues involved in allosteric communication between orthosteric and allosteric binding sites.

Weaknesses:

1) There are several tools used in statistical genomics to predict allosteric communication pathways. Even though the paper tries to demonstrate the ability to predict residues that are involved in allosteric communication, KeyAllosite is not compared with any other state-of-the-art tool that does the same (1-3), which would highlight the strengths of this method with respect to existing ones.

2) The authors mention that the number of effective homologue sequences affects the probability of finding an allosteric site in the top 3 scored sites, however, Cdc4 and AR2 have also a low number of effective homologue sequences (Figure 1.A) yet a high Z-score. No sufficient explanation is given regarding this discrepancy. This also demonstrates that a threshold in the Neff under which KeyAlloSite becomes unreliable should be defined.

3) From the low Z-score of CYP3A4, the authors claim that the conservation of residues in the orthosteric site is important for getting an accurate prediction of the coupling strength. It is not clear though if and how is this aspect encoded in the KeyAlloSite. From the description of the method, it seems like the algorithm does not check for the level of conservation of the orthosteric residues. An explanation as to why has it not been incorporated into the algorithm is necessary. The conservation at a given site in an MSA defined as the overall deviance of amino acid frequencies at that site from their mean values, in combination with the statistical coupling of two sites has been shown to be important in the development of allosteric models (4).

4) In the case of AuroraA, the authors do not explain why other Ser/Thr/Tyr residues scored higher than T287 and T288, or if their higher scores are an artefact. However, in many cases, post-translational modifications take place by secondary partners and, therefore, the coupling of a post-translational modification site with the orthosteric site cannot be used to predict such sites. Post-translational modification sites are expected to have a strong coupling with residues of the upstream/downstream effector that is responsible for the modification, rather than with residues of the orthosteric site of the protein.

5) The authors defined allo-residues as the residues whose Z-score is >0.8, but there is no strong argument regarding the choice of this threshold. In the case of the allosteric pockets, for example, the authors use a threshold of 0 to identify pockets with strong coupling strength.

6) In the case of the Tar, it would be good if the authors reported the results starting from an apo structure as well and see if the method is able to find Y149 and Q152. That way, they could test how sensitive/biased the method is to the chosen conformation. If no apo structure is available, maybe another protein where both an apo and a holo structure exist could be used for this purpose.

7) The data in Table 1 is not sufficiently convincing. It would be helpful to also report how many of the identified residues by KeyAlloSite are indeed involved in oncogenic mutations or find another metric to quantify the success rate of KeyAlloSite.

8) The success rate of the method to predict key residues of the function of CALB (38% success rate based on the reported residues in the literature) and CMS (20%), should be interpreted with caution. It may be the case that the rest of the predicted residues have not been tested for their functional role, or that the method has indeed a low success rate in predicting key residues for allosteric communication.

In lines 38-41, the authors refer to SARs of allosteric drugs as "flat", which is not clear what they refer to.

It is not clear if the so-called, "allo-residues", that the authors define in lines 49-52 refer to residues that affect the binding or the signalling.

It would be helpful for the reader if the authors included a section where they describe the method itself and the key steps of developing their model before presenting the results. That way, the reader could follow the results easier.

The authors classified L359 of BCR-ABL1 as an allo-residue and justified the importance of this residue based on the fact that a weaker ligand does interact with this residue. For transparency, it would be good for the authors to report the allosteric scores of all 44 residues in the allosteric site to show that this residue was not cherry-picked and that the method can pick up all the residues that are important for the binding.

The manuscript has several typos and language mistakes (e.g. "screening" instead of "screen" in the Abstract, missing articles, etc.) that should be corrected prior to a resubmission.

Reviewer #2 (Public Review):

The authors designed a statistical approach to analyse coevolution scores from a protein and predict allosteric residues. This approach relies on comparison of residues from the two sites (allosteric vs. orthosteric) and omits the rest of the protein.

While the approach is logical, the predictive power has not been clearly established. To demonstrate the effectiveness of the approach a confusion matrix should be provided as it will show the predictive power of the method in the different scenarios presented in the manuscript (PTM's, enzymes, pathogenic mutations, etc.).

An effective method, along these lines, will have significant impact on protein and drug design.

It would be helpful if statistics on the distribution of significant scores from E row comparison after the t-test are provided. Further, you have so far compared allosteric with orthosteric, it would be interesting to compare the same distribution/statistics with allosteric vs. non-orthosteric residues. It may serve as a very good benchmark. The idea would be to see if the events of significance from allosteric vs. orthosteric are different/unique if we pick and compare any two random patches in the protein.

A boxplot of Zscores for All, top200, top300, top400 pairs from Figure 1—figure supplement 2 would be very informative.

As you have been using top zscore residues (>0.5 or >0.9 for enzymes), direct or indirect evidence for the statement in lines 306-307 is not very apparent from the provided data. This has to be flushed out further.

Please annotate residues for which experimental data is available to compare with keyallosite predictions for example in Supplementary files 4 and 5. In the main text (line 284) you only mention one residue out of 52 to demonstrate the effectiveness of the prediction. Again a confusion matrix would help here.

Can you please clarify if in Figure 2, number of residues refers to the number of residues from allosteric+orthosteric site?

Figure 6 may be ideal as Figure 1 to inform the readers about the protocol of this work.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Coevolution-based prediction of key allosteric residues for protein function regulation" for further consideration by eLife. Your revised article has been evaluated by Volker Dötsch (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

1. Can you please provide a detailed description and expand the discussion on the predicted amino acids and why the predicted residues are not the top ranking ones?

2. The theoretical formulation to extract the coupling score and a compare/contrast to Hopf et al. approach would be highly beneficial to the readers.

Reviewer #1 (Recommendations for the authors):

The authors did extensive work trying to address the revision comments, and their effort is much appreciated.

Nevertheless, in the absence of further experimental validation of the importance of the predicted so-called key "allo-residues" in mediating the signal between the orthosteric and allosteric binding site, it is still hard to assess the predictive power of the presented method. There are still residues for example that score higher than known functional residues (see for example residues T235, S245 and S249 in the case of Aurora A) whose implication in signal transduction is not confirmed, or residues whose importance depends on the conformation chosen for analysis despite the coevolutionary conservation metric used to score the residues (Q152 of Tar whose score is way below the 0.8 threshold when an apo structure is considered, which shouldn't be the case given that AlloSite uses only MSA and coevolutionary information for the scoring – it seems like other residues are predicted to be way more important based on the allosteric pocket definition given by CAVITY to decrease the score of Q152 to <0.8 after normalisation).

I fully understand that such experimental validation goes beyond the capacities of a computational group, however, as a reader, I might be hesitant to try this method.

Reviewer #2 (Recommendations for the authors):

The authors have carefully addressed all the comments from the previous review.

Few additional comments for authors to consider:

The Discussion section can be enriched further. At present it only discusses two points, i.e., ability of the method to predict key allosteric residues and requirement of sequence depth. For example: the last two paragraphs 565-580 are repetitive as they both highlight the need for depth in sequence alignments which is a known issue for MSA dependent methods, as again discussed by Hopf et al. Nature Biotechnology volume 35, pages 128-135 (2017). Further, the theoretical formulation to extract the coupling score is identical to Hopf et al. and a compare/contrast of the two approaches would be highly beneficial to the audience. Even Hopf et al. highlight the significance of the pairwise term, as done in this article.

eLife. 2023 Feb 17;12:e81850. doi: 10.7554/eLife.81850.sa2

Author response


Essential revisions:

There seems to be a consensus between the reviewers that:

1) There is a lack of comparison of the proposed method against existing ones that do pretty much the same. This is the main weakness of the manuscript.

We agree that comparison of our method with existing ones should be given whenever possible. However, most existing methods based on sequence statistical analysis were developed for predicting allosteric communication pathways. To the best of our knowledge, no systematic computational methods for identifying key allo-residues, residues that contribute most to allosteric signaling in allosteric sites are available now. As the main purpose of our current study is to predict key allo-residues in allosteric sites, direct comparisons with other methods are not possible. As suggested, we have performed indirect comparisons with the Statistical Coupling Analysis (SCA) method, which have been included in the revised manuscript (The last paragraph of KeyAlloSite can also identify key allosteric functional residues of enzymes).

SCA predicts several groups of coevolved residues (Sectors) that form physically continuous networks, often able to connect major functional site and allosteric sites, that is, allosteric pathways. We analyzed all the proteins in our data set using SCA. We first compared the performance of SCA and KeyAlloSite in predicting known key allo-residues in allosteric sites (Supplementary File 9). For the known key allo-residue L359 in the BCR-ABL1 protein, it was not present in the sectors predicted by SCA, despite that the sectors contain 68 residues, while it could be correctly predicted by KeyAlloSite. For the known key allo-residues Y149 and Q152 in the Tar receptor, KeyAlloSite could correctly predict both of them. However, although the sectors predicted by SCA contained Y149 and Q152, it also included two residues (R69 and R73) that have been experimentally verified to contribute only to ligand binding and not to allosteric signaling (PNAS 2013, 110, 16814). For PDZ3, the sectors predicted by SCA contained the key allo-residues A347 and L353, which were also successfully predicted by KeyAlloSite. We further compared the performance of SCA and KeyAlloSite in predicting the key allosteric functional residues of enzymes. For the CALB, the sectors predicted by SCA missed one of the key allo-residues A225, which has been experimentally shown to have a great impact on enzyme activity, and the predicted known key allo-residues account for 32.8% of all the residues in the sectors, while KeyAlloSite could predict the key allo-residue A225, and the predicted known key allo-residues account for 38.5% of all the predicted key allo-residues. For the CMs, the sectors predicted by SCA contained only one key allo-residue D83, while KeyAlloSite could predict key allo-residues R44 and L40. For the KeyAlloSite correctly predicted functional phosphorylation sites T288 and T287 in the AurA, SCA missed both of them. Thus, KeyAlloSite performs better than SCA in predicting key allo-residues.

2) The authors should also provide a better metric of the true positives of their method – and of course report the false positives they have for the cases they've tested. AlloSite seems to predict several residues as key, some of which happen to be oncogenic (Table 1). It's not clear how many of the top 10, let's say, residues that it predicts have an allosteric effect.

We have added the confusion matrices of our method in different scenarios (Supplementary file 8) in the revised manuscript. It should be noted that the known real positive data is very limited. At the same time, there is little known real negative data, and we regarded the data of unknown functions as negative data. It can be seen that KeyAlloSite has high recall (0.92 and 1.00) in the prediction of allosteric sites and the prediction of key allo-residues in allosteric pockets. The recall of KeyAlloSite in the prediction of functional post-translational modification (PTM) sites, pathogenic mutations and enzymes are not very high, mainly because there are limited known data on functional PTM sites and pathogenic mutations, and the functions of many key allo-residues predicted by KeyAlloSite are unknown. It remains to be further verified whether they have the function of allosteric regulation. Moreover, since we are now taking the predicted key allo-residues of unknown function as negative samples, our values for recall here are lower bounds of recall. The prediction list of KeyAlloSite provides a good starting point for further experimental studies and we believe that the prediction accuracy would be increased after the functions of predicted residues were experimentally identified.

3) In addition to the comparison, the authors need to establish that the signal coming from comparison of allosteric vs. orthosteric is different from comparing any two random patches with same number of residue pairs.

As suggested, we have added the difference between the evolutionary coupling between orthosteric and allosteric pockets and the evolutionary coupling between two random patches in the protein in the revised manuscript (The last paragraph of The evolutionary coupling between orthosteric and allosteric sites is stronger). For each protein in the data set, two residues that are not part of the orthosteric and allosteric sites were randomly selected from the surface residues. Among them, one was taken as the first center and the residues around it with the same number as the residues in orthosteric pocket were selected as patch1; and the other residue was taken as the second center and the residues around it with the same number as the residues in allosteric pocket were selected as patch2. Then we calculated the evolutionary coupling strength between the patch1 and patch2. The process was repeated four times, and then the mean and standard deviation of the evolutionary coupling strength were calculated. We then compared the evolutionary coupling strength between patch1 and patch2 with that between orthosteric and allosteric sites. The results showed that the evolutionary coupling strength between orthosteric and allosteric sites was significantly higher than that between two random patches (Figure 2—figure supplement 3). In other words, there is intrinsic evolutionary coupling between orthosteric and allosteric sites, which is different from the evolutionary coupling between any two random patches.

4) A complete picture of the predictive power with both strengths and weakness of the tool should be presented.

We have added discussions on the strengths and weaknesses of our approach in the revised manuscript (The second paragraph of Discussion). The main strength of our method is that it is the first systematic and efficient computational method to predict key allosteric residues in allosteric sites that are primarily responsible for allosteric signaling. The weakness of our method is that as KeyAlloSite attempts to capture coevolutionary coupling between residues from multiple sequence alignment (MSA), it requires that MSA should contain sufficient homologous and diversified sequences. For the MSAs with only a few homologous sequences, KeyAlloSite usually cannot give accurate predictions. How to reduce the number of homologous sequences required remains further research.

Reviewer #1 (Public Review):

Allostery refers to processes whereby a change at one site of a biological macromolecule affects the structure and dynamics at another distinct functional site, enabling the regulation of the corresponding function. Xie et al. developed an in-silico method to predict residues involved in allosteric regulation using a coevolution-based method. A fast and accurate method of identifying key residues responsible for allosteric signalling is important for drug design purposes and protein engineering.

We thank the reviewer for the positive remarks and encouragement.

Strengths:

1) The authors applied their method to multiple targets from different protein families to test their method.

2) The method is able to predict in a retrospective analysis certain residues involved in allosteric communication between orthosteric and allosteric binding sites.

Weaknesses:

1) There are several tools used in statistical genomics to predict allosteric communication pathways. Even though the paper tries to demonstrate the ability to predict residues that are involved in allosteric communication, KeyAllosite is not compared with any other state-of-the-art tool that does the same (1-3), which would highlight the strengths of this method with respect to existing ones.

We thank the reviewer for the comment. We have compared our method with statistical coupling analysis method that has been used to predict allosteric communication pathways. Please refer to our response to the first comment in Essential Revisions. We would like to emphasize that our method was developed mainly for predicting key allosteric residues and not for predicting allosteric communication pathways.

2) The authors mention that the number of effective homologue sequences affects the probability of finding an allosteric site in the top 3 scored sites, however, Cdc4 and AR2 have also a low number of effective homologue sequences (Figure 1.A) yet a high Z-score. No sufficient explanation is given regarding this discrepancy. This also demonstrates that a threshold in the Neff under which KeyAlloSite becomes unreliable should be defined.

We thank the reviewer for the comment. A high number of effective homologous sequences guarantee the high probability of the correct prediction of allosteric site. Furthermore, the prediction power also depends on several other factors, such as when the allosteric site appears in evolution, how many effective homologous sequences contain the allosteric site in a particular protein family, and the size of allosteric pockets. We have built the protein sequence based phylogenetic tree of AR1 homologous proteins and found that AR1 located near the tail of the phylogenetic tree (Figure 2—figure supplement 1 in the revised manuscript). This implies that this allosteric function did not exist in the early evolutionary period and the allosteric function may have appeared in the late stage of evolution. Due to the relatively large number of sequences in the early evolutionary period and relatively few sequences in the late stage of evolution, the allosteric signal was weak. For simplicity, we defined the threshold of the number of effective homologous sequences (Neff) as 100 (Section The allosteric protein data set in Methods and Materials). When Neff is less than 100, the prediction result may become unreliable, but a low number of Neff does not necessarily correspond to a small Z-score. For example, Cdc4 and AR2 have high Z-scores despite their small Neff.

3) From the low Z-score of CYP3A4, the authors claim that the conservation of residues in the orthosteric site is important for getting an accurate prediction of the coupling strength. It is not clear though if and how is this aspect encoded in the KeyAlloSite. From the description of the method, it seems like the algorithm does not check for the level of conservation of the orthosteric residues. An explanation as to why has it not been incorporated into the algorithm is necessary. The conservation at a given site in an MSA defined as the overall deviance of amino acid frequencies at that site from their mean values, in combination with the statistical coupling of two sites has been shown to be important in the development of allosteric models (4).

We thank the reviewer for the comment. Since orthosteric and allosteric sites are functionally coupled and coevolve, we mainly focus on the evolutionary coupling between residues in orthosteric and allosteric sites. We did not explicitly encode the conservation of orthosteric residues in KeyAlloSite as orthosteric residues of most proteins are relatively conserved and ECM already contains sequence conservation information. When the conservation of residues in the orthosteric site varies largely as in the case of CYP3A4, it will be hard to give reliable prediction. We have discussed this issue in the revised manuscript (The first paragraph of The evolutionary coupling between orthosteric and allosteric sites is stronger).

4) In the case of AuroraA, the authors do not explain why other Ser/Thr/Tyr residues scored higher than T287 and T288, or if their higher scores are an artefact. However, in many cases, post-translational modifications take place by secondary partners and, therefore, the coupling of a post-translational modification site with the orthosteric site cannot be used to predict such sites. Post-translational modification sites are expected to have a strong coupling with residues of the upstream/downstream effector that is responsible for the modification, rather than with residues of the orthosteric site of the protein.

We thank the reviewer for the comment. For the three residues T235, S245 and S249 with higher scores than T287 and T288, currently no experimental data are available to verify the effect of their phosphorylation on protein function. Further studies are needed to investigate whether these three residues have allosteric regulatory function, and we think these three residues are very worthy of further study.

It is true in many cases, post-translational modifications (PTMs) take place by secondary partners, and the coupling between a PTM site and residues of upstream/downstream effector determines whether this PTM reaction will happen. However, whether this PTM would influence protein function is encoded in its coupling with the protein orthosteric site as functionally coupled residues usually coevolve. A recent study has also shown that there are strong coevolutionary couplings between post-translational modification sites and orthosteric sites (Zhu, et al., J Chem Inf Model. 2022; 62:3331).

5) The authors defined allo-residues as the residues whose Z-score is >0.8, but there is no strong argument regarding the choice of this threshold. In the case of the allosteric pockets, for example, the authors use a threshold of 0 to identify pockets with strong coupling strength.

We thank the reviewer for the comment. We chose this threshold to ensure that most of the known key allo-residues can be correctly predicted by KeyAlloSite, and at the same time, the number of predicted key allo-residues should be as small as possible. We calculated the number of known key allo-residues that could be predicted by KeyAlloSite (Supplementary file 5) and the ratio of predicted key allo-residues in all residues of allosteric pockets of all proteins in the data set (Figure 3—figure supplement 1) for thresholds of 0.5, 0.6, 0.7, 0.8, 0.9, 1.0. It can be seen from the Supplementary file 5 that when taking 0.5, 0.6, 0.7 and 0.8 as thresholds, KeyAlloSite could correctly predict the known key allo-residues in BCR-ABL1, Tar and PDZ3. But when taking 0.9 and 1.0 as thresholds, several known key allo-residues could not be correctly predicted. Moreover, when the threshold was 0.8, the ratio of predicted key allo-residues among all residues of allosteric pockets is less than that of 0.5-0.7 (Figure 3—figure supplement 1). Therefore, we finally chose the threshold as 0.8. We have added this explanation in the revised manuscript (The first paragraph of Coevolution analysis revealed key allo-residues in allosteric pockets).

6) In the case of the Tar, it would be good if the authors reported the results starting from an apo structure as well and see if the method is able to find Y149 and Q152. That way, they could test how sensitive/biased the method is to the chosen conformation. If no apo structure is available, maybe another protein where both an apo and a holo structure exist could be used for this purpose.

We thank the reviewer for the suggestion. We have added the results using the apo structure of Tar in the revised manuscript (The first paragraph of KeyAlloSite correctly identified key allo-residues in other proteins not in the data set). We first used CAVITY to identify all the potential ligand binding pockets on the surface of chain B in the apo structure of Tar (PDB ID: 4Z9J). Of the eight pockets found, cavity_5 is the allosteric pocket, which contains 24 residues. As in the holo structure, the 16 residues (A166-T181) in the C-terminal of the α4 helix were chosen as the orthosteric site. In the prediction results, Y149' and Q152' ranked fourth and fifth among the 24 residues, with corresponding Z-scores of 1.01 and 0.66, respectively. Therefore, KeyAlloSite could also correctly predict the key allo-residue Y149 in the apo structure. For Q152, its Z-score is slightly smaller than the threshold of 0.8, though with a high ranking. This indicates that conformational changes do have subtle influence on the predicted results, probably mainly due to the change of residue composition in the allosteric pocket detected by CAVITY, which will lead to some fluctuations in the predicted key allo-residues. However, when the conformational changes between apo and holo states are not large, the influence on the results is small.

7) The data in Table 1 is not sufficiently convincing. It would be helpful to also report how many of the identified residues by KeyAlloSite are indeed involved in oncogenic mutations or find another metric to quantify the success rate of KeyAlloSite.

We thank the reviewer for the suggestion. Among the 51 KeyAlloSite predicted key allo-residues in 7 human proteins, 11 have been found as oncogenic mutations in the Allo-Mutation database. We have added related information in the revised manuscript (KeyAlloSite identified pathogenetic mutations in human proteins). With the increasing number of known allosteric residues involved in oncogenic mutations, we expect that KeyAlloSite can identify more pathogenetic mutations.

8) The success rate of the method to predict key residues of the function of CALB (38% success rate based on the reported residues in the literature) and CMS (20%), should be interpreted with caution. It may be the case that the rest of the predicted residues have not been tested for their functional role, or that the method has indeed a low success rate in predicting key residues for allosteric communication.

We thank the reviewer for the suggestion. Wu et al. identified 63 annotated functional residues in CALB from literature (FASEB J 2020, 34, 1983), of which 8 were in the substrate binding pocket and the remaining 55 annotated functional residues were outside the substrate binding pocket. There are 296 residues outside the substrate binding pocket, and the annotated functional residues account for 18.6% of all residues, so it is difficult to predict these annotated functional residues. Wu et al. developed the SCA.SIM method and found that the success rate of SCA.SIM was better than that of SCA. The top 27 ranked residues predicted by SCA.SIM contained 11 annotated functional residues, 6 of which are in the substrate binding pocket and 5 of which are outside the substrate binding pocket. This means that the predicted success rate of the annotated functional residues outside the substrate binding pocket is 18.5%. In comparison, the top 27 ranked residues predicted by KeyAlloSite contained 7 annotated functional residues outside of the substrate binding pocket with a success rate of 25.9%. The top 38 ranked residues predicted by SCA.SIM contained 12 annotated functional residues, 6 of which are in the substrate binding pocket and 6 of which are outside the substrate binding pocket with a prediction success rate of 15.8%. In comparison, the top 38 ranked residues predicted by KeyAlloSite contained 9 annotated functional residues outside of the substrate binding pocket with a success rate of 23.7%. Therefore, the success rates of KeyAlloSite was slightly higher than that of SCA.SIM in predicting annotated functional residues outside of the substrate binding pocket. For the rest of the predicted residues that have not been tested for their functional role, we reason that they are likely to allosterically regulate the activity of the enzyme, and further experiments are needed to test whether they really have allosteric regulatory functions.

In lines 38-41, the authors refer to SARs of allosteric drugs as "flat", which is not clear what they refer to.

“Flat” SARs (also sometimes referred to as “Shallow” SARs) means that SARs are not robust and a modest structural modification may destroy the activity of a potent allosteric modulator. More explanations on this have been added in the revised manuscript.

It is not clear if the so-called, "allo-residues", that the authors define in lines 49-52 refer to residues that affect the binding or the signalling.

“Allo-residues” refer to residues that mainly affect signaling. More explanations on this aspect have been added in the revised manuscript.

It would be helpful for the reader if the authors included a section where they describe the method itself and the key steps of developing their model before presenting the results. That way, the reader could follow the results easier.

We have added a section describing the method and the key steps of developing the model before presenting the results in the revised manuscript (The last paragraph of the Introduction).

The authors classified L359 of BCR-ABL1 as an allo-residue and justified the importance of this residue based on the fact that a weaker ligand does interact with this residue. For transparency, it would be good for the authors to report the allosteric scores of all 44 residues in the allosteric site to show that this residue was not cherry-picked and that the method can pick up all the residues that are important for the binding.

We have added the Supplementary file 6, which reported the allosteric scores of all 44 residues in the allosteric site.

The manuscript has several typos and language mistakes (e.g. "screening" instead of "screen" in the Abstract, missing articles, etc.) that should be corrected prior to a resubmission.

We have carefully checked the language and made necessary corrections in the revised manuscript.

Reviewer #2 (Public Review):

The authors designed a statistical approach to analyse coevolution scores from a protein and predict allosteric residues. This approach relies on comparison of residues from the two sites (allosteric vs. orthosteric) and omits the rest of the protein.

While the approach is logical, the predictive power has not been clearly established. To demonstrate the effectiveness of the approach a confusion matrix should be provided as it will show the predictive power of the method in the different scenarios presented in the manuscript (PTM's, enzymes, pathogenic mutations, etc.).

An effective method, along these lines, will have significant impact on protein and drug design.

We thank the reviewer for the comments and encouragement. To demonstrate the predictive power of our method, we have given the confusion matrices in different scenarios (Supplementary file 8) in the revised manuscript. In different scenarios, the known real positive data is very limited. At the same time, there is little real negative data, and we regarded data with unknown functions as negative data. It can be seen that KeyAlloSite has high recall in the prediction of allosteric sites and the prediction of key allo-residues in allosteric pockets. The recall of KeyAlloSite in the prediction of functional post-translational modification (PTM) sites, pathogenic mutations and enzymes are not very high, mainly because there are limited known data on functional PTM sites and pathogenic mutations, and the functions of many key allo-residues predicted by KeyAlloSite are unknown. It remains to be further verified whether they have the function of allosteric regulation. Moreover, since we are now taking the predicted key allo-residues of unknown function as negative samples, our values for recall here are lower bounds of recall. The prediction recall of KeyAlloSite will be higher if the residue of the unknown function we predicted is later determined to have an allosteric regulation function.

It would be helpful if statistics on the distribution of significant scores from E row comparison after the t-test are provided. Further, you have so far compared allosteric with orthosteric, it would be interesting to compare the same distribution/statistics with allosteric vs. non-orthosteric residues. It may serve as a very good benchmark. The idea would be to see if the events of significance from allosteric vs. orthosteric are different/unique if we pick and compare any two random patches in the protein.

As suggested, we have added some examples of distributions of the statistics corresponding to significant scores obtained from the t-test (Figure 3—figure supplement 2).

We also have added the difference between the evolutionary coupling between orthosteric and allosteric pockets and the evolutionary coupling between two random patches in the protein in the revised manuscript (The last paragraph of The evolutionary coupling between orthosteric and allosteric sites is stronger). For each protein in the data set, two residues that are not part of the orthosteric and allosteric sites were randomly selected from the surface residues. Among them, one was taken as the first center and the residues around it with the same number as the residues in orthosteric pocket were selected as patch1; and the other residue was taken as the second center and the residues around it with the same number as the residues in allosteric pocket were selected as patch2. Then we calculated the evolutionary coupling strength between the patch1 and patch2. The process was repeated four times, and then the mean and standard deviation of the evolutionary coupling strength were calculated. We then compared the evolutionary coupling strength between patch1 and patch2 with that between orthosteric and allosteric sites. The results showed that the evolutionary coupling strength between orthosteric and allosteric sites was significantly higher than that between two random patches (Figure 2—figure supplement 3). In other words, there is intrinsic evolutionary coupling between orthosteric and allosteric sites, which is different from the evolutionary coupling between any two random patches.

A boxplot of Zscores for All, top200, top300, top400 pairs from Figure 1—figure supplement 2 would be very informative.

We have added the boxplot of Z-scores for All, top200, top300, top400 pairs from Figure 1—figure supplement 2 as suggested (Figure 2—figure supplement 2C).

As you have been using top zscore residues (>0.5 or >0.9 for enzymes), direct or indirect evidence for the statement in lines 306-307 is not very apparent from the provided data. This has to be flushed out further.

Thank you for the suggestion. We feel sorry for not explaining clearly how weak terms in Jij were used in our prediction. Since orthosteric and allosteric pockets are two different pockets, the residues in the two pockets generally do not contact. Therefore, when we calculate the coevolutionary coupling between residues in the two pockets, most of the evolutionary coupling values between residues in the two pockets are relatively small, that is, they correspond to the weak terms in Jij, as can be seen in Figure 1D. Then we got the number of significant differences of each residue by comparing the differences of these weak evolutionary coupling values of each residue in the allosteric pocket, and normalized the number of significant differences to Z-scores. A larger Z-score indicates that the evolutionary coupling values of this residue are significantly different from the evolutionary coupling values of more residues. In other words, Jij and Z-score are two different metrics. We have added this detailed explanation of the use of the weak terms in Jij in the revised manuscript (The first paragraph of Coevolution analysis revealed key allo-residues in allosteric pockets).

Please annotate residues for which experimental data is available to compare with keyallosite predictions for example in Supplementary files 4 and 5. In the main text (line 284) you only mention one residue out of 52 to demonstrate the effectiveness of the prediction. Again a confusion matrix would help here.

Thank you for the suggestion. We have bolded the predicted key allo-residues that were annotated as functional residues by the experimental data in the previous literature in the revised Supplementary files 4 and 7 as suggested. In the main text (line 284), V225M is a representative of the functional residues already annotated with experimental data in CALB, and other functional residues annotated with experimental data are included in revised Supplementary file 7. We have added another example, V37I, and the prediction accuracy calculated based on confusion matrix was 74.0%. Although the prediction accuracy here is not very high, it is the lower bound of the prediction accuracy of KeyAlloSite. The prediction accuracy of KeyAlloSite will be higher if the residue of the unknown function we predicted is later determined to have an allosteric regulation function.

Can you please clarify if in Figure 2, number of residues refers to the number of residues from allosteric+orthosteric site?

Thank you for pointing this out. The number of residues refers to the number of residues from allosteric site. We have added this detailed description in the revised Figure which is now Figure 3.

Figure 6 may be ideal as Figure 1 to inform the readers about the protocol of this work.

Thank you for the suggestion. We have adjusted Figure 6 as Figure 1 in the revised manuscript as suggested.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

We thank the Senior Editor, the Reviewing Editor and the reviewers for the comments and suggestions which are helpful to improve our work.

1. Can you please provide a detailed description and expand the discussion on the predicted amino acids and why the predicted residues are not the top ranking ones?

Thank you for the suggestion. We have added the detailed description and discussion on the predicted other top-ranking key allo-residues with unidentified function (The second paragraph of Discussion), the comparison to the approach of Hopf et al. (The third paragraph of Discussion), the possible effect of protein conformational changes on the prediction results (The fifth paragraph of Discussion) and possible applications of the predicted key allo-residues (The last paragraph of Discussion) in the revised manuscript.

2. The theoretical formulation to extract the coupling score and a compare/contrast to Hopf et al. approach would be highly beneficial to the readers.

We have added the theoretical formulation for extracting the coupling scores (The first paragraph of The Evolutionary Coupling Model (ECM)) and the comparison to the approach of Hopf et al. (The third paragraph of Discussion) in the revised manuscript. The theoretical formulation for extracting the coupling scores between residues in the initial step of our method is the same as that in the method of Hopf et al., but the problem studied by us is different from that of Hopf et al., and the usage of the coupling scores between residues in the later steps is different. Hopf et al. used the coevolutionary coupling scores between residues to predict the effects of mutations by calculating the difference in statistical energy between mutant and wild-type sequences. In contrast, we used the coevolutionary coupling scores between residues to predict allosteric sites and predict the key allo-residues in allosteric pockets that are mainly responsible for allosteric signaling by pairwise comparing the difference of the coevolutionary coupling scores of residues in allosteric pockets. Although Hopf et al. highlighted the significance of the pairwise term for the prediction of mutation effects, we highlighted the importance of the weak pairwise term for the study of allosteric function.

Reviewer #1 (Recommendations for the authors):

The authors did extensive work trying to address the revision comments, and their effort is much appreciated.

Nevertheless, in the absence of further experimental validation of the importance of the predicted so-called key "allo-residues" in mediating the signal between the orthosteric and allosteric binding site, it is still hard to assess the predictive power of the presented method. There are still residues for example that score higher than known functional residues (see for example residues T235, S245 and S249 in the case of Aurora A) whose implication in signal transduction is not confirmed, or residues whose importance depends on the conformation chosen for analysis despite the coevolutionary conservation metric used to score the residues (Q152 of Tar whose score is way below the 0.8 threshold when an apo structure is considered, which shouldn't be the case given that AlloSite uses only MSA and coevolutionary information for the scoring – it seems like other residues are predicted to be way more important based on the allosteric pocket definition given by CAVITY to decrease the score of Q152 to <0.8 after normalisation).

I fully understand that such experimental validation goes beyond the capacities of a computational group, however, as a reader, I might be hesitant to try this method.

We thank the reviewer for the comments. We agree that experimental validation of the newly predicted key "allo-residues" in mediating allosteric signaling will further verify the predictive power of our method. We are planning to carry out experimental validation studies in the future. Currently we only have a small number of proteins with limited experimental information about key allo-residues. Among these proteins, our predicted key allo-residues rank high among all the alloresidues. At the same time, there are residues with higher scores than these known functional residues, which may play important roles in allosteric signaling. Of course, further experimental validation will be necessary to study the function of these top-ranking residues. With more experimental data at hand, the method can be further validated and improved in the future.

The scores of the key allo-residues predicted by KeyAlloSite depend not only on the coevolutionary information, but also on the residues that constitute the allosteric pockets (Figure 1, see Materials and methods for details). For Tar, the evolutionary couplings between residues are the same for the apo and holo conformations, but the allosteric pockets found by CAVITY in the two conformations contain a number of different residues. Because the prediction of key allo-residues by KeyAlloSite requires pairwise comparison of residues in allosteric pockets, the predicted key allo-residues in the two conformations were slightly different. For the apo Tar, on the one hand, although the score of Q152 is 0.66, which is less than the threshold of 0.8, Q152 ranked high among all residues in the allosteric pocket with a ranking of 5/24. When we lower the threshold slightly, we will be able to correctly predict Q152. On the other hand, Bi et al. showed that although Y149 and Q152 are both key allo-residues, Y149 seems to be more important as allosteric signaling can be conducted when the allosteric molecule only interacts with it and not Q152 (PNAS 2013 110:16814-16819. DOI: https://doi.org/10.1073/pnas.1306811110). Therefore, although Q152 was not predicted in the apo Tar, allosteric molecules can still be optimized based on the Y149 predicted by KeyAlloSite.

Reviewer #2 (Recommendations for the authors):

The authors have carefully addressed all the comments from the previous review.

Few additional comments for authors to consider:

The Discussion section can be enriched further. At present it only discusses two points, i.e., ability of the method to predict key allosteric residues and requirement of sequence depth. For example: the last two paragraphs 565-580 are repetitive as they both highlight the need for depth in sequence alignments which is a known issue for MSA dependent methods, as again discussed by Hopf et al. Nature Biotechnology volume 35, pages 128-135 (2017). Further, the theoretical formulation to extract the coupling score is identical to Hopf et al. and a compare/contrast of the two approaches would be highly beneficial to the audience. Even Hopf et al. highlight the significance of the pairwise term, as done in this article.

We thank the reviewer for the suggestion. We have added the discussion on the predicted other topranking key allo-residues with unidentified function (The second paragraph of Discussion), the comparison to the approach of Hopf et al. (The third paragraph of Discussion), the possible effect of protein conformational changes on the prediction results (The fifth paragraph of Discussion) and the utility of the predicted key allo-residues (The last paragraph of Discussion) in the revised manuscript.

For the last two paragraphs, although they both mentioned the need for depth in sequence alignments, the two paragraphs emphasized different points. The first paragraph emphasized the need for sufficient homologous sequences, while the second paragraph emphasized that although our method needs to use the three-dimensional structure of a protein, when a protein does not have a known three-dimensional structure but has a certain number of homologous sequences, our method can also be applied.

The theoretical formulation for extracting the coupling scores between residues in the initial step of our method is the same as that in the method of Hopf et al., but the problem studied by us is different from that of Hopf et al., and the usage of the coupling scores between residues in the later steps is different. Hopf et al. used the coevolutionary coupling scores between residues to predict the effects of mutations by calculating the difference in statistical energy between mutant and wild-type sequences. In contrast, we used the coevolutionary coupling scores between residues to predict allosteric sites and predict the key allo-residues in allosteric pockets that are mainly responsible for allosteric signaling by pairwise comparing the difference of the evolutionary coupling scores of residues in allosteric pockets. Although Hopf et al. highlighted the significance of the pairwise term for the prediction of mutation effects, we highlighted the importance of the weak pairwise term for the study of allosteric function.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Figure 2—source data 1. Raw data for Figure 2.
    Figure 2—figure supplement 2—source data 1. Raw data for Figure 2—figure supplement 2.
    Figure 2—figure supplement 3—source data 1. Raw data for Figure 2—figure supplement 3.
    Figure 3—source data 1. Raw data for Figure 3.
    Figure 3—figure supplement 1—source data 1. Raw data for Figure 3—figure supplement 1.
    Figure 3—figure supplement 3—source data 1. Raw data for Figure 3—figure supplement 3.
    Figure 4—source data 1. Raw data for Figure 4.
    Figure 5—source data 1. Raw data for Figure 5.
    Figure 6—source data 1. Raw data for Figure 6.
    Supplementary file 1. Information of the allosteric proteins in the data set.
    elife-81850-supp1.docx (29.8KB, docx)
    Supplementary file 2. List of the Z-scores and ranking of allosteric pockets in the data set.
    elife-81850-supp2.docx (27.6KB, docx)
    Supplementary file 3. KeyAlloSite prediction results of Aurora A kinase.
    elife-81850-supp3.docx (28.8KB, docx)
    Supplementary file 4. List of the predicted key allo-residues in allosteric pockets.
    elife-81850-supp4.docx (28.8KB, docx)
    Supplementary file 5. Key allo-residues predicted by KeyAlloSite with different cutoffs.
    elife-81850-supp5.docx (25.4KB, docx)
    Supplementary file 6. KeyAlloSite prediction results of tyrosine-protein kinase ABL1.
    elife-81850-supp6.docx (27.2KB, docx)
    Supplementary file 7. The key allo-residues predicted by our method on Candida antarctica lipase B.
    elife-81850-supp7.docx (28.7KB, docx)
    Supplementary file 8. The confusion matrices of KeyAlloSite in different scenarios.
    elife-81850-supp8.docx (26.9KB, docx)
    Supplementary file 9. Comparison of KeyAlloSite and SCA methods.
    elife-81850-supp9.docx (25.9KB, docx)
    MDAR checklist

    Data Availability Statement

    The data that support the results of this study are in the Supplementary data, including information of the allosteric proteins in the data set (Supplementary file 1); list of the Z-scores and ranking of allosteric pockets in the data set (Supplementary file 2); KeyAlloSite prediction results of AurA kinase (Supplementary file 3); list of the predicted key allo-residues in allosteric pockets (Supplementary file 4); key allo-residues predicted by KeyAlloSite with different cutoffs (Supplementary file 5); KeyAlloSite prediction results of tyrosine-protein kinase ABL1 (Supplementary file 6); the key allo-residues predicted by our method on CALB (Supplementary file 7); the confusion matrices of KeyAlloSite in different scenarios (Supplementary file 8); comparison of KeyAlloSite and SCA methods (Supplementary file 9); phylogenetic tree of androgen receptor (Figure 2—figure supplement 1); comparison of ECS between pockets when all residue pairs and partial residue pairs were used (Figure 2—figure supplement 2); difference between the EC between orthosteric and allosteric sites and the EC between two random patches (Figure 2—figure supplement 3); distribution of the ratios of the number of key allo-residues predicted by KeyAlloSite in the number of all residues in allosteric pockets when using different cutoffs in all proteins (Figure 3—figure supplement 1); examples of distributions of the statistics corresponding to significant scores obtained from the t-test (Figure 3—figure supplement 2); and random sampling of homologous sequences (Figure 3—figure supplement 3). The homologous sequences of the proteins in the data set are available in the following GitHub repository: https://github.com/huilan1210/KeyAlloSite, Xie et al., 2023.

    All data that support the results of this study are included in the manuscript, supplementary files, and GitHub repository (https://github.com/huilan1210/KeyAlloSite; copy archived at swh:1:rev:333d4a48d2570c74806f68a1247611ed64794b97). Source Data files have been provided for all Figures (except Figure 1 and Figure 2-figure supplement 1).


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES