Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 May 5;292(16):4211–4231. doi: 10.1111/febs.70121

Electrostatic potential as a reactivity scoring function in computer‐assisted enzyme engineering

Aitor Vega 1, Antoni Planas 1,2,, Xevi Biarnés 1,
PMCID: PMC12366256  PMID: 40322838

Abstract

The high catalytic efficiency of enzymes is attained, in part, by their capacity to stabilize electrostatically the transition state of the chemical reaction. High‐throughput protocols for measuring this electrostatic contribution in computer‐assisted enzyme design are limited. We present here an easy‐to‐compute metric that captures the electrostatic complementarity of the enzyme to the charge distribution of the substrate at the transition state. We demonstrate such a complementarity for a representative dataset of glycoside hydrolases, a large family of enzymes responsible for the hydrolytic cleavage of glycosidic bonds in oligosaccharides, polysaccharides, and glycoconjugates. We have implemented this metric in BindScan, a computer‐based mutational analysis protocol to assist protein engineering. We demonstrate the predictive power of BindScan with this metric for two mechanistically distinct glycoside hydrolases: Spodoptera frugiperda β‐glucosidase (Sfβgly, operates via protein nucleophile catalysis) and Bifidobacterium bifidum lacto‐N‐biosidase (BbLnbB, operates via substrate‐assisted catalysis). The metric correctly predicts sequence positions sensible to the modulation of k cat/K M upon mutation from an experimental benchmark of 51 mutants of Sfβgly with 77% classification efficiency and identifies variants of BbLnbB with improved transglycosylation yields (up to 32%). Based on electrostatic potential and ligand affinity calculations, as implemented in BindScan, we propose a rational strategy to design glycoside hydrolase variants with improved transglycosylation efficiency for the synthesis of added‐value glycoconjugates. The new reactivity metric may contribute to expanding the range of computational protocols available to assist enzyme engineering campaigns aimed at optimizing mechanistically relevant properties.

Keywords: binding affinity, computational protein engineering, electrostatic potential, glycoside hydrolases, transglycosylation


We present a computational metric that quantifies the electrostatic complementarity of enzymes to the charge distribution of substrates at the transition state. Implemented in BindScan, this metric accurately predicts functionally relevant mutations in glycoside hydrolases. In combination with ligand binding affinity measurements, this approach enhances enzyme engineering strategies, facilitating the rational design of glycoside hydrolases and transglycosidases for biotechnological applications.

graphic file with name FEBS-292-4211-g012.jpg


Abbreviations

AUC

area under the curve

BbLnbB

Bifidobacterium bifidum lacto‐N‐biosidase

GH

glycoside hydrolase

Glc‐pNP

p‐nitrophenyl‐β‐d‐glucopyranoside

LNB

lacto‐N‐biose

LNT

lacto‐N‐tetraose

ROC

receiver operating characteristic

Sfβgly

Spodoptera frugiperda β‐glucosidase

Introduction

Enzymes, the actual machinery of life, catalyze a myriad of biochemical reactions with remarkable efficiency and specificity. Catalytic efficiency is significantly enhanced by the ability of enzymes to dynamically pre‐organize the reaction environment. The catalytic elements are precisely positioned in the enzymatic active site to (a) bind the substrate in a pre‐catalytic conformation by specific non‐covalent interactions, (b) electrostatically stabilize the transition state, and (c) favor the release of products for turnover. The molecular complementarity between each enzyme and cognate substrate pair has naturally evolved in a specific manner for metabolically essential enzymes. Nevertheless, some degree of substrate promiscuity and sub‐optimal efficiency is also apparent for less evolutionary constrained enzymes [1, 2]. Industrial biotechnology applications of enzymes take advantage of such promiscuity to catalyze the sustainable synthesis of a non‐natural landscape of chemicals and pharmaceuticals [3, 4, 5]. However, enzyme‐substrate interactions are not innately optimized in such cases. Therefore, it is expected that there will be much room for the improvement of the catalytic efficiency of enzymes acting on non‐cognate substrates.

Directed enzyme evolution has emerged as a useful laboratory technique to engineer the catalytic properties of enzymes through the application of natural evolution rules [6, 7]. Properties such as thermostability, robustness in hostile organic solvents, and enantioselectivity, which were not selected for in the original organism, can be altered by inducing mutation, recombination, and selection in a controlled manner [8, 9, 10, 11, 12, 13]. An effective evolutionary strategy that reduces the necessary molecular biological work and the screening effort is to perform iterative cycles of saturation mutagenesis at rationally chosen sites in an enzyme [11, 14, 15]. This strategy induces high evolutionary pressure in a well‐chosen and confined region of the protein sequence space. The extensive availability of bioinformatic tools and its combination with molecular simulations has proven to be effective for selecting randomization sites aimed at improving catalytically relevant properties [16], such as protein stability [17], enhancing region and enantioselectivity [18, 19, 20, 21] and tuning substrate specificity [22, 23, 24, 25, 26]. Many of these strategies are conceived as the generation of a virtual library of mutant enzyme structures followed by the evaluation of changes in ligand binding affinity or thermal stability at the active site as an indirect measure of enzymatic reactivity. The recent advances in artificial intelligence have also paved the way to the rise of deep learning algorithms to enhance protein stability [27, 28], to predict enzyme kinetics parameters [29, 30], and generative models aimed at guiding protein and enzyme engineering with remarkable results. For recent reviews, see [31, 32].

The application of molecular simulation methods for assessing enzyme reactivity intrinsically, rather than protein stability or substrate binding, in a high‐throughput manner remains limited. The catalytic power of enzymes is largely attributed to the electrostatic stabilization of the transition state [33]. Such an effect can be analyzed by electronic structure calculations, which have proven successful in case‐by‐case studies to delineate catalytic itineraries of enzymes and assess the effect of few point mutations [34, 35, 36]. There are a few examples for which protein electrostatics can explain the high catalytic efficiency achieved by some de novo designed enzymes, and therefore, it has been suggested that such measurements can guide enzyme design [37, 38]. Yet, quantum mechanics calculations used in these studies are laborious and hinder the screening of thousands of mutants in a protein engineering campaign. Nonetheless, the mechanistic features learned from such detailed methods can inspire the development of simpler scoring systems amenable to the high‐throughput analysis of large mutant libraries.

Here, we introduce a fast and easy‐to‐compute metric to guide computer‐assisted enzyme engineering focused on enzymatic reactivity. The metric is based on implicit solvent electrostatic potential calculations, and it evaluates the changes in electrostatic potential gradient at the catalytic site upon mutation of surrounding amino acids. We have implemented this metric in the BindScan computational pipeline [39], a protocol that combines standard modeling techniques, virtual docking calculations, and now, electrostatic potential calculations, on an ensemble of protein structures derived from an exhaustive library of enzyme mutants (Fig. 1). Similar ways to generate virtual libraries of mutant structures and posterior property evaluation are PROSS [17], CASCO [40], FuncLib [24] or AsiteDesign [25] among others. These pipelines typically focus on evaluating the changes in substrate binding affinity or stability upon mutation as a guide to computational enzyme engineering. We propose here a systematic and automatic evaluation of protein electrostatic potential gradients to complement such pipelines.

Fig. 1.

Fig. 1

BindScan computational pipeline extended with electrostatic potential calculations.

The application of the new electrostatic potential metric is exemplified here with two glycoside hydrolases (GH), a family of enzymes that catalyze the cleavage of glycosidic bonds in oligosaccharides, polysaccharides, and glycoconjugates [41, 42]. These enzymes have evolved towards well‐defined sets of regioselectivities and stereospecificities, being active on a wide repertoire of substrates. For this reason, many industrial applications of GHs have been devised in the biomass processing industry, biofuel production, food industry, and for the synthesis of added‐value compounds [43, 44]. Glycoside hydrolases are highly specific for their cognate substrate, and catalytic efficiency is attained in part by a proper control of the substrate distortions and charge distribution along the catalytic itinerary [45, 46]. However, how these features can be controlled by enzyme redesign is challenging. The complex network of enzyme‐carbohydrate interactions in the active site of glycoside hydrolases makes it difficult to choose sites for randomization to modify reactivity or improve catalytic efficiency besides introducing novel substrate specificities. The effect of the surrounding protein environment on the electronic structure of the sugar ring, which is important for transition state stabilization, is hard to predict. We address these features via a systematic protein side‐chain substitutions and evaluation of changes in the electrostatic potential complementarity to the charge distribution of the substrate at the transition state.

The two test GH enzymes are: Spodoptera frugiperda β‐glucosidase (Sfβgly) and Bifidobacterium bifidum lacto‐N‐biosidase (BbLnbB). The first enzyme was chosen given the large amount of experimental data available (single point mutation effects on kinetic parameters collected in the same study) [47]. This information was then used to assess the predictions made with our methodology. The second example illustrates how the new electrostatic potential metric can be applied to guide the engineering of glycosidases into transglycosidases. Under controlled conditions, retaining glycosidases can operate by forming glycosidic bonds between a properly chosen sugar donor and an acceptor. This process is known as transglycosylation, and it has been reported as a natural activity for some glycoside hydrolases [41, 48]. However, the product of the transglycosylation reaction is susceptible to being hydrolyzed again by the same enzyme, a process known as secondary hydrolysis (Fig. 2).

Fig. 2.

Fig. 2

Hydrolysis and transglycosylation itineraries followed by retaining glycoside hydrolases. Double displacement mechanism with an enzyme nucleophile (A) or with the substrate's 2‐acetamido group as a nucleophile (B). Figure adapted from Ref. [41] with permission of the author.

Many authors have attempted the (semi)rational engineering of glycosidases into transglycosidases in order to achieve efficient catalysts for the synthesis of oligosaccharides and glycoconjugates [39, 49, 50, 51]. Recently, a strategy based on the analysis of differential sequence conservation patterns in a large dataset of glycoside hydrolases has proven to be successful in guiding the design of efficient transglycosylation variants, without requiring explicit structural knowledge [51]. Nonetheless, a general protocol for a structure‐based design of efficient transglycosylases is still lacking given the complexity of modulating the hydrolysis/transglycosylation reaction balance at the molecular level. We will show here how the enzyme reactivity metrics based on electrostatic potential calculations, together with substrate binding affinity measurements, can assist in enhancing the transglycosylation yield of BbLnbB.

Results

Protein electrostatic potential complementarity to the transition state in glycoside hydrolases

The oxocarbenium ion‐like nature of the transition state in glycoside hydrolases (GHs) has been well characterized [45, 46, 52]. A positive charge is developed at the C1–O endocyclic bond, and a negative charge at the departing O‐aglycon (Fig. 3A). In order to stabilize such a transition state, we wondered if the electrostatic potential exerted by the enzyme could be complementary to such charge distribution. The protein electrostatic potential at the active site of both Spodoptera frugiperda β‐glucosidase (Sfβgly) and Bifidobacterium bifidum lacto‐N‐biosidase (BbLnbB) was evaluated in the absence of ligands (see Materials and methods). Indeed, our measurements reveal that there is an electrostatic potential gradient precisely located along the glycosidic bond that is to be cleaved during hydrolysis (Fig. 3B,C). A negative electrostatic potential patch is concentrated around the positive charge to be developed at the anomeric carbon of the substrate, and a less negative electrostatic potential patch around the departing O‐aglycon. Thus, the charge separation at the transition state is stabilized by the electrostatic complementarity between the enzyme active site and the oxocarbenium ion‐like transient structure. More interestingly, the electrostatic field lines are locally more intense along the direction of the glycosidic bond (Fig. 3D,E). The gradient in electrostatic potential that finely overlaps the scissile glycosidic bond is even more evident. These features are presented here in detail for two different GH families (GH1 and GH20), but we have observed similar patterns across representatives of each of the 18 GH clans with a hydrolytic mechanism [53] (Table S1). For all cases under evaluation, there is an intense density of electrostatic field lines that precisely emerges from the active site of GHs (see Fig. 4). In turn, the direction of the field lines follows the direction of the scissile glycosidic bond of the substrate for each individual case. Thus, the electrostatic potential complementarity between the enzyme active site and the substrate is a common characteristic at least for canonical GHs. This is indicative of the precise organization of GH active sites to maximize the electrostatic potential gradient along the glycosidic bond to favor catalysis, a feature not perceived before from the perspective of the enzyme in GHs. Thus, such a property may be used as a descriptor to guide enzyme design.

Fig. 3.

Fig. 3

Electrostatic properties of the transition state of retaining glycoside hydrolases. (A) Charge separation takes place at the substrate between the anomeric carbon (positive charge density distributed along endocyclic oxygen) and the departing group (negative charge density distributed along glycosidic oxygen and acid catalyst). (B) Protein electrostatic potential at the active site of Sfβgly. Ligand: p‐nitrophenyl‐β‐d‐glucopyranoside (Glc‐pNP) in ball and sticks, catalytic residues in sticks. Case of GH with catalytic nucleophile (as in Fig. 2A). (C) Protein electrostatic potential at the active site of BbLnbB. Ligand: lacto‐N‐tetraose (LNT) natural substrate. Case of GH with substrate‐assisted catalysis (as in Fig. 2B). (D) Protein electrostatic field lines at the active site of Sfβgly colored by electrostatic potential value as in (B). (E) Protein electrostatic field lines at the active site of BbLnbB colored by electrostatic potential value as in (C). Structures, electrostatic potential, and field lines rendered with vmd [75].

Fig. 4.

Fig. 4

Electrostatic potential gradient at the active site of representatives of each of the 18 GH clans (panel letters). Field lines are colored by electrostatic potential value as in Fig. 3. Protein backbone is represented as a cartoon in transparent gray. Catalytic residues and ligands are shown in thick lines. Structures and electrostatic field lines rendered with vmd [75].

Based on these observations, we propose an easy‐to‐compute score to evaluate changes in enzymatic activity upon mutation in silico. This reactivity metric evaluates variations upon mutation of such electrostatic potential gradient, measured at two groups of atomic positions of the substrate (Fig. 5A). The first group of atoms (G+) includes a representative selection of adjacent atoms in the region where a partial positive charge is known to be developed at the transition state of the enzymatic reaction. The second group of atoms (G) includes those atoms in the region where a partial negative charge is developed (see Materials and methods section for full details on the measurements). Despite both groups of substrate atoms being in close proximity, the protein electrostatic potential differences at their positions are statistically significant along the whole ensemble of protein structures (Fig. 5B). When combined with a mutant library generator, this metric will allow identifying mutations with improved or decreased transition state stabilization relative to the wild‐type enzyme, with an expected impact on catalytic efficiency.

Fig. 5.

Fig. 5

Electrostatic potential measurements on reference atomic positions. (A) Selection of atoms for the electrostatic potential metrics evaluation in glycoside hydrolases. The electrostatic potential of the protein is measured at the coordinates of the enclosed atoms in the image: G+ group atoms are circled in blue, while G are circled in red. (B) Distribution of protein electrostatic potential measurements at the active site of Sfβgly for both groups of atoms: EpotG+ in the blue boxplot and EpotG in the red boxplot. Both distributions are statistically distinct (P‐value < 2.2e–16, Welch Two Sample t‐test).

Application case 1: GH1 β‐glycosidase from Spodoptera frugiperda (Sfβgly)

A first example in which the reactivity metric is tested is the digestive enzyme β‐glycosidase from the farm warm Spodoptera frugiperda (Sfβgly). This enzyme catalyzes the hydrolysis of terminal monosaccharides from di‐ and oligosaccharides, with substrate preferences towards fucosides, glucosides, and galactosides [54]. It is a retaining glycosidase with enzyme nucleophile (hydrolysis mechanism as in Fig. 2A). Sfβgly is classified as a family 1 glycoside hydrolase (GH1) [53, 55] with experimentally determined 3D structure [47]. Substrate specificity of the enzyme has been addressed in different studies [56, 57]. Mutational effects of 51 mutants from 37 different positions involved in three functional regions of the Sfβgly active site on the hydrolysis of p‐nitrophenyl‐β‐d‐glucopyranoside have exhaustively been characterized [47]. This large amount of mutational data in equivalent experimental conditions is used here as a benchmark to assess the quality of the predictions on hydrolytic efficiency using the new reactivity metric. Practically all mutants in the experimental dataset show at least some residual activity (see Table S3), which implies that they are still able to recognize the substrate. The ability of the enzyme to recognize the substrate is an essential requirement to be catalytically active. So, to complement the interpretation of the predictions, ligand binding affinities were also measured.

Spots identification in Sfβgly to modulate enzyme reactivity

Amino acid positions along the sequence of Sfβgly sensible to the catalytic efficiency were evaluated with the new reactivity metrics implemented in BindScan. The test substrate was p‐nitrophenyl‐β‐d‐glucopyranoside (Glc‐pNP), a chromogenic ligand mimetic of the natural substrate used for the experimental determination of enzymatic activities [47]. The Michaelis complex between Sfβgly and Glc‐pNP was predicted with standard modeling techniques (see Materials and methods) and directly used as initial coordinates for the simulation (Fig. 6). This geometry is a good representative of the catalytic reaction initiation, as the substrate adopts a distorted conformation in which the aglycon (pNP leaving group) is located axially, it places the glycosidic oxygen properly oriented towards the general acid/base residue (E187), and the nucleophile residue (E399) is at an attacking distance of the anomeric carbon. There are 145 amino acid residues within 15 Å of any atom of the substrate. For these, a single site virtual saturation library of 2900 variants was built (including the wild‐type), structures predicted, and the reactivity metrics evaluated. We followed an extended version of the BindScan simulation protocol, introduced in [39], that combines homology modeling with modeller [58], electrostatic potential calculations with apbs [59], binding affinity measurements with autodock vina [60] and heatmaps representations of the different metrics. To account for protein flexibility, at least of the sidechains, BindScan generates 30 different replicates of the structure for each single mutant. This ensures that the metrics to be evaluated are averaged on an ensemble of representative structures (see Materials and methods for the detailed protocol and explanation of the plots).

Fig. 6.

Fig. 6

Predicted structure of Sfβgly in complex with Glc‐pNP. Positions mutated during the BindScan simulation are shown in thick lines (all residues within 15 Å of Glc‐pNP). Protein backbone is represented as a cartoon in transparent gray. A magnified representation of ligand Glc‐pNP and catalytic residues in a catalytically competent orientation. Catalytic residues are labeled as E187 (acid/base) and E399 (nucleophile). Structure rendered with vmd [75].

The predicted effects on enzymatic reactivity of each Sfβgly mutant are shown in Fig. 7. At the top of the figure, a bar plot allows identifying those sequence positions with measured large deviations of electrostatic potential gradient with respect to the wild‐type (dotted lines). For instance, positions N186 and E187 show large and intense red and blue bars respectively, suggesting that practically any mutation at those positions will influence the hydrolytic activity of the enzyme. However, electrostatic potential gradient measurements for any mutation at the nearby position F185 do not present significant deviations from the reference wild‐type, suggesting no effect on catalytic efficiency when mutated. The particular set of mutants with a higher a contribution to this gain or loss of reactivity can then easily be identified in the heatmap.

Fig. 7.

Fig. 7

Heatmap representation of the BindScan mutational analysis on Sfβgly targeting Glc‐pNP using the electrostatic potential gradient metric. Mutants showing an increase or decrease in the electrostatic potential gradient are shown in red and blue spots, respectively. For each position, deviations from the reference gradient of the wild‐type are shown in a bar plot on top of the heatmap. Dotted lines in the bar plot represent the thresholds for significant deviations (±2.5 times standard deviation of wild‐type affinity). For the sake of clarity, only those sequence positions located within 10 Å of the substrate are shown.

From these plots, a list of hotspots (gain of function) and cold‐spots (loss of function) for protein engineering campaigns aimed at controlling enzymatic reactivity (or ligand recognition, see below) can be suggested. Two cold‐spots are clearly identified from this plot: E187 and E399, which indeed correspond to the acid/base and nucleophile catalytic residues. Any virtual mutation at these two sites strongly decreases the electrostatic potential gradient along the glycosidic bond that is to be cleaved at the active site. Indeed, mutations at these catalytic residues yield inactive enzymes. Besides the catalytic residues, other neighboring residues contribute to the electrostatic potential that stabilizes the charge development at the transition state. Positions W143, Y331, and E451 also appear as clear cold‐spots, and these positions have also been shown experimentally to significantly decrease enzymatic reactivity when mutated (Table S3) [47]. Other positions can be identified as either cold‐spots (W142 and W444) or hotspots (N186). There is no experimental information on the effect of mutations at these positions in Sfβgly, but according to our simulations, they should also affect the catalytic efficiency of the enzyme. Indeed, for a close ortholog of this enzyme in Thermus thermophilus, alanine substitution at position N163 (equivalent to N186 in Sfβgly) leads to strongly reduced hydrolytic activity [61], and H119 and W385 (equivalent to W142 and W444 in Sfβgly) have a strong impact on catalysis by stabilizing the transition state via a conserved hydrophobic platform [62].

Spots identification in Sfβgly to modulate ligand recognition

Substrate binding affinities were also evaluated on the same pool of mutants from the generated virtual library. Two different binding metrics were considered: rigid ligand affinity, in which the ligand is constrained to the original orientation predicted for the wild‐type enzyme (Fig. 6), and flexible ligand affinity, in which the location of the ligand is optimized after mutation (see Materials and methods for full details on the calculations).

Regarding the rigid ligand affinity metrics, there are only two clear hot spots derived from Fig. 8A (intense red spots in the heatmap and larger red bars beyond the threshold in the top bar plot) which are T332 and S369. Variants at these positions are predicted to increase binding affinity towards Glc‐pNP. The full list of spots is provided in Table S2. When mutated, these two positions would probably yield enzyme variants with improved catalytic efficiencies with respect to the test ligand in Sfβgly. Unfortunately, there is no experimental data available to confirm or disprove this hypothesis. In contrast, variants predicted to decrease the binding affinity towards Glc‐pNP (cold‐spots, intense blue spots in the heatmap and larger blue bars above the threshold in the top bar plot) are in agreement with experimentally determined mutagenesis data. Mutants at positions Q39, W143, E190, E451, W452, and F460 resulted in increased K M values for the hydrolytic reaction of Glc‐pNP (Table S3). Although a direct relationship between substrate binding affinity and K M cannot directly be drawn for a two‐step reaction mechanism (see Fig. 2), the lower calculated binding affinities of these mutants together with increased experimental K M values suggest here that these mutants mainly compromise the glycosylation step. Furthermore, both the nucleophile (E399) and acid/base residues (E187) are among the list of cold‐spots. Indeed, mutations at these positions yield inactive mutants.

Fig. 8.

Fig. 8

Heatmap representations of the BindScan mutational analysis on Sfβgly targeting Glc‐pNP using the (A) rigid ligand binding affinity metric and (B) flexible ligand binding affinity. Mutants showing a gain or loss of affinity for the ligand are shown in red and blue spots, respectively. For each position, deviations from the reference binding affinity of the wild‐type are shown in a bar plot on top of each heatmap. Dotted lines in the bar plot represent the thresholds for significant deviations (±2.5 times standard deviation of wild‐type affinity). For the sake of clarity only those sequence positions located within 10 Å of the substrate are shown.

When ligand flexibility is considered, predicted changes in binding affinity are smoother because the ligand can accommodate the presence of new amino acid sidechains upon mutation. This makes the identification of spots with the flexible ligand affinity metric less evident (see Fig. 8B). Nonetheless, one additional hotspot can be identified: S247. Interestingly, the sensitivity of this position for increasing ligand recognition is confirmed experimentally. The S247A mutant yields a more efficient enzyme variant with decreased K M value for the hydrolytic reaction of Glc‐pNP in Sfβgly (Table S3). Our calculations predict indeed a gain in ligand affinity for most mutants at S247.

BindScan as a classifier between active and inactive spots in glycoside hydrolases

We have just shown that the effect on catalytic efficiency is properly predicted by the identified hot/cold‐spots with either the electrostatic potential or binding affinity scores following the BindScan simulation protocol. Taken altogether, both metrics discriminate between (a) ‘sensible’ positions: predicted to influence catalysis when mutated, and (b) ‘neutral’ positions: predicted to have no effect. Predicted ‘sensible’ positions are those showing either a significant increase or decrease of the metrics for any (or some) virtual mutant with respect to the wild‐type. The remaining positions will be considered as ‘neutral’ since all metrics values of the mutants are similar to wild‐type. In this scenario, we have explored the classification performance of both protein electrostatics and ligand affinity metrics by analyzing both sensitivity and specificity for the whole experimental dataset [47]. The true‐positive and false‐positive rates have been evaluated as indicated in the Materials and methods section (Table 1). The free parameter that allows discriminating between ‘sensible’ and ‘neutral’ positions is the threshold T. This parameter defines how many times the standard deviation (SD) of the wild‐type metric value must deviate from the reference wild‐type value to be classified as ‘sensible’ position. The influence of the threshold parameter T in the predictive performance is assessed by analyzing receiver operating characteristic curves (ROC) and area under the curve values (AUC) for different metrics and kinetic parameters (Fig. 9 and Fig. S2). The rigid ligand binding affinity metric is a good discriminator between sensible and neutral positions in terms of K M (AUC = 0.70, Fig. 9A). The classification performance decreases notably when using the electrostatic potential gradient to discriminate between active and neutral positions in terms of k cat alone (AUC = 0.61, Fig. S2). Interestingly, this metric performs much better to predict positions sensible to k cat/K M (AUC = 0.77, Fig. 9B). For all metrics, a threshold value T = 2.5 (meaning a value outside ±2.5 × SD of the wild‐type metric value) is adequate for an optimal balance between sensitivity and specificity (Fig. S2). In conclusion, any sequence position presenting a virtual mutant with either a ligand binding affinity or electrostatic potential gradient value that is 2.5 times the SD of the wild‐type higher or lower than the wild‐type reference value is expected to have an impact on enzymatic activity upon mutation. This simple classification approach allows confidently picking sequence positions from electrostatics and affinity scores evaluated on a full‐saturation library of mutants (such as the one generated with BindScan) to be tested experimentally aimed at modulating kinetic parameters of an enzyme. It should be noted that ionizable amino acid sites tend to appear as hotspots or cold spots (i.e., Arg97 in Fig. 7), which may correspond to false‐positive or false‐negative predictions. Also, the effect on pKa shifts upon substitution of surrounding amino acids is not considered during the BindScan simulation, in the current implementation.

Table 1.

Confusion matrix for the classification performance analysis. FN, false negative; FP, false positive; N, actual negative instances; P, actual positive instances; PN, predicted negative instances; PP, predicted positive instances; TN, true negative; TNR, true negative rate; TP, true positive; TPR, true‐positive rate.

Output classifier Actual condition
“Sensible” to kinetic parameter (P) “Neutral” to kinetic parameter (N)
Classified as “sensible” (PP) TP FP
Classified as “neutral” (PN) FN TN
TPR (recall) = TP/P TNR (specificity) = TN/N
Fig. 9.

Fig. 9

Receiver operating characteristic curve for BindScan simulations for 20 different classification thresholds. (A) Predictions of the effect on K M based on rigid ligand affinity. (B) Predictions of the effect on k cat/K M based on electrostatic potential gradient. FPR, False‐positive rate; TPR, True‐positive rate.

Application case 2: Engineering the glycosidase BbLnbB into a transglycosidase

We applied the same protocol to Bifidobacterium bifidum lacto‐N‐biosidase (BbLnbB) as a test case to show how the affinity and reactivity metrics can assist in enhancing the transglycosylation activity of glycoside hydrolases. BbLnbB hydrolyzes lacto‐N‐tetraose into lactose and lacto‐N‐biose in the human intestinal tract. BbLnbB is a GH20 enzyme operating by substrate‐assisted catalysis, where the N‐acetamido group acts as a catalytic nucleophile, leading to an oxazoline intermediate (Fig. 2B) [63, 64]. This enzyme can catalyze the synthesis of lacto‐N‐tetraose by transglycosylation between an activated sugar donor (for example p‐nitrophenyl lacto‐N‐bioside, LNB‐pNP) and high concentrations of sugar acceptor lactose. But the transglycosylation yield of the wild‐type enzyme is very low (< 1% yield) [65], as commonly observed for natural glycosyl hydrolases, mainly due to the secondary hydrolysis of the formed product (Fig. 2). We had previously performed a mutational analysis of BbLnbB addressed to improve transglycosylation activity using LNB‐pNP as donor and lactose as acceptor, finding efficient variants with lacto‐N‐tetraose product yields formation of up to 32% (mutant W394F) [63]. Likewise, BbLnbB mutants with compromised hydrolase activity also synthesized lacto‐N‐tetraose product from LNB‐oxazoline donor and lactose acceptor with up to 30% yield [66].

Our rational approach to transglycosylation design is to use the reactivity and affinity scores to predict variants of the enzyme that reduce the hydrolytic activity (to disfavor secondary hydrolysis) and that, at the same time, enhance the recognition of the acceptor substrate (lactose in this case) (Fig. 10). For the first criterion, the electrostatic potential gradient metric has been measured along the axis defined by the scissile glycosidic bond of lacto‐N‐tetraose. For the second criterion, the rigid and flexible ligand binding affinity metrics, evaluated on the acceptor compound, have been tested using a reference structure of the ternary complex between BbLnbB, lacto‐N‐biose‐thiazoline (an inhibitor mimicking the oxazoline intermediate within the X‐ray structure) and lactose (acceptor) in a configuration compatible for the transglycosylation reaction (Fig. 11). The starting structure of the ternary complex has been modeled as described in Materials and methods. The selected conformation places the oxygen from the acceptor β‐lactose at close distance of the general base and anomeric carbon of LNB intermediate to favor proton abstraction and nucleophilic attack, respectively (see Fig. 2B). This model is compatible with the catalytic itinerary of this enzyme [64].

Fig. 10.

Fig. 10

Rational strategy for improving transglycosylation efficiency in GHs by ligand affinity and protein electrostatic potential measurements: increase binding affinity for the acceptor compound and destabilize the transition state of the hydrolytic reaction. The transglycosylation reaction mechanism for BbLNbB (substrate‐assisted GH20 hexosaminidases) is shown starting from the enzyme·oxazoline intermediate to form the glycosidic bond with the acceptor, followed by secondary hydrolysis of the transglycosylation product (Fig. 2B). An equivalent rational would apply to GHs with catalytic nucleophile (Fig. 2A).

Fig. 11.

Fig. 11

Initial structure of BbLnbB in complex with LNB‐thiazoline and lactose in a catalytically competent configuration. Positions mutated during the Bindscan simulation are shown in thick lines (all residues within 10 Å of LNB and lactose). Protein backbone is represented as a cartoon in transparent gray. Magnified representation of ligands LNB‐thiazoline and lactose and catalytic residues in a catalytically competent orientation. Catalytic residues are labeled as D321 (nucleophile) and D320 (assisting residue). Structure rendered with vmd [75].

The mutational effects on enzymatic reactivity of BbLnbB are shown in Fig. 12A. In this case, three clear cold‐spots can be identified: Asp320, Glu321, and Trp394; and to a lesser extent, His263, Asp375, Tyr427, and Asp467. As compared to the wild‐type, any virtual mutation at these sites strongly decreases the electrostatic potential gradient along the glycosidic bond that is to be cleaved at the active site. Thus, according to the classification criteria defined above, these sequence positions should be considered as mutational spots to modulate the hydrolytic activity of BbLnbB, presumably decreasing it. Asp320 and Glu321 are the actual catalytic residues of the enzyme. Indeed, mutations at these catalytic residues yield practically inactive enzymes with residual hydrolytic activities of 3% and below 0.04% of the wild‐type, respectively, when mutated into alanine [63]. Interestingly, mutations to alanine at His263, Trp394, and Asp467 positions reduce the hydrolytic activity to 0.6%, 0.13%, and 1.4% of the wild‐type, respectively, whereas phenylamine substitution at Tyr427 retains 64% of the activity [63]. There is no clear indication of why choosing a priori these three positions for experimental mutagenesis would be important for catalysis, since they are not directly involved in the catalytic mechanism of BbLnbB. Nonetheless, they have been correctly predicted as important for catalysis by means of the protein electrostatic potential metric introduced here. The effect on hydrolytic activity of mutations at Asp375 is not yet known, but according to our simulations, they should also affect the catalytic efficiency of BbLnbB.

Fig. 12.

Fig. 12

Heatmap representation of the BindScan mutational analysis on BbLnbB using the (A) electrostatic potential gradient metric, and (B) flexible ligand binding affinity metric targeting lactose. Mutants showing an increase or decrease in the metric are shown in red and blue spots respectively. For each position, deviations from the reference value of the wild‐type are shown in a bar plot on top of the heatmap. Dotted lines in the bar plot represent the thresholds for significant deviations (±2.5 times standard deviation of wild‐type affinity). For the sake of clarity only those sequence positions located within 10 Å of the substrate are shown.

The sensitivity of BbLnbB mutants to lactose binding is shown in Fig. 12B. One sequence position clearly emerges from the rest, corresponding to Trp394. All mutants at this position exhibit lower values of average binding free energy compared to the wild‐type (below the threshold). Thus, Trp394 can be classified as a spot position that modulates the binding of BbLnbB to lactose, presumably increasing its affinity. Other spot positions that presumably increase the binding of the acceptor ligand are Ile324, Tyr395, Ser421, Ala424, and Tyr427, although to a lower extent compared to Trp394.

Taken altogether, Trp394 is the only spot identified by both affinity and reactivity metrics (Fig. 12). Mutants at this position may have a double and complementary effect: potentially increase the affinity for the lactose acceptor and reduce hydrolytic activity. They would favor the binding of the acceptor molecule (lactose) and at the same time hamper the secondary hydrolysis of the product (lacto‐N‐tetraose). Indeed, we reported that the Trp394Phe mutant is the most efficient BbLnbB variant for transglycosylation with a maximum yield of lacto‐N‐tetraose formation of 32% [63]. Also, the Tyr427Phe substitution achieves a discreet transglycosylation yield (2%), but as indicated, this mutant partially retains the secondary hydrolytic activity of the product. This confirms our approach in the computational design of transglycosylation: reduced hydrolytic activities at the same time as increasing the affinity for the acceptor ligand (Fig. 10). Concomitantly, mutants at Asp467 and Asp320 do not improve transglycosylation efficiency, showing maximum yields of lacto‐N‐tetraose formation below 3.3% [63]. These spots were correctly predicted to decrease hydrolytic activity but not to improve the binding of lactose; this could explain the low transglycosylation yields. In summary, all spots identified by both reactivity and affinity metrics for the engineering of BbLnbB have an impact on enzymatic catalysis, either for hydrolysis or transglycosylation. Ligand binding sensitivity spots and reactive propensity spots must be interpreted together to guide the rational selection of sequence positions for the engineering of glycosyl hydrolases into efficient transglycosylases.

Discussion

We have presented here an easy‐to‐implement score that appropriately discriminates positions along a protein sequence that have an impact on catalysis upon mutation. The score aims to account for intrinsic enzymatic activity, and it is based on implicit solvent electrostatic potential measurements of the protein alone. The reactivity metric has been tested here for glycoside hydrolases. This family of enzymes is highly specific for their cognate substrate, and catalytic efficiency is attained in part by a proper control of the substrate conformational itinerary and charge distribution along the catalytic reaction. We have shown that glycoside hydrolases favor the formation of the oxocarbenium ion‐like transition state by exerting a complementary electrostatic potential gradient in the direction of the glycosidic bond that is to be cleaved. The proposed reactivity metric evaluates such gradient in electrostatic potential, which, together with ligand binding affinity scores, both capture the mechanistic features of these enzymes.

Our unified protocol implementation in BindScan allows identifying positions along a protein structure that are simultaneously sensible to key metrics in enzymatic catalysis: ligand recognition and electrostatic stabilization of the transition state. Considering both metrics is crucial for a proper understanding of the mutational effects on catalytic efficiency. For instance, if a favorable impact on the electrostatic potential gradient is predicted for a given mutation, this must be accompanied by a proper binding of the ligand in the active site. In the same way, if a mutation strongly disfavors the binding of the ligand, or even worse, prevents it from binding to the active site in a catalytically relevant orientation, the value of the electrostatic potential gradient, or any other property, may be irrelevant. The robustness of the protocol lies in the fact of exhaustively casting all natural variants at each position, rather than focusing on single amino‐acid substitutions such as in alanine‐scanning‐based methodologies. This allows the emergence of hidden variants not detectable by visual inspection of protein structures or computational alanine‐scanning, such as the insertion of bulkier amino acids to substitute glycine positions or unforeseen glutamate to arginine substitutions. This is similar to other computer‐guided enzyme engineering protocols, but complementing those, we believe the collection of structures modeled for each individual mutant provides a representative conformational ensemble on which to average the evaluated metrics in a robust manner, rather than a single point calculation at the end of an equilibrated structure. The large mutational data generated is presented in the form of heat maps and associated sensitivity bar plots. These representations allow intuitively identifying spots of interest which can guide experimental design. Both individual mutants can be suggested for site‐directed mutagenesis in specific protein engineering experiments, and whole sequence positions (spots) for iterative saturation mutagenesis in directed evolution campaigns.

Concluding remarks

Engineering both enzyme reactivity and substrate recognition in glycoside hydrolases is of current interest for tuning substrate specificity and catalytic efficiency with broad applications in the biomass processing industry, biofuel production, food industry, and for the synthesis of added‐value compounds. The lack of a clear protocol to engineer a glycosyl hydrolase into an efficient transglycosylase hampers the emergence of new biocatalysts for the efficient synthesis of oligosaccharides and glycoconjugates of biotechnological interest. The suggested metrics can easily be implemented in computational enzyme engineering protocols based on molecular modeling tools (such as CASCO, FuncLib, AsiteDesign, BindScan, …). This will expand the range of computational methods available to assist the engineering campaigns of not only carbohydrate‐active enzymes, but also other enzyme classes with similar mechanistic features.

Materials and methods

Initial structures

The initial protein structures were taken from PDB codes 5CG0 (Sfβgly) and 4JAW (BbLnbB). The corresponding Michelis complex structures with ligands p‐nitrophenyl‐β‐d‐glucopyranoside in 1S3 conformation (for Sfβgly) and lacto‐N‐biose‐thiazoline and Lactose (for BbLnbB) were computed with autodock vina [60] as detailed below. The structure of BbLnbB in complex with the natural substrate lacto‐N‐tetraose derived from QM/MM MD simulations was kindly provided by Carme Rovira [64]. Substrate geometries in both systems correspond to reactant states in distorted sugar ring conformations, which have been described to be stable and preactivated conformations for catalysis in GHs [46].

The protein structure of Sfβgly was extracted from PDB: 5CG0, chain A. All water molecules and ligands were removed. autodock 4 [67] was used for a previous clustering study performing 200 dockings using p‐nitrophenyl‐β‐d‐glucopyranoside (Glc‐pNP). Clusters showed that using skewed glucopyranose ring conformation ensures a catalytically competent binding mode on the active site of the enzyme. The skewed 1S3 conformation was built by joining two structures: the aglycon moiety was extracted from the chromogenic substrate of an exo‐β‐glucanase (from PDB: 4M81) and the skewed β‐d‐glucopyranoside was extracted from the natural substrate of a β‐glucosidase (PDB: 1E56). autodock vina [60] was then used to generate the final enzyme‐substrate complex. A grid search space of 22.5 × 22.5 × 22.5 Å3 centered at the enzyme catalytic site was defined. All exocyclic chemical bonds of the ligand, except those involving H atoms or double bonds, were defined as flexible. Amino acid side chains were kept fixed during the docking evaluation. The exhaustiveness search criterion was set to 24, which guarantees an intense exploration of the search space. 20 different ligand binding poses were requested, and the lowest binding energy pose with a ligand orientation compatible with catalysis was selected. The resulting structure of Sfβgly in complex with Glc‐pNP was used as the template structure for the BindScan calculations of Spodoptera frugiperda β‐glucosidase.

The structure of BbLnbB in complex with lacto‐N‐biose‐thiazoline (LNB‐thiazoline, a mimic of the oxazoline intermediate) was extracted from PDB: 4JAW. All water molecules and ions were removed. β‐lactose molecule was extracted from PDB: 6FOF. autodock vina was used to generate the ternary complex between BbLnbB‐LNB‐thiazoline and β‐lactose. A grid search space of 10 × 13 × 10 Å3 centered at the enzyme catalytic site was defined. All exocyclic chemical bonds of the ligand, except those involving H atoms or double bonds, were defined as flexible. Amino acid side chains were kept fixed during the docking evaluation. The exhaustiveness search criterion was set to 24, which guarantees an intense exploration of the search space. 20 different ligand binding poses were requested, and the lowest binding energy pose with a ligand orientation compatible with catalysis was selected. The resulting structure of BbLnbB in complex with LNB‐thiazoline and β‐lactose was used as the template structure for the BindScan calculations of Bifidobacterium bifidum lacto‐N‐biosidase.

Mutant library

The virtual mutant library was restricted to protein positions located up to 15 or 10 Å from the ligand binding pocket for both application cases: Sfβgly and BbLnbB, respectively. This threshold is assumed to be enough to capture both direct protein‐ligand interactions and distant electrostatic effects. The mutant library is composed of N = 145 positions for Sfβgly and N = 41 positions for BbLnbB. Each sequence position was individually mutated to each of the 20 natural amino acids (including the wild‐type). The virtual library containing the 20*N variants of each native sequence was generated with basic bash scripting, and it is stored in a fasta format file.

Ensemble of protein‐ligand structures

The three‐dimensional structure of each single mutant enzyme variant was reconstructed from the provided initial structure of the wild‐type protein‐ligand complex by means of homology modeling with modeller [58]. Models are first optimized with the variable target function method Discrete Optimized Protein Energy (assess.DOPE) and GA341 method (assess.GA341). Original ligand geometry is preserved in the model using intra‐ligand, inter‐ligand, and ligand‐protein distance constraints as implemented in modeller (AutoModel.nonstd_restraints) method. To account for protein flexibility, at least one of the sidechains, structures for each single mutant were modeled 30 times. This ensures that the metrics to be evaluated are averaged on an ensemble of representative structures. For the application cases presented here, this was enough to satisfy convergence on the evaluated metrics (see Fig. S1). Structures of the full mutant library of enzyme variants were stored in independent PDB files.

Enzymatic reactivity metrics: electrostatic potential gradient

A metric for enzymatic reactivity is defined here as the electrostatic potential gradient of the protein measured along the ligand axes with the highest charge separation at the transition state of the catalytic reaction. The electrostatic potential of the enzyme structure is evaluated in implicit solvent with apbs [59]. From each PDB file of the modeled mutants, protein coordinates are extracted, and their atoms parametrized with PDB 2PQR following amber atom typing [68]. The electrostatic potential is evaluated using automatically configured finite difference Poisson‐Boltzmann calculations (mg‐auto). The electrostatic potential is evaluated in a coarse grid and fine grid meshes of 90 × 90 × 90 Å3 for Sfβgly and 30 × 30 × 30 Å3 for BbLnbB, with 97 × 97 × 97 points centered at the enzyme active site (location of the C1 atom of the ligand). Resulting electrostatic potentials are stored in separate volumetric datafiles in DX format for each modeled structure.

Two atom groups of the ligand are then defined to measure the electrostatic potential gradient. The first group of atoms (G+) includes a representative selection of adjacent atoms in the region where a partial positive charge is known to be developed at the transition state of the enzymatic reaction. The second group of atoms (G) includes those atoms in the region where a partial negative charge is developed. For glycoside hydrolases, these atom groups can generally be defined as: the anomeric carbon together with the adjacent endocyclic oxygen and carbon atoms (G+, C2‐C1‐O5), and the glycosidic oxygen together with the adjacent carbon atom of the aglycon and two additional carbon atoms adjacent to it (G, O‐C4'‐C3'‐C5') (see Fig. 5A). Similar atom group selections can be defined for other enzymatic activities. The electrostatic potential values at the location of each atom in the group are extracted from the computed volumetric data with the apbs multivalue tool. Please note, the ligand is not present in the electrostatic potential calculation, but its cartesian coordinates are known as they are available from each modeled structure. These values are then averaged per group of atoms, and the electrostatic potential gradient is evaluated as EpotG+EpotG. Since the distance between the two atom groups is constant for all modeled structures, it is not included in the equation for the evaluation of the gradient. The final reactivity score assigned to a variant in the mutant library corresponds to the average of this subtraction for all the 30 model repetitions, and it is expressed in kT/ec.

Ligand recognition metrics (I): rigid ligand affinity

Protein‐ligand binding affinities were measured with autodock vina [60] and are expressed in kcal·mol−1. From each PDB file of the modeled mutants, both protein and ligand coordinates were extracted, and their atoms parametrized following autodock 4.2 atom typing with autodocktools [67]. The rigid ligand affinity metric was defined as the autodock vina scoring function directly evaluated on the coordinates of the modeled mutant, without optimization of the ligand position. Thus, this metric evaluates the affinity of the ligand in the exact original orientation provided in the starting structure.

Ligand recognition metrics (II): relaxed ligand affinity

Binding affinities were also measured after allowing the ligand to re‐accommodate in the active site by means of virtual docking with autodock vina. A grid search space of 22.5 × 22.5 × 22.5 Å3 for Sfβgly and 10 × 13 × 10 Å3 for BbLnbB was defined. All simulation boxes were centered at the corresponding substrate binding pocket. All exocyclic chemical bonds of the ligand, except those involving H atoms or double bonds, were defined as flexible. Amino acid side chains were kept fixed during the docking evaluation. The exhaustiveness search criterion was set to 8, which guarantees enough exploration of the search space while keeping the simulation time tight. Docking calculations were repeated for each of the protein models (30 in the test cases presented here) and requesting 20 different ligand binding poses per model. This renders an ensemble of 600 predicted protein‐ligand complexes per mutant, which account for both receptor and ligand flexibility. From this collection of affinity data, two metrics were defined: (a) ‘highest affinity’ metric, which corresponds to the lowest binding free energy score from the ensemble; and (b) ‘average affinity’ metric, which corresponds to the average of computed binding free energy scores for those docking poses that preserve the catalytically competent geometry of the enzyme‐substrate complex. The latter was achieved by filtering docking poses to those ligand orientations geometrically close to the ligand orientation provided as starting structure (chosen as catalytically competent in the Initial coordinates setup, see above). The filtering is performed by measuring the root mean squared displacement (RMSD) of the atoms of the ligand with respect to the initial orientation in the starting structure. Only those predicted ligand geometries with an RMSD value below 2.5 Å were considered. See later additional ligand recognition metrics that can be derived from these data.

Classification performance analysis

The classification performance of the different metrics to identify positions sensible to catalytic efficiency was assessed by comparing the outcome of the predictions with experimental data. The benchmark dataset consists of kinetic parameters, k cat, K M, and k cat/K M, for 51 mutants from 37 different positions of Sfβgly evaluated against p‐nitrophenyl‐β‐d‐glucopyranoside under uniform conditions [47] (see Table S3). Enzyme sequence positions for which at least one mutant has an impact on experimental catalytic performance (either activation or deactivation) were defined as a ‘sensible’ position. These sensible positions were defined as those with at least a 2‐fold increase or decrease in terms of K M, and/or those with at least a 1.5‐fold increase or 0.25‐fold decrease in terms of k cat or k cat/K M with respect to the wild‐type. The rest of the sequence positions were defined as ‘neutral’ positions. In terms of the different computed metrics (affinities and electrostatic potential gradient), ‘sensible’ positions were defined as those sequence positions for which at least one virtual mutant shows a significant increase or decrease in the property value with respect to the reference wild‐type value. The rest of the positions were considered ‘neutral’. The threshold value to consider an increase or decrease in the property to be significant was defined as T times the standard deviation of the property for all the measurements on wild‐type versions of the enzyme in the library. The free parameter T was optimized by analyzing receiver operating characteristic curves (ROC) and area under the curve values (AUC) [69] for different metrics and kinetic parameters according to the confusion matrix depicted in Table 1. 20 different T threshold values were tested ranging from 1 to 5 in steps of 0.25.

Updated BindScan protocol

The BindScan protocol [39] is built up on three main sequential steps (Fig. 1): (a) definition of the mutant library, (b) generation of the ensemble of protein‐ligand structures, (c) evaluation of the metrics. The only initial requirement for the execution of the protocol is the disposal of a reliable three‐dimensional structure of the protein‐ligand complex and a list of amino acid positions to be cast. The protocol can be implemented with different software, tools, and programming languages. Our personal implementation is mainly based on BASH Scripting language to automate the execution of the following standard molecular modeling tools: modeller [58], autodock4 [67] and autodock vina [60], PDB 2PQR, and apbs [59].

The initial geometry of the protein‐ligand complex must be reminiscent of the catalytic mechanism of the enzyme. It can either correspond to the enzyme in complex with the substrate, the final products, or even the transition state. When available, structures from the Protein Data Bank can directly be used. In other cases, previous modeling may be required, which might include prediction of the protein structure itself or prediction of the ligand binding mode. Current machine learning‐derived techniques [70, 71] may assist in this previous task.

In the first step of the protocol, a generator of mutant enzyme sequences is required to render a full‐saturation mutagenesis library at each individual position along the protein sequence. Reduced library sets can be considered for computational efficiency purposes, or cooperative effects included by simultaneous mutation at more than one sequence position. This mutant generator can be implemented in scripting languages such as Python or BASH. We have chosen BASH for its simplicity for this task and easy transferability.

In the second step of the protocol, an automatic 3D structure modeling pipeline needs to be implemented to generate an ensemble of protein structures for each single mutant in the library. This can be implemented with standard molecular modeling tools, such as modeller, our choice [58], rosetta [72], alphafold [70], or molecular dynamics. This step renders different replicates of protein structures for each individual mutant. These replicates are aimed at covering different conformational states of the protein sidechains. We have tested different numbers of structure replicates to warrant a statistical convergence in the results (see Fig. S1).

In the last module of the protocol, different automatic pipelines need to be implemented to evaluate different molecular properties of interest, such as ligand recognition and enzymatic reactivity. These properties are evaluated on each structure of the ensemble coming from the second module of BindScan and are averaged into a metric score for each mutant. Computational details on the affinity metrics (rigid and flexible ligand binding affinities) and reactivity metrics (electrostatic potential gradient) are given in the main Materials and methods section.

Ligand recognition metrics are an estimation of the binding free energy between the protein variants and the target ligand by means of docking scoring functions. Some popular scoring functions are: autodock vina, our choice [60], autodock 4 [67], gold [73] or glide [74] among others. The protein‐ligand interaction energy can be evaluated considering both the exact location of the ligand in the protein‐ligand complex provided as starting structure, or optimizing the ligand position by virtual docking after each modeled structure replicate. For the ‘rigid ligand’ metric, the ligand is kept fixed in the original orientation as provided by the initial structure of the enzyme‐substrate complex. This is an underestimation of the actual binding free energy, as the ligand may need to accommodate the presence of new amino acid side chains, but it is very fast to compute. This simple procedure aims to guarantee that the interaction energy of the ligand with the enzyme is measured on a catalytically competent geometry of the enzyme‐substrate complex provided as starting structure. Thus, this metric is very sensitive to the original structure, which should be derived from experimental data or previously validated by other computational methods. If ligand flexibility is to be considered, ‘relaxed ligand’ metrics described in the main Materials and methods section can be used: ‘highest affinity’ and ‘average affinity’.

Additional ligand recognition metrics, not used in this work, can easily be obtained for each mutant by analyzing the data collected in the ensemble of predicted protein‐ligand complexes. For example, the RMSD of the ligand in its highest affinity binding mode is measured with respect to the initial geometry of the protein‐ligand complex. This metric can be used to check whether the most stable docking predictions retain the catalytically competent geometry assumed at the initial protein‐ligand complex. Alternatively, the binding free energy of the ligand in the geometry closest to the reference initial orientation (lowest RMSD value) can be retrieved from the computed data. This metric (‘closest’ affinity) is convenient to evaluate the actual stability of the ligand in an orientation as similar as possible to the initial structure.

Description of the BindScan plots

At the end of the protocol, all generated information is gathered and graphically represented by means of heatmaps and bar plots. Heatmaps provide a complete view of all the tested mutants and their BindScan metric values (see for instance Fig. 12 in the main Results section). The full library of mutants is represented in two dimensions: protein sequence positions in the horizontal axes and amino acid substitution in the vertical axes. For each amino acid substitution at each sequence position, the value of the metric is represented in a three‐color scale ranging from red (gain of value) to blue (loss of value) passing through the white value that indicates no change in the BindScan metric value with respect to wild‐type. This representation allows easily identifying individual mutants strongly affecting the property of interest. Sensitivity bar plots are drawn on top of the heatmap in the form of bar plots to allow inspecting sequence positions sensible to the measured property, regardless of the individual mutation. The average value of the BindScan metric for all the wild‐type variants in the library is evaluated and set to the 0 level in the plot. The standard deviation is also evaluated and the levels corresponding to ±2.5 times the standard deviation are represented with horizontal dotted lines. The full library of mutants is represented in one single dimension: for each amino acid position (horizontal axis), translucid bars for each metric value of all 20 mutants are grouped together. Red bars indicate a gain of value in the computed BindScan metric with respect to the wild‐type, whereas blue bars indicate a loss of value. Since the bars are translucid, the more intense the color of each bar, the more prone is any mutation into that amino acid position to contribute to the evaluated metric. The effect of individual mutants can be checked in the heatmaps.

Conflict of interest

The authors declare no conflict of interest.

Author contributions

AV: computational investigation, data analysis and writing first draft of the manuscript; XB and AP: conceptualization, project design, data analysis and writing/reviewing the manuscript. AP funding acquisition. All authors have read and agreed to the published version of the manuscript.

Peer review

The peer review history for this article is available at https://www.webofscience.com/api/gateway/wos/peer‐review/10.1111/febs.70121.

Supporting information

Fig. S1. Running average measurements of (A) rigid ligand binding affinities and (B) electrostatic potential gradients on the ensemble of structure models generated for each BbLnbB mutant at position W394. Ensembles containing from 1 to 30 replicates.

Fig. S2. BindScan specificity and sensitivity classification performance as a function of the threshold parameter T (multiple of standard deviations from the wild‐type).

Table S1. List of representative glycoside hydrolases of each clan selected from CAZY Database.

Table S2. List of hots‐pots and cold‐spots identified by BindScan for Spodoptera frugiperda β‐glucosidase, considering all evaluated metrics.

Table S3. Kinetic parameters of Sfβgly hydrolysis on Glc‐pNP and BindScan scores.

FEBS-292-4211-s001.pdf (617.5KB, pdf)

Acknowledgements

This work was supported by grants GLYCODESIGN (PID2019‐104350RB‐I00) and GLYCOENGIN (PID2022‐138252OB‐I00) from MICINN, Spain (to AP), and the Ramon Llull University/Obra Social la Caixa Grant BINDSCAN 2.0'2020 (to XB). AV acknowledges a predoctoral fellowship from IQS. The authors acknowledge BSc student Pablo Sánchez‐Izquierdo for the calculations on SfβGly.

Contributor Information

Antoni Planas, Email: antoni.planas@iqs.url.edu.

Xevi Biarnés, Email: xevi.biarnes@iqs.url.edu.

Data availability statement

Data available within this article and/or the Supporting Information. Additional data is available on request from the authors.

References

  • 1. Nam H, Lewis NE, Lerman JA, Lee DH, Chang RL, Kim D & Palsson BO (2012) Network context and selection in the evolution to enzyme specificity. Science 337, 1101–1104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Peracchi A (2018) The limits of enzyme specificity and the evolution of metabolism. Trends Biochem Sci 43, 984–996. [DOI] [PubMed] [Google Scholar]
  • 3. Wu S, Snajdrova R, Moore JC, Baldenius K & Bornscheuer UT (2021) Biocatalysis: enzymatic synthesis for industrial applications. Angew Chem Int Ed 60, 88–119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Bell EL, Finnigan W, France SP, Green AP, Hayes MA, Hepworth LJ, Lovelock SL, Niikura H, Osuna S, Romero E et al. (2021) Biocatalysis. Nat Rev Methods Primers 1, 1–21. [Google Scholar]
  • 5. Hammer SC, Knight AM & Arnold FH (2017) Design and evolution of enzymes for non‐natural chemistry. Curr Opin Green Sustain Chem 7, 23–30. [Google Scholar]
  • 6. Voigt CA, Mayo SL, Arnold FH & Wang ZG (2001) Computational method to reduce the search space for directed protein evolution. Proc Natl Acad Sci U S A 98, 3778–3783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Chen K & Arnold FH (2020) Engineering new catalytic activities in enzymes. Nat Catal 3, 203–213. [Google Scholar]
  • 8. Arnold FH (1998) When blind is better: protein design by evolution. Nat Biotechnol 16, 617–618. [DOI] [PubMed] [Google Scholar]
  • 9. Arnold FH & Georgiou G (2003) Directed enzyme evolution: screening and selection methods. Methods Mol Biol 13, 2836–2837. [Google Scholar]
  • 10. Eijsink VGH, Gåseidnes S, Borchert TV & van den Burg B (2005) Directed evolution of enzyme stability. Biomol Eng 22, 21–30. [DOI] [PubMed] [Google Scholar]
  • 11. Reetz MT (2004) Controlling the enantioselectivity of enzymes by directed evolution: practical and theoretical ramifications. Proc Natl Acad Sci U S A 101, 5716–5722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Alcalde M (2017) Directed Enzyme Evolution: Advances and Applications. Springer International Publishing, New York, NY. [Google Scholar]
  • 13. Gargiulo S & Soumillion P (2021) Directed evolution for enzyme development in biocatalysis. Curr Opin Chem Biol 61, 107–113. [DOI] [PubMed] [Google Scholar]
  • 14. Reetz MT, Prasad S, Carballeira JD, Gumulya Y & Bocola M (2010) Iterative saturation mutagenesis accelerates laboratory evolution of enzyme stereoselectivity: rigorous comparison with traditional methods. J Am Chem Soc 132, 9144–9152. [DOI] [PubMed] [Google Scholar]
  • 15. Acevedo‐Rocha CG, Hoebenreich S & Reetz MT (2014) Iterative saturation mutagenesis: a powerful approach to engineer proteins by systematically simulating Darwinian evolution. Methods Mol Biol 1179, 103–128. [DOI] [PubMed] [Google Scholar]
  • 16. Vega A, Planas A & Biarnés X (2025) A practical guide to computational tools for engineering biocatalytic properties. Int J Mol Sci 26, 980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Goldenzweig A, Goldsmith M, Hill SE, Gertman O, Laurino P, Ashani Y, Dym O, Unger T, Albeck S, Prilusky J et al. (2016) Automated structure‐ and sequence‐based Design of Proteins for high bacterial expression and stability. Mol Cell 63, 337–346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Reetz MT (2009) Directed evolution of enantioselective enzymes: an unconventional approach to asymmetric catalysis in organic chemistry. J Org Chem 74, 5767–5778. [DOI] [PubMed] [Google Scholar]
  • 19. Arabnejad H, Bombino E, Colpa DI, Jekel PA, Trajkovic M, Wijma HJ & Janssen DB (2020) Computational Design of Enantiocomplementary Epoxide Hydrolases for asymmetric synthesis of aliphatic and aromatic diols. Chembiochem 21, 1893–1904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Wijma HJ, Marrink SJ & Janssen DB (2014) Computationally efficient and accurate enantioselectivity modeling by clusters of molecular dynamics simulations. J Chem Inf Model 54, 2079–2092. [DOI] [PubMed] [Google Scholar]
  • 21. Wijma HJ, Floor RJ, Bjelic S, Marrink SJ, Baker D & Janssen DB (2015) Enantioselective enzymes by computational design and in silico screening. Angew Chem Int Ed Engl 127, 3797–3801. [DOI] [PubMed] [Google Scholar]
  • 22. Babkova P, Sebestova E, Brezovsky J, Chaloupkova R & Damborsky J (2017) Ancestral haloalkane dehalogenases show robustness and unique substrate specificity. Chembiochem 18, 1448–1456. [DOI] [PubMed] [Google Scholar]
  • 23. Ding X, Tang XL, Zheng RC & Zheng YG (2019) Identification and engineering of the key residues at the crevice‐like binding site of lipases responsible for activity and substrate specificity. Biotechnol Lett 41, 137–146. [DOI] [PubMed] [Google Scholar]
  • 24. Khersonsky O, Lipsh R, Avizemer Z, Ashani Y, Goldsmith M, Leader H, Dym O, Rogotner S, Trudeau DL, Prilusky J et al. (2018) Automated design of efficient and functionally diverse enzyme repertoires. Mol Cell 72, 178–186.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Roda S, Terholsen H, Meyer JRH, Cañellas‐Solé A, Guallar V, Bornscheuer U & Kazemi M (2023) AsiteDesign: a Semirational algorithm for an automated enzyme design. J Phys Chem B 127, 2661–2670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Mateljak I, Monza E, Lucas MF, Guallar V, Aleksejeva O, Ludwig R, Leech D, Shleev S & Alcalde M (2019) Increasing redox potential, redox mediator activity, and stability in a fungal laccase by computer‐guided mutagenesis and directed evolution. ACS Catal 9, 4561–4572. [Google Scholar]
  • 27. Shroff R, Cole AW, Morrow BR, Diaz DJ, Donnell I, Gollihar J, Ellington AD & Thyer R (2019) A structure‐based deep learning framework for protein engineering. bioRxiv. 10.1101/833905. [DOI] [PubMed]
  • 28. Blaabjerg LM, Kassem MM, Good LL, Jonsson N, Cagiada M, Johansson KE, Boomsma W, Stein A & Lindorff‐Larsen K (2023) Rapid protein stability prediction using deep learning representations. Elife 12, e82593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Kroll A, Engqvist MKM, Heckmann D & Lercher MJ (2021) Deep learning allows genome‐scale prediction of Michaelis constants from structural features. PLoS Biol 19, e3001402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Li F, Yuan L, Lu H, Li G, Chen Y, Engqvist MKM, Kerkhoven EJ & Nielsen J (2022) Deep learning‐based kcat prediction enables improved enzyme‐constrained model reconstruction. Nat Catal 5, 662–672. [Google Scholar]
  • 31. Listov D, Goverde CA, Correia BE & Fleishman SJ (2024) Opportunities and challenges in design and optimization of protein function. Nat Rev Mol Cell Biol 25, 639–653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Yang J, Li FZ & Arnold FH (2024) Opportunities and challenges for machine learning‐assisted enzyme engineering. ACS Cent Sci 10, 226–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Warshel A, Sharma PK, Kato M, Xiang Y, Liu H & Olsson MHM (2006) Electrostatic basis for enzyme catalysis. Chem Rev 106, 3210–3235. [DOI] [PubMed] [Google Scholar]
  • 34. Dubey KD, Wang B, Vajpai M & Shaik S (2017) MD simulations and QM/MM calculations show that single‐site mutations of cytochrome P450BM3 alter the active site's complexity and the chemoselectivity of oxidation without changing the active species. Chem Sci 8, 5335–5344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Jafari S, Ryde U & Irani M (2021) QM/MM study of the catalytic reaction of Myrosinase; importance of assigning proper protonation states of active‐site residues. J Chem Theory Comput 17, 1822–1841. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Koulgi S, Achalere A, Sharma N, Sonavane U & Joshi R (2013) QM‐MM simulations on p53‐DNA complex: a study of hot spot and rescue mutants. J Mol Model 19, 5545–5559. [DOI] [PubMed] [Google Scholar]
  • 37. Ruiz‐Pernía JJ, Świderek K, Bertran J, Moliner V & Tuñón I (2024) Electrostatics as a guiding principle in understanding and designing enzymes. J Chem Theory Comput 20, 1783–1795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Welborn VV, Ruiz Pestana L & Head‐Gordon T (2018) Computational optimization of electric fields for better catalysis design. Nat Catal 1, 649–655. [Google Scholar]
  • 39. Bissaro B, Durand J, Biarnés X, Planas A, Monsan P, O'Donohue MJ & Fauré R (2015) Molecular Design of non‐Leloir Furanose‐Transferring Enzymes from an α‐L‐Arabinofuranosidase: a rationale for the engineering of evolved Transglycosylases. ACS Catal 5, 4598–4611. [Google Scholar]
  • 40. Wijma HJ, Floor RJ, Bjelic S, Marrink SJ, Baker D & Janssen DB (2015) Enantioselective enzymes by computational design and in silico screening. Angew Chem Int Ed 54, 3726–3730. [DOI] [PubMed] [Google Scholar]
  • 41. Planas A (2023) Glycoside hydrolases: Mechanisms, specificities, and engineering. In Glycoside Hydrolases. Biochemistry, Biophysics, and Biotechnology (Goyal A & Sharma K, eds), pp. 25–53. Academic Press, Elsevier, Amsterdam. [Google Scholar]
  • 42. Li D, dan, Wang J, lan, Liu Y, Li Y, zhong & Zhang Z (2021) Expanded analyses of the functional correlations within structural classifications of glycoside hydrolases. Comput Struct Biotechnol J 19, 5931–5942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Srivastava J, Sunthar P & Balaji PV (2021) Monosaccharide biosynthesis pathways database. Glycobiology 31, 1636–1644. [DOI] [PubMed] [Google Scholar]
  • 44. Chettri D, Verma AK & Verma AK (2020) Innovations in CAZyme gene diversity and its modification for biorefinery applications. Biotechnol Rep 28, e00525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Morais MAB, Nin‐Hill A & Rovira C (2023) Glycosidase mechanisms: sugar conformations and reactivity in endo‐ and exo‐acting enzymes. Curr Opin Chem Biol 74, 102282. [DOI] [PubMed] [Google Scholar]
  • 46. Davies GJ, Planas A & Rovira C (2012) Conformational analyses of the reaction coordinate of glycosidases. Acc Chem Res 45, 308–316. [DOI] [PubMed] [Google Scholar]
  • 47. Tamaki FK, Souza DP, Souza VP, Ikegami CM, Farah CS & Marana SR (2016) Using the amino acid network to modulate the hydrolytic activity of β‐glycosidases. PLoS One 11, e0167978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Bissaro B, Monsan P, Fauré R & O'Donohue MJ (2015) Glycosynthesis in a waterworld: new insight into the molecular basis of transglycosylation in retaining glycoside hydrolases. Biochem J 467, 17–35. [DOI] [PubMed] [Google Scholar]
  • 49. Franceus J, Lormans J & Desmet T (2022) Building mutational bridges between carbohydrate‐active enzymes. Curr Opin Biotechnol 78, 102804. [DOI] [PubMed] [Google Scholar]
  • 50. Jian X, Li C & Feng X (2023) Strategies for modulating transglycosylation activity, substrate specificity, and product polymerization degree of engineered transglycosylases. Crit Rev Biotechnol 43, 1284–1298. [DOI] [PubMed] [Google Scholar]
  • 51. Teze D, Zhao J, Wiemann M, Kazi ZGA, Lupo R, Zeuner B, Vuillemin M, Rønne ME, Carlström G, Duus J et al. (2021) Rational enzyme design without structural knowledge: a sequence‐based approach for efficient generation of Transglycosylases. Chem A Eur J 27, 10323–10334. [DOI] [PubMed] [Google Scholar]
  • 52. Biarnés X, Ardèvol A, Iglesias‐Fernández J, Planas A & Rovira C (2011) Catalytic itinerary in 1,3‐1,4‐β‐glucanase unraveled by QM/MM metadynamics. Charge is not yet fully developed at the oxocarbenium ion‐like transition state. J Am Chem Soc 133, 20301–20309. [DOI] [PubMed] [Google Scholar]
  • 53. Drula E, Garron ML, Dogan S, Lombard V, Henrissat B & Terrapon N (2022) The carbohydrate‐active enzyme database: functions and literature. Nucleic Acids Res 50, D571–D577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Marana SR, Terra WR & Ferreira C (2002) The role of amino‐acid residues Q39 and E451 in the determination of substrate specificity of the Spodoptera frugiperda β‐glycosidase. Eur J Biochem 269, 3705–3714. [DOI] [PubMed] [Google Scholar]
  • 55. Cantarel BI, Coutinho PM, Rancurel C, Bernard T, Lombard V & Henrissat B (2009) The carbohydrate‐active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res 37, D233–D238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Mendonça MFL, Marana SR & Marana SR (2008) The role in the substrate specificity and catalysis of residues forming the substrate aglycone‐binding site of a β‐glycosidase. FEBS J 275, 2536–2547. [DOI] [PubMed] [Google Scholar]
  • 57. Marana SR, Andrade EHP, Ferreira C & Terra WR (2004) Investigation of the substrate specificity of a β‐glycosidase from Spodoptera frugiperda using site‐directed mutagenesis and bioenergetics analysis. Eur J Biochem 271, 4169–4177. [DOI] [PubMed] [Google Scholar]
  • 58. Šali A & Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234, 779–815. [DOI] [PubMed] [Google Scholar]
  • 59. Jurrus E, Engel D, Star K, Monson K, Brandi J, Felberg LE, Brookes DH, Wilson L, Chen J, Liles K et al. (2018) Improvements to the APBS biomolecular solvation software suite. Protein Sci 27, 112–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Trott O & Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31, 455–461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Teze D, Hendrickx J, Czjzek M, Ropartz D, Sanejouand YH, Tran V, Tellier C & Dion M (2014) Semi‐rational approach for converting a GH1 β‐glycosidase into a β‐transglycosidase. Protein Eng Des Sel 27, 13–19. [DOI] [PubMed] [Google Scholar]
  • 62. Nerinckx W, Desmet T & Claeyssens M (2003) A hydrophobic platform as a mechanistically relevant transition state stabilising factor appears to be present in the active centre of all glycoside hydrolases. FEBS Lett 538, 1–7. [DOI] [PubMed] [Google Scholar]
  • 63. Castejón‐vilatersana M, Faijes M & Planas A (2021) Transglycosylation activity of engineered Bifidobacterium Lacto‐N‐Biosidase mutants at donor subsites for Lacto‐N‐Tetraose synthesis. Int J Mol Sci 22, 3230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Cuxart I, Coines J, Esquivias O, Faijes M, Planas A, Biarnés X & Rovira C (2022) Enzymatic hydrolysis of human Milk oligosaccharides. The molecular mechanism of Bifidobacterium bifidum Lacto‐N‐biosidase. ACS Catal 12, 4737–4743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Wada J, Ando T, Kiyohara M, Ashida H, Kitaoka M, Yamaguchi M, Kumagai H, Katayama T & Yamamoto K (2008) Bifidobacterium bifidum lacto‐N‐biosidase, a critical enzyme for the degradation of human milk oligosaccharides with a type 1 structure. Appl Environ Microbiol 74, 3996–4004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Schmölzer K, Weingarten M, Baldenius K & Nidetzky B (2019) Lacto‐N‐tetraose synthesis by wild‐type and glycosynthase variants of the β‐N‐hexosaminidase from Bifidobacterium bifidum . Org Biomol Chem 17, 5661–5665. [DOI] [PubMed] [Google Scholar]
  • 67. Morris GM, Ruth H, Lindstrom W, Sanner MF, Belew RK, Goodsell DS & Olson AJ (2009) AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J Comput Chem 30, 2785–2791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Dolinsky TJ, Czodrowski P, Li H, Nielsen JE, Jensen JH, Klebe G & Baker NA (2007) PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res 35, W522–W525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30, 1145–1159. [Google Scholar]
  • 70. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A et al. (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Landrum G (2010) RDKit: Open‐source cheminformatics. https://www.rdkit.org.
  • 72. Simons KT, Kooperberg C, Huang E & Baker D (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J Mol Biol 268, 209–225. [DOI] [PubMed] [Google Scholar]
  • 73. Jones G, Willett P, Glen RC, Leach AR & Taylor R (1997) Development and validation of a genetic algorithm for flexible docking. J Mol Biol 267, 727–748. [DOI] [PubMed] [Google Scholar]
  • 74. Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Repasky MP, Knoll EH, Shelley M, Perry JK et al. (2004) Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 47, 1739–1749. [DOI] [PubMed] [Google Scholar]
  • 75. Humphrey W, Dalke A & Schulten K (1996) VMD: Visual molecular dynamics. J Mol Graph 14, 33–38. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Fig. S1. Running average measurements of (A) rigid ligand binding affinities and (B) electrostatic potential gradients on the ensemble of structure models generated for each BbLnbB mutant at position W394. Ensembles containing from 1 to 30 replicates.

Fig. S2. BindScan specificity and sensitivity classification performance as a function of the threshold parameter T (multiple of standard deviations from the wild‐type).

Table S1. List of representative glycoside hydrolases of each clan selected from CAZY Database.

Table S2. List of hots‐pots and cold‐spots identified by BindScan for Spodoptera frugiperda β‐glucosidase, considering all evaluated metrics.

Table S3. Kinetic parameters of Sfβgly hydrolysis on Glc‐pNP and BindScan scores.

FEBS-292-4211-s001.pdf (617.5KB, pdf)

Data Availability Statement

Data available within this article and/or the Supporting Information. Additional data is available on request from the authors.


Articles from The Febs Journal are provided here courtesy of Wiley

RESOURCES