Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Apr 1.
Published in final edited form as: Nat Chem Biol. 2022 Dec 12;19(4):460–467. doi: 10.1038/s41589-022-01206-0

Designer installation of a substrate recruitment domain to tailor enzyme specificity

Rodney Park 1, Chayanid Ongpipattanakul 2, Satish K Nair 2,3,4, Albert A Bowers 5,*, Brian Kuhlman 1,6,*
PMCID: PMC10065947  NIHMSID: NIHMS1873900  PMID: 36509904

Abstract

Promiscuous enzymes that modify peptides and proteins are powerful tools for labeling biomolecules; however, directing these modifications to desired substrates can be challenging. Here, we use computational interface design to install a substrate recognition domain adjacent to the active site of a promiscuous enzyme, catechol O-methyltransferase (COMT). This design approach effectively decouples substrate recognition from the site of catalysis and promotes modification of peptides recognized by the recruitment domain. We determined the crystal structure of this novel multi-domain enzyme, SH3-588, which shows that it closely matches our design. SH3-588 methylates directed peptides with catalytic efficiencies exceeding the WT enzyme by over 1000-fold, whereas peptides lacking the directing recognition sequence do not display enhanced efficiencies. In competition experiments, the designer enzyme preferentially modifies directed substrates over undirected, suggesting that we can use designed recruitment domains to direct posttranslational modifications to specific sequence motifs on target proteins in complex multi-substrate environments.

Keywords: Computational protein design, Rosetta, enzyme engineering, substrate recruitment, biocatalysis, post-translational modification, molecular modeling, COMT

Introduction

Enzymes are incredibly proficient at modifying specific proteins and peptides in complex biological environments13. Specificity is often conferred by recognition surfaces or domains that are physically distinct from the active site. Src family kinases are an excellent example of this paradigm. They contain three domains: the kinase domain, a Src homology 2 (SH2) domain, and a Src homology 3 (SH3) domain. In some cases, the kinase domain alone is sequence promiscuous and phosphorylates diverse tyrosine containing peptides, while the SH2 and SH3 domains can enhance specificity by binding to unique sequence motifs on substrates47 (Fig. 1A). Separation of substrate recognition and catalysis plays a similarly important role in natural product biosynthesis where ribosomally synthesized and post-translationally modified peptides (RiPPs) undergo a series of directed enzymatic modifications to yield mature natural products8,9. Recruitment occurs when a separate domain of the enzyme binds a recognition sequence (RS) in the peptide substrate and directs the ‘core’ to the active site for modification (Fig. 1B). An evolutionary advantage of separating binding and catalysis is that enzymes with new specificity profiles can be readily generated by combining preexisting promiscuous catalytic domains with preexisting peptide-binding domains3,10.

Fig. 1.

Fig. 1.

General scaffolding diagrams and Rosetta design approach. (A) Flexible scaffolding approach used by our naïve fusion enzyme and Src family kinases. (B) Diagram of a specific interaction between a recruitment domain and promiscuous enzyme, similar to how many RiPPs enzymes are constructed. RS indicates the recognition sequence on the peptide that is bound by the peptide binding domain. (C) Overview of the Rosetta design pipeline. (D) Multistage Rosetta Scripts protocol (Design protocol in SI “dock_design_sh3.xml”) and follow-up design steps.

Protein engineers frequently take advantage of the modular nature of enzymes. This is exemplified by the many engineered proteins that have been created for genome editing11,12. Typically, a protein domain with high binding specificity for a unique DNA sequence is genetically fused to a promiscuous DNA modifying enzyme13. Similar strategies have been employed to target protein substrates10,1416. In these studies, fusion proteins have been created by connecting a substrate recruitment domain to the catalytic domain with a simple, flexible linker15,16. This is a straightforward approach employed by nature as evidenced by the flexible linkers connecting the domains in Src family kinases4. However, some natural enzymes, such as many RiPPs enzymes, use specifically positioned recognition domains to anchor and direct the peptide substrate to the active site17,18. The role of this kind of recruitment domain placement in catalysis has not been extensively studied, in part because methods have not been available for designing multi-domain proteins that adopt well defined tertiary structure. Strategic positioning of a recruitment domain relative to an active site may influence catalysis in a few ways: (1) affinity for the substrate (KM) may be enhanced by increasing the local concentration near the active site3,5,7, (2) kcat may be perturbed if specific binding orientations are required for catalysis or binding and unbinding rates are rate limiting15, and (3) modification site selectivity could likely be controlled by which substrate residues are directed towards the active site.

Here, we test if the catalytic efficiency and specificity of a promiscuous enzyme can be enhanced by installing a substrate scaffolding domain in a specific position relative to its active site (Fig. 1C). Our approach takes advantage of developments in the field of computational protein design, which now enable engineering of predefined interfaces between proteins1923. As a model system we use catechol O-methyltransferase (COMT) as the promiscuous enzyme and the SH3 domain of human Fyn tyrosine kinase (Fyn-SH3) as the peptide-binding domain24. In nature, COMT transfers a methyl group from S-adenosyl methionine (SAM) to catechols such as L-dihydroxyphenylalanine (L-DOPA) or dopamine25. However, COMT can be repurposed to install a variety of chemical groups on DOPA-containing peptides26. SH3 domains bind to well-defined poly-proline motifs, and the Fyn-SH3 domain is robust to mutation making it amenable to design27. We show that we can create a multi-domain protein with the SH3 domain placed as designed, and that the engineered protein enhances catalytic efficiency for poly-proline containing peptides by over 1000-fold. Further, we test and model the performance of our designed enzyme in multi-substrate environments and find that it largely prefers directed substrates.

Results

Rosetta interface design of multidomain proteins.

To create a multidomain protein from Fyn-SH3 and COMT, we employed a two-step computational protocol that first redesigned the surfaces of the two proteins to create a favorable interaction and then connected their termini to produce a single-chain construct (Fig. 1C). Interface design was performed using iterative rounds of protein-protein docking and sequence optimization with the molecular modeling program Rosetta. We biased docking to favor inter-domain contacts between well-ordered secondary structural elements on the two domains, in particular along the solvent exposed faces of helices 1 and 2 of COMT and β-strands 2-4 of Fyn-SH3. Placing these structural elements in proximity brought the domain termini near each other and directed the C-terminus of peptides bound to the SH3 domain toward the COMT active site. To stabilize the docked conformation, we employed two-sided interface design, where contacting residues on both protein surfaces were allowed to mutate to form well-packed hydrophobic interactions and hydrogen bonds. Because both the docking and design protocols in Rosetta are stochastic, thousands of independent “dock and design” trajectories with stages of filtering and focusing (Fig. 1D) were used to identify low energy models.

After sorting the models by calculated energies and removing designs that did not satisfy a set of quality metrics (see methods), we evaluated the top ~50 designs by manual inspection and selected 5 designs for experimental studies. The designs averaged 11 mutations across the two domains (ranging from 3 to 7 mutations on each domain). In a last design step, a short glycine-serine linker was added to connect the C-terminus of Fyn-SH3 to COMT to create a single multidomain enzyme. All 5 of the selected designs favored similar domain placement and peptide proximity to the active site, with the C-terminus of the poly-proline motif ~30 Å from the sulfur on SAM. The designed interfaces were predominately hydrophobic, with small hydrogen bond networks in three of the models.

Biophysical and structural characterization of the designs.

All five designed enzymes expressed and were soluble. Of these, two designs, SH3-588 and SH3-003, did not aggregate and were found to be monomeric by SEC-MALS (Supplementary Fig. 1), while the remainder were oligomeric or precipitated during purification. For comparative studies, we additionally expressed and purified a naïve fusion of the two proteins in which the C-terminus of WT Fyn-SH3 was connected to the N-terminus of WT COMT with a glycine-serine linker (Fig. 1A).

Binding studies with a TAMRA-labelled poly-proline peptide, TAMRA-APP12 (peptide S1), confirmed that the naïve fusion, SH3-588, and SH3-003 all bind to the recognition sequence with similar affinities. The measured Kds were 330 nM, 690 nM, and 670 nM, respectively, with no binding to WT COMT (Supplementary Fig. 2). Similar affinities were also measured using a competitive binding experiment that did not require direct labeling of the peptide with TAMRA. To verify that the design process did not disrupt catalytic activity, we measured methylation rates with a peptide substrate, MKEARSAEAKEAGRGSGGSKNFLDYOH (1), that includes a C-terminal DOPA residue but lacks a recognition sequence. We used a luciferase coupled assay that monitors production of SAH (S-adenosylhomocysteine), a byproduct of the reaction (Fig. 2A). Similar kcat (between 0.27 and 0.37 s−1) and KM (between 19 and 42 μM) values were measured for SH3-588, SH3-003, the naïve fusion, and WT COMT (Table 1, Fig. 2B). This indicates that the active site residues remain undisturbed after design and suggests that all the enzymes engage peptide 1 directly through their active sites.

Fig. 2.

Fig. 2.

Structural characterization and background activity of designed enzymes. (A) Conversion of DOPA to methylated DOPA by COMT. (B) Plot of rate vs. [substrate] for WT COMT, naïve fusion, SH3-588, and SH3-003 for an undirected peptide substrate (1). All 3 enzymes have similar catalytic parameters with undirected substrate indicating that catalytic activity has not been compromised by the design process. The substrate residue, L-DOPA (YOH), is underlined. Rates are presented as points calculated from slopes of best fit calculated across 3 distinct reactions (n=3). Error bars are +/− SE of best fit centered on the calculated rate (example shown in Supplementary Fig. 3 AD). (C) Crystal structure of SH3-588 (blue, PDB code: 7UD6) structurally aligned to its design model (gray). SAH (green) was observed in the crystal structure. The APP12 peptide is modeled bound to the SH3 domain (pink) to show relative placement near the active site. (D) Alignment of interface residues between SH3 and COMT domains showing packing at the interface is very similar in the crystal structure and design model. (E) Substrate peptide modeling on the crystal structure (a short peptide 7 (teal); medium peptide 2 (pink); and misdirected peptide 3 (orange)) indicating stretch of residues from peptide binding domain to active site. A magnesium ion (lime green) indicates the location of the active site but is not present in the SH3-588 crystal structure or during modeling.

Table 1.

Fitted parameters for kcat, KM, and kcat/KM, where heatmap indicates kcat/KM on a logarithmic scale. Error values indicate +/− SE of best fit for the traditional Michaelis-Menten parameters.

Peptide sequence kcat (s−1) KM (μM) kcat/KM (M−1s−1) × 10−4
WT COMT Naïve fusion SH3-588 SH3-003 WT COMT Naïve fusion SH3-588 SH3-003 WT COMT Naïve fusion SH3-588 SH3-003
1 MKEARSAEAKEAGRGSGGSKNFLDYOH 0.37 0.27 0.35 0.3 42 40 19 30 0.9 0.7 1.9 1.0
(± 0.007) (± 0.01) (± 0.02) (± 0.01) (± 3) (± 6) (± 3) (± 4) (± 0.3) (± 0.2) (± 0.5) (± 0.3)
2 EWAPPLPPRNRPRRSGGSKETYOHSK 0.213 0.073 0.37 0.212 132 0.4 0.15 0.8 0.16 17 250 27
(± 0.002) (± 0.007) (± 0.02) (± 0.005) (± 4) (± 0.2) (± 0.04) (± 0.1) (± 0.05) (± 4) (± 40) (± 5)
3 YOHDLFNKGSGGAPPLPPRNRPRRS 0.34 0.28 0.22 0.27 45 0.2 6 0.18 0.8 140 3.8 150
(± 0.02) (± 0.03) (± 0.01) (± 0.01) (± 7) (± 0.1) (± 2) (± 0.04) (± 0.3) (± 30) (± 0.8) (± 30)
4 APPLPPRNRPRRSGKNFLDYOH 0.371 0.103 0.31 0.18 24 0.18 0.03 0.07 1.6 60 1200 270
(± 0.005) (± 0.007) (± 0.02) (± 0.01) (± 1) (± 0.05) (± 0.01) (± 0.03) (± 0.4) (± 10) (± 100) (± 30)
5 APPAPPRNRPRRSGKNFLDYOH 0.22 0.198 0.28 0.238 15 6 1.5 1.0 1.5 3.5 19 23
(± 0.01) (± 0.009) (± 0.01) (± 0.009) (± 4) (± 1) (± 0.4) (± 0.2) (± 0.3) (± 0.7) (± 3) (± 4)
6 EWAPPLPPRNRPRRS(GGS)3GKETYOHSK 0.246 0.166 0.45 0.319 220 1.7 0.18 0.75 0.11 10 250 43
(± 0.004) (± 0.008) (± 0.03) (± 0.005) (± 10) (± 0.4) (± 0.08) (± 0.05) (± 0.03) (± 2) (± 40) (± 8)
7 EWAPPLPPRNRPRRSYOHSK 0.3154 0.35 0.32 0.25 183.1 400 130 0.9 0.17 0.09 0.25 29
(± 0.0006) (± 0.03) (± 0.01) (± 0.01) (± 1.0) (± 90) (± 20) (± 0.3) (± 0.06) (± 0.04) (± 0.08) (± 5)

To determine if SH3-588 adopts the designed conformation, we solved its crystal structure to a resolution of 2.6 Å. The crystal structure closely matches our design model (backbone RMSD = 0.543 Å); with identical orientation of the catalytic and binding domains (Fig. 2C) and similar interface surface areas between the design model and crystal structure (904 and 848 Å2, respectively). Many of the side chain conformations at the interface were accurately predicted (Fig. 2D), including a designed hydrogen bond between residues Gln46 (from the SH3 domain) and Gln93 (from COMT). Meanwhile, the glycine-serine linker connecting the two domains is not resolved in the structure. This, along with the well-ordered interface contacts, indicates that the designed interface predominately stabilizes the multidomain enzyme, not the linker. Critically, the peptide-binding site in the SH3 domain is accessible and alignment of the substrate peptide onto SH3-588 shows that the C-terminus of the peptide is pointed towards the active site. We were unable to crystallize our second design, SH3-003.

KM enhancement by substrate scaffolding.

Molecular modeling with SH3-588 and peptide substrates suggested that at least 9 residues are necessary to span between the SH3 domain and the active site (Fig. 2E); therefore, we designed a directed substrate EWAPPLPPRNRPRRSGGSKETYOHSK (2) with 12 residues between the poly-proline motif and the DOPA residue to ensure sufficient flexibility for DOPA to productively engage the active site. With peptide 2, we found that SH3-588 dramatically enhanced methylation activity compared to WT COMT (Table 1, Fig. 3A). Specifically, we measured a >1000-fold increase in catalytic efficiency (kcat/KM) from 1600 to 2.5 × 106 M−1s−1 for WT COMT and SH3-588, respectively. This improvement was largely afforded by tighter substrate binding, with KM lowering from 132 μM with WT COMT to 150 nM with SH3-588. Similarly, SH3-003 reacts with peptide 2 with a kcat/KM of 2.7 × 105 M−1s−1, which is a 170-fold increase relative to WT COMT, largely due to having a lower KM for the peptide. The kcat values for WT COMT, SH3-588 and SH3-003 are similar (ranging from 0.212 s−1 to 0.37 s−1), suggesting that the DOPA side chain engages the active site in a similar manner for all enzymes.

Fig. 3.

Fig. 3.

Steady state kinetic analysis of the designed enzymes. The recognition sequence (RS) and L-DOPA substrate residue (YOH) are underlined in peptides. (A) Plot of rate vs. [substrate] for WT COMT, naïve fusion, SH3-588, and SH3-003 for a directed substrate (2). Boxed area indicates points that were not fit for the naïve fusion and suggest direct binding of substrate to the active site. (B) Plot of rate vs. [substrate] for WT COMT, naïve fusion, SH3-588, and SH3-003 for N-terminal misdirected substrate (3). (C) Plot of rate vs. [substrate] for SH3-588 for variable affinity recognition sequences (4,5,1). (D) Plot of rate vs. [substrate] for SH3-588 for variable length substrates (6,2,7). Rates are presented as points calculated from slopes of best fit calculated across 3 distinct reactions (n=3). Error bars are +/− SE of best fit centered on the calculated rate (example shown in Supplementary Fig. 3 AD).

Interestingly, the naïve fusion displays a biphasic response in rate with increasing substrate concentration. The first phase fits to a KM of 400 nM with a kcat of 0.073 s−1; however, at high substrate concentration, the rates match those of WT COMT (Fig. 3A boxed rate points). The intersection of these two phases occurs near the KM of WT COMT, 132 μM. This suggests the naïve fusion recruits substrate to the active site through SH3 domain binding at low substrate concentrations; however, at higher peptide concentrations near KM for undirected substrates, substrates bind directly to the active site.

Sensitivity to substrate directionality, affinity, length.

We next tested the sensitivity of the enzymes to three substrate features: (1) directionality, (2) recognition sequence affinity, and (3) linker length. In the directionality experiments, we placed the DOPA residue N-terminal to the poly-proline motif as opposed to C-terminal. Based on our model of SH3-003 and structure of SH3-588, this should direct the substrate away from the active site, disfavoring catalysis. As anticipated, the catalytic efficiency of SH3-588 dropped considerably with the misdirected peptide YOHDLFNKGSGGAPPLPPRNRPRRS (3) (Fig. 3B), with a kcat/KM of 3.8 × 104 M−1s-1. For comparison, peptide APPLPPRNRPRRSGKNFLDYOH (4), which directs the DOPA towards the active site, has a kcat/KM of 1.2 × 107 M−1s−1 (an increase of ~300-fold over misdirected substrate). The lower catalytic efficiency with the misdirected peptide is primarily due to an increase in KM (6 μM vs. 30 nM for peptides 3 and 4, respectively), suggesting that the DOPA residue interacts directly with the active site without simultaneous engagement of the poly-proline motif to the SH3 domain. In contrast, both the naïve fusion and SH3-003 displayed similar catalytic efficiencies with peptides 3 and 4. For the naïve fusion, this is consistent with a flexible connection between the SH3 domain and COMT and suggests that SH3-003 can adopt conformations different from our structural model. In line with this, the designed interface of SH3-003 contains a series of polar contacts that may not be forming as designed, allowing it to behave like the naïve fusion with flexibility about its linker.

Next, we measured reaction rates between SH3-588 and three substrates with varied affinity for the SH3 domain: APPLPPRNRPRRSGKNFLDYOH (4), APPAPPRNRPRRSGKNFLDYOH (5), and MKEARSAEAKEAGRGSGGSKNFLDYOH (1). Binding and kinetic studies showed a clear correlation between recognition sequence affinity (Kd) and KM (Fig. 3C, Supplementary Fig. 2). For peptide 4 the Kd was 690 nM and the KM 30 nM, while for peptide 5 the Kd was 22 μM and the KM 1.5 μM. Meanwhile, peptide 1 lacking the poly-proline motif resulted in the weakest KM of 19 μM. Interestingly, the KMs for peptides 2, 4, and 5 are lower than the Kds measured between the recognition sequence and SH3-588. As KM is frequently an upper limit on the Kd between a substrate and enzyme, this result indicates that full-length peptide binds to SH3-588 more tightly than the poly-proline sequence alone, suggesting cooperative binding between the peptide and enzyme. Indeed, the difference between KM and recognition sequence Kd is greater for peptides 4 and 5 (~20-fold) than it is for peptide 2 (~4-fold), reflecting the stronger binding of a C-terminal DOPA (peptides 4 and 5) to COMT compared to an internal DOPA (peptide 2). Alternatively, the Kd for substrate binding can be larger than KM in cases where product release is slower than substrate release28, but this typically involves significant conformational changes associated with catalysis that slow product release. Conformational changes of this type have not been reported for COMT.

Despite the widely varied KMs, kcat remained largely similar with all three substrates, and more broadly, was consistent across the entire substrate scope tested in this study (Table 1). This highlights two features of our system: (1) substrate recruitment predominately improves substrate recognition (KM) and (2) suggests that catalysis, as opposed to product release, is the rate-limiting step in the reaction. If the release of the methylated peptide were rate-limiting, then we would expect the observed kcat to become dramatically slower as affinity for the recognition sequence was increased.

Lastly, we investigated the consequence of variable length linkers between the recognition sequence and substrate motif (Fig. 3D diagram). We designed three variable length substrates: “long” peptide 6, “medium” peptide 2, and “short” peptide 7. Both the long and medium peptides displayed comparable, tight KM values of 180 and 150 nM, respectively, for SH3-588 (Fig. 3D). Meanwhile, the short peptide had a KM of 130 μM which is close to that of WT COMT indicating that the substrate interacts directly with the enzyme active site, not through the peptide binding domain. These results suggest longer linkers, including ones with unique functional sites, may be incorporated without sacrifice to activity. Meanwhile, peptides with short linkers disallow simultaneous binding of the recognition sequence and the DOPA residue.

We also measured the reactivity of the variable length peptides with the naïve fusion. Due to the enzyme’s flexibility, we anticipated that the short peptide might be an effective substrate; however, as with SH3-588 the KM was large (>100 μM), indicating that the peptide could not simultaneously engage the SH3 domain and the active site. As an additional control, we designed a second naïve fusion (naïve fusion 2) that has a longer linker (8 residues) between the SH3 domain and COMT. Naïve fusion 2 has similar kinetic parameters as the first naïve fusion with peptides 1-6, but with short peptide substrate (7) naïve fusion 2 has a lower KM (600 nM, Supplementary Table 1). This result indicates that the longer linker between the SH3 domain and COMT in naïve fusion 2 allows simultaneous engagement of peptide 7 with the active site and the SH3 domain. Notably, naïve fusion 2 and SH3-003 have similar catalytic parameters for peptides 1-7 (Supplementary Table 1), providing further evidence that SH3-003 does not form the designed interaction and instead samples a variety of conformations.

SH3-588 preference in substrate competition experiments.

Many bioconjugation enzymes are selective, discriminating between multiple potential reactive substrates and modifying a desired target. To probe the substrate selectivity of SH3-588, the naïve fusion, and WT COMT, we performed competition assays with two peptide substrates at equimolar concentrations: a directed peptide EWAPPLPPRNRPRRSGGSKETYOHSK (2) and an undirected peptide containing the same substrate motif but missing the poly-proline motif, MKEWRSPELKEFGRGSGGSGGKETYOHSK (peptide S2, Fig. 4A). Product formation was monitored with LCMS.

Fig. 4.

Fig. 4.

Substrate competition experiment. (A) Diagram showing competing substrates (peptides 2 and S2). (B) Percent conversion of directed and undirected peptides in competition reactions at 1, 10, and 100 μM (2, 20, and 200 μM total peptide) by indicated enzyme. Raw data points are represented by black dots superimposed on plots. Data bars present mean values +/− SEM across three distinct reactions (n=3). Error bars are centered on the mean. (C) Model of alternative modes of interaction between the enzymes and substrate at varying concentrations of substrate. KMActive site is the affinity (KM) of substrate binding directly to the active site.

For all three substrate concentrations that were tested, WT COMT modified the directed and undirected peptides equally (Fig. 4B, Extended Data Fig. 1). As expected, given the lower KM’s for the directed substrate, the naïve fusion and SH3-588 preferentially modified the directed peptide at substrate concentrations less than 10 μM. At 100 μM substrate, near the KM for undirected peptide, the naïve fusion lost selectivity and modified both peptides comparably, displaying similar yields to those of WT COMT. In contrast, SH3-588 retained preference for the target peptide by a factor of >6-fold at high substrate concentration. For all conditions that were studied, SH3-588 modified greater than 90% of the directed peptide, while WT COMT and the naïve fusion modified at most 50% of the substrate.

Substrate occlusion explains enzyme specificity.

Peptides can interact with the SH3/COMT fusions by two mechanisms: directly through the active site or through recruitment by the peptide binding domain. The competition experiment demonstrated that at high substrate concentrations the naïve fusion shows nonspecific modification, while SH3-588 maintains preference for the directed substrate. A possible explanation for the SH3-588 result is that the directed peptide is simultaneously binding to the SH3 domain and the active site, thus occluding undirected peptides from reacting with the enzyme. Indeed, the KM for many of the poly-proline directed peptides is tighter than the Kd of the recognition sequence for the SH3 domain. To further explore this concept, we constructed a mathematical model that uses differential equations and our experimental catalytic parameters to simulate competition for directed and undirected substrate at the active site.

The model uses three primary routes to simulate substrate processivity (Extended Data Fig. 2). The directed route models a tethered substrate bound to the SH3 domain. The undirected route models the substrate motif interacting exclusively through the active site. A third pathway involves two peptides interacting with the enzyme: one peptide bound to the SH3 domain and a second, undirected peptide engaged at the active site (Fig. 5A). The relevance of this third pathway depends on how efficiently SH3-bound peptides occlude the active site. In our model, the equilibrium constant Kopen is derived from the parameter “fraction_open,” which specifies the fraction of SH3-bound peptides that are in an open conformation with the active site accessible to undirected peptides.

Fig. 5.

Fig. 5.

Numerical modeling of competing substrates reacting with the engineered enzymes. (A) In the numerical model, directed substrates (i.e. containing the poly-proline recognition sequence) engage the SH3 domain and then exist in an equilibrium between an open and closed conformation (equilibrium constant = Kopen). In the closed form, the DOPA residue is bound to the active site and is available for catalysis. The fraction of bound peptide in the open conformation is referred to as fraction_open, which is directly related to Kopen. Undirected peptide can only engage the active site when the directed peptide is the open state, or the enzyme is not bound to directed peptide. Varying fraction open in the numerical model changes the accessibility of the active site to undirected peptides and changes the predicted specificity during a simulated reaction with directed and undirected peptide (the full mathematical model is presented in Extended Data Fig. 2). (B) A comparison of simulated results (hashed bars) and experimental results (solid bars) for a competitive reaction in which directed and undirected peptide are both at a concentration of 100 μM. Numerical simulations performed with different values assigned to fraction_open indicate that directed peptide is in the open conformation ~10% of the time when bound to SH3-588 and ~30% of the time when bound to the naïve fusion. Raw data points are represented by black dots superimposed on plots. Data bars present mean values +/− SEM across three distinct reactions (n=3). Error bars are centered on the mean. Hashed bars are a single computational result.

We performed simulations with equimolar concentrations of directed peptide (2) and undirected peptide (S2) and predicted conversions while varying the fraction_open parameter. Because of the identical substrate sequence motifs (KETYOHSK), we used the catalytic parameters of WT COMT as a surrogate for undirected peptide at the SH3-588 active site. With perfect substrate occlusion, fraction_open = 0, we predict very high specificity (~1000-fold) for the directed peptide over the undirected peptide (Fig. 5B). This high specificity reflects the ratio of the specificity constants used in modeling SH3-588 with directed peptide 2 (kcat/KM = 2.5 × 106 M−1s−1) and undirected peptide S2 (kcat/KM = 1.6 × 103 M−1s−1). With no substrate occlusion, fraction_open = 1, we predict specificity for the directed peptide at low substrate concentrations, but this specificity is lost at high substrate concentrations (Extended Data Fig. 1). The model most closely matches the experimental results when fraction_open is set to 0.1 and 0.3 for SH3-588 and the naïve fusion, respectively, suggesting that when SH3-bound, the DOPA residue is engaged at the active site ~90% and ~70% of the time. These fraction_open values indicate that less substrate occlusion occurs with the naïve fusion compared to SH3-588, likely due to the added flexibility between the SH3 domain and COMT.

It is also conceivable that binding of the directed peptide is altering binding preferences for SAM versus SAH, which could perturb selectivity for the directed peptide; however, this is unlikely considering the ordered binding mechanism used by COMT: SAM binds first, then Mg2+, followed by L-DOPA substrate25. The ordered binding mechanism is inconsistent with preferential SAM binding as SAH-SAM exchange occurs in the absence of DOPA peptide binding to the active site.

Given the value for fraction_open and affinity of DOPA for the active site, it is possible to estimate the effective concentration of DOPA when peptide is anchored to the SH3 domain (see methods). With a binding affinity of 130 μM for DOPA (KM of peptide 2 for the WT COMT), an effective concentration of 1.1 mM results in 90% occupancy at the active site. Effective concentration can also be estimated by calculating the volume accessible to a tethered linker (see methods). With 12 residues between the recognition sequence and DOPA and assuming half of the space is occupied by enzyme, the volumetric approach predicts an effective concentration of ~2 mM, similar to the value derived from fraction_open.

SH3-588 is highly active with diverse types of substrates.

As evidenced by naturally occurring kinases, physical separation of substrate recognition and catalysis provides a powerful strategy for creating systems that can modify a diverse set of substrates. Kinases can often phosphorylate peptides and proteins with varied amino acids located adjacent to the site of modification. To allow for rapid testing of a large set of sequences with SH3-588, we coupled in-vitro transcription/translation (IVT) of peptide substrates with an enzymatic activity assay that used mass spectrometry to probe product formation (Extended Data Fig. 3A, see methods). For almost all substrates tested with SH3-588, the reaction reached completion within 15 minutes. WT COMT, however, was less active against all substrates and showed a preference for peptides with hydrophobic residues adjacent to DOPA (Extended Data Fig. 3B, IVT peptides 1-5). Similarly, we tested variations in the recognition sequence and found that a minimal sequence, PALPAR, still results in completion by SH3-588 (IVT peptides 7-10), demonstrating plasticity and tunability in the recognition sequence.

Discussion

Our results demonstrate the feasibility and utility of designing substrate scaffolding enzymes from existing protein domains. To the best of our knowledge, our use of computational protein design is the first to combine the functions of two domains to generate enhanced enzymatic activity for specially tailored substrates. Studies with our design, SH3-588, show kinetic and selectivity advantage over both WT and naïve fusion enzymes. The naïve fusion alone improved efficiency (kcat/KM), but greater boosts of more than a 1000-fold were seen with our design, SH3-588, which also enhanced selectivity for poly-proline directed peptides. Notably, this selectivity was maintained at high substrate concentrations, where the naïve fusion lost selectivity. This places our design among the top examples of exclusively computationally improved enzymes29.

The crystal structure of SH3-588 suggests a few general principles for future substrate scaffolding design. First, we chose protein domains that are structurally well characterized and lend themselves well to interface design. These domains have exposed, well-ordered secondary structural elements that readily tolerate mutation and design, emphasizing the need to select stable domains. Second, correct orientation of these designable surfaces immediately puts the N- and C-termini in proximity for short linker design. Last, this orientation also allows unobstructed access to the active site from the poly-proline binding site. The success of the SH3-588 design is underscored, though, by some of the challenges that we faced in protein engineering. Several of the designs purified as dimers or oligomers, perhaps because of non-specific interactions between hydrophobic residues that were added to the domain surfaces. Additionally, the SH3-003 design behaves more like a naïve fusion, suggesting that the interface did not form as strongly as expected. These results indicate that some challenges remain in coupling interface design and viable folding and expression behavior. Nevertheless, we anticipate that the principles described above can be applied to other enzymes and peptide binding domains to create new scaffolding enzymes when the domains are well chosen.

Specific placement of the substrate recruitment domain next to the active site resulted in a significant kinetic payoff over both WT and naïve fusion enzymes. For example, due to improvements in KM, both naïve fusion and SH3-588 see significant improvements in efficiency, with the interface design having more than 10-fold greater efficiency than the naïve fusion. These improvements track with what has previously been seen in RiPPs systems, which seem to be well-suited for overcoming the detriments of low substrate concentrations in a complex cellular environment8,30. SH3-588 had additional advantages over the naïve fusion in directionality and specificity. While the naïve fusion modified substrate residues placed both before and after the recognition sequence, the design was less competent with DOPA residues placed N-terminal to the poly-proline motif (Fig. 3B), providing an additional level of control over modification. Also, whereas specificity eroded at higher concentrations with the naïve fusion, it was largely sustained with SH3-588. Substrate occlusion appears to play a strong role in this sustained specificity, as our model suggests that a change from approximately 30% residual accessibility for the naïve fusion to only 10% for the SH3-588 design (fraction_open values of 0.3 and 0.1, respectively) can account for the observed differences. These values correlate with effective concentrations of 300 μM and 1.1 mM for the naïve fusion and SH3-588, respectively, in agreement with geometric estimates16,31.

Initially, we hypothesized scaffolding may perturb turnover rates as kcat can be sensitive to both binding kinetics15 and orientation of substrate in the active site. Interestingly, we observed similar kcat (< 5-fold differences) across all substrates and enzymes tested in the study, suggesting that the rate of the chemical reaction is an upper bound on the observed rates. Indeed, our mathematical model requires fast on (kon > 1×106 M−1 sec−1) and off (koff > 0.4 sec−1) peptide binding rates to recapitulate our kinetic results. The SH3 domain is well suited for this application because it binds and releases peptides quickly. The FYN-SH3 domain binds to polyproline peptides with off rates faster than 200 sec−1 and association rate constants > 5×107 M−1 sec−1 32. Additional lowering of KM by tightening the affinity of either the poly-proline leader sequence or the core may further improve efficiency and selectivity by minimizing fraction_open; however, maintaining fast koff (where koff > kcat) will be a critical restraint on achieving high catalytic efficiencies.

In conclusion, we have demonstrated that we can design a specific, designed inter-domain interaction that enhances enzyme affinity and specificity. We anticipate that this design approach will be generalizable and provide a structural and kinetic framework allowing the redesign of other existing, promiscuous enzymes for applications in bioconjugation or biosynthesis. The affinity enhancement we see with relatively small peptide substrates should translate advantageously to more complex protein substrates, such as antibodies or enzymes. COMT itself could be useful in labeling applications because it can install SAM analogs with orthogonal functional groups, such as alkynes26. Similarly, the expansion of this approach to new catalytic domains effecting different chemistries could allow new labelling strategies or even more extensive modifications, leading to new therapeutic peptide derivatives or natural product analogs – O-methyl DOPA itself occurs in pepticinnamin E and related antibiotics. Lastly, the improved affinity and specificity afforded by the designed interaction, especially at low substrate concentrations, should lend itself towards the targeted modification of peptide substrates in much more complex environments. For example, the chemical diversification of combinatorial libraries in peptide and protein display technologies, such as mRNA display, or else for synthetic and chemical biology applications in a cellular context. Indeed, naïve fusion and recruitment strategies have already been used to probe subcellular interactions33 and signaling cascades10,14,34,35. Clearly there is much more to be done with these designed substrate scaffolding enzymes.

Methods

Computational Methods

Selection of enzyme, peptide binding domain, and designable interfaces.

Catechol O-methyltransferase (COMT) was chosen as a catalytic domain because of its well understood chemical mechanism25,26, high-resolution structural characterization (PDB ID: 5p9v), and surface exposed active site36. Furthermore, peptide substrates can readily be synthesized that incorporate the unnatural amino acid substrate residue, L-DOPA26. The Fyn SH3 domain (PDB ID: 4znx) has relatively high thermostability and has been used in the past for protein folding studies27. Binding studies indicate that Fyn SH3 binds it’s cognate APP12 peptide with a Kd = 260 nM24. Designable interface residues were chosen from large patches of residues on well-defined secondary structural elements (i.e. α-helices on COMT and β-sheet residues on the SH3 domain). Input PDBs were pre-relaxed with energy minimization in Rosetta and oriented in a single pdb file such that the designable interface of COMT and the SH3 domain faced each other, but were separated by nearly 70 Å.

Multistage interface design.

The Multistage Rosetta Script (MSRS) design protocol is summarized in Fig. 1D. Briefly, four separate stages were used to dock (stages 1 and 2) and subsequently design (stages 3 and 4) two domains into a single multi-domain enzyme (example script “dock_design_sh3.xml”). In the first stage, rigid body global docking brought the two designable interfaces into proximity. Small rotational and translational perturbations were used to sample the docking landscape. To enrich for a designable population of models, the 100,000 generated models were sorted by the number of designable interface contacts and the top 10,000 were kept. The second step, local docking, perturbed the conformation further by allowing small rotational and translational movements about the initial docked confirmation. Inputs to this step were used for 10 independent local docking simulations and the total stage 2 model set was again filtered by number of interface contacts. Docking constraints were used through both the first and second stages to minimize the distances between the termini of the two domains (necessary for short linker design) and to minimize the gap between the active site and peptide binding site.

Two-sided interface design was then used to stabilize the resulting docked conformations. The 3rd stage consisted of a short round of FastDesign (repeats=1) which iterates between fast rotamer packing across the interface and energetic minimization of side chain and backbone atoms23. This stage largely removed clashing side chains between the two domains to assess models for further design. Out of the 10,000 input models, 2500 were kept for further design after sorting according to a combined energy calculation that included the total energy of the complex and a normalized interface energy. Stage 4 involved more extensive interface design (FastDesign repeats=3) and sorting according to the previously mentioned metric, saving the top 200 designs for further analysis. A final sorting step was applied to save the top 10% of output models by interface energy. The entire multistage script was repeated till >1000 design models were generated. Several different design scripts were run with custom distance constraints to generate multiple populations of docked models.

Forward docking and final model selection.

Selected design sequences were further evaluated with protein docking (“forward docking”) to determine if the designed pose was strongly preferred over alternative conformations. Designed models underwent random perturbation and docking to attempt to reclose the interface. An example forward docking script is provided in “forward_dock.xml”. Forward docking produced approximately 20-50 designs from which manual selection was used to pick unique interfaces that varied interface composition (i.e. bulky vs. small residues and polar vs. hydrophobic residues at the interface periphery) and peptide directionality toward the active site. Ultimately, 5 unique design models were selected for further linker design and experimental characterization.

Glycine-Serine linker design.

We fused the newly installed peptide binding domain to the COMT catalytic domain by linking the C-terminus of the SH3 domain to the N-terminus of COMT. Linker design was performed using FloppyTail, a Rosetta application that models a linear stretch of amino acids, to extend the C-terminus of the SH3 binding domain towards the N-terminus of COMT. The minimum number of GS residues was estimated from this approach and an extra 2 residues were added to the linker before ordering the genes as described below.

Experimental Methods

Cloning and expression of enzymes.

Enzyme (COMT native, naïve fusion, SH3-588, and SH3-003) expression constructs were either ordered and synthesized from Twist bioscience (Supplementary Table 4) or manually cloned using forward primer CTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGTCTCTCCC and reverse primer GGATTGGAAGTACAGGTTCTCCCC. These constructs were transformed into chemically competent E. coli and grown on LB agar plates (ampicillin 100 mg/L). Colonies were selected and inoculated into 50 mL LB cultures for growth overnight (37 °C, 250 rpm agitation). Approximately 8 mL of overgrown culture was introduced into fresh Terrific broth (TB) media with an additional 8 mL of 50% glycerol solution. Cells were grown at 37 °C till OD600 = 1.0 – 1.2 upon which culture temperature was dropped to 16 °C and protein production was induced via introduction of isopropyl βD-1-thiogalactopyranoside (IPTG) to 600 μM. Cells were grown overnight at 16 °C at 200 rpm agitation for typically 18 to 20 hrs after which they were pelleted by centrifugation (3148 g, for 20 mins) and frozen and stored at −80 or −20 °C.

Cell pellets were solubilized in lysis buffer (50 mM PBS pH 7.4, 300 mM NaCl, 10 mM imidazole) and sonicated for (5 min, 5 sec on:5 sec off, 70% amplitude) prior to homogenization via 4 passes on emulsiflex. Cell lysates were clarified by centrifugation (24500 g, 30 mins) and filtered (5.0 μM nylon membrane) before loading onto Ni-NTA gravity flow column for affinity chromatography. The column was washed first with 20 column volume (CV) high salt buffer (same as lysis with 1 M NaCl) and second with 20 CV with low salt buffer (same as lysis buffer) before elution (same as lysis with 400 mM imidazole). Affinity purified enzymes were further purified by size exclusion chromatography (SEC) and simultaneously buffer exchanged into storage buffer (50 mM PBS pH 7.4, 300 mM NaCl). Enzymes were concentrated to approximately 100 μM and flash frozen with 10% glycerol for long term storage at −80 °C.

Protein crystallization and data acquisition.

For protein crystallization, pMCSG28-COMT-SH3-588 was transformed into Rosetta(DE3) cells on LB agar plates (ampicillin 100 mg/L) and colonies were selected for overnight growth. Cells were grown in 1 L LB media at 37 °C to OD600 = 0.6. Cultures were cooled in an ice bath for 10 minutes before induction with IPTG at a final concentration of 0.4 mM. Cells were allowed to grow overnight at 18 °C before harvesting. The cell pellet was resuspended in suspension buffer (500 mM NaCl, 20 mM Tris pH 8.0, 10% glycerol).

Cells were lysed and clarified as above, and clarified lysate was passed through a 5 mL HisTrap HP column (Cytiva Life Sciences) pre-equilibrated with suspension buffer. The column was washed with 40 mL column volumes (CV) of wash buffer (1 M NaCl, 20 mM Tris pH 8.0, 30 mM imidazole). SH3-588-His6 was eluted on an ÄKTAprime plus system (Cytiva Life Sciences) using a linear gradient from 25s mM imidazole to 250 mM imidazole over 40 mL at a flow rate of 2 mL/min. The purest fractions as judged by SDS-PAGE were pooled and cleaved with TEV protease during dialysis (100 mM NaCl, 20 mM Tris pH 8.0, 3 μM 2-mercaptoethanol). A subtractive nickel purification was performed to remove uncut material. SH3-588 was concentrated and purified on a 16/60 Superdex® 200 column (Cytiva Life Sciences) equilibrated in 100 mM KCl, 20 mM Tris pH 8.0, 5 mM DTT.

COMT-SH3-588 were incubated with 2 mM of S-adenosyl homocysteine (SAH) prior to laying drops. Crystallization conditions for COMT-SH3-588 were established using sitting-drop sparse matrix screening (using the Crystal Gryphon LCP, Art Robbins Instruments). Crystals were obtained in 0.7 M magnesium formate, 0.1 M bis-tris propane pH 7.0. Initial crystals were optimized in hanging drop trays using 10 −15 mg/mL of protein. Crystals resembling grains of rice formed over a week at 24°C. Crystals did not form if the reducing agent was absent or at lower temperatures than 24°C.

Structure determination.

Crystals were transferred to a cryoprotectant consisting of the reservoir solution supplemented with 20% glycerol prior to vitrification by direct immersion into liquid nitrogen. All crystallographic measurements were collected at Sector 21-ID (LS-CAT, Advanced Photon Source, Argonne National Labs, IL). Data was indexed, scaled, and integrated using XDS37 as implemented in autoPROC38. Phases were determined by the molecular replacement method as implemented in Phaser39 using the coordinates of PDB 1M27 and PDB 4PYN as search probes. There was one molecule in the crystallographic asymmetric unit. A high-resolution cutoff of 2.59 Å was used for the data. Although the crystallographic data statistics suggest that the quality data may extend beyond this cutoff, this was not the case as the signal to noise dropped rapidly beyond this resolution. For example, at a slightly higher cutoff of 2.5 Å resolution, the merging R value in the highest resolution shell was 200% and the I/σ(I) was below 1. Moreover, the quality of the resultant maps did not improve. Hence, the chosen cutoff of 2.59 Å was deemed to be appropriate.

Following one cycle of refinement using REFMAC540, electron density could be observed for regions of the polypeptide that were not included in the search probes, including residues at the designed interface. While density for the bound ligand and the active site were continuous and obvious, other regions of the polypeptide lacked density at the main chain and were not modelled. These include the Gly-Ser linker joining the SH3 domain with the COMT domain and a loop region in the COMT domain between Ser256 and Val266. In some instance for the COMT domain, electron density for side chains were minimal, and the rotamers were modeled based on their common position across 6 different structures of the isolated COMT domain. The overall B factors are higher than would be expected for other structures of similar resolution, but the regions of the active site and the domain interface are clearly defined in omit maps. Manual and automated rebuilding using COOT41 and Buccaneer42, interspersed with cycles of refinement, resulted in the final model. Crystallographic statistics may be found in Supplementary Table 5. The PDB accession code for SH3-588 is 7UD6.

Peptide synthesis and purification.

Peptides were synthesized by UNC’s High-Throughput Peptide Synthesis and Array Facility via standard Fmoc solid phase peptide synthesis technique with C-terminal amide. DOPA residues were incorporated as Fmoc protected unnatural amino acids in peptide substrates and all peptides were purified to >90 % purity as determined by HPLC (Supplementary Fig. 4). All peptides used in this study are listed in Supplementary Table 2.

Fluorescence polarization assays.

Fluorescence polarization assays were conducted where TAMRA-APP12 (S1) (200 nM) was incubated with varying concentrations of enzyme (5 nM to 50 μM) in phosphate buffer (50 mM) pH 7.5, NaCl (300 mM), 0.005% tween20. Fluorescence polarization was assessed on Molecular Devices SpectraMax M5 plate reader. Recognition sequence binding affinity was determined by best fit to a single site binding quadratic model.

Competitive binding was accomplished by incubating increasing concentrations of competitor peptide (50 nM to 500 μM) with constant fluorophore labelled peptide (200 nM) and enzyme (2 μM) in the previously described buffer with MgCl2 (3 mM) and SAH (70 μM) to facilitate binding of the DOPA side chain to the active site. The competitive binding data was fit by modeling the competitive fluorescence signal according to Hussain et al. Briefly, since the Kd of the TAMRA-labelled peptide (peptide S1) and its relationship to the monomer concentrations are known, the observed FP signal can be simulated for a given binding affinity for a competitor43.

Kinetic Assays.

Steady state kinetics was conducted using Promega’s MTase-Glo Methyltransferase Assay. Briefly, methylation was initiated via pipette-mixing of 2x enzyme (typically 300 pM final concentration for designed enzyme / naïve fusion or 10 nM WT COMT. This concentration changed depending on the ease of conversion of the peptide substrate) to a 2x substrate solution (typically varying substrate from 70 nM to 1 mM) in a Tris buffer (20 mM) pH 8.0 with SAM (45 μM), NaCl (300 mM), MgCl2 (3 mM), EDTA (1 mM), BSA (0.2 mg/mL), Tween20 (0.005%), and DTT (1 mM). Reactions were incubated at 25 °C for desired reaction period and terminated via introduction of a reaction aliquot to 0.5% TFA (final concentration of 0.1% TFA). The sequential assay development steps were completed per Promega assay guidelines following all timepoint collections. Luminescence intensity was analyzed by BMG Labtech CLARIOstar plate reader. Kinetic parameters were determined by fitting initial rates to the Michaelis-Menten equation in Matlab (R2020a), where data processing and plotting were done with Microsoft Excel.

Substrate competition LCMS assays.

Competition assays were conducted similarly to the above kinetic assays. Reactions were initiated via addition of 2x enzyme (final concentrations of 5, 50, and 300 nM) into 2x substrate (final concentrations of 1, 10, and 100 μM, respectively), where target and off-target peptides were in equimolar concentration (total peptide concentrations of 2, 20, and 200 μM). Reactions were incubated at 25 °C for indicated reaction period and terminated by addition of 25 μL sample into 25 μL methanol. Reaction aliquots were cleared by centrifugation (>16000 g, 10 mins) and stored at −20 °C till analyzed. Reaction aliquots were analyzed by LCMS ESI (Kinetex 2.6 u C18 column and Agilent 6520 Accurate-Mass Q-TOF ESI) where resulting ratios of integrations of product and substrates were used to calculate product conversion for each enzyme. Ions were extracted with a 50 ppm window width for the substrate and product ions as shown in Supplementary Table 6.

MALDI substrate screen.

DNA was ordered from Twist bioscience encoding peptide substrates of interest (Supplementary Table 7). Peptide substrates were generated with NEB PURExpress cell-free transcription/translation system. DNA was introduced into the custom PURExpress kit (using a custom amino acid mix which includes L-DOPA instead of tyrosine) to directly translate L-DOPA into our peptides. After incubating for 1 hr at 37°C, completed IVT reactions were introduced to a 2X enzyme solution (1 μM enzyme of interest, 2 mM SAM, 3 mM MgCl2, 12.5 mM ascorbic acid, 300 mM NaCl, and 50 mM PBS pH 7.4, final concentrations). Enzyme reactions were run for 15 minutes at 30°C. To prep samples for MALDI-TOF-MS (AB SCIEX TOF / TOF 5800 in positive reflector mode), enzyme reactions were zip-tipped (C18 stationary tips, Thermo Fisher Scientific) to purify peptides with 4% acetonitrile and eluted onto a MALDI plate with 80% acetonitrile with half saturated MALDI matrix (α-cyano-4-hydroxycinnamic acid). Enzyme conversions were calculated as the product integral over the summation of the product and substrate integrals.

Numerical integration model.

Numerical integration models were constructed and tested in MATLAB (R2020a) to simulate the conversion of a single substrate via a recruitment mechanism (Extended Data Fig. 2A and “substrate_recruitment.m” script) and to simulate a competition reaction where a directed and undirected substrate compete for the active site (Extended Data Fig. 2B and “substrate_competition.m” script). The directed and undirected routes used binding and kinetic rates derived from experimental values (i.e. directed rate kcatdir uses kcat from SH3-588, while undirected rate, kcatundir, uses kcat from WT enzyme). In the “substrate_recruitment.m” model, a substrate can bind to the SH3 domain before engaging the active site or it can bind directly to the active site. For the “substrate_competition.m” experiments a series of fraction_open parameters were tested (from 0 to 1) to reproduce experimental data.

The effective concentration of tethered substrate was calculated using a geometric approach or by using the fraction_open parameter derived from numerical modeling of the competition reactions. The geometric approach estimates the volume of a sphere from the calculated radius of an amino acid chain bound to the enzyme16. Fraction_open can be used to infer effective concentration because it is related to the relative population of tethered substrate that is engaging the active site at any time. A fraction_open of 0.0 would indicate that the DOPA residue is always bound to the active site when the peptide is bound to the SH3 domain and thus would indicate a very high effective concentration. A fraction_open of 1.0 would indicate that the DOPA residue is not engaging the active site and thus the effective concentration is very low. Equation (1) was used to estimate effective concentration (i.e. [substrate]) from the fitted fraction_open parameter and the Kd of DOPA for the active site. Kd was set equal to the KM value measured for peptides that did not include a poly-proline motif.

1fractionopen=[substrate]Kd+[substrate] (1)

Data availability.

Atomic coordinates for the designed SH3-COMT fusion, SH3-588, have been deposited to the Protein Data Bank under the accession number 7UD6. The plasmid for SH3-588 has been deposited to AddGene under plasmid number 185920 (pMCSG28-SH3-588). Source data is provided in the supplementary information.

Code Availability.

Code used to model kinetic and substrate competition data is available in the supplemental information. Rosetta scripts used to design the protein interfaces are also provided in the supplemental information and a demo folder containing scripts and input files can also be found online.

Extended Data

Extended Data Fig. 1.

Extended Data Fig. 1.

Expanded set of timepoints for substrate competition assay for target peptide (red) and off-target (light blue) at 1 (A), 10 (B), and 100 (C) μM peptide substrate (2, 20, and 200 μM total peptide respectively). Raw data points are represented by black dots superimposed on plots. Data bars present mean values +/− SEM across three distinct reactions (n=3, frequently hidden under data points). Error bars are centered on the mean. (D) Exact timepoints collected for 1, 10, and 100 μM. Timepoint 2 is displayed in Fig. 4 in main text and t1 at 100 μM is used in competition reaction mathematical modeling.

Extended Data Fig. 2.

Extended Data Fig. 2

Full computational model diagram. (A) Single, directed substrate simulation, where substrate can bind to the active site and peptide binding site. (B) Multi-substrate (directed and undirected competition) reaction simulation diagram.

Extended Data Fig. 3.

Extended Data Fig. 3.

Substrate diversity assay. (A) Diagram of assay procedure. Briefly, kit components were combined with genes encoding peptides. After IVT incubation, the crude mixture was split, respective enzyme added, and reactions were run at 30°C for 15 minutes. Samples were prepped and run for MALDI-TOF-MS analysis. (B) Table of IVT peptide substrates tested and their corresponding conversions. SH3-588 largely reached completion for most peptides tested (marked with 100%); remaining substrate peaks were hardly above noise (IVT peptides 1-9). IVT peptides 1-10 were analyzed by MALDI-TOF-MS. Error values indicate +/− SEM across three distinct replicates (n=3) for IVT peptides 1-8 and two distinct replicates (n=2) for IVT peptides 9 and 10 centered on the mean.

Supplementary Material

Supplementary material

Acknowledgements

We would like to thank the Bowers and Campbell labs for sharing advice and equipment. In addition, we would like to thank David, Jack, and Andrew for their invaluable advice in creating the design protocol for the multi-domain enzyme. This work was supported by NIH Grant R35GM131923 (B.K.) and is based upon work supported in part by a discovery grant from the Eshelman Institute for Innovation and the National Science Foundation under Grant No. 2204094 (A.A.B.).

Footnotes

Competing Interests Statement

The authors declare no competing financial interests.

References

  • 1.Ho SH & Tirrell DA Enzymatic Labeling of Bacterial Proteins for Super-resolution Imaging in Live Cells. ACS Cent Sci 5, 1911–1919 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Chen I, Howarth M, Lin W & Ting AY Site-specific labeling of cell surface proteins with biophysical probes using biotin ligase. Nat Methods 2, 99–104 (2005). [DOI] [PubMed] [Google Scholar]
  • 3.Bhattacharyya RP, Reményi A, Yeh BJ & Lim WA Domains, motifs, and scaffolds: The role of modular interactions in the evolution and wiring of cell signaling circuits. Annual Review of Biochemistry vol. 75 655–680 Preprint at 10.1146/annurev.biochem.75.103004.142710 (2006). [DOI] [PubMed] [Google Scholar]
  • 4.Miller WT Determinants of substrate recognition in nonreceptor tyrosine kinases. Accounts of Chemical Research vol. 36 393–400 Preprint at 10.1021/ar020116v (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Pellicena P, Stowell KR & Miller WT Enhanced phosphorylation of Src family kinase substrates containing SH2 domain binding sites. Journal of Biological Chemistry 273, 15325–15328 (1998). [DOI] [PubMed] [Google Scholar]
  • 6.Scott MP & Miller WT A peptide model system for processive phosphorylation by Src family kinases. Biochemistry 39, 14531–14537 (2000). [DOI] [PubMed] [Google Scholar]
  • 7.Qiu H & Miller WT Role of the Brk SH3 domain in substrate recognition. Oncogene 23, 2216–2223 (2004). [DOI] [PubMed] [Google Scholar]
  • 8.Ortega MA & van der Donk WA New Insights into the Biosynthetic Logic of Ribosomally Synthesized and Post-translationally Modified Peptide Natural Products. Cell Chem Biol 23, 31–44 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Arnison PG et al. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat. Prod. Rep. 30, (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Park S-H, Zarrinpar A & Lim WA Rewiring MAP Kinase Pathways Using Alternative Scaffold Assembly Mechanisms. Science (1979) 299, 1061–1064 (2003). [DOI] [PubMed] [Google Scholar]
  • 11.Adli M The CRISPR tool kit for genome editing and beyond. Nat Commun 9, 1911 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Urnov FD, Rebar EJ, Holmes MC, Zhang HS & Gregory PD Genome editing with engineered zinc finger nucleases. Nat Rev Genet 11, 636–646 (2010). [DOI] [PubMed] [Google Scholar]
  • 13.Bolukbasi MF et al. DNA-binding-domain fusions enhance the targeting range and precision of Cas9. Nat Methods 12, 1150–1156 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bashor CJ, Helman NC, Yan S & Lim WA Using Engineered Scaffold Interactions to Reshape MAP Kinase Pathway Signaling Dynamics. Science (1979) 319, 1539–1543 (2008). [DOI] [PubMed] [Google Scholar]
  • 15.Dyla M & Kjaergaard M Intrinsically disordered linkers control tethered kinases via effective concentration. Proceedings of the National Academy of Sciences 117, 21413–21419 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Speltz EB & Zalatan JG The Relationship between Effective Molarity and Affinity Governs Rate Enhancements in Tethered Kinase-Substrate Reactions. Biochemistry 59, 2182–2193 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Burkhart BJ, Hudson GA, Dunbar KL & Mitchell DA A prevalent peptide-binding domain guides ribosomal natural product biosynthesis. Nat Chem Biol 11, (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Grove TL et al. Structural Insights into Thioether Bond Formation in the Biosynthesis of Sactipeptides. J Am Chem Soc 139, (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Leaver-Fay A et al. Rosetta3. in 545–574 (2011). doi: 10.1016/B978-0-12-381270-4.00019-6. [DOI] [Google Scholar]
  • 20.Cao L et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science (1979) 370, 426–431 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Alford RF et al. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J Chem Theory Comput 13, 3031–3048 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Karanicolas J et al. A De Novo Protein Binding Pair By Computational Design and Directed Evolution. Mol Cell 42, 250–260 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Maguire JB et al. Perturbing the energy landscape for improved packing during computational protein design. Proteins: Structure, Function, and Bioinformatics 89, 436–449 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Camara-Artigas A, Ortiz-Salmeron E, Andujar-Sánchez M, Bacarizo J & Martin-Garcia JM The role of water molecules in the binding of class I and II peptides to the SH3 domain of the Fyn tyrosine kinase. Acta Crystallogr F Struct Biol Commun 72, 707–712 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Lotta T et al. Kinetics of Human Soluble and Membrane-Bound Catechol O-Methyltransferase: A Revised Mechanism and Description of the Thermolabile Variant of the Enzyme. Biochemistry 34, 4202–4210 (1995). [DOI] [PubMed] [Google Scholar]
  • 26.Struck A-W et al. An Enzyme Cascade for Selective Modification of Tyrosine Residues in Structurally Diverse Peptides and Proteins. J Am Chem Soc 138, 3038–3045 (2016). [DOI] [PubMed] [Google Scholar]
  • 27.Plaxco KW et al. The Folding Kinetics and Thermodynamics of the Fyn-SH3 Domain. Biochemistry 37, 2529–2537 (1998). [DOI] [PubMed] [Google Scholar]
  • 28.Johnson KA New standards for collecting and fitting steady state kinetic data. Beilstein Journal of Organic Chemistry 15, 16–29 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Goldsmith M & Tawfik DS Enzyme engineering: reaching the maximal catalytic efficiency peak. Curr Opin Struct Biol 47, 140–150 (2017). [DOI] [PubMed] [Google Scholar]
  • 30.Tianero Ma. D. et al. Metabolic model for diversity-generating biosynthesis. Proceedings of the National Academy of Sciences 113, 1772–1777 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Krishnamurthy VM, Semetey V, Bracher PJ, Shen N & Whitesides GM Dependence of Effective Molarity on Linker Length for an Intramolecular Protein−Ligand System. J Am Chem Soc 129, 1312–1320 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Meneses E & Mittermaier A Electrostatic Interactions in the Binding Pathway of a Transient Protein Complex Studied by NMR and Isothermal Titration Calorimetry. Journal of Biological Chemistry 289, 27911–27923 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Cho KF et al. Split-TurboID enables contact-dependent proximity labeling in cells. Proceedings of the National Academy of Sciences 117, 12143–12154 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Rivera VM et al. A humanized system for pharmacolog ic control of gene expression. NATURE MEDICINE vol. 2 http://www.nature.com/naturemedicine (1996). [DOI] [PubMed] [Google Scholar]
  • 35.Yazawa M, Sadaghiani AM, Hsueh B & Dolmetsch RE Induction of protein-protein interactions in live cells using light. Nat Biotechnol 27, 941–945 (2009). [DOI] [PubMed] [Google Scholar]

Methods-only References

  • 36.Lerner C et al. Design of Potent and Druglike Nonphenolic Inhibitors for Catechol O-Methyltransferase Derived from a Fragment Screening Approach Targeting the S-Adenosyl-l-methionine Pocket. J Med Chem 59, 10163–10175 (2016). [DOI] [PubMed] [Google Scholar]
  • 37.Kabsch W XDS. Acta Crystallogr D Biol Crystallogr 66, 125–132 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Vonrhein C et al. Data processing and analysis with the autoPROC toolbox. Acta Crystallogr D Biol Crystallogr 67, 293–302 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.McCoy AJ et al. Phaser crystallographic software. J Appl Crystallogr 40, 658–674 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Murshudov GN et al. REFMAC5 for the refinement of macromolecular crystal structures. Acta Crystallogr D Biol Crystallogr 67, 355–67 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Emsley P, Lohkamp B, Scott WG & Cowtan K Features and development of Coot. Acta Crystallogr D Biol Crystallogr 66, 486–501 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Cowtan K The Buccaneer software for automated model building. 1. Tracing protein chains. Acta Crystallogr D Biol Crystallogr 62, 1002–11 (2006). [DOI] [PubMed] [Google Scholar]
  • 43.Hussain M, Cummins MC, Endo-Streeter S, Sondek J & Kuhlman B Designer proteins that competitively inhibit Gαq by targeting its effector site. Journal of Biological Chemistry 297, 101348 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

Data Availability Statement

Atomic coordinates for the designed SH3-COMT fusion, SH3-588, have been deposited to the Protein Data Bank under the accession number 7UD6. The plasmid for SH3-588 has been deposited to AddGene under plasmid number 185920 (pMCSG28-SH3-588). Source data is provided in the supplementary information.

RESOURCES