Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 May 23.
Published in final edited form as: Angew Chem Int Ed Engl. 2022 Mar 30;61(22):e202202657. doi: 10.1002/anie.202202657

Spontaneous Orthogonal Protein Crosslinking via a Genetically Encoded 2-Carboxy-4-Aryl-1,2,3-Triazole

Yali Xu [a], Abdur Rahim [a], Qing Lin [a]
PMCID: PMC9117480  NIHMSID: NIHMS1790519  PMID: 35290708

Abstract

Here we report the design of N2-carboxy-4-aryl-1,2,3-triazole-lysines (CATKs) and their site-specific incorporation into proteins via genetic code expansion. When introduced into the protein dimer interface, CATKs permitted spontaneous, proximity-driven, site-selective crosslinking to generate covalent protein dimers in living cells, with phenyl-bearing CATK-1 exhibiting high reactivity toward the proximal Lys and Tyr. Furthermore, when introduced into the N-terminal β-strand of either a single-chain VHH antibody or a supercharged monobody, CATK-1 enabled site-specific, inter-strand, orthogonal crosslinking with a proximal Tyr located on the opposing β-strand. Compared with a non-crosslinked monobody, the orthogonally crosslinked monobody displayed improved cellular uptake and enhanced proteolytic stability against an endosomal enzyme. The robust crosslinking reactivity of CATKs should facilitate the design of novel protein topologies with improved physicochemical properties.

Keywords: Orthogonal crosslinking, Genetic code expansion, Proximity-driven reaction, Electrophilic amino acid, Antibody mimics

Graphical Abstract

graphic file with name nihms-1790519-f0001.jpg

A genetically encoded N2-carboxy-4-aryl-1,2,3-triazole-lysine (CATK) permits a spontaneous, quantitative, site-specific inter-strand crosslinking with a proximal Tyr in a small antibody-mimetic protein, generating a novel protein topology accompanied by improved cellular uptake and enhanced proteolytic resistance.


The disulfide bond in protein structure offers a redox-active covalent crosslink for regulating protein stability and function. The disulfide bond has been the only natural crosslink in protein structure until the recent discovery of an endogenous N−O−S crosslink between a discrete pair of lysine and cysteine residues of a transaldolase in Neisseria gonorrhoeae.[1] While the exogenous disulfide bonds have been engineered into proteins to enhance protein stability,[2] it has two major limitations: 1) recombinant expression of the cysteine-rich proteins in bacteria frequently leads to misfolding and formation of the inclusion bodies, requiring a lengthy refolding process to obtain native protein structure; 2) the disulfide bond is labile in the reducing environment of mammalian cytosol, rendering it unsuitable for intracellular applications. To circumvent these limitations, we report the design of an exogenous crosslink that is orthogonal to the disulfide bond and generated spontaneously via a proximity-driven acyl transfer reaction inside bacterial cells (Figure 1a).

Figure 1.

Figure 1.

Orthogonal protein crosslinking via a proximity-driven acyl transfer reaction. (a) Reaction scheme showing orthogonal crosslinking mediated by a genetically encoded amino acid. LG = leaving group. (b) Structures of noncanonical electrophilic amino acids designed in this study.

Our design involves the introduction of a genetically encoded electrophilic amino acid site-specifically into a protein of interest, which then undergoes spontaneous, intramolecular, proximity-driven crosslinking with a nearby nucleophilic residue. While several electrophilic amino acids have been incorporated into proteins site-specifically through genetic code expansion,[3] including p-2’-fluoroacetyl-phenylalanine,[4] bromoalkyl amino acids BprY[5] and BrC6K,[6] fluorosulfate-modified tyrosine (FSY)[7] and lysine (FSK),[8] and noncanonical amino acids containing perfluorobenzene[9] and vinyl sulfonamide,[10] they preferentially react with cysteine and lack orthogonality to the disulfide bond. Recently, Schultz and coworkers reported site-specific incorporation of 4-fluorophenylcarbamate-lysine (FPheK) into thioredoxin.[11] After incubating the protein in HEPES buffer, pH 8.5, at 37 °C for 8~12 hours, intramolecular crosslinking with a nearby nucleophilic amino acid (Lys, Cys, Tyr) was observed with good yields. Inspired by this work, we envisioned this acyl transfer-based crosslinking could proceed under neutral conditions if we can identify an appropriate genetically encoded leaving group. To this end, we considered a panel of azoles with pKa values ranging from 19.8 to 8.2,[12] and a varying degree of leaving group effect (Figure 1a). We were particularly attracted to 1,2,3-triazoles because: 1) 2H-1,2,3-triazole is quite acidic with pKa value of 9.4, making it an excellent leaving group in the acyl transfer reaction; and 2) N2-carboxy-1,2,3-triazoles have been used in the literature as stable electrophiles for chemical proteomics studies.[13] Thus, we designed a series of N2-carboxy-4-aryl-1,2,3-triazole-containing lysines (CATK-1-7) as well as three analogous triazolyl lysines, CATK-8, -8a, and -9, for comparison purposes (Figure 1b). For the synthesis of CATK-1-9, the critical step involved the triphosgene-mediated coupling of aryl- or alkyl-substituted triazoles with a protected lysine (Supporting Information (SI), Schemes S1S3). While there was no apparent selectivity for N2-carbamoylated CATKs, the two regioisomers can be readily separated by flash chromatography. After deprotection, CATK-1-8 were obtained in 7–52% yields. Because the N1 isomers showed poor water solubility, we proceeded with the N2 isomers in our subsequent studies. Since analogous 1,2,4-triazoles have also been used in designing small-molecule probes for serine hydrolases,[14] we synthesized 1,2,4-triazole-based CATK-9 in four steps with an overall yield of 43% (SI, Scheme S3). Importantly, in NMR-based stability assays, CATKs exhibited excellent stability toward the reduced glutathione (SI, Figure S1).

To identify pyrrolysine-tRNA synthetase (PylRS) variants that can charge CATKs, we co-transformed BL21(DE3) cells with two plasmids: pEVOL-PylRS encoding PylRS and tRNACUA, and pET-sfGFP-Q204TAG encoding sfGFP bearing an amber codon. We screened a panel of PylRS variants (Figure 2a; SI, Table S1),[15] and found one carrying Y306V, L309A, C348F, and Y384F mutations, hereafter referred to as CATKRS, can charge CATK-1, -2, -4, and -7 site-specifically into sfGFP (Figure 2b; SI, Figure S3). The incorporations were also confirmed by SDS-PAGE (SI, Figure S4) and QTOF-LC/MS analyses (Figure 2c; SI, Figure S5). Some amount of GSH adducts (~30%) were detected, presumably due to the high reactivity of CATKs at position-204, a site on sfGFP that is completely solvent exposed (SI, Figure S4a). However, no hydrolysis products were observed, indicating that CATKs are stable under bacterial culture conditions.

Figure 2.

Figure 2.

Identification of CATKRS and validation of its activity. (a) Crystal structure of MmPylRS in complex with Pyl-AMP (PDB code: 2ZIM) with five contact residues shown in green tube model and Pyl-AMP shown in yellow tube model. (b) Fluorescence-based detection of CATK incorporation into sfGFP in BL21(DE3) cells expressing CATKRS. (c) Deconvoluted intact mass spectrum of the sfGFP-204CATK-1 mutant analyzed by QTOF-LC/MS.

To assess CATK crosslinking reactivity, we decided to use the glutathione-S-transferase (GST) as a model because GST exists naturally as a homodimer and has been used previously for evaluating electrophilicity of noncanonical amino acids.[15a] Thus, we expressed GST mutants by placing CATK at position-52 and Lys at position-92 with anticipation that the flexible alkyl amine of Lys-92 will displace the triazole in a proximity-dependent acyl transfer reaction to generate the covalent GST dimer (Figure 3a). The GST mutants encoding CATK-1, -2, -4, and -7 at position-52 were obtained in good yields (3.0–7.3 mg L−1). To our satisfaction, prominent dimer bands were detected for all four CATK-encoded GST mutants on SDS-PAGE gel (Figure 3b), which was corroborated by western blot analysis (SI, Figure S7a). Neither buffer exchange nor prolonged incubation was needed (SI, Figure S7b), suggesting that the crosslinking occurred inside bacterial cells. Notably, the four cysteines present in each GST monomer (Figure 3a) do not interfere with CATK-1-mediated orthogonal crosslinking. As a control, Nε-(tert-butoxycarbonyl) lysine (BocK)-encoded GST mutant did not produce any covalent dimer, indicating that CATK is responsible for the cross-linking (Figure 3b). In contrast, GST mutants encoding FPheK or FSY at position-52 showed lower covalent dimer formation under the same condition, suggesting that CATK is a superior crosslinking motif (Figure 3b; SI, Figures S9S11).

Figure 3.

Figure 3.

Assessment of the CATK crosslinking reactivity in SjGST dimers. (a) Scheme for covalent crosslinking of the GST-CATK dimer. The crosslinking bonds were marked as red lines between the two monomers. The glutathione S-transferase structure (PDB code: 1Y6E) was rendered using PyMOL. The four free cysteines in one monomer were shown in a CPK model. (b) Coomassie blue-stained SDS-PAGE gels of the CATK, FPheK, and FSY-encoded GST proteins showing different amount of covalent GST dimer formation. For the full gel image of FSY-encoded GST proteins, see Figure S11a in the SI.

To identify residues responsible for crosslinking with CATK, we built a model of GST-E52CATK-1-K92 based on the GST dimer structure. Upon considering the distance and orientation of the residues surrounding CATK-1, we identified K92 and K141 as the plausible reaction partner (Figure 4a). We then mutated K92 to either Ala or Glu and observed complete abolishment of dimer formation on the SDS-PAGE gel (Figure 4b), indicating that K92 is responsible for the proximity-driven crosslinking. Finally, to examine whether amino acids other than Lys may participate in this proximity-driven crosslinking, we mutated Lys-92 to Tyr, Cys, Gln, Met, Asp, Thr, His, and Ser. Among six GST mutants that were expressed successfully, only the Tyr mutant gave a comparable crosslinking yield while the Cys and His mutants afforded modest crosslinking (Figure 4c). Together, lysine and tyrosine appear to represent the two most suitable reaction partners for CATK-1 in the nucleophilic acyl transfer reaction, likely due to their extended side chains and high intrinsic reactivity.

Figure 4.

Figure 4.

Assessment of CATK-mediated crosslinking specificity. (a) A close-up view of residues from the opposing GST monomer (colored in gray) surrounding CATK-1. PDB code: 1Y6E. (b) SDS-PAGE analysis of CATK-1-encoded GST mutants lacking certain adjacent nucleophilic residues. (c) Examining crosslinking specificity of GST-E52CATK-1 mutants containing potential nucleophilic residues at position-92 by western blot. The covalent GST dimer was probed using anti-His6 antibody. The crosslinking yields were listed underneath each lane. The higher crosslinking yields for the K92 mutant (relative to the one shown in Figure 3b) was due to continuing crosslinking reaction during the sample storage.

To probe whether CATK-1 is suitable for inter-strand crosslinking in proteins containing the disulfide bond, we selected a small protein called nanobody NB1, a prototypical single-chain VHH antibody that binds specifically to GFP protein.[16] Based on NB1 structure, there is one disulfide bond formed between Cys-24 and Cys-98, close to a proposed orthogonal crosslinking site at Val-4 and Tyr-106 (Figure 5a, left). To test orthogonal crosslinking, we placed CATK-1 at Val-4 position to target Tyr-106 located 5.6 Å away on the opposing strand. The BocK- and CATK-1-encoded NB1 were successfully expressed with yields of 14.3 mg L−1 and 3.5 mg L−1, respectively (Figure 5b, left). The deconvoluted intact masses showed a 42% crosslinking yield (Figure 5c, left; Figure S12). Notably, no GSH adduct, hydrolysis product, or the side product from the cysteine reaction with CATK-1 was detected, indicating that CATK-1-mediated crosslinking is orthogonal to the disulfide bond. Separately, we also examined the utility of CATK-1 in effecting intramolecular crosslinking in an antibody mimic called monobody. Due to their lack of cysteine residues, small size (~10 kDa), and evolvable binding affinity and specificity, monobodies represent an ideal protein scaffold for targeting protein-protein interactions in the cytosols of mammalian cells.[17] However, monobodies are cell impermeable, severely limiting their potential. One strategy to potentially overcome this limitation is to combine protein surface supercharging[18] with orthogonal crosslinking to increase stability in the endosomes and thus improve cytosolic delivery. To this end, we designed an overall +10 charged monobody NSa1,[19] termed NSa1(+10), using the Supercharge protocol on ROSIE Rosetta Online Server[20] and added an amber codon at Ala-13 position. Based on the NSa1 structure,[19] A13CATK-1 is well-positioned to react with the proximal Tyr-92 on the opposing strand at C-terminus (Figure 5a, right). Accordingly, the wild-type and NSa1(+10) mutant proteins encoding CATK-1 or BocK were expressed and purified in good yields (4.1–6.9 mg L−1; Figure 5b, right). To our delight, mass spectrometry analysis indicated that the inter-strand cross-linking yield between CATK-1 and Tyr-92 was essentially quantitative (Figure 5c, right; Figure S13), which was substantially higher than the FSY mutant giving 27.5% yield (SI, Figure S14). The crosslink-containing fragment was identified by LC/MS after trypsin digestion (Figure S15). Furthermore, when Tyr-92 was mutated to Phe, the crosslinking yield dropped to 9.5% (SI, Figure S13d), indicating that Tyr-92 is the primary site for the proximity-driven crosslinking.

Figure 5.

Figure 5.

Inter-strand crosslinking of nanobody NB1 and monobody NSa1 mediated by CATK-1. (a) Nanobody NB1 structure (PDB code: 3OGO, left) and wild-type NSa1 structure (PDB code: 4JE4, right) showing the crosslinking sites. Cys-24 and Cys-98 were rendered in blue CPK model. (b) Coomassie blue stained SDS-PAGE gels of NB1-V4BocK and NB1-V4CATK-1 (left), and NSa1, NSa1(+10)-A13BocK and NSa1(+10)-A13CATK-1 (right). Asterisk indicates the impurity derived from Ni-NTA affinity purification. (c) Deconvoluted mass spectra of NB1-V4CATK-1 (left) and NSa1(+10)-A13CATK-1 (right). The un-crosslinked starting materials [M − Met + H+] (calcd 12990.43 Da) and potential GSH adduct (calcd 13152.60 Da) were not observed for NSa1(+10)-A13CATK-1.

To assess cellular uptake of the supercharged NSa1 proteins, we first removed the N-terminal His-tag after TEV cleavage to obtain two intact NSa1(+10) mutants encoding either BocK or CATK-1. We then reacted the mutants with Alexa Fluor 488-NHS overnight to obtain the fluorescently labeled NSa1(+10) mutant proteins (Figure 6a). Mass spectrometry analysis revealed a modest labeling yield of 20–23% (SI, Figure S16). We then carried out a flow cytometry assay to quantify the uptake efficiency of the NSa1(+10) mutants. In brief, HeLa cells were treated with 100 or 500 nM of NSa1(+10) proteins at 37 °C for 4 hours. After washing cells three times with PBS containing 20 U/mL heparin to remove the surface-bound proteins, cells were collected and analyzed by flow cytometry (SI, Figure S17). We observed significant monobody uptake when protein concentrations reached 500 nM. While there is no significant difference in the percentage of fluorescent cells (13.6% vs. 14.8%; Figure 6b), the NSa1(+10)-A13CATK-1 treated cells showed 40% higher mean fluorescence intensity than the NSa1(+10)-A13BocK treated ones (Figure 6c; SI, Figure S17), indicating a more efficient uptake of the CATK-1-crosslinked monobody (Figure 6b). Since the kinetically stable protein folds show enhanced proteolytic resistance due to their rigid conformations with limited local openings,[21] we assessed the effect of orthogonal crosslinking on proteolytic stability of the monobody. Thus, we incubated the CATK-1-crosslinked NSa1(+10) with cathepsin B—an enzyme responsible for the degradation of protein cargoes in the endosomes—and monitored monobody stability by mass spectrometry. The CATK-1-crosslinked NSa1(+10) mutant gave a half-life of 126 min, three times longer than the non-crosslinked NSa1(+10)-A13BocK (Figure 6d), confirming the enhanced kinetic stability afforded by orthogonal crosslinking.

Figure 6.

Figure 6.

Assessment of effect of CATK-1-mediated on cellular uptake and endosomal stability. (a) SDS-PAGE analysis of the AF488-labeled NSa1(+10) monobodies encoding either CATK-1 or BocK. In-gel fluorescence image was shown on the top and silver staining image was shown at the bottom. The design of the NSa1 expression construct was shown on the right. (b) Scatter plots of HeLa cells without or with NSa1(+10) treatment. A total of 10,000 events were recorded in each measurement. (c) Plot of mean fluorescence intensity of HeLa cells after treatment with the NSa1(+10). The error bars represent the standard deviations from three independent measurements. (d) Stability of the supercharged NSa1 mutants against cathepsin B. The total ion counts of the intact proteins were used for quantification. Data at each time point represent mean ± SEM of three independent experiments. The data were fitted to one-phase decay equation using GraphPad Prism 9.2.

To examine whether CATKs are compatible with genetic code expansion in mammalian cells, we first performed a cell viability assay by treating HEK293T cells with CATK-1 and -2 at various concentrations. We did not detect cytotoxicity at concentrations ≤ 500 μM (SI, Figure S18). We then cotransfected HEK293T cells with two plasmids: one encodes CATKRS/ tRNAPyl, and the other encodes the mCherry-TAG-EGFP-HA reporter. The transfected cells were allowed to grow in DMEM supplemented with 10% FBS in the absence or presence of CATK-1. Fluorescence microscopy showed green fluorescence when CATK-1 was present (SI, Figure S19b), indicating successful CATK-1 incorporation into mCherry-TAG-EGFP-HA, which was also confirmed by western blot (SI, Figure S19c).

In summary, we have designed a panel of N2-carboxy-4-aryl-1,2,3-triazole-lysines (CATKs) that can be incorporated into proteins site-specifically via genetic code expansion in E. coli and mammalian cells. When introduced into the GST dimer interface, CATK-1, -2, -4, and -7 permitted spontaneous proximity-driven, site-selective crosslinking of the GST dimer in E. coli. Owing to its enhanced leaving group ability, phenyl-bearing CATK-1 exhibited higher crosslinking reactivity toward the proximal Lys and Tyr at neutral pH than FPheK and FSY, two genetically encoded noncanonical amino acids reported recently. When introduced into the N-terminal β-strand of either a single-chain VHH antibody or a supercharged monobody, CATK-1 enabled efficient site-specific, inter-strand, orthogonal crosslinking with a proximal Tyr located on the opposing β-strand. Compared with a non-crosslinked monobody, the orthogonally crosslinked monobody displayed improved cellular uptake and enhanced proteolytic resistance against an endosomal enzyme. The development of these triazole-based genetically encodable crosslinkers should facilitate the design of novel protein topologies containing orthogonal crosslinks akin to disulfide bonds, leading to potential new applications of protein-based materials.

Supplementary Material

supinfo

Acknowledgments

We gratefully acknowledge the National Institutes of Health (R35 GM130307) and National Science Foundation (CHE-1904558) for financial support.

Footnotes

The authors declare a potential conflict of interest. The Research Foundation for The State University of New York has filed a provisional patent based on this work.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supinfo

RESOURCES