Abstract
The tobacco etch virus (TEV) protease is a commonly used reagent for removal of solubility and purification tags from recombinant proteins and is cited as being highly specific for its canonical cleavage site. Flexibility in some amino acids within this recognition sequence has been described in the literature but researchers generally assume few native human proteins will carry off-target sequences for TEV cleavage. We report here the aberrant cleavage of three human proteins with non-canonical TEV protease cleavage sites and identify broader sequence specificity rules that can be used to predict unwanted cleavage of recombinant proteins. Using these rules, 456 human proteins were identified that could be substrates for unwanted TEV protease cleavage.
Keywords: Tobacco etch virus protease, TEV, solubility tags, recombinant protein, protein purification
Introduction
Our laboratory produces hundreds of proteins per year in support of early-stage drug discovery and structural biology efforts. Although the types of proteins being produced vary widely, we have generally found that a standard process in which proteins of interest are tagged with purification and solubility tags prior to expression in either bacterial, insect, or mammalian systems, provides the highest likelihood of success. In this scheme, large solubility enhancers such as maltose-binding protein (MBP) need to be removed from the target protein during purification, requiring the use of exogenous proteases. The tobacco etch virus (TEV) protease has become the standard reagent for this purpose due to its high degree of target site specificity (reducing aberrant cleavage in proteins of interest), ease of production, and high activity in many buffer conditions [1, 2].
Initially identified for its specific recognition and cleavage of the canonical consensus sequence ENLYFQ/[G/S], later work suggested that there was more flexibility in the consensus sequence than originally reported. Work from the Waugh laboratory showed that TEV protease cleaved a variety of substrates with nearly any amino acid except proline at the P1’ position, although the canonical glycine and serine were preferred [3]. Further work utilizing whole-cell fluorescence-based library assays also revealed a reduced specificity at the P6 position where glutamic acid was originally suggested to be strongly favored [4]. However, many of these approaches utilized either peptide substrates, which do not necessarily mimic the more structured protein substrates commonly used in protein production, or library approaches which are not able to cover the entire space of the 6-7 amino acid substrate recognition site. As a result, the findings of these experiments continued to suggest highly stringent specificity requirements in several positions of the substrate recognition sequence, including the absolute requirement for glutamine at the P1 position. Structural results, again utilizing peptide substrates, also proposed rationales for the strict requirement of the P1 glutamine and P6 glutamic acid based on observed contacts in the crystal structures [5].
Because of these proposed stringent sequence requirements, TEV protease has become the standard protease used to remove tags from recombinant fusion proteins. While other proteases such as thrombin and enterokinase were originally used for this purpose, these proteins had very low sequence specificity and often cleaved the recombinant protein of interest [6]. The argument that TEV protease would ameliorate these concerns was based on the limited number of human protein sequences with the canonical TEV protease site present. However, in the span of several years, our laboratory has encountered a variety of cases where TEV protease cleaved proteins at non-canonical sites. By examining the impact of these changes on recombinant protein production of a variety of substrates, we demonstrate here that the specificity requirements for TEV are less stringent than anticipated by previous studies and propose search methods to ensure a higher chance of identification of aberrant cleavage before proteins are purified in order to save time and improve the chances of successful production.
Experimental Methods
Cloning of recombinant proteins for protein purification
Genes for production of recombinant proteins were synthesized by ATUM, Inc. as Gateway Entry clones. Human CP110(91-638) and human MAPKAP1 (2-510) were produced with an upstream canonical TEV protease site (ENLYFQG) and optimized for expression in insect cells (CP110) or E. coli (MAPKAP1). Human CEP97 (full length) contained native human DNA sequence without a TEV protease site. All genes were subcloned into Gateway Destination vectors to generate expression clones. MAPKAP1 was subcloned into pDest-566 (Addgene #11517) for bacterial expression, while CP110 was subcloned into pDest-636 (Addgene #159574) and CEP97 into pDest-633 (Addgene #161877) for insect expression. Standard protocols were used for generation of bacmid DNA for CP110 and CEP97 [7]. For test constructs, PCR was used to add the TEV protease cleavage sites to the N-terminus of the genes of interest, eGFP, KRAS4b(1-169), or CAT. Using Gateway cloning, PCR products were recombined into Entry clones, and subsequently Expression clones in pDest-566 for bacterial expression.
Production of TEV protease
TEV protease expression clones contained L56V/S135G/S219V mutations and were placed downstream of a maltose-binding protein fusion with an ENLYFQS cleavage site and His6 tag in a standard pET-style T7 expression vector. TEV protease was produced in E. coli BL21*[pRare] by in vivo cleavage, with expression as described [8]. Cells were lysed using microfluidization in 20 mM Hepes pH 7.4, 300 mM NaCl, 1 mM TCEP, and purified on an XK 50/20 column packed with ~250 ml of Ni Sepharose™ High Performance resin (Cytiva) followed by size exclusion using a HiLoad 16/600 Superdex 75 resin (Cytiva) column. Final samples were concentrated using an Amicon stirred cell to a final concentration of 5 mg/ml.
Protein expression and purification
Bacterial expression constructs (MAPKAP1, eGFP, CAT, KRAS4b) were transformed into E. coli BL21*[pRare] cells and plated on LB agar with 100 μg/ml ampicillin and 15 μg/ml chloramphenicol at 37°C. Seed cultures were started from single colonies in 2 ml of MDAG-135 medium and grown overnight at 37°C. One milliliter of seed culture was used to inoculate 50 ml of Dynamite medium [8] in a 250-ml baffled shake flask. Cells were grown at 37°C to an OD600 of 6-8, before adding 0.5 mM IPTG to induce protein expression at 16°C for 18 hours. Cells were collected by centrifugation at 4,000 x g for 10 minutes and frozen at −80°C. Baculovirus expression constructs (CEP97, CP110) were transfected into Sf9 cells to generate baculovirus, and virus was used to infect Tni-FNL cells for protein expression using standard processes [9].
Bacterial cell pellets were thawed and resuspended in lysis buffer (20 mM HEPES, pH 7.4, 300 mM NaCl, 1 mM TCEP, 1:100 v:v Sigma Protease Inhibitor Cocktail #P8849) using a volume of 1 ml/100 OD600 units. Insect cell pellets were resuspended in a volume of one-tenth of the culture volume. Cells were lysed in a microfluidizer with 2 passes at 10,000 psi (bacterial) or 7,000 psi (insect). Lysates were clarified by ultracentrifugation at 100,000 g for 30 minutes at 4°C, followed by filtration of the soluble fraction with a 0.45 μm PES filter. Proteins were purified by IMAC using 160-ul Nickel Sepharose tips (Biotage) and an MEA2 system (Phynexus). Eluted proteins were quantitated and adjusted to a concentration of 0.5 mg/ml with 20 mM HEPES, pH 7.4, 300 mM NaCl, 1 mM TCEP. To test protease cleavage efficiency, 20 μl of each protein stock was digested with 1 μl of diluted TEV protease (molar ratio of substrate to TEV protease of approximately 10:1) for 3 hours at room temperature. For each sample, 1 μg of undigested protein and 1 μg of digested protein were analyzed side by side using SDS-PAGE.
Protein samples were analyzed by electrospray ionization mass spectrometry (ESI-MS). Proteins were diluted to 0.1 mg/ml in a total volume of 50 μl using 5% acetonitrile/0.2% formic acid. Reverse-phase separation was performed on a Vanquish UHPLC (Thermo Fisher Scientific) using a MabPac HPLC Column (Thermo Fisher Scientific) maintained at 50 °C. High-resolution intact protein mass (MS1) spectra were acquired in an Exactive Plus EMR Orbitrap MS (Thermo Fisher Scientific) and spectra were analyzed by MagTran (Zhonqi Zhang, Amgen Inc).
Results and Discussion
Unexpected cleavage of recombinant human proteins by TEV protease during purification
Based on the stringent TEV protease specificity requirements, we had not generally been concerned with aberrant proteolytic activity. However, we observed several target proteins which had unusual results during purification suggesting promiscuous TEV cleavages. The initial observation arose during production of human MAPKAP1 (Figure 1A), in which protein was generated in the form of His6-MBP-tev-MAPKAP1. From this 110 kDa fusion protein, the expected final cleaved protein size was 66 kDa. TEV protease cleavage of this protein led to the appearance of a 55 kDa protein band and ESI-MS confirmed that the product observed was MAPKAP1 missing its initial 68 amino acids. Cleavage at the normal TEV site also occurred based on analysis of the size of the cleaved His6-MBP fusion tag—likely the intervening 68 amino acid peptide was not observed on the SDS-PAGE gel due to migration rate and a lack of stainable lysine residues. The sequence at the aberrant cleavage junction was GYVYAQ/S which matched the canonical TEV protease cleavage site at the P3, P1, and P1′ positions. In addition, the glycine at P6 was consistent with previous reported results [4].
Figure 1. Unanticipated TEV protease cleavage of human proteins during purification.

The three panels represent SDS-PAGE gels of IMAC-purified fusion proteins of His6-MBP with MAPKAP1 (panel A) and CP110 (panel B), or His6-GST with CEP97 (panel C). MAPKAP1 and CP110 constructs had canonical TEV protease cleavage sites (ENLYFQ/G) between the tag and protein of interest, while CEP97 was a direct fusion of the tag and protein of interest without any TEV protease site. IMAC-purified pools (−) were treated with TEV protease (+) to cleave off the solubility tags where appropriate. Arrows indicate protein species of interest discussed in the text. Note that both MAPKAP1 and CP110 migrate anomalously on SDS-PAGE likely due to disordered regions and non-sphericity. The schematic below each panel shows the arrangement of the tag and protein and denotes the desired TEV protease cleavage sites with a black arrow and the unanticipated TEV protease cleavage sites with a red arrow. Molecular weight standards (S)are shown in kilodaltons.
The amino-terminal domain of human CP110 (Figure 1B) also produced unusual results during purification of His6-MBP-tev-CP110(91-638) fusion protein. Two bands were observed after TEV protease digestion, and ESI-MS confirmed that the slower migrating band was the correct CP110(91-638) sequence, while the lower band (representing 75% of the total protein) was CP110(112-638). This aberrantly cleaved product was the result of cleavage at the sequence ETVYSN/S, which matched the canonical TEV protease cleavage site at P6, P3, and P1′ but surprisingly did not contain the reportedly invariant glutamine residue at P1 [5], but rather had an asparagine at that location.
Finally, a third aberrantly cleaved protein was observed during attempts to produce human CEP97 (Figure 1C). In this case, the protein generated was a direct fusion of His6-GST to CEP97 without any protease site for tag removal. However, during purification, an accidental addition of TEV protease showed efficient cleavage of the 160 kDa fusion protein into a fragment containing the GST tag and N-terminus of CEP97, and another fragment containing the C-terminal region of CEP97. This cleavage point was mapped to the location of a putative TEV protease cleavage site, EWLYSQ/G which matches the canonical sequence except for the P5 and P2 positions, making it a very strong candidate for efficient TEV protease cleavage.
TEV cleavage at non-canonical sites is partially dependent on surrounding protein sequences
To investigate whether the aberrant TEV protease cleavage events were specific to the context of the proteins, we decided to engineer these “alternative” TEV cleavage sites from the three proteins in Figure 1 as cleavage sites for a series of standard test proteins used in our lab. The data in Figure 2 consist of three proteins, green fluorescent protein (eGFP, panel A), the small GTPase KRAS4b (panel B), and chloramphenicol acetyltransferase (CAT, panel C). All proteins were His6-MBP fusions separated with five different TEV protease cleavage sequences, including the canonical ENLYFQ/G, a modified canonical sequence replacing the Q at the P1 position with N, or one of the three putative cleavage sequences from the proteins in Figure 1. Proteins were purified and subjected to cleavage under identical conditions with TEV protease prior to analysis on SDS-PAGE.
Figure 2. Function of alternative TEV protease cleavage sites with representative recombinant proteins.

The three panels represent SDS-PAGE gels of three representative test proteins: monomeric enhanced green fluorescent protein (eGFP, panel A), human KRAS4b (panel B), and chloramphenicol acetyl transferase (CAT, panel C). Each protein is preceded by a His6-MBP fusion tag separated by a canonical TEV site (TQ, ENLYFQ/G) or a variant based on sequences from this paper including CP110 (CP, ETVYSN/S), MAPKAP1 (MK, GYIYAQ/S), CEP97 (CE, EWLYSQ/G), or a canonical site with Q to N mutation (TN, ENLYFN/G). Pairs of lanes represent 1 microgram of purified protein digested with TEV protease (+) or in the absence of TEV protease (−). Arrows indicate protein species of interest as a reference; His6-MBP tag (M, green arrows), TEV protease (T, blue arrows), cleaved protein of interest (P, red arrows). Molecular weight standards (S) are shown in kilodaltons.
The results in Figure 2 clearly demonstrate that the primary amino acid sequence is not the only determinant of TEV protease cleavage site specificity. In all three fusions, the CP110 cleavage site, which produced significant levels of cleavage in the production of CP110 in Figure 1B, showed little cleavage when utilized between MBP and the test proteins (CP). This suggests that the region of CP110 where that sequence lies may have an unusual conformation which favors TEV protease cleavage. This is likely due to the high level of intrinsic disorder predicted in the N-terminal region of the CP110 protein, which could permit easier access to the cleavage site than in the test proteins. Similarly, while the MAPKAP1 site (MK) does show appreciable cleavage in the eGFP and CAT fusions, the level of cleavage is quite low compared to the nearly 100% seen with MAPKAP1 itself in Figure 1A. MAPKAP1 also contains a predicted disordered region around the area of the aberrant cleavage, consistent with the results with CP110. In contrast to CP110 and MAPKAP1, the CEP97 site (CP) functions nearly as well as the canonical site, matching the high level of cleavage seen in Figure 1C. Perhaps most notably, the modified ENLYFN/G site (TN) which alters the supposedly invariant Q at P1 produces nearly as efficient cleavage as CEP97 and the canonical site, again highlighting that P1 specificity is conclusively not restricted to glutamine.
Prediction methods can identify potentially broader TEV specificity
To investigate potential TEV cleavage across the human proteome we utilized the MOTIF algorithm developed in the Kanehisa lab (https://www.genome.jp/tools/motif/MOTIF2.html) and available as part of the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [10]. Searches against the KEGG database with the canonical sequence, or the extended version where G/S/A/C are allowed at the P1’ position, identify only one human protein with a TEV protease cleavage site (CTNNAL1, alpha catenin-like 1). The MOTIF search can be expanded to use information found in the literature regarding the broader specificity of TEV—for instance, data showing that any amino acid can be placed at the P5 position and that either hydrophobic residues or any amino acid can be found at the P2 position [4, 5]. Our data also confirms previous work demonstrating that the P6 site specificity can also be relaxed to include other amino acids [4]. Varying the search options at P6 and P2 (see Table 1) produces results containing more than 200 human genes with protein products potentially cleavable by TEV protease.
Table 1: Optimal pattern search definitions can identify potential TEV protease cleavage sites in human protein sequences.
The total number of gene protein products (# prot) in the KEGG database containing various putative TEV protease cleavage sites identified by MOTIF. The amino acids at the P6, P2, and P1 sites are highlighted, along with whether the three test proteins (CP110, CEP97, MAPKAP1) discussed in this work would be detected using these search patterns.
| P6 | P2 | P1 | MOTIF Pattern Search | # prot | CP110 | CEP97 | MAPKAP |
|---|---|---|---|---|---|---|---|
| E | F | Q | E-N-L-Y-F-Q-G | 0 | No | No | No |
| E | F | Q | E-N-L-Y-F-Q-[GSAC] | 1 | No | No | No |
| E | FWLVAI | Q | E-x-[LVAIF]-Y-[FWLVAI]-Q-[GSAC] | 16 | No | No | No |
| E | any | Q | E-x-[LVAIF]-Y-x-Q-[GSAC] | 51 | No | Yes | No |
| E | any | Q or N | E-x-[LVAIF]-Y-x-[QN]-[GSAC] | 128 | Yes | Yes | No |
| EDGMQ | FWLVAI | Q | [EDGMQ]-x-[LVAIF]-Y-[FWLVAI]-Q-[GSAC] | 67 | No | No | Yes |
| EDGMQ | FWLVAI | Q or N | [EDGMQ]-x-[LVAIF]-Y-[FWLVAI]-[QN]-[GSAC] | 136 | No | No | Yes |
| EDGMQ | any | Q | [EDGMQ]-x-[LVAIF]-Y-x-Q-[GSAC] | 232 | No | Yes | Yes |
| EDGMQ | any | Q or N | [EDGMQ]-x-[LVAIF]-Y-x-[QN]-[GSAC] | 456 | Yes | Yes | Yes |
Perhaps the most important result obtained here is the reduced specificity at the P1 position. Previous reports, including the thorough library search by Samuelson [4], and structural analysis using X-ray crystallography [5] suggested that the glutamine at P1 was invariant. However, Samuelson’s study only looked at alteration of P1 in the initial libraries, where it is possible that the higher activity of Q at P1, and the limited number of clones chosen for analysis, may have overwhelmed any signal from peptides containing asparagine. The structural analysis of TEV protease bound to a peptide substrate also suggested that the glutamine at P1 would be essential due to necessary hydrogen bonds from both the amide nitrogen and epsilon oxygen. However, asparagine also contains these potential hydrogen bond donors, differing only in the reduction of one carbon length in the side chain. It is reasonable to assume that flexibility within the TEV protease active site would permit hydrogen bonding of asparagine in a similar manner, although with potentially reduced affinity or catalytic activity. If the P1 specificity is extended to permit either N or Q, the number of potential human protein products which are cleavable by TEV protease increases to 456 (Table 1). Notably, only in this broadest specificity search would all three of the proteins we identified in Figure 1 have been detected. The complete list of 456 human genes along with the predicted cleavage sites of their predominant protein products can be found in Supplemental Table 1.
Conclusions
In summary, this data argues for the use of MOTIF searching of any protein which will be used with TEV protease for recombinant protein production. While it is unlikely that all 456 predicted proteins in Table 1 will cleave with significant efficiency using TEV protease, there is the potential for at least limited cleavage activity which could lead to issues in downstream protein purification. Awareness of this possibility will potentially save time and effort or permit smaller scale testing to identify issues before large scale failures. If a protein is deemed risky due to a potential TEV cleavage site, an alternative protease like HRV-3C could be chosen to avoid concerns. More work needs to be done to investigate whether efficiency of cleavage is predictable based on sequence and to identify the structural basis for alternative cleavage across different substrates.
Supplementary Material
Highlights.
Tobacco etch virus (TEV) protease has broader specificity than previously identified
MOTIF searching can predict potential aberrant TEV protease cleavage sites in proteins
TEV protease specificity involves both sequence and context dependent features
Acknowledgments
We thank Matthew Drew, Vijaya Gowda, Carissa Grose, Simon Messing, Nitya Ramakrishnan, Troy Taylor, Jane Jones, and Bill Gillette for technical assistance with cloning, expression, and purification as well as review and editing of the manuscript. This work has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under contract 75N91019D00024. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.
Abbreviations
- ESI-MS
electrospray ionization mass spectrometry
- eGFP
enhanced green fluorescent protein
- GST
glutathione-S-transferase
- IMAC
immobilized metal affinity chromatography
- MBP
maltose binding protein
- PCR
polymerase chain reaction
- SDS-PAGE
sodium dodecyl sulfate-polyacrylamide gel electrophoresis
- TEV
Tobacco etch virus
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflicts of Interest
The authors declare they have no conflicts of interest.
References
- [1].Kapust RB, Tozser J, Fox JD, Anderson DE, Cherry S, Copeland TD, Waugh DS, Tobacco etch virus protease: mechanism of autolysis and rational design of stable mutants with wild-type catalytic proficiency. Protein Eng 14 (2001) 993–1000. [DOI] [PubMed] [Google Scholar]
- [2].Kapust RB, Waugh DS, Controlled intracellular processing of fusion proteins by TEV protease. Protein Expr Purif 19 (2000) 312–318. [DOI] [PubMed] [Google Scholar]
- [3].Kapust RB, Tozser J, Copeland TD, Waugh DS, The P1’ specificity of tobacco etch virus protease. Biochem Biophys Res Commun 294 (2002) 949–955. [DOI] [PubMed] [Google Scholar]
- [4].Kostallas G, Lofdahl PA, Samuelson P, Substrate profiling of tobacco etch virus protease using a novel fluorescence-assisted whole-cell assay. PLoS One 6 (2011) e16136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Phan J, Zdanov A, Evdokimov AG, Tropea JE, Peters HK 3rd, Kapust RB, Li M, Wlodawer A, Waugh DS, Structural basis for the substrate specificity of tobacco etch virus protease. J Biol Chem 277 (2002) 50564–50572. [DOI] [PubMed] [Google Scholar]
- [6].Waugh DS, An overview of enzymatic reagents for the removal of affinity tags. Protein Expr Purif 80 (2011) 283–293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Gillette WK, Esposito D, Taylor TE, Hopkins RF, Bagni RK, Hartley JL, Purify First: rapid expression and purification of proteins from XMRV. Protein Expr Purif 76 (2011) 238–247. [DOI] [PubMed] [Google Scholar]
- [8].Taylor T, Denson JP, Esposito D, Optimizing Expression and Solubility of Proteins in E. coli Using Modified Media and Induction Parameters. Methods Mol Biol 1586 (2017) 65–82. [DOI] [PubMed] [Google Scholar]
- [9].Snead K, Wall V, Ambrose H, Esposito D, Drew M, Polycistronic baculovirus expression of SUGT1 enables high-yield production of recombinant leucine-rich repeat proteins and protein complexes. Protein Expr Purif 193 (2022) 106061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Kanehisa M, Goto S, KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28 (2000) 27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
