Abstract
Background
The Escherichia coli expression system is widely used for recombinant protein production, but its utility is often limited by the formation of insoluble inclusion bodies. Although fusion tags can enhance solubility, their effectiveness varies unpredictably across different target proteins, and the optimal tag must typically be determined empirically.
Results
We developed a standardized set of pX fusion tag vector system for the high-throughput screening of soluble protein expression in E. coli. This system consists of eight medium-sized, TEV-cleavable fusion tags (ArsC, Crr, DsbA, Ecotin, MsyB, SlyD, Snut, and YjgD) cloned into a standardized pET-28b(+) backbone. We systematically evaluated the impact of these tags on the solubility and function of three model proteins (eGFP, EcFabG, and Mals) and six proteins known to be challenging to express in E. coli (PulA, NodE, FabF1XL, FabF2XL, FabZXL, and FabGXL). Our results demonstrated that the efficacy of each tag was highly protein-dependent. Notably, tags such as MsyB and Snut dramatically increased the soluble proportion of eGFP from 15% to over 85%, while the SlyD tag significantly enhanced both the solubility and activity of Mals. For several difficult-to-express proteins, soluble expression was only achieved with specific tags, highlighting the critical importance of tag selection.
Conclusions
Our study presents a versatile and efficient parallel cloning and screening system for the rapid production of soluble recombinant proteins. By enabling parallel screening of multiple fusion partners, this system facilitates the identification of optimal conditions for enhancing protein solubility and function, thereby addressing a key bottleneck in recombinant protein applications.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12896-026-01109-1.
Keywords: Recombinant protein expression, Solubility enhancement, Fusion tag screening, Escherichia coli, TEV protease, Inclusion bodies, pX vector series
Introduction
The Escherichia coli expression system is a cornerstone in biotechnology for producing recombinant proteins, prized for its well-characterized genetics, rapid growth, high yield, and cost-effectiveness [1]. Despite these advantages, the heterologous expression of many proteins in E. coli is often hampered by several challenges, including codon bias, proteolytic degradation, and—most notably—the aggregation of target proteins into insoluble inclusion bodies [2]. The formation of these inactive aggregates represents a major bottleneck, influenced by factors ranging from cultivation conditions (e.g., temperature, pH, ionic strength) to intrinsic properties of the target protein itself, such as a high proportion of hydrophobic residues, the presence of intrinsically disordered regions, complex multidomain architectures, or the requirement for specific eukaryotic chaperones or post-translational modifications absent in the prokaryotic host [2, 3].
Various strategies have been employed to improve the solubility and yield of biologically active proteins, including optimization of expression conditions, engineering of host strains, co-expression of molecular chaperones, and the use of fusion tags [3, 4]. Among these, fusion tags—typically derived from highly soluble proteins—have become a mainstream solution. By being fused to the N- or C-terminus of the target protein, these tags can promote solubility, enhance stability, and facilitate purification. N-terminal fusions are more commonly employed due to their potential to enhance translation initiation efficiency, protect the nascent polypeptide from N-terminal degradation, and compatibility with many affinity tag designs [5, 6]. A number of such tags from E. coli and other organisms have been widely adopted, including Maltose Binding Protein (MBP), Thioredoxin (TrxA), N-utilization substance A (NusA), and Small Ubiquitin-like Modifier (SUMO) [7–10]. While these established tags are invaluable, their effectiveness is not universal, and their relatively large size (e.g., MBP at ~ 40 kDa) can sometimes interfere with downstream structural or functional studies. A systematic, comparative analysis of alternative, less-characterized solubility tags within a unified cloning framework remains underexplored. This study aims to fill this gap by constructing and evaluating a parallelized suite of eight medium-sized tags, providing a direct empirical comparison of their performance across diverse protein targets.
Beyond these well-established tags, a diverse array of other partners with solubilizing potential has been explored. For instance, arsenate reductase (ArsC) exhibits high cytoplasmic solubility and folding capacity, serving as an effective fusion partner [11]. The glucose-specific phosphotransferase IIA component (Crr) and the spermidine/putrescine-binding periplasmic protein (PotD) have also been shown to significantly increase the solubility of heterologous proteins [12]. Disulfide bond formation protein A (DsbA) is particularly useful for promoting the soluble expression of proteins requiring disulfide bond formation [13], while the periplasmic trypsin inhibitor Ecotin has facilitated the expression of challenging proteins like human pepsinogen [14]. Furthermore, highly acidic polypeptides such as MsyB and YjgD are believed to improve solubility by increasing the net negative charge and hydrophilicity of the fusion protein [15, 16]. The anti-aggregation protein SlyD has proven effective in solubilizing aggregation-prone proteins in the cytoplasm [17], and the Solubility ‘eNhancing’ Ubiquitous Tag (SNUT), derived from a portion of the trans-peptidase sortase found in Staphylococcus aureus, confers favorable solubility characteristics toward target proteins [3].
However, no single fusion tag is universally effective, as their success is highly dependent on the specific target protein [15, 18]. Moreover, the tag itself can sometimes interfere with the structure, function, or immunogenicity of the protein of interest. For many applications in therapeutics and functional studies, the removal of the fusion tag is therefore essential [18]. This is typically achieved by incorporating a specific protease cleavage site between the tag and the target protein. Proteases like Factor Xa, thrombin, and enterokinase have been used, but they often suffer from inefficiency, high cost, or non-specific cleavage [2, 19]. In contrast, Tobacco Etch Virus (TEV) protease offers high specificity and efficiency, recognizing the ENLYFQ/G sequence and cleaving between Q and G, making it an excellent tool for tag removal [20, 21]. Beyond protease-based systems, emerging technologies like the CASPON platform (based on a specific intein) offer highly efficient, traceless tag removal alternatives [20]. Similarly, intein-based self-cleaving tags provide another avenue [21, 22]. The standardized design of our system could potentially be adapted to incorporate such cleavage modules in the future.
To address the need for a systematic and efficient method to identify the optimal fusion tag for a given protein, we designed and constructed a comprehensive, parallelized expression system for E. coli. This system overcomes the inefficiency of testing tags sequentially in disparate vectors by enabling the parallel cloning of a target gene into eight unified vectors. Each vector harbors one of a curated panel of medium-sized solubility tags—ArsC, Crr, DsbA, Ecotin, MsyB, SlyD, Snut, or YjgD selected to balance potent solubilization with minimal structural encumbrance (Table 1)—along with a standardized architecture featuring a TEV protease site for tag removal. We systematically investigated the effects of these tags on the solubility and function of three model proteins: enhanced Green Fluorescent Protein (eGFP) [21, 22], 3-ketoacyl-ACP reductase (EcFabG) [23] and maltogenic amylase (Mals). Furthermore, we evaluated the system’s efficacy on six difficult-to-express proteins (PulA, NodE, FabF1XL, FabF2XL, FabZXL and FabGXL selected based on prior empirical evidence from our lab and literature reports indicating their poor solubility or expression in E. coli [24], likely due to factors such as complex oligomerization or membrane association) to demonstrate its broad utility. Our findings establish this system as a robust and versatile tool for enhancing the soluble expression of diverse recombinant proteins.
Table 1.
Solubility enhancer tags used in this study
| Tags | Source | Full Name | Size (kDa) | Reference |
|---|---|---|---|---|
| ArsC | Escherichia coli | Arsenate reductase | 15.7 | [11] |
| Crr | Escherichia coli | Glucose-specific phosphotransferase (PTS) enzyme IIA | 18.1 | [12] |
| DsbA | Escherichia coli | Disulfide bond formation protein A | 23.0 | [13] |
| Ecotin | Escherichia coli | E. coli trypsin inhibitor | 18.1 | [14] |
| MsyB | Escherichia coli | An acidic E. coli protein | 14.1 | [16] |
| SlyD | Escherichia coli | An aggregation-resistant protein | 20.7 | [17] |
| Snut | Staphylococcus aureus | Solubility ‘eNhancing’ Ubiquitous Tag | 16.8 | [3] |
| YjgD | Escherichia coli | The hypothetical E. coli ORF | 15.5 | [15] |
Materials and methods
Bacterial strains, plasmids, and growth conditions
The bacterial strains and plasmids used in this study are listed in Table S1. E. coli strains were cultivated in Luria-Bertani (LB) medium at 37 °C. Antibiotics and inducers were supplemented as needed at the following final concentrations: ampicillin, 100 µg/mL; kanamycin, 30 µg/mL; and isopropyl-β-D-thiogalactoside (IPTG), 240 µg/mL. Bacterial growth was monitored by measuring the optical density at 600 nm (OD600). Restriction enzymes, high-fidelity DNA polymerase, T4 DNA ligase, and other molecular biology reagents were obtained from Takara Biotechnology (Dalian, China). Primers were synthesized, and DNA sequencing was performed by Sangon Biotech (Shanghai, China). All other chemicals were of molecular biology grade.
Vector construction
All expression vectors were derived from the pET-28b(+) backbone (Novagen). The general structure of the fusion constructs is depicted in Fig. 1. Briefly, the gene encoding each fusion tag (e.g., ArsC) was amplified from E. coli MG1655 genomic DNA using primer pairs Arsc-P1 and Arsc-P2 listed in Table S2. Sequential overlap extension PCR was used to assemble the final cassette containing the tag, a synthetic linker (GGGGS)₂, the TEV protease recognition site (ENLYFQ/G), and a 6×His tag. The resulting fragment was digested with NcoI and BamHI and ligated into similarly digested pET-28b(+) to generate the initial fusion tag vector, pArsC. The other seven tag fragments (Crr, DsbA, Ecotin, MsyB, SlyD, Snut, YjgD) were amplified, digested with NcoI and KpnI, and subsequently cloned into the equivalently digested pArsC backbone, thereby replacing the original ArsC tag to generate the final pX vector series (pCrr, pDsbA, pEcotin, pMsyB, pSlyD, pSnut, and pYjgD). All constructs were verified by DNA sequencing (Sangon Biotech, Guangzhou).
Fig. 1.
Schematic diagram of the pX vector series with various N-terminal fusion tags. All vectors share a conserved structure featuring a standardized multiple cloning site (MCS) flanked by dual 6×His tags, a synthetic linker peptide, and a TEV protease cleavage site. A panel of fusion tags (ArsC, Crr, DsbA, Ecotin, MsyB, SlyD, Snut, and YjgD) are incorporated upstream of the MCS to facilitate diverse protein expression and purification strategies
The genes encoding Mals and PulA were synthesized with GenSmart™ codon optimization for E. coli, based on sequences from Bacillus sp. WPD616 and Thermotoga neapolitana, respectively. The eGFP gene was amplified from pET28b-egfp (a kind gift from associate Prof Wang [21]) with primer pairs listed in Table S2. The EcFabG gene was amplified from E. coli MG1655 genomic DNA [23]. The genes for NodE, FabF1XL, FabF2XL, FabZXL and FabGXL were amplified from Sinorhizobium meliloti Rm1021 genomic DNA [24]. All the target genes sequences were then parallel cloned into the pX vectors with the same NdeI and HindIII digestion. All final expression constructs were confirmed by DNA sequencing.
Protein expression and solubility analysis
Recombinant vectors were transformed into E. coli BL21(DE3) competent cells chosen for its robust protein expression capability and compatibility with the T7 promoter system [25]. Single transformants were inoculated into 5 mL of LB medium and grown overnight at 37 °C. The overnight cultures were diluted 1:20 into fresh LB medium and grown at 37 °C until the OD₆₀₀ reached approximately 0.6–0.8. Protein expression was induced by adding IPTG to a final concentration of 240 µg/mL, followed by incubation for 4 h at 37 °C or 16 h at 20 °C [26]. Cells were harvested by centrifugation, resuspended in Lysis Buffer (50 mM NaH₂PO₄, 300 mM NaCl, 10 mM imidazole, pH 8.0), and disrupted by sonication on ice 5 cycles of 30 s pulse and 30 s rest. The soluble (supernatant) and insoluble (pellet) fractions were separated by centrifugation at 12,000 rpm for 30 min at 4 °C, the pellet was washed once with Lysis Buffer and resuspended in an equal volume of the same buffer. All the fractions were then analyzed by 12.5% SDS-PAGE to assess the overall expression level and solubility. All protein expression experiments were performed in three independent biological replicates (n = 3). Representative SDS-PAGE images are shown. Protein solubility was assessed based on the distribution of the target protein in the supernatant versus the pellet. Solubility ratios were quantified using ImageJ software from triplicate gels, and data are presented as mean ± standard deviation (SD). For proteins found primarily in the insoluble fraction, induction conditions were optimized by lowering the temperature and extending the induction time. Mals and EcFabG proteins were purified using nickel-nitrilotriacetic acid (Ni-NTA) affinity chromatography as previously described [23]. Protein concentrations were quantified by densitometric analysis of SDS-PAGE gels using ImageJ software.
Fluorescence analysis of eGFP fusion tags
The purified eGFP fusion proteins were diluted to a final concentration of 1 µM in 50 mM sodium phosphate buffer (50 mM NaH2PO4, 300 mM NaCl, pH 8.0). eGFP fluorescence measurements were carried out using a SpectraMax i3x Multi-Mode Microplate Reader (Molecular Devices) [22]. The excitation wavelength was set to 460 nm, and emission spectra were recorded accordingly to analyze the effect of the eight fusion tags on the expression of eGFP fluorescence. Measurements were performed in triplicate for each independent protein preparation.
Enzymatic activity assay of EcFabG fusions
The activity of EcFabG and its fusion variants was assessed in vitro by reconstituting the fatty acid synthesis pathway as described previously [23]. The assay mixture contained 0.1 m sodium phosphate (pH 7.0), 0.1 µg each of EcFabD, EcFabH, EcFabG, EcFabZ, EcFabI and 50 μm NADH, 50 μm NADPH, 1 mm β-mercaptoethanol, 100 μm malonyl‐CoA, 50 μm holo‐ACP and 100 μm acetyl‐CoA in a final volume of 40 µL. The assay mixtures were incubated at 37 °C for 1 h and resolved by conformation‐sensitive gel electrophoresis on 20% polyacrylamide gels containing 2.5 M urea for separation. The gels were stained with Coomassie brilliant blue R250 to visualize the acyl-ACP products.
TEV protease cleavage and mals activity assay
Purified Tag-Mals fusion proteins were treated with TEV protease (a gift from Associate Professor Wang) to remove the fusion tags at a 1:10 (w/w) protease-to-protein ratio in reaction buffer (50 mM Tris-HCl, pH 8.0, 0.5 mM EDTA, 1 mM DTT) for 2 h at 30 °C [21, 27]. The cleavage reaction was centrifuged briefly (5,000 rpm, 1 min), and the fusion tag was released and became free. A second Ni-NTA purification step can then be applied to obtain high-purity tag-free target protein, The enzymatic activity of both tagged and tag-free Mals was determined using the 3,5-dinitrosalicylic acid (DNS) method, which measures the reducing sugars released from starch hydrolysis [28]. Briefly, the reaction mixture containing 1% (w/v) soluble starch in 50 mM sodium citrate buffer (pH 5.5) and an appropriate amount of enzyme with concentration (0.1 nmol) normalized based on total protein (tagged) or soluble protein after cleavage (tag-free) was incubated at 60 °C for 10 min. The reaction was stopped by adding DNS reagent, followed by boiling for 10 min. The amount of reducing sugars released was measured spectrophotometrically at 540 nm. One unit (U) of enzyme activity was defined as the amount of enzyme required to produce 1 µmol of reducing sugar (expressed as glucose equivalents) per minute under the assay conditions. A standard curve was prepared using glucose solutions of known concentrations (Figure S1). Activity assays were performed in triplicate.
Results
Construction of a versatile fusion tag expression system
We successfully constructed a series of pET-28b(+)-derived expression vectors, designated as the pX series. As illustrated in Fig. 1, each vector is designed to fuse one of eight different tags (ArsC, Crr, DsbA, Ecotin, MsyB, SlyD, Snut, or YjgD) to the N-terminus of a target protein via a synthetic linker. The construct also includes a TEV protease cleavage site and a standardized multiple cloning site (MCS) flanked by dual 6×His affinity tags. In total, 81 recombinant expression vectors were constructed in batches to accommodate the various tag-target protein (eGFP, EcFabG, Mals, PulA, NodE, FabF1XL, FabF2XL, FabZXL or FabGXL) combinations (Table S1). All ‘No-tag’ controls refer to the target gene cloned directly into the standard pET-28b(+) vector, providing a common baseline for comparison. All wild-type (no-tag) constructs contained only the target gene with start and stop codons in pET-28b(+), resulting in proteins with theoretical molecular weights are as follows: eGFP (~ 27 kDa), EcFabG (~ 26 kDa), Mals (~ 68 kDa), NodE (~ 42 kDa), FabGXL (~ 37 kDa), FabF1XL (~ 45 kDa), FabF2XL (~ 42 kDa), and FabZXL (~ 17 kDa).
Fusion tags differentially modulate eGFP solubility and fluorescence
The green fluorescent protein GFP is a widely used reporter gene for in vivo expression and is recognized to be partially soluble when expressed in E. coli [14]. We first employed enhanced Green Fluorescent Protein (eGFP) as a model protein to evaluate the solubility-enhancing effects of the eight fusion tags. Solubility analysis by SDS-PAGE demonstrated that for eGFP, all tested fusion proteins achieved excellent soluble expression except those with Crr and DsbA tags (Fig. 2A). Densitometric quantification of the gels revealed that eGFP fused with ArsC, Ecotin, MsyB, SlyD, Snut, and YjgD was predominantly soluble. Notably, the MsyB and Snut tags exhibited particularly remarkable effects, increasing the soluble proportion of eGFP from 15% (wild-type, WT) to 87% and 92%, respectively. Furthermore, the ArsC tag not only enhanced the solubility but also boosted the overall expression level of eGFP (Fig. 2B). The results indicated that MsyB, ArsC, Ecotin, SlyD, YjgD and Snut effectively promoted the soluble expression of eGFP.
Fig. 2.
Analysis of solubility and fluorescence of eGFP fused with different tags. (A) Recombinant protein expression in E. coli BL21 (DE3) harboring different fusion tag constructs. Cultures were grown in LB medium and induced with 240 μg/mL IPTG at 37°C for 4 h. SDS-PAGE analysis of the soluble fraction (supernatants, S) and insoluble fraction (pellets, P) of eGFP fused with various tags. M, protein molecular weight marker. (B) Quantification of protein solubility ratio (soluble fraction vs. total) based on grayscale density analysis for eGFP. (C) Comparison of fluorescence intensity (RLU) between tagged and No-tag eGFP. Fluorescence of eGFP was measured at excitation 460 nm/emission 511 nm. Data represent the mean ± SD from three independent experiments (n=3); ns, not significant (P > 0.05, Student's t-test vs. No-tag control); *, P < 0.05; **, P < 0.01; ***p < 0.001; ****p < 0.0001
We next assessed whether the tags interfered with eGFP function by measuring fluorescence intensity. As summarized in Fig. 2C, the fluorescence intensity of the DsbA-tagged eGFP was significantly reduced, suggesting that this tag may interfere with the proper folding of eGFP. In contrast, the other seven tagged proteins exhibited fluorescence intensities comparable to that of wild-type eGFP, indicating that these tags promoted soluble expression without compromising functional integrity.
Impact of fusion tags on EcFabG solubility and enzyme activity
We next examined the effect of the fusion tags on 3-ketoacyl-ACP reductase (EcFabG), a soluble enzyme involved in the bacterial fatty acid synthesis pathway. Solubility analysis by SDS-PAGE demonstrated that EcFabG fusions with YjgD, MsyB, DsbA, and ArsC were largely soluble (Fig. 3A). Densitometric quantification of the gels indicated that the solubility of EcFabG fused with Crr and SlyD decreased to 38% and 35%, respectively, compared to 93% for the wild-type enzyme (Fig. 3B). This demonstrates that fusion with Crr or SlyD led to predominant inclusion body formation, significantly compromising EcFabG solubility. In contrast, tags such as Ecotin and Snut substantially reduced the overall expression level of EcFabG. This indicated that the Crr, SlyD, Ecotin and Snut tags had a greater effect on the solubility of EcFabG.
Fig. 3.
Evaluation of solubility and enzyme activity for EcFabG fusion proteins. (A) SDS-PAGE analysis of soluble (S) and insoluble (P) fractions of EcFabG fused with different tags expressed in E. coli BL21 (DE3). Cultures were induced at 37°C for 4 h. M, protein molecular weight marker. (B) Quantification of solubility ratio (soluble fraction vs. total) based on grayscale density for EcFabG. Data represent the mean ± SD from three independent experiments (n=3); ns, not significant (P > 0.05, Student's t-test vs. No-tag control); *, P < 0.05; **, P < 0.01; ***p < 0.001. (C) In vitro enzymatic activity assay of EcFabG fusions using conformationally sensitive gel electrophoresis. The reaction shows the conversion of malonyl-ACP (Mal-ACP) and octanoyl-ACP (C8-ACP) to decanoyl-ACP (C10-ACP) and dodecanoyl-ACP (C12-ACP). Control reaction showing enzymatic activity of No-tag EcFabG
To evaluate the functional integrity of the tagged EcFabG proteins, we reconstituted the fatty acid synthesis pathway in vitro. The EcFabG catalyzes the reduction of 3-ketoacyl-ACP to 3-hydroxyacyl-ACP. The enzymatic activity was assessed via conformationally sensitive gel electrophoresis (Fig. 3C). EcFabG fused with YjgD MsyB or ArsC retained significant reductase activity. In contrast, the activity of EcFabG carrying the Crr, Ecotin, DsbA or SlyD tags was markedly impaired, indicating that these tags interfered with the enzyme’s catalytic function.
TEV protease cleavage and its effect on mals activity
Given that some tags enhanced solubility but compromised the activity of EcFabG and the fluorescence intensity of eGFP, we further investigated the effect of tag removal on maltogenic amylase (Mals). As shown in Fig. 4A, SDS-PAGE analysis of soluble (S) and insoluble (P) fractions revealed distinct solubility profiles for Mals fused with different tags. Notably, Mals fused with tags such as ArsC, SlyD, DsbA, MsyB, Crr, and Snut showed prominent bands in the soluble fractions, indicating significantly improved the soluble expression of Mals. In contrast, the Ecotin-tagged Mals was predominantly detected in the insoluble pellets, suggesting poor solubility under the tested conditions. Quantification of solubility via grayscale density analysis further supported these observations (Fig. 4B). Most fusion tags significantly improved the solubility of Mals compared to the no-tag control, with Crr, MsyB, SlyD and Snut exhibiting the most pronounced effects.
Fig. 4.
Functional analysis of tagged and tag-free Mals proteins. (A) SDS-PAGE analysis of soluble (S) and insoluble (P) fractions of Mals with different fusion tags expressed in E. coli BL21 (DE3). M, protein molecular weight marker. (B) Quantification of solubility ratio (soluble fraction vs. total) based on grayscale density for Mals. Data represent the mean ± SD from three independent experiments (n=3); ns, not significant (P > 0.05, Student's t-test vs. No-tag control); *, P < 0.05; **, P < 0.01; ***p < 0.001; ****p < 0.0001. (C) Quantitative analysis of maltogenic amylase activity before and after TEV protease cleavage (μmol/ml/min). Data represent mean ± SD from three independent assays. Statistical significance refers to comparisons between pre- and post-cleavage activities for each construct
Enzymatic activity assays conducted before and after TEV protease cleavage provided insights into the functional influence of the fusion tags (Fig. 4C). Prior to cleavage, several tagged Mals, particularly those with SlyD, MsyB and Crr, displayed high maltogenic amylase activity. Following tag removal by TEV protease, a general reduction in activity was observed across most constructs, implying that the presence of the fusion tag substantially enhanced Mals activity compared to the tag-free protein. Notably, the SlyD-tagged Mals retained the highest activity both before and after cleavage, highlighting its dual benefit as a solubility and positively influenced the enzymatic function of Mals. The results indicated that the fusion tags differentially influenced the solubility and activity of Mals.
Enhancement of soluble expression for challenging target proteins
Previous studies have found that it is difficult to express and purify some of the enzymes involved in fatty acid synthesis in S. meliloti Rm1021, making it impossible to carry out biochemical studies [24]. To further validate the broad applicability of our system, we tested its efficacy on five difficult-to-express proteins from Sinorhizobium meliloti (NodE, FabGXL, FabF1XL, FabF2XL, and FabZXL), which are typically insoluble or poorly expressed in E. coli. As shown in Fig. 5, the fusion tags dramatically improved their solubility profiles: NodE showed improved soluble expression when fused with MsyB or Snut tags, as evidenced by distinct protein bands in the soluble fractions (Fig. 5a, band enclosed in the red box). FabGXL, which was largely insoluble without a tag, achieved soluble expression only with the YjgD tag (Fig. 5b). For FabF1XL and FabF2XL, the YjgD and MsyB tags demonstrated the most effective solubilization, while other tags were less effective (Fig. 5c, d). In contrast, all eight fusion tags successfully promoted the soluble expression of FabZXL (Fig. 5e). These results underscore the target protein-dependent nature of tag efficacy and highlight the utility of our multi-tag system for identifying optimal solubility conditions for a wide range of recalcitrant proteins.
Fig. 5.
Enhancement of expression and solubility of NodE, FabGXL, FabF1XL, FabF2XL and FabZXL using the pX vector series. SDS-PAGE analysis of NodE (A), FabGXL (B), FabF1XL (C), FabF2XL (D) and FabZXL (E) expression from the soluble fraction (supernatants, S) and insoluble fraction (pellets, P) in E. coli BL21 (DE3) using the pX medium-sized fusion tag vector series. M, molecular weight marker. Arrows indicate the expected positions of the target proteins. The region enclosed in the red box indicates the soluble fraction. In (A), the expected band for NodE fusion proteins is indicated by an arrow or the red box. Some smearing is observed, which is not uncommon for membrane-associated or fatty acid enzymes like NodE
Discussion
The formation of insoluble inclusion bodies remains a major obstacle to the widespread application of prokaryotic expression systems for recombinant protein production [2, 29]. To address this challenge, we developed a versatile fusion tag vector system that enables rapid screening for optimal solubility enhancers. In this study, the pullulanase PulA could not be expressed in E. coli BL21(DE3) (Figure S2), likely due to its origin from the hyperthermophilic anaerobe Thermotoga neapolitana. Although the PulA gene was codon-optimized for E. coli, it still failed to express, suggesting that factors beyond codon usage—such as protein folding kinetics or compatibility with the host proteostasis network—may have contributed to its insolubility [2, 30]. Similarly, proteins such as Mals and NodE, which may contain rare codons or complex folding requirements, achieved soluble expression only when fused with specific tags. For instance, MsyB markedly enhanced the solubility of NodE (Fig. 5a), while ArsC, SlyD, DsbA, MsyB, Crr, and Snut all significantly improved the solubility of Mals (Fig. 4).
Other target proteins, including eGFP, FabGXL, FabF1XL, FabF2XL, and FabZXL, were predominantly expressed as inclusion bodies in the absence of fusion tags. While soluble expression was achievable with tag assistance, the efficacy of each tag varied considerably depending on the target protein. For example, all eight tags improved FabZXL solubility (Fig. 5e), whereas only YjgD enabled soluble expression of FabGXL (Fig. 5b). Similarly, MsyB and YjgD were the most effective for FabF1XL and FabF2XL (Fig. 5c–d). ArsC, Ecotin, MsyB, SlyD, Snut, and YjgD fusion tags can promote the solubilization of eGFP (Fig. 2). These observations highlight the target-specific nature of fusion tag efficacy and underscore the importance of employing a multi-tag screening approach [18, 31].
Our study focuses on a set of medium-sized tags (14–23 kDa) distinct from widely used large tags like MBP (~ 40 kDa) or GST (~ 26 kDa). While MBP and GST is renowned for its high success rate, particularly for eukaryotic proteins [10, 32], its large size can sometimes interfere with downstream structural or functional studies and may complicate purification. The tags in our pX system offer a balance between effective solubilization and minimal structural encumbrance. Although a direct, side-by-side comparison with MBP or GST was not within the scope of this initial characterization, our data demonstrate that several tags (e.g., YjgD, MsyB) achieve high solubilization efficacy comparable to what is commonly reported for traditional tags [15, 16]. Future work will include direct experimental benchmarking against MBP for selected challenging targets to provide definitive comparative data. Importantly, our system provides a curated panel of alternatives; for instance, the DsbA tag offers specific utility for disulfide-bonded proteins, while SlyD may act as an intrinsic chaperone. This expands the toolbox available for researchers facing expression challenges with conventional tags. Collectively, the pX system enhances operational efficiency by enabling the parallel assessment of multiple solubility-enhancement strategies within a unified workflow. Its use of medium-sized tags offers a distinct advantage over larger partners by reducing potential steric interference while maintaining high solubilization power, as evidenced by our data. The standardized design with a TEV protease site ensures streamlined tag removal. Together with its demonstrated efficacy across a diverse set of challenging proteins, this system provides a versatile and practical tool for addressing common expression bottlenecks.
The solubilization mechanism of fusion tags remains incompletely understood, though it is often suggested to relate to their own folding properties and biophysical characteristics, such as surface charge or hydrophilicity [15, 33]. To gain preliminary insight into the mechanisms underlying solubility enhancement, we analyzed the core biophysical properties of our eight fusion tags (Supplementary Table S3). We observed a general trend wherein tags characterized by a high negative net charge and hydrophilic nature, as indicated by a negative GRAVY index, and particularly those with highly acidic properties (pI < 5.0) such as MsyB and YjgD, tended to be the most effective and versatile solubility enhancers. This finding is consistent with prior studies on acidic fusion partners [15, 16]. However, notable exceptions highlight the complexity of the mechanism. For instance, the superior performance of SlyD with Mals likely stems from its intrinsic chaperone activity rather than its charge properties alone [17]. Interestingly, the less acidic Snut (pI 6.324, GRAVY − 1.106, net charge − 1.79) still demonstrated considerable solubilization efficacy, performing well with target proteins such as eGFP, Mals, NodE, and FabZXL. Furthermore, the ability of certain tags (e.g., YjgD) to solubilize particularly recalcitrant proteins (FabGXL) for which others failed including the MsyB, underscores that optimal tag selection results from a complex, individualized match between the tag’s biophysical properties and the target protein’s specific folding pathway and structural needs.
Based on our results, we propose preliminary, practical guidelines for tag screening. For a new, aggregation-prone target, initiating screening with highly acidic, hydrophilic tags like MsyB or YjgD is advisable, as their negative charge may prevent non-specific aggregation [13, 14]. For proteins suspected of requiring disulfide bonds, DsbA is a logical first choice. Tags like SlyD, which possesses chaperone-like activity, may be particularly effective for complex folding proteins. While a two-tag screen (YjgD and MsyB) would have identified a soluble candidate for all targets in this study (Table S3), the ‘optimal’ tag for function (e.g., SlyD for Mals activity) or yield might differ. Therefore, a small panel of 3–4 tags representing different mechanisms (e.g., one acidic, one chaperone-like, one disulfide-helper) could balance screening efficiency with success rate (Supplementary Table S4). Future integration of bioinformatic prediction of target protein properties with our empirical tag performance database could further refine selection.
However, the molecular dimensions and structural properties of fusion tags can also interfere with the folding and functionality of target proteins, as evidenced by our experimental data. A notable example is the DsbA tag, which substantially quenched eGFP fluorescence despite maintaining reasonable solubility (Fig. 2C). In this experimental design, the soluble protein EcFabG was intentionally selected to analyze the impact of fusion tags on its solubility and function. As anticipated, the addition of tags reduced its solubility ratio to varying degrees, with the most significant decrease reaching 58%. Furthermore, tags including Crr, Ecotin, DsbA, and SlyD were found to impair EcFabG’s enzymatic activity (Fig. 3C). These results emphasize that while fusion tags can enhance solubility, they may also interfere with protein function. It is therefore common practice to remove fusion tags after purification [34]. Interestingly, however, several tags in this study—including SlyD, MsyB, and Crr—enhanced the enzymatic activity of Mals even before cleavage (Fig. 4C). This suggests that, in some cases, fusion partners may do more than improve solubility; they may also assist in folding or stabilize the active conformation of certain target proteins [17, 18]. The underlying mechanisms warrant further investigation.
In summary, our results confirm that no single fusion tag is universally effective for all target proteins. The optimal tag must be empirically determined through parallel screening [18]. The pX vector system developed in this study provides a convenient and efficient framework for such screening, enabling rapid identification of the most suitable fusion tag to enhance both the solubility and functional yield of diverse recombinant proteins. The standardized design also holds potential for future adaptation to alternative cleavage systems (e.g., inteins) or high-throughput cloning methodologies [35–37].
Conclusions
In this study, we established a versatile parallel screening platform (pX system) based on pET-28b(+) for enhancing the soluble expression of recombinant proteins in E. coli. This system integrates eight medium-sized, TEV-cleavable fusion tags into a standardized backbone, enabling high-throughput parallel cloning and screening. Our comprehensive evaluation using diverse model and challenging target proteins demonstrates that fusion tags exert protein-specific effects on solubility, expression yield, and biological activity. No single tag was universally optimal, underscoring the necessity of empirical screening. The pX system effectively addresses this need by facilitating the rapid identification of the most suitable fusion partner for a given protein. With its proven efficacy in promoting the soluble and functional expression of even recalcitrant proteins, this vector series represents a valuable and streamlined tool for both academic and industrial protein research.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
Not applicable.
Abbreviations
- E. coli
Escherichia coli
- LB
Luria Bertani
- IPTG
Isopropyl-β-D-1-thiogalactopyranoside
- MCS
Multi-cloning site
- TEV
Tobacco Etch Virus
- eGFP
enhanced green fluorescence protein
- EcFabG
3-ketoacyl-ACP reductase
- Mals
Maltogenic amylase
- ArsC
Arsenate reductase
- Crr
Glucose-specific phosphotransferase (PTS) enzyme IIA
- DsbA
Disulfide bond formation protein A
- Ecotin
E. coli trypsin inhibitor
- MsyB
An acidic E. coli protein
- SlyD
An aggregation-resistant protein
- Snut
Solubility ‘eNhancing’ Ubiquitous Tag
- YjgD
The hypothetical E. coli ORF
Author contributions
LLZ and MJC: Writing–original draft, Writing–review & editing, Data curation, Visualization. LLZ, CJT, TZY and CYQ: carried out all experiments. ZWB and MJC: planning, design, and coordination of the research. HZ and WHH: Project administration. All authors reviewed the manuscript.
Funding
This study was supported by the following projects: Industry-University Cooperation Project (H20230317) and National Natural Science Foundation of China (32570032).
Data availability
All of the data generated and used in this work are included in the manuscript and are available as supplementary material.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Wen-Bin Zhang, Email: zhangwenbin@scau.edu.cn.
Jin-Cheng Ma, Email: majincheng@scau.edu.cn.
References
- 1.Hayat SMG, Farahani N, Golichenari B, Sahebkar A. Recombinant protein expression in Escherichia coli (E. coli): what we need to know. Curr Pharm Des. 2018;24(6):718–25. 10.2174/1381612824666180131121940. [DOI] [PubMed] [Google Scholar]
- 2.Rosano GL, Ceccarelli EA. Recombinant protein expression in Escherichia coli: advances and challenges. Front Microbiol. 2014;5:172. 10.3389/fmicb.2014.00172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Caswell J, Snoddy P, McMeel D, Buick RJ, Scott CJ. Production of Recombinant proteins in Escherichia coli using an N-terminal Tag derived from sortase. Protein Expr Purif. 2010;70(2):143–50. 10.1016/j.pep.2009.10.012. [DOI] [PubMed] [Google Scholar]
- 4.Paraskevopoulou V, Falcone FH. Polyionic tags as enhancers of protein solubility in Recombinant protein expression. Microorganisms. 2018;6(2):47. 10.3390/microorganisms6020047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Waugh DS. An overview of enzymatic reagents for the removal of affinity tags. Protein Expression Purif. 2011;80(2):283–93. 10.1016/j.pep.2011.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Terpe K. Overview of Tag protein fusions: from molecular and biochemical fundamentals to commercial systems. Appl Microbiol Biotechnol. 2003;60(5):523–33. 10.1007/s00253-002-1158-6. [DOI] [PubMed] [Google Scholar]
- 7.di Guana C, Lib P, Riggsa PD, Inouyeb H. Vectors that facilitate the expression and purification of foreign peptides in Escherichia coli by fusion to maltose-binding protein. Gene. 1988;67(1):21–30. 10.1016/0378-1119(88)90004-2. [DOI] [PubMed] [Google Scholar]
- 8.Malakhov MP, Mattern MR, Malakhova OA, Drinker M, Weeks SD, Butt TR. SUMO fusions and SUMO-specific protease for efficient expression and purification of proteins. J Struct Funct Genomics. 2004;5(1–2):75–86. 10.1023/b:Jsfg.0000029237.70316.52. [DOI] [PubMed] [Google Scholar]
- 9.LaVallie ER, DiBlasio EA, Kovacic S, Grant KL, Schendel PF, McCoy JM. A thioredoxin gene fusion expression system that circumvents inclusion body formation in the E. coli cytoplasm. Nat Biotechnol. 1993;11(2):187–93. 10.1038/nbt0293-187. [DOI] [PubMed] [Google Scholar]
- 10.Jeon WB. Retrospective analyses of the bottleneck in purification of eukaryotic proteins from Escherichia coli as affected by molecular weight, cysteine content and isoelectric point. BMB Rep. 2010;43(5):319–24. 10.5483/bmbrep.2010.43.5.319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Song JA, Lee DS, Park JS, Han KY, Lee J. A novel Escherichia coli solubility enhancer protein for fusion expression of aggregation-prone heterologous proteins. Enzyme Microb Technol. 2011;49(2):124–30. 10.1016/j.enzmictec.2011.04.013. [DOI] [PubMed] [Google Scholar]
- 12.Han K, Seo H, Song J, Ahn K, Park J, Lee J. Transport proteins potd and Crr of Escherichia coli, novel fusion partners for heterologous protein expression. Biochim Biophys Acta. 2007;1774(12):1536–43. 10.1016/j.bbapap.2007.09.012. [DOI] [PubMed] [Google Scholar]
- 13.Zhang Y, Olsen DR, Nguyen KB, Olson PS, Rhodes ET, Mascarenhas D. Expression of eukaryotic proteins in soluble form in Escherichia coli. Protein Expr Purif. 1998;12(2):159–65. 10.1006/prep.1997.0834. [DOI] [PubMed] [Google Scholar]
- 14.Malik A, Rudolph R, Söhling B. A novel fusion protein system for the production of native human pepsinogen in the bacterial periplasm. Protein Expression Purif. 2006;47(2):662–71. 10.1016/j.pep.2006.02.018. [DOI] [PubMed] [Google Scholar]
- 15.Zou Z, Cao L, Zhou P, Su Y, Sun Y, Li W. Hyper-acidic protein fusion partners improve solubility and assist correct folding of Recombinant proteins expressed in Escherichia coli. J Biotechnol. 2008;135(4):333–9. 10.1016/j.jbiotec.2008.05.007. [DOI] [PubMed] [Google Scholar]
- 16.Su Y, Zou Z, Feng S, Zhou P, Cao L. The acidity of protein fusion partners predominantly determines the efficacy to improve the solubility of the target proteins expressed in Escherichia coli. J Biotechnol. 2007;129(3):373–82. 10.1016/j.jbiotec.2007.01.015. [DOI] [PubMed] [Google Scholar]
- 17.Han KY, Song JA, Ahn KY, Park JS, Seo HS, Lee J. Solubilization of aggregation-prone heterologous proteins by covalent fusion of stress-responsive Escherichia coli protein, SlyD. Protein Eng Des Sel. 2007;20(11):543–9. 10.1093/protein/gzm055. [DOI] [PubMed] [Google Scholar]
- 18.Köppl C, Lingg N, Fischer A, Kröß C, Loibl J, Buchinger W, Schneider R, Jungbauer A, Striedner G, Cserjan-Puschmann M. Fusion Tag design influences soluble Recombinant protein production in Escherichia coli. Int J Mol Sci. 2022;23(14):7678. 10.3390/ijms23147678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cserjan-Puschmann M, Lingg N, Engele P, Kröß C, Loibl J, Fischer A, Bacher F, Frank A-C, Öhlknecht C, Brocard C, et al. Production of circularly permuted Caspase-2 for affinity Fusion-Tag removal: Cloning, expression in Escherichia coli. Purif Charact. 2020;10(12):1592. 10.3390/biom10121592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kapust RB, Tözsér J, Copeland TD, Waugh DS. The P1′ specificity of tobacco etch virus protease. Biochem Biophys Res Commun. 2002;294(5):949–55. 10.1016/S0006-291X(02)00574-0. [DOI] [PubMed] [Google Scholar]
- 21.Wang HZ, Chu ZZ, Chen CC, Cao AC, Tong X, Ouyang CB, Yuan QH, Wang MN, Wu ZK, Wang HH, et al. Recombinant passenger proteins can be conveniently purified by One-Step affinity chromatography. PLoS ONE. 2015;10(12):e0143598. 10.1371/journal.pone.0143598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Cha HJ, Wu CF, Valdes JJ, Rao G, Bentley WE. Observations of green fluorescent protein as a fusion partner in genetically engineered Escherichia coli: monitoring protein expression and solubility. Biotechnol Bioeng. 2000;67(5):565–74. 10.1002/(SICI)1097-0290(20000305)67:5%3C565::AID-BIT7%3E3.0.CO;2-P. [PubMed]
- 23.Hu Z, Ma J, Chen Y, Tong W, Zhu L, Wang H, Cronan JE. Escherichia coli FabG 3-ketoacyl-ACP reductase proteins lacking the assigned catalytic triad residues are active enzymes. J Biol Chem. 2021;296:100365. 10.1016/j.jbc.2021.100365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Haag AF, Wehmeier S, Muszyński A, Kerscher B, Fletcher V, Berry SH, Hold GL, Carlson RW, Ferguson GP. Biochemical characterization of Sinorhizobium meliloti mutants reveals gene products involved in the biosynthesis of the unusual lipid A very long-chain fatty acid. J Biol Chem. 2011;286(20):17455–66. 10.1074/jbc.M111.236356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Studier FW, Moffatt BA. Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. J Mol Biol. 1986;189(1):113–30. 10.1016/0022-2836(86)90385-2. [DOI] [PubMed] [Google Scholar]
- 26.Studier FW. Protein production by auto-induction in high density shaking cultures. Protein Expr Purif. 2005;41(1):207–34. 10.1016/j.pep.2005.01.016. [DOI] [PubMed] [Google Scholar]
- 27.Raran-Kurussi S, Cherry S, Zhang D, Waugh DS. Removal of affinity tags with TEV protease. Methods Mol Biol. 2017;1586:221–30. 10.1007/978-1-4939-6887-9_14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.LI Y, Su L, Wu J, Wu D. Recombinant expression and fermentation optimization of B. stearothermophilu maltogenic amylases in Bacillus subtilis. J Food Sci Biotechnol. 2020;39(02):1–9. 10.3969/j.issn.1673-1689.2020.02.001. [Google Scholar]
- 29.Zhao L, Cao J, Liu X, Li Y, Wu J, Su L. Optimizing protein folding in prokaryotes: strategies to enhance soluble expression of Recombinant proteins. Bioresour Technol. 2026;439:133266. 10.1016/j.biortech.2025.133266. [DOI] [PubMed] [Google Scholar]
- 30.Fang J, Zou L, Zhou X, Cheng B, Fan J. Synonymous rare arginine codons and tRNA abundance affect protein production and quality of TEV protease variant. PLoS ONE. 2014;9(11):e112254. 10.1371/journal.pone.0112254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Park J-S, Han K-Y, Lee J-H, Song J-A, Ahn K-Y, Seo H-S, Sim S-JJ, Kim S-W, Lee J. Solubility enhancement of aggregation-prone heterologous proteins by fusion expression using stress-responsive Escherichia coli protein, RpoS. BMC Biotechnol. 2008;8:15–15. 10.1186/1472-6750-8-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Smith DB, Corcoran LM. Expression and Purification of Glutathione-S-Transferase Fusion Proteins. Curr Protoc Mol Biol. 2001;Chap 16:Unit16.17. 10.1002/0471142727.mb1607s28. [DOI] [PubMed]
- 33.Chen J-P, Gong J-S, Su C, Li H, Xu Z-H, Shi J-S. Improving the soluble expression of difficult-to-express proteins in prokaryotic expression system via protein engineering and synthetic biology strategies. Metab Eng. 2023;78:99–114. 10.1016/j.ymben.2023.05.007. [DOI] [PubMed] [Google Scholar]
- 34.Raran-Kurussi S, Waugh DS. Expression and purification of Recombinant proteins in Escherichia coli with a His(6) or dual His(6)-MBP Tag. Methods Mol Biol. 2017;1607:1–15. 10.1007/978-1-4939-7000-1_1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Prabhala SV, Gierach I, Wood DW. The evolution of Intein-Based affinity methods as reflected in 30 years of patent history. Front Mol Biosci. 2022;9:857566. 10.3389/fmolb.2022.857566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Prabhala SV, Mayone SA, Moody NM, Kanu CB, Wood DW. A convenient Self-Removing affinity Tag method for the simple purification of tagless Recombinant proteins. Curr Protoc. 2023;3(10):e901. 10.1002/cpz1.901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Lingg N, Kröß C, Engele P, Öhlknecht C, Köppl C, Fischer A, Lier B, Loibl J, Sprenger B, Liu J, et al. CASPON platform technology: ultrafast circularly permuted caspase-2 cleaves tagged fusion proteins before all 20 natural amino acids at the N-terminus. New Biotechnol. 2022;71:37–46. 10.1016/j.nbt.2022.07.002. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All of the data generated and used in this work are included in the manuscript and are available as supplementary material.





