Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2024 Oct 29;33(11):e5193. doi: 10.1002/pro.5193

Unique Fn3‐like biosensor in σI /anti‐σI factors for regulatory expression of major cellulosomal scaffoldins in Pseudobacteroides cellulosolvens

Sheng Dong 1,2,3,4, Chao Chen 1,2,3,4, Jie Li 1,2,3,4,9, Ya‐Jun Liu 1,2,3,4, Edward A Bayer 5,6, Raphael Lamed 7, Itzhak Mizrahi 6, Qiu Cui 1,2,3,4,8, Yingang Feng 1,2,3,4,
PMCID: PMC11520246  PMID: 39470320

Abstract

Lignocellulolytic clostridia employ multiple pairs of alternative σ/anti‐σ (SigI/RsgI) factors to regulate cellulosomal components for substrate‐specific degradation of cellulosic biomass. The current model has proposed that RsgIs use a sensor domain to bind specific extracellular lignocellulosic components and activate cognate SigIs to initiate expression of corresponding cellulosomal enzyme genes, while expression of scaffoldins can be initiated by several different SigIs. Pseudobacteroides cellulosolvens contains the most complex known cellulosome system and the highest number of SigI–RsgI regulons yet discovered. However, the function of many RsgI sensor domains and their relationship with the various enzyme types are not fully understood. Here, we report that RsgI4 from P. cellulosolvens employs a C‐terminal module that bears distant similarity to the fibronectin type III (Fn3) domain and serves as the sensor domain. Substrate‐binding analysis revealed that the Fn3‐like domain of RsgI4 represents a novel carbohydrate‐binding module (CBM) that binds to a wide range of polysaccharide types. Structure determination further revealed that the Fn3‐like domain belongs to the type B group of CBMs with a predicted concave face for substrate binding. Promoter sequence analysis of cellulosomal genes revealed that SigI4 is responsible for cellulosomal regulation of major scaffoldins rather than enzymes, consistent with the broad substrate specificity of the RsgI4 sensor domain. Notably, scaffoldins are invariably required as cellulosome components regardless of the substrate type. These findings suggest that the intricate cellulosome system of P. cellulosolvens comprises a more elaborate regulation mechanism than other bacteria and thus expands the paradigm of cellulosome regulation.

Keywords: carbohydrate‐binding module, cellulosome, NMR, regulation, X‐ray crystallography

1. INTRODUCTION

Lignocellulose is the most abundant renewable organic compound on Earth (Perez et al., 2002), and its conversion to soluble sugars is of global importance in terms of the production of biofuels and valuable chemicals (Gronenberg et al., 2013; Liu et al., 2019). Some cellulolytic bacteria produce large multi‐enzyme complexes (termed cellulosomes) for the efficient degradation of lignocellulose (Artzi et al., 2017; Bayer et al., 2008; Xiao et al., 2024). The composition of cellulosomes in anaerobic bacteria can change in response to the structural complexity and composition of the extracellular biomass substrates (Raman et al., 2009), and a group of alternative σI (SigI) and anti‐σI (RsgI) factors have been proposed to sense the extracellular substrates and to regulate the expression of the relevant cellulosomal genes (Kahel‐Raifer et al., 2010; Nataf et al., 2010; Ortiz de Ora et al., 2018) (Figure 1a). The mesophilic anaerobic bacterium Pseudobacteroides cellulosolvens, previously named Bacteroides cellulosolvens, produces the most extensive and intricate cellulosomal system known in nature (Zhivin et al., 2017; Zhivin‐Nissan et al., 2019), and its regulatory system is correspondingly elaborate. It contains 16 paralogous SigI factors which are twice as many as that of the well‐studied cellulolytic cellulosome‐producing bacterium, Clostridium thermocellum (Kahel‐Raifer et al., 2010; Ortiz de Ora et al., 2018) (Figure 1b).

FIGURE 1.

FIGURE 1

The σI (SigI) and anti‐σI (RsgI) factors for cellulosome regulation in Pseudobacteroides cellulosolvens. (a) Schematic illustration of the functional mechanism of SigI/RsgI‐based cellulosome regulation. In the absence of extracellular substrate, the intracellular N‐terminal domain of RsgI (RsgI_N) binds to SigI and inhibits its activity. When the surface‐positioned sensor domain of RsgI binds to a specific substrate, the signal will be transferred into the cell through the periplasmic domain and the transmembrane helix by a unique regulated intramembrane proteolysis (RIP) mechanism. The RsgI–bound SigI is then released and recruits RNA polymerase to transcribe a specific set of cellulosomal genes. (b) RsgIs and their sensor domains in P. cellulosolvens. The function of the Fn3‐like domain in RsgI4 is unknown and the domain comprises the research objective in this study.

SigI–RsgI regulons resemble extra‐cytoplasmic function (ECF) σ factors, but they have distinct sequences, unique domain structures, and multiple paralogues in single bacterial species (Ortiz de Ora et al., 2018; Staron et al., 2009; Wei et al., 2019). SigI factors represent a novel group of σ70 factors according to their structures and promoter recognition modes (Li et al., 2023), while their RsgI counterparts contain a conserved N‐terminal cytoplasmic domain as an inhibitory domain for their respective SigIs (Wei et al., 2019) as well as a novel periplasmic domain which undergoes essential autoproteolysis for transmembrane signal transduction (Brogan et al., 2023; Chen et al., 2023; Takayesu et al., 2024) (Figure 1a). The C‐terminal modular elements of many of the anti‐σI factors comprise non‐conserved carbohydrate‐binding modules (CBMs) or glycoside hydrolases (GHs), which have been proposed to be sensor domains specific for different types of extracellular polysaccharide substrates (Bahari et al., 2011; Grinberg et al., 2019; Mahoney et al., 2022; Nataf et al., 2010; Yaniv et al., 2014). In addition, although RsgI sensor domains were proposed to have polysaccharide‐binding functions (Ortiz de Ora et al., 2018), many of them lack sequence homology with any known CBM or GH domain. For P. cellulosolvens, only five of the 16 SigI–RsgI paralogues contain sensor domains of known CBM/GH families, including a recently characterized PA14 domain (Grinberg et al., 2019; Ortiz de Ora et al., 2018) (Figure 1b). According to a previous report, the RsgI4 in P. cellulosolvens (PcRsgI4) contains a fibronectin type III (Fn3) domain as the sensor domain (Grinberg et al., 2019). However, its presumed Fn3 exhibits very low sequence identity (<25%) to Fn3 sequences with known structures, and its position on a phylogenetic tree is displaced from those of confirmed Fn3 domains (Figure S1). Hence, we herein refer to this domain as an “Fn3‐like domain”, and PcRsgI4_Fn3′—the abbreviated form of P. cellulosolvens RsgI4—contains an apostrophe (′) to designate it as such. Since the polysaccharide‐binding function(s) of most Fn3 domains have not been experimentally established to our knowledge, it is therefore worth studying the structure and function of PcRsgI4_Fn3′, which may represent a novel CBM.

CBMs are often found as modular appendages, associated with many GHs, and they interact specifically with polysaccharides to promote substrate hydrolysis through a targeting effect that delivers the catalytic GH module to the immediate proximity of the specific target polysaccharide substrate (Boraston et al., 2004; Gilbert et al., 2013; Guillen et al., 2010). Because CBMs generally exhibit autonomous folding, high substrate‐binding specificity and diversity, and owing to the abundantly available substrates (such as cellulose) with excellent chemical and physical properties (such as the strength, rigidity and recalcitrance characteristics of cellulose), CBMs have wide applications in the development of biotechnology and are of great interest in molecular engineering (Armenta et al., 2017; Shoseyov et al., 2006).

Thousands of CBMs have now been identified experimentally and are currently divided into 103 families, based on their amino acid sequence similarities in the CAZy (Carbohydrate‐Active enZYme) database (http://www.cazy.org/, accessed in May 25, 2024) (Lombard et al., 2014). Furthermore, some known CBMs which are not associated to catalytic CAZy modules, such as lectins, are registered as CBM_nc (CBMs non‐classified in any family) in the CAZy database. CBMs have also been grouped into three types (type A, type B, and type C) according to their structural and functional similarity (Boraston et al., 2004; Gilbert et al., 2013). Type A CBMs have flat or platform‐like binding sites with the prevalence of aromatic amino acid residues, complementary to the flat surfaces presented by crystalline polysaccharides like cellulose and chitin. The carbohydrate‐binding sites of type B CBMs are extended, often described as grooves or clefts, and comprise several subsites able to accommodate the individual sugar chain internally. Type C CBMs bind to the termini of glycan chains, and thus lack the extended binding‐site grooves and are generally able to bind to small sugars, such as mono‐, di‐, or trisaccharides. The structures of individual CBMs are of great importance for understanding their mechanism of substrate specificity and function, particularly for members of novel CBM families (Abbott & van Bueren, 2014; Gilbert et al., 2013).

Previous studies have suggested that the polysaccharides recognized by the sensor domain of RsgI reflect the substrates of the cellulosomal enzymes regulated by the cognate SigI (Bahari et al., 2011; Kahel‐Raifer et al., 2010; Nataf et al., 2010; Ortiz de Ora et al., 2018). Ortiz de Ora et al. (2018) identified 140 putative σI‐dependent promoters in P. cellulosolvens, analyzed the function of downstream cellulosomal genes, and proposed, for example, that PcSigI11 is likely involved in the regulation of genes encoding pectin‐degrading enzymes. This contention is indeed consistent with the function of the pectin‐binding sensor domain of PcRsgI11 (Grinberg et al., 2019; Ortiz de Ora et al., 2018). However, the functions of other SigI–RsgI factors in P. cellulosolvens are largely unknown, especially those with sensor domains that have hitherto unknown functions. In this work, we re‐analyzed the previously‐identified SigI‐dependent promoters in P. cellulosolvens and found that one of them, PcSigI4, is responsible for the regulation of genes encoding major scaffoldins rather than those encoding catalyzing enzymes. Furthermore, the structural and functional analysis of its cognate RsgI sensor domain, PcRsgI4_Fn3′, revealed its role as a novel CBM with broad polysaccharide‐binding specificity. The implications of these findings in cellulosome regulation are discussed.

2. MATERIALS AND METHODS

2.1. Materials and strains

Avicel, chitosan, and xylan (from beechwood) were purchased from Sigma Chemical Co. (St. Louis, MO). Wheat arabinoxylan (insoluble form) was purchased from Megazyme International, Ltd. (Wicklow, Ireland). Phosphoric acid‐swollen cellulose (amorphous cellulose) was prepared according to Lamed et al. (1985). Chitosan‐oligosaccharide (MW 800–1000) was purchased from Shanghai Macklin Biochemical Technology Co., Ltd. Escherichia coli DH5α was used for plasmid constructions, and E. coli BL21 (DE3) was used for protein expression.

2.2. Sequence analysis

Transmembrane helix prediction was performed using the TMHMM server (Krogh et al., 2001). The domains of PcRsgI4 (gene locus: Bccel_2225) were predicted by searching in the NCBI Conserved Domain Database (CDD) (https://www.ncbi.nlm.nih.gov/cdd) (Marchler‐Bauer et al., 2017) and Pfam database (http://pfam.xfam.org/) (El‐Gebali et al., 2019). Domain linkers were predicted using the DLP‐SVM server (http://domserv.lab.tuat.ac.jp/dlpsvm.html) (Ebina et al., 2009). Secondary structures were predicted using the PSIPRED server (Jones, 1999). The similarity of promotors was analyzed by using a Perl script to count the number of identical nucleotides. DNA sequence logos were generated with the WebLogo server (Crooks et al., 2004).

2.3. Gene cloning, protein expression, and purification

The DNA fragments of PcRsgI4_Fn3′L and PcRsgI4_Fn3′ (residues 416–538 and 413–516 of full‐length PcRsgI4, respectively) were amplified separately by PCR, using the primers listed in Table S1, and genomic DNA of P. cellulosolvens was used as the template. The PcRsgI4_Fn3′L fragment was cloned into pET30a between the NdeI and XhoI restriction sites, and the encoded recombinant protein contained a C‐terminal His‐tag for protein purification. The PcRsgI4_Fn3′ fragment was cloned into the pET28a‐SMT3 plasmid between the BamHI and XhoI restriction sites, and the protein encoded in the construct contained an N‐terminal His6‐SMT3 tag which could be removed by Ulp1 protease cleavage. All constructs were verified by DNA sequencing and transformed into BL21 (DE3) to produce recombinant proteins. Mutants of PcRsgI4_Fn3′ were constructed by the QuikChange method using designed primers and appropriate templates (Table S1).

Cells harboring the recombinant plasmids were grown to an OD600 of 0.8 in LB broth containing 30 μg/mL kanamycin at 37°C with shaking at 200 rpm. The expression of the target gene was induced by adding 0.2 mM IPTG, and the cultivation was continued at 16°C with shaking at 100 rpm overnight. Recombinant proteins were purified using a Ni Sepharose High‐Performance HP resin (GE Healthcare), followed by gel filtration using a Superdex 75 column (GE Healthcare) according to the manufacturer's instructions. For proteins containing an N‐terminal His6‐SMT3 tag, samples were treated with Ulp1 protease for 3 h at 37°C before the gel filtration step. The purity of target proteins was assessed by Tricine–SDS–PAGE. Protein concentrations were determined by UV absorbance at 280 nm, according to their theoretical molar extinction coefficient as determined based on the amino acid composition of each protein using the ProtParam tool on the ExPASy server (https://web.expasy.org/protparam/). The purified target proteins were stored at −80°C for further use.

The 15N‐ and [15N, 13C]‐labeled proteins for NMR experiments were obtained by cell cultivation using M9 minimal medium containing 15NH4Cl and [13C]‐glucose as sole nitrogen and carbon sources, respectively. The labeled proteins were purified using the same procedures as the unlabeled proteins.

2.4. NMR spectroscopy and structural calculations

The recombinant PcRsgI4_Fn3′L protein containing a C‐terminal His‐tag was used in the NMR study. NMR samples contained ~1.0 mM PcRsgI4_Fn3′L protein in 50 mM sodium acetate buffer (pH 5.0) with 90% H2O/10% D2O (v/v), 0.02% (w/v) NaN3, and 0.01% (w/v) sodium 2,2‐dimethylsilapentane‐5‐sulfonate (DSS). All NMR experiments were performed at 310 K on a Bruker Avance III 600 MHz NMR spectrometer using a triple‐resonance TCI cryoprobe equipped with z‐gradient. Chemical shift assignments were derived from the 2D 1H–15N HSQC and 1H–13C HSQC, 3D 1H–13C–15N HNCACB, CBCA(CO)NH, HNCO, HN(CA)CO, HBHA(CBCA)(CO)NH, HBHA(CBCA)NH, H(C)CH‐TOCSY, and (H)CCH‐TOCSY spectra. The NOESY spectra for distance restraints of the structure calculation were obtained from the 3D 15N‐ and 13C‐edited NOESY‐HSQC spectra (mixing time 120 ms). All the spectra were processed using NMRPipe (Delaglio et al., 1995) and analyzed using NMRViewJ (Johnson, 2004). The backbone chemical shift assignments were obtained using the program MARS (Jung & Zweckstetter, 2004) with manual verification. Side chain assignments were obtained manually in NMRViewJ (Johnson, 2004). Chemical shifts were referenced according to IUPAC recommendations using the internal DSS (Markley et al., 1998). Backbone 15N T1, T2, and 1H–15N steady‐state heteronuclear NOE experiments were performed using standard pulse programs (Feng et al., 2014), and the 15N relaxation rate constants R 1 and R 2 and the 1H–15N heteronuclear NOE values were calculated using NMRViewJ (Johnson, 2004).

Initial PcRsgI4_Fn3′L structures were calculated using the CANDID module of the CYANA program (Herrmann et al., 2002) with the NOE peak lists and the chemical shift‐derived dihedral angle restraints from the program TALOS‐N (Shen & Bax, 2015). The PcRsgI4_Fn3′L structures were refined using the CNS program (Brünger et al., 1998) and scripts from RECOORDScript (Nederveen et al., 2005), with the distance restraints derived by the semi‐automatic assignment program SANE (Duggan et al., 2001) and the dihedral angle restraints. Hydrogen bond restraints were also introduced according to the secondary structure elements in the late stage of the refinement. A family of 100 structures was generated by CNS, and 50 structures with the lowest energies were subjected to the refinement in explicit water using CNS and RECOORDScript, from which a final set of 20 structures with the lowest energies was selected to represent the final ensemble of PcRsgI4_Fn3′L structures. The final structures were analyzed using PROCHECK_NMR (Laskowski et al., 1996), MOLMOL (Koradi et al., 1996), PyMol (DeLano Scientific LLC; http://www.pymol.org), and WHAT_CHECK (Hooft et al., 1996). PyMol was used for the visualization of the structures. The structural similarity was analyzed using the Dali (Holm & Rosenstrom, 2010) and SSM servers (Krissinel & Henrick, 2004).

2.5. Crystallization, data collection, structure determination, and refinement

The recombinant PcRsgI4_Fn3′L and PcRsgI4_Fn3′ proteins were subjected to crystallization. The PcRsgI4_Fn3′L (~30 mg/mL) and PcRsgI4_Fn3′ protein (~43 mg/mL) in the buffer (10 mM Tris–HCl, 100 mM NaCl) were crystallized at 18°C by using commercial high‐throughput screening kits (Hampton Research) and a Formulatrix NT8 liquid handling robot operating with 96‐well plates. The crystallization conditions were further optimized in 24‐well crystallization plates. The PcRsgI4_Fn3′L crystals used for X‐ray data collection were obtained in 0.1 M phosphate‐citrate (pH 4.2), 0.2 M lithium sulfate, and 19% PEG1000. The PcRsgI4_Fn3′ crystals were obtained in 0.2 M ammonium acetate, 0.1 M Bis–Tris (pH 6.5) and 25% PEG3350. All the diffraction‐quality crystals were cryoprotected by soaking in a well solution, supplemented with 30% (v/v) glycerol for 10 s, and then flash‐cooled to 100 K in liquid nitrogen.

The data were collected at the Shanghai Synchrotron Radiation Facility (SSRF), Beamlines BL17U1 and BL19U1, in a 100 K nitrogen stream (Wang et al., 2018; Zhang et al., 2019). Data indexing, integration, and scaling were conducted using XDS (Kabsch, 2010). The crystal structures of PcRsgI4_Fn3′L and PcRsgI4_Fn3′ were determined by molecular replacement using Phaser (Mccoy et al., 2007) and the solution structure of PcRsgI4_Fn3′L as the search model. Refinements of the structures were performed iteratively, using the programs COOT (Emsley et al., 2010) and PHENIX (Adams et al., 2010). All molecular graphics were created using PyMOL.

2.6. Structure deposition

The solution structures and the chemical shift assignments of PcRsgI4_Fn3′L have been deposited into the Protein Data Bank and the BioMagResBank under accession numbers 7CG1 and 36358, respectively. The crystal structures of PcRsgI4_Fn3′L and PcRsgI4_Fn3′ have been deposited into the Protein Data Bank under accession numbers 7CG5 and 7CG8, respectively.

2.7. Polysaccharide binding assay

Qualitative assessment of binding to the insoluble polysaccharides was achieved using Avicel, amorphous cellulose, chitosan, insoluble arabinoxylan, and xylan. Insoluble xylan was pretreated as described previously (Moraïs et al., 2010). Proteins (40 μg) were mixed with 50 mg insoluble polysaccharides in 80 μL of 50 mM Tris–HCl buffer (pH 7) and incubated on ice for 5 h with occasional stirring. After centrifugation at 15,000 × g, 4°C for 10 min, supernatant fluids (unbound proteins) were collected, and pellets were washed twice with 1 mL Tris–HCl buffer (pH 7) to reduce nonspecific binding. Proteins that bound to polysaccharides were then boiled with sample buffer (80 μL containing SDS and DTT) for 10 min to dissociate the bound protein from the pellets. Bound and unbound fractions were analyzed by Tricine–SDS–PAGE. Controls with proteins without polysaccharides were included to ensure that precipitation did not occur during the assay.

3. RESULTS

3.1. Sequence analysis of PcRsgI4

The PcRsgI4 protein, encoded by the gene Bccel_2225, contains 597 amino acid residues and includes an intracellular N‐terminal RsgI_N domain (residues 1–58), a transmembrane helix (residues 59–82), a periplasmic domain (residues 83–275), a disordered linker (residues 276–414), a fibronectin type III‐like (Fn3‐like) domain (residues 416–516), and a disordered C‐terminal region (residues 517–597) (Figures 2a and S2). According to previous functional studies of RsgI (Bahari et al., 2011; Kahel‐Raifer et al., 2010; Munoz‐Gutierrez et al., 2016; Nataf et al., 2010; Yaniv et al., 2014), the C‐terminal Fn3‐like domain of PcRsgI4 (PcRsgI4_Fn3′) was considered the potential sensor domain for binding carbohydrate components of lignocellulose.

FIGURE 2.

FIGURE 2

Sequence analysis of PcRsgI4. (a) Schematic diagram of the domain organization of PcRsgI4. The positions of the two constructs of the Fn3‐like domain used in this study are also indicated. (b) Secondary structures of PcRsgI4_Fn3′. The results of secondary structure prediction using the PSIPRED server (Jones, 1999) are shown as black arrows. The secondary structures according to the crystal and NMR structures of PcRsgI4_Fn3′L and PcRsgI4_Fn3′ are shown as colored arrows, where the strands in magenta and cyan form the two respective β‐sheets of the β‐sandwich structures. The start and end points for the constructs are indicated by small arrows.

We identified several proteins with homologous sequences of PcRsgI4_Fn3′ using the BLAST search in the NCBI non‐redundant protein sequences database. However, the sequence identity of PcRsgI4_Fn3′ to these proteins is low (<30%), and all of the proteins were annotated as either hypothetical proteins or predicted proteins that have no relationship with cellulases or CBMs, indicating the unique properties of PcRsgI4_Fn3′ among known proteins. According to the secondary structure prediction by the PSIPRED server, PcRsgI4_Fn3′ is rich in β‐strands (Figures 2b and S2), which is in agreement with known Fn3 structures. However, we did not identify any homology with the known protein structures using a BLAST search against the PDB database. These results indicated that PcRsgI4_Fn3′ represents a unique Fn3‐like domain with a potential carbohydrate‐binding function that warrants further elucidation.

Notably, PcRsgI4 contains two additional predicted β‐strands (residues 521–526, 534–538) after the C‐terminus of the predicted Fn3‐like domain (Figure 2b). Therefore, we constructed two fragments PcRsgI4_Fn3′L and PcRsgI4_Fn3′, containing residues 416–538 and 413–516 of PcRsgI4 (Figure 2a), respectively, for the following structure and function study.

3.2. Binding of PcRsgI4_Fn3′ to polysaccharides

During protein purification, we found that the retention volume of PcRsgI4_Fn3′ and PcRsgI4_Fn3′L in gel filtration chromatography was much larger than the expected volume (about 19 mL on a 24‐mL Superose 6 column and 96 mL on a 110‐mL Superdex 75 column) according to the deduced molecular weight. The retention volume on the Superose 6 column is close to the column volume (CV) and that on the Superdex 75 column is even much greater than one CV (Figures 3a,b and S3a). In addition, the elution peak of the PcRsgI4_Fn3′ fractions was broader than that anticipated for a homogeneous monomeric species (Figures 3a and S3a). One potential rationale for the unusual retention volume and the broad elution peak is the protein's retarded retention on the column due interaction with the polysaccharide‐based column resin (cross‐linked agarose and dextran in the Superdex 75 column and cross‐linked porous agarose in the Superose 6 column), which is consistent with the proposed polysaccharide‐binding function of PcRsgI4_Fn3′. To corroborate the carbohydrate‐binding function of PcRsgI4_Fn3′, we tested its binding capacity to different insoluble polysaccharides, including crystalline cellulose (Avicel), amorphous cellulose, chitosan, arabinoxylan, and xylan from beechwood. As shown in Figure 3c, PcRsgI4_Fn3′ exhibited extensive binding to chitosan and arabinoxylan, medium binding to amorphous cellulose and xylan, and weak binding to Avicel (Figure 3c). PcRsgI4_Fn3′L also showed a carbohydrate‐binding profile similar to that of PcRsgI4_Fn3′ (Figure S3c). The broad polysaccharide‐binding profile of PcRsgI4_Fn3′ is similar to many type B CBMs that exhibit weak binding to crystalline cellulose but strong binding to various glycan chains (Boraston et al., 2004). Since PcRsgI4_Fn3′ does not show significant sequence identity to any known CBMs in a BLAST search, these data indicated that PcRsgI4_Fn3′ is a novel CBM with broad polysaccharide‐binding specificity.

FIGURE 3.

FIGURE 3

Binding of PcRsgI4_Fn3′ to polysaccharides. (a) Gel filtration chromatography of SMT3‐tagged PcRsgI4_Fn3′ after ULP1 protease treatment. A Superose 6 gel filtration column with a column volume (CV) of 24 mL was used. (b) Tricine–SDS–PAGE of samples during purification. M represents protein molecular weight standards; E and E + Ulp1 are the samples before and after ULP1 protease treatment; the other lanes were labeled according to the eluted fraction numbers indicated in Panel A. (c) Binding of PcRsgI4_Fn3′ to insoluble polysaccharides. BSA was used as a negative control. The same amount of protein used in the binding assay but without polysaccharide was included as a control. The bound (B) and unbound (U) fractions were analyzed by Tricine–SDS–PAGE.

3.3. Structure of PcRsgI4_Fn3′

We first solved the solution structure of PcRsgI4_Fn3′L (RMSD 0.38 ± 0.06 Å for backbone atoms of all residues) using NMR spectroscopy (Table 1, Figure 4a). Crystal structures of both the extended and core FN3 derivatives, PcRsgI4_Fn3′L (2.85 Å) and PcRsgI4_Fn3′ (1.50 Å), were then obtained by molecular replacement using the NMR structure as a phase search model (Table 2 and Figure 4b,c). The crystal structures of PcRsgI4_Fn3′L and PcRsgI4_Fn3′ showed one and four molecules in an asymmetric unit, respectively. Two molecules of polyethylene glycol (PEG, a component of the crystallization buffer) were observed in the structure of PcRsgI4_Fn3′ (Figure 4c). The packing interfaces in the two crystal structures are completely different, and NMR relaxation experiments revealed that PcRsgI4_Fn3′L is largely monomeric in the solution state (Figure S4), so the functional state of PcRsgI4_Fn3′ is also likely a monomer. The crystal and NMR structures are essentially the same, except for some flexible loop regions (most visible in the loop between β2 and β3) and the margins of the β‐strands (Figures 4d and S5). The C‐terminal region (residues 517–538) of PcRsgI4_Fn3′L is disordered in both the NMR and crystal structures, which is different from the results of the secondary structure prediction (Figure 3b). Without the C‐terminal disordered region, the crystal structure of PcRsgI4_Fn3′ was obtained at high resolution (1.50 Å). PcRsgI4_Fn3′ adopted a typical β‐sandwich consisting of two β‐sheets, wherein one is composed of three antiparallel β‐strands, β1, β2, and β5, and the other is formed by four antiparallel β‐strands, β3, β4, β6, and β7 (Figure 4b,d). The crystal structure of PcRsgI4_Fn3′L has an additional β‐strand (β7′), which forms a loop in the PcRsgI4_Fn3′ structure, owing to the binding of PEG (Figure 4b,d). PcRsgI4_Fn3′/Fn3′L has an extended cleft (approximately 15 Å) on the second β‐sheet (Figure 4b), which comprises several polar and aromatic amino acid residues. In the crystal structure of PcRsgI4_Fn3′, two PEG molecules are bound in the clefts of two adjacent PcRsgI4_Fn3′ molecules (Figure 4c).

TABLE 1.

The experimental restraints and structural statistics for the solution NMR structure of PcRsgI4_Fn3′L.

Parameter PcRsgI4_Fn3′L
PDB number 7CG1
NOE restraints
Intra‐residues 838
Sequential 582
Medium‐range 181
Long‐range 693
Ambiguous 1039
Total 3333
Number per residue 25.25 (132 a.a.)
Hydrogen bond restraints 74
Torsion angle restraints
Phi angle restraints 81
Psi angle restraints 81
Chi angle restraints 56
Violations
Max. NOE violation (Å) 0.177
Max. torsion angle violation (°) 3.73
RMSD from mean structure (Å)
Residues in regular secondary structure  a
Backbone heavy atoms 0.26 ± 0.05
All heavy atoms 0.54 ± 0.05
All residues b
Backbone heavy atoms 0.38 ± 0.06
All heavy atoms 0.75 ± 0.08
Ramachandran statistics b
Most favored region (%) 87.0
Additionally allowed (%) 12.9
Generously allowed (%) 0.2
Disallowed (%) 0.0
WHAT_CHECK Z‐scores c
1st generation packing quality −1.737
2nd generation packing quality −1.711
Ramachandran plot appearance −3.544
chi‐1/chi‐2 rotamer normality −2.583
Backbone conformation −2.067
Inside/outside distribution 1.101
a

The residues in regular secondary structure include 3–10, 17–25, 37–45, 53–60, 65–71, 79–86, 90–91, and 95–98 of PcRsgI4_Fn3′L.

b

Residues 2–27 and 37–102 are included and the flexible regions are excluded.

c

For the Z‐scores, more positive values are better.

FIGURE 4.

FIGURE 4

Overall structure of PcRsgI4_Fn3′. (a) Cartoon representation of the ensemble of 20 PcRsgI4_Fn3′L solution structures. The structures are colored in blue‐to‐red rainbow from the N‐terminus to the C‐terminus, except that the C‐terminal disordered residues (only six residues shown for clarity) were colored in gray. (b) Crystal structure of PcRsgI4_Fn3′L in a crystal asymmetric unit. (c) Crystal structure of PcRsgI4_Fn3′ in a crystal asymmetric unit. PEG molecules are shown as blue sticks. (d) Superimposition of four PcRsgI4_Fn3′ molecules in an asymmetric unit and PcRsgI4_Fn3′L in the crystal structures. (e) Structural alignments of PcRsgI4_Fn3′L and some proteins exhibiting similarity, as identified by the SSM server. PcRsgI4_Fn3′L is shown in marine blue while other proteins are shown in various colors. The substrate‐binding residues and the bound carbohydrates are shown as sticks. The Z‐scores and RMSDs reported by the SSM server are labeled.

TABLE 2.

Crystallographic data collection and refinement statistics of PcRsgI4_Fn3′L and PcRsgI4_Fn3′.

Parameter PcRsgI4_Fn3′L PcRsgI4_Fn3′
PDB number 7CG5 7CG8
Data collection
Space group R 32:h P 21 21 21
a, b, c (Å) 71.23, 71.23, 127.37 54.59, 71.05, 100.44
α, β, γ 90, 90, 120 90, 90, 90
Wavelength (Å) 0.97849 0.979183
Resolution (Å) 44.31–2.85 (2.693–2.85) a 47.96–1.50 (1.54–1.50) a
Unique reflections 5638 (401) a 120,016 (8817) a
Completeness (%) 100.0 (100.0) a 99.3 (98.6) a
Redundancy 10.1 (10.2) a 3.5 (3.3) a
Mean I/σ(I) 9.12 (2.58) a 11.44 (1.80) a
R meas 0.174 (0.672) 0.074 (0.837)
CC(1/2) 0.995 (0.971) 0.998 (0.664)
Refinement
R work /R free (%) 24.06/28.59 15.99/18.79
MolProbity clash score 3.23 2.40
No. of non‐H atoms
Protein 766 3204
Ligand 0 107
Water 2 580
B‐factors
Average B‐factor 66.64 22.13
Proteins 66.70 20.40
Ligands 26.91
Solvent 42.90 30.84
RMSD
Bond length (Å) 0.007 0.008
Bond angles 0.94 0.90
Ramachandran statistics
Favored (%) 97.92 97.78
Allowed (%) 2.08 2.22
Outliers (%) 0.00 0.00
a

Numbers in parentheses refer to data in the highest resolution shell.

Because β‐sandwich structures widely exist in many proteins, the search using Dali or SSM server identified hundreds of structures with significant similarity for PcRsgI4_Fn3′. Most structures with high scores are Fn3‐domain‐containing proteins, indicating that PcRsgI4_Fn3′ is indeed a closely related domain. The structure with the highest score is the structure of a human cell‐surface receptor Fn3 domain (PDB 3S8W) (Thomas et al., 2011), but the protein and PcRsgI4_Fn3′ share very low (~15%) sequence identity. Several carbohydrate‐related proteins were identified in the search, including a chitinase Fn3 domain (Jee et al., 2002), a non‐cellulosomal cohesin of a family 84 glycoside hydrolase from Clostridium perfringens (Ficko‐Blean et al., 2009), and several CBMs (Boraston et al., 2006; Hettle et al., 2017; Montanier et al., 2010; Sauve et al., 2007; Venditto et al., 2016) (Figure 4e). Although all of these proteins have β‐sandwich structures, the length, position, and number of β‐strands vary to some extent. The residues of the substrate‐binding sites in the CBMs are completely different and are not conserved in PcRsgI4_Fn3′. In known type B CBMs with β‐sandwich structures (Abbott & van Bueren, 2014), there are two major substrate‐binding sites: the variable loop site (VLS) and the concave face site (CFS). VLS is located at one end of the β‐sandwich (such as CBM60 in Figure 4e) while CFS is located at the center of one β‐sheet (such as CBM25, CBM41, and CBM80 in Figure 4e). The cleft bound with PEG in the PcRsgI4_Fn3′ structure is also located at the position corresponding to CFS of CBMs, so we speculate that the cleft of PcRsgI4_Fn3′ is the substrate‐binding site.

3.4. Critical amino acid residues in ligand binding

To determine the key amino acid residues in polysaccharide binding, we performed NMR titration experiments using xylose, glucose, fructose, cellobiose, cellotetraose, and chitosan‐oligosaccharide (MW 800–1000). No chemical shift perturbation was observed in the 1H–15N‐HSQC spectra of PcRsgI4_Fn3′ during the NMR titration experiments (data not shown). This suggested that PcRsgI4_Fn3′ probably binds only glycan chains or longer oligosaccharides. Interestingly, two PEG molecules with an unambiguous electron density map occupy the potential carbohydrate‐binding site in the crystal structure of PcRsgI4_Fn3′ (Figures 4c and 5a). Analysis of the PEG‐binding site in the PcRsgI4_Fn3′ structure revealed that K497 forms hydrogen bonds with three oxygen atoms of PEG (with distances approximately 2.9–3.4 Å) and the aromatic rings of W458, F502, and Y503 form CH/π interactions with PEG (Figure 5b). Furthermore, several water molecules form bridging hydrogen bonds between PEG and residues Q495 and S505. We speculate that these PEG‐binding residues are potential polysaccharide‐binding sites.

FIGURE 5.

FIGURE 5

Analysis of the important amino acid residues in the polysaccharide‐binding cleft of PcRsgI4_Fn3′. (a) A PEG molecule bound at the proposed binding cleft of PcRsgI4_Fn3′. The 2mF o −DF c density for the PEG molecule was shown as a mesh at the 1 − σ level. (b) Analysis of the amino acid residues in the PEG‐binding site of PcRsgI4_Fn3′. Residues from two protein molecules A and B are shown as green sticks and cyan sticks, respectively. Water molecules are shown in red balls. Black and red dashed lines denote the hydrogen bonds and CH/π interactions, respectively. Distances between involved atoms are shown in angstroms. The right panels are the re‐orientated plot of two regions for clarity. (c) Gel filtration analysis of the PcRsgI4_Fn3′ and its mutants. A Superose 6 gel filtration column with a column volume (CV) of 24 mL was used in these experiments. (d) Binding of PcRsgI4_Fn3′ mutants to various insoluble polysaccharides. The same amount of protein without polysaccharide was included as a control. The bound (B) and unbound (U) fractions were analyzed by Tricine–SDS–PAGE. M represents the protein molecular weight standard and the displayed band is 14.4 kDa.

Alanine substitutions were introduced to the potential polysaccharide‐binding residues N452, W458, Q495, K497, F502, and Y503 to examine their effects on polysaccharide binding. The mutant W458A was expressed in insoluble inclusion bodies, which suggested that the side chain of W458 is critical for protein folding. We then constructed mutants W458L, W458F, and W458Y, in which W458L was still insoluble while W458F and W458Y were soluble, indicating the indispensability of the aromatic ring for folding. The matrix‐binding ability of the mutants was revealed by the change of retention volumes in gel filtration. Most mutants had smaller retention volumes on the Superose 6 gel filtration column than that of the wild‐type protein except for N452A which showed the same retention volume as that of the wild‐type protein (Figure 5c). This result demonstrated different levels of reduction in their binding capabilities to the gel matrix. The binding abilities of PcRsgI4_Fn3′ mutants to specific insoluble polysaccharides were further determined (Figure 5d). All of the mutants showed a reduced binding ability to polysaccharides, especially for mutants F502A and Y503A which existed mainly in the unbound fractions. These results demonstrated that the cleft is the carbohydrate‐binding site and the aromatic residues are critical for the carbohydrate‐binding. The non‐aromatic hydrophilic residues in the cleft also contribute to the binding affinity to different substrates. Furthermore, K497A showed significant weakening of xylan binding but only a slight change in the chitosan and arabinoxylan binding, suggesting that the hydrophilic residues may play different roles in the binding of various substrates.

3.5. The PcRsgI4/PcSigI4 complex is responsible for regulating the expression of scaffoldins

According to previous studies (Bahari et al., 2011; Kahel‐Raifer et al., 2010; Munoz‐Gutierrez et al., 2016; Nataf et al., 2010; Ortiz de Ora et al., 2018; Wei et al., 2019), both interactions between SigI and RsgI and between SigI and its target promoters are specific. The substrate recognized by the RsgI sensor domain comprises a consistent substrate type catalyzed by the enzymes under the transcriptional control of the cognate SigI, although some crosstalk may occur (Bahari et al., 2011; Grinberg et al., 2019; Kahel‐Raifer et al., 2010; Munoz‐Gutierrez et al., 2016; Nataf et al., 2010; Ortiz de Ora et al., 2018; Yaniv et al., 2014). Therefore, we were curious about what genes are regulated by PcSigI4 when PcRsgI4 can recognize various types of polysaccharides. We therefore re‐analyzed the cellulosomal promoters in P. cellulosolvens, reported in a previous study (Ortiz de Ora et al., 2018). The promoter sequences were sorted according to their similarity (number of identical nucleotides) to the promoter sequences of the PcSigI genes (Table S2) except those of PcSigI5, PcSigI 7, PcSigI 10, and three SigI–RsgI fused proteins whose promoters were not identified before. The results indicated that, with the promoters of PcSigI1 and PcSigI11 as the reference, the top‐ranked sequences include the promoters of xylanases and a pectin esterase, respectively. These results are in agreement with previous analyses, which indicated that the sensor domains of PcRsgI1 and PcRsgI11 bind to xylan and pectin, respectively, although both RsgIs contain PA14 domains as the sensor domain (Zhivin‐Nissan et al., 2019). This suggested the close relationship between the function of the RsgI sensor domain and the regulated targets of cognate SigI and also demonstrated the validity of our analysis. Interestingly, several of the top‐ranked sequences with high similarity to the PcSigI4 promoter were promoters of genes that encode major scaffoldins (Figure 6), including (i) the largest primary scaffoldin ScaA1, (ii) the largest cell‐free secondary scaffoldin ScaE, (iii) the largest cell‐free primary scaffoldin ScaM1, and (iv) a primary scaffoldin ScaL2. It is uncanny to note that the cohesin types of P. cellulosolvens are switched as opposed to all other clostridial species, such that the type II cohesins appear in the primary scaffoldins whereas the type I cohesins occur in the secondary scaffoldins (Ding et al., 2000; Xu et al., 2004). Interestingly, another large abundant primary scaffoldin ScaA2 (Bccel_1393) was not identified here, because no σI‐dependent promoter was identified for this gene (Ortiz de Ora et al., 2018). Scaffoldins ScaA1, ScaE, and ScaL2 are the most abundant scaffoldins, according to proteomic analysis of the P. cellulosolvens cellulosomal components (Zhivin‐Nissan et al., 2019). Therefore, the major scaffoldins in P. cellulosolvens are likely regulated by PcSigI4. The consensus sequence of the scaffoldin promoters contains a conserved CGTT motif in the −10 region and a conserved A at the 5′ of the −35 region, thus distinguishing it from consensus sequences recognized by other PcSigIs (Table S2).

FIGURE 6.

FIGURE 6

Putative σI4‐dependent promoters in P. cellulosolvens. The promoters are named after the gene locus and the serial number of the promoter (0 and 1 stand for the first and second σI‐dependent promoters of the gene, respectively). The column Id(s) is the number of nucleotides identical to the promoter of σI4 in the −10 and −35 regions (nucleotides in bold in the column sequence). The column Id(a) is the number of nucleotides identical to the promoter of σI4 in all nucleotides in the column sequence.

In summary, by analyzing the similarity between the promoter of PcSigI4 and the promoters of cellulosomal genes, we found that the promoters of the major scaffoldins share the highest similarity to that of PcSigI4, which suggested that the PcSigI4/RsgI4 pair is instrumental in the regulation of the expression of these major scaffoldins. To obtain high efficiency of lignocellulose degradation, the major scaffoldins would be expected to be highly expressed, in a manner independent of the type of extracellular type of lignocellulose. Indeed, we observed broad substrate specificity of the presumed biosensor, PcRsgI4_Fn3′, which would be in agreement with the regulatory function of PcSigI4 towards scaffoldins rather than cellulosomal enzymes.

4. DISCUSSION

Our previous studies have identified the promoters of many SigI and cellulosomal genes in P. cellulosolvens (Ortiz de Ora et al., 2018). In the current study, we have concentrated on SigI4 and its counterpart RsgI4, focusing on the structural characteristics of the anti‐σI sensor and the specificity of SigI4 for the promoters of the various genes that are potentially activated by this system. Our structural analyses indicate that the C‐terminal sensor domain of PcRsgI4 is an Fn3‐like domain with a substrate‐binding cleft. PcRsgI4_Fn3′ can bind various polysaccharides, as demonstrated by the binding assays and mutagenesis analysis, and the novel Fn3‐like domain thus represents a new type of CBM with broad polysaccharide‐binding specificity. The cellulosomal gene promoters recognized by PcSigI4 are elucidated by the similarity with its own promoter, and the results show that the major scaffoldins are the majority regulated by this σ factor. The recognition of various polysaccharides by PcRsgI4_Fn3′ and the regulation of scaffoldins by PcSigI4 are consensus, implying the functional co‐evolution of SigI and RsgI.

The investigation of PcSigI4/PcRsgI4 function expands the previously proposed model, in which each SigI–RsgI pair regulates a subset of cellulosomal enzymes that are involved in the degradation of a specific type of substrate, whereas the scaffoldins are regulated by multiple SigI–RsgI pairs (Kahel‐Raifer et al., 2010; Ortiz de Ora et al., 2017, 2018). Here, we propose that the major cellulosomal scaffoldins in P. cellulosolvens are mainly regulated by a SigI–RsgI pair (Figure 7). Our results illuminate the ecology of P. cellulosolvens and its ability to adapt to its environment. Isolated from sewage sludge, which potentially contains multiple sugar polymers, P. cellulosolvens demonstrates a remarkable capacity to activate the transcription of its major scaffolding proteins in response to various polysaccharides. These scaffolding proteins facilitate the docking of specific enzymes transcribed and tailored to the relevant polymers, providing functional flexibility and enhancing combinatorial potential. This ensures that the scaffolding is present under different conditions, with polymer specificity provided by the corresponding enzymes. This showcases high molecular efficiency and enables a portfolio effect (Schindler et al., 2015) where diverse enzymes are deployed in response to varying polymer conditions, each having an appropriate scaffolding to dock into, thereby maximizing the organism's adaptive capacity. This does not exclude the possibility that the scaffoldins could be weakly regulated by other SigI–RsgI pairs, as the promoter of ScaA1 is also in the promoter list recognized by SigI8 and SigI12 but of relatively lower rank (Table S2). In the newly proposed model, further studies are still needed to elucidate the overall regulation paradigm, because the sensor domains of 10 RsgIs are still of unknown function, and the promoter sequences of 6 SigIs have not been identified.

FIGURE 7.

FIGURE 7

Currently proposed model of SigI–RsgI regulons for cellulosome regulation in P. cellulosolvens. The previous model (Kahel‐Raifer et al., 2010) proposed that each RsgI bears CBM(s) that recognize(s) a specific type of polysaccharide, and the cognate SigI initiates the transcription of a specific subset of cellulosomal CAZyme genes for degradation of the target polysaccharide. The present work expands the model, whereby the Fn3‐like domain of RsgI4 recognizes broad types of polysaccharides, and the cognate SigI4 initiates the transcription of major scaffoldin genes. There are still 10 SigI–RsgI regulons of unknown function, which require further study and might further expand the model.

The function of the PcSigI4/RsgI4 regulon is unique to the known SigI–RsgI factors. Previous phylogenetic analysis indicated that PcSigI4 is in the same clade as CtSigI2 (SigI2 from C. thermocellum) (Grinberg et al., 2019). However, unlike the broad substrate‐binding specificity of PcRsgI4_Fn3′, the sensor domain of CtRsgI2 is a CBM3 domain and mainly binds to cellulose (Yaniv et al., 2014). Additionally, CtRsgI2 was proven not to recognize the promoters of the major scaffoldin genes CipA and SdbA in C. thermocellum according to previous studies based on a Bacillus subtilis heterologous expression system (Ortiz de Ora et al., 2017, 2018). Therefore, the function of the PcSigI4/RsgI4 regulon is different from those of the other regulons in the clade. The regulation of scaffoldins in P. cellulosolvens is likely very different from that in C. thermocellum, which is supported by the previous finding that C. thermocellum shares similar regulation of the major scaffoldin CipA with several other bacterial species that produce a complex cellulosome system, except for P. cellulosolvens (Ortiz de Ora et al., 2017). Therefore, P. cellulosolvens appears to be unique in cellulosome‐producing bacteria, not only due to its reversed type I and II assembly in hierarchical cellulosomes (Zhivin et al., 2017) but also to the distinct regulation of its scaffoldins.

RsgI sensor domains are generally CBMs or functional equivalents, such as GHs, PA14 domains, and the peptidase‐ and NTF2‐like domains that recognize and bind to various substrates (Bahari et al., 2011; Grinberg et al., 2019; Mahoney et al., 2022; Nataf et al., 2010; Ortiz de Ora et al., 2018; Yaniv et al., 2014). Our study confirms that the Fn3‐like domain of PcRsgI4 also functions as a sensor domain for polysaccharide binding with broad specificity. Fn3 domains and related protein domains, known as small globular protein domains of 90–100 amino acids, have been found in thousands of proteins from bacteria to humans (Bloom & Calabro, 2009; Bork & Doolittle, 1992; Little et al., 1994). Nature has already exploited the functional plasticity of the Fn3 fold for a variety of roles, such as ligand‐binding modules, compact forms of peptide linkers or spacers between other domains, or protein domains that assist in maintaining the solubility of large enzyme complexes (Alahuhta et al., 2010). Although Fn3s and related domains have been discovered in many bacterial glycoside hydrolases (Alahuhta et al., 2010; Ficko‐Blean et al., 2009; Kataeva et al., 2002; Little et al., 1994), their capacity to serve as carbohydrate‐binding elements is not generally known. Therefore, PcRsgI4_Fn3′ as a specialized Fn3‐like domain, with very low sequence identity to other proteins, represents a novel CBM with sensor and regulatory roles. Since only CBMs which appear associated to catalytic CAZy modules can be considered to form a new CBM family in the CAZy database, PcRsgI4_Fn3′ currently does not meet this criterion and should therefore be classified into the CBM_nc group in the CAZy database.

Besides RsgI4, the set of SigI–RsgI mega‐regulons of P. cellulosolvens includes several novel sensor domains which represent either domains of unknown function or domains unrelated to polysaccharide‐binding functions (Ortiz de Ora et al., 2018). SigI–RsgI regulons widely exist in cellulosome‐producing bacteria, notably Acetivibrio cellulolyticus, Clostridium clariflavum, Clostridium straminisolvens, Clostridium thermocellum, and of course Pseudobacteroides (Bacteroides) cellulosolvens, many of which contain domains of unknown function at the C‐terminus of the respective RsgI factor (Grinberg et al., 2019). Promoters of many SigI–RsgI operons and cellulosomal genes in P. cellulosolvens cannot be identified using the currently known SigI‐dependent promoter motif (Ortiz de Ora et al., 2018). For example, no SigI‐dependent promoter was identified for the genes of ScaA2 and two abundant mini‐scaffoldins ScaF1 and ScaH2—each of which contains only one cohesin module (Ortiz de Ora et al., 2018; Zhivin‐Nissan et al., 2019). Further research should be carried out that combines the functional analyses of sensor domains and the target genes of corresponding SigI–RsgI regulons, which will provide a comprehensive understanding of the intricate regulation system of cellulosomes, and eventually will promote the application of cellulosomes and cellulosome‐producing bacteria in biofuel production and biotechnology development.

AUTHOR CONTRIBUTIONS

Sheng Dong: Investigation; writing – original draft; visualization; funding acquisition. Chao Chen: Investigation; visualization; writing – original draft. Jie Li: Investigation; visualization. Ya‐Jun Liu: Writing – review and editing; funding acquisition. Edward A. Bayer: Writing – review and editing. Raphael Lamed: Writing – review and editing. Itzhak Mizrahi: Writing – review and editing; funding acquisition. Qiu Cui: Writing – review and editing; funding acquisition. Yingang Feng: Conceptualization; investigation; visualization; writing – review and editing; funding acquisition.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

Supporting information

APPENDIX S1: Supporting information.

PRO-33-e5193-s001.pdf (2.1MB, pdf)

ACKNOWLEDGMENTS

We thank the staffs at the BL17U1 beamline at the Shanghai Synchrotron Radiation Facility (SSRF) and the staffs at the BL19U1 beamline of SSRF of the National Facility for Protein Science in Shanghai (NFPS), Shanghai Advanced Research Institute, Chinese Academy of Sciences, for X‐ray diffraction data collection. This work is supported by the National Key Research and Development Program of China (2023YFC3402300 to Y. F.), the National Natural Science Foundation of China (32070125 to Y. F., 32171203 to S. D., 32070028 to Y.‐J. L., 32170051 to Q. C.), the QIBEBT International Cooperation Project (QIBEBT ICP202304 to Y. F.), the State Key Laboratory of Microbial Technology Open Projects Fund (M2022‐01 to Y. F.), German‐Israeli Project Cooperation (DIP 2476/2‐1 to I. M.), the European Research Council (ERC 866530 to I. M.), and the Israel Science Foundation (ISF 1947/19 to I. M.). E. A. B. is the incumbent of The Maynard I. and Elaine Wishner Chair of Bio‐organic Chemistry.

Dong S, Chen C, Li J, Liu Y‐J, Bayer EA, Lamed R, et al. Unique Fn3‐like biosensor in σI /anti‐σI factors for regulatory expression of major cellulosomal scaffoldins in Pseudobacteroides cellulosolvens . Protein Science. 2024;33(11):e5193. 10.1002/pro.5193

Review Editor: Aitziber L. Cortajarena

DATA AVAILABILITY STATEMENT

The coordinates of the crystal structures of PcRsgI4_Fn3′L and PcRsgI4_Fn3′ have been deposited in the Protein Data Bank (PDB) under accession codes 7CG5 and 7CG8, respectively. The coordinates of solution structures of PcRsgI4_Fn3′L and the NMR restraints have been deposited in the PDB under accession code 7CG1. The chemical shift assignments of PcRsgI4_Fn3′L have been deposited into the BioMagResBank under accession number 36358.

REFERENCES

  1. Abbott DW, van Bueren AL. Using structure to inform carbohydrate binding module function. Curr Opin Struct Biol. 2014;28:32–40. [DOI] [PubMed] [Google Scholar]
  2. Adams PD, Afonine PV, Bunkoczi G, Chen VB, Davis IW, Echols N, et al. PHENIX: a comprehensive Python‐based system for macromolecular structure solution. Acta Crystallogr Sect D Biol Crystallogr. 2010;66:213–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Alahuhta M, Xu Q, Brunecky R, Adney WS, Ding SY, Himmel ME, et al. Structure of a fibronectin type III‐like module from Clostridium thermocellum . Acta Cryst F Struct Biol Commun. 2010;66:878–880. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Armenta S, Moreno‐Mendieta S, Sanchez‐Cuapio Z, Sanchez S, Rodriguez‐Sanoja R. Advances in molecular engineering of carbohydrate‐binding modules. Proteins. 2017;85:1602–1617. [DOI] [PubMed] [Google Scholar]
  5. Artzi L, Bayer EA, Moraïs S. Cellulosomes: bacterial nanomachines for dismantling plant polysaccharides. Nat Rev Microbiol. 2017;15:83–95. [DOI] [PubMed] [Google Scholar]
  6. Bahari L, Gilad Y, Borovok I, Kahel‐Raifer H, Dassa B, Nataf Y, et al. Glycoside hydrolases as components of putative carbohydrate biosensor proteins in Clostridium thermocellum . J Ind Microbiol Biotechnol. 2011;38:825–832. [DOI] [PubMed] [Google Scholar]
  7. Bayer EA, Lamed R, White BA, Flint HJ. From cellulosomes to cellulosomics. Chem Rec. 2008;8:364–377. [DOI] [PubMed] [Google Scholar]
  8. Bloom L, Calabro V. FN3: a new protein scaffold reaches the clinic. Drug Discov Today. 2009;14:949–955. [DOI] [PubMed] [Google Scholar]
  9. Boraston AB, Bolam DN, Gilbert HJ, Davies GJ. Carbohydrate‐binding modules: fine‐tuning polysaccharide recognition. Biochem J. 2004;382:769–781. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Boraston AB, Healey M, Klassen J, Ficko‐Blean E, van Bueren AL, Law V. A structural and functional analysis of alpha‐glucan recognition by family 25 and 26 carbohydrate‐binding modules reveals a conserved mode of starch recognition. J Biol Chem. 2006;281:587–598. [DOI] [PubMed] [Google Scholar]
  11. Bork P, Doolittle RF. Proposed acquisition of an animal protein domain by bacteria. Proc Natl Acad Sci U S A. 1992;89:8990–8994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Brogan AP, Habib C, Hobbs SJ, Kranzusch PJ, Rudner DZ. Bacterial SEAL domains undergo autoproteolysis and function in regulated intramembrane proteolysis. Proc Natl Acad Sci U S A. 2023;120:e2310862120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Brünger AT, Adams PD, Clore GM, DeLano, WL, Gros P, Grosse‐Kunstleve RW, et al. Crystallography & NMR System: a new software suite for macromolecular structure determination. Acta Crystallogr Sect D Biol Crystallogr. 1998; 54:905–921. [DOI] [PubMed] [Google Scholar]
  14. Chen C, Dong S, Yu Z, Qiao Y, Li J, Ding X, et al. Essential autoproteolysis of bacterial anti‐σ factor RsgI for transmembrane signal transduction. Sci Adv. 2023;9:eadg4846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Delaglio F, Grzesiek S, Vuister GW, Zhu G, Pfeifer J, Bax A. NMRPipe: a multidimensional spectral processing system based on Unix pipes. J Biomol NMR. 1995;6:277–293. [DOI] [PubMed] [Google Scholar]
  17. Ding SY, Bayer EA, Steiner D, Shoham Y, Lamed R. A scaffoldin of the Bacteroides cellulosolvens cellulosome that contains 11 type II cohesins. J Bacteriol. 2000;182:4915–4925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Duggan BM, Legge GB, Dyson HJ, Wright PE. SANE (structure assisted NOE evaluation): an automated model‐based approach for NOE assignment. J Biomol NMR. 2001;19:321–329. [DOI] [PubMed] [Google Scholar]
  19. Ebina T, Toh H, Kuroda Y. Loop‐length‐dependent SVM prediction of domain linkers for high‐throughput structural proteomics. Biopolymers. 2009;92:1–8. [DOI] [PubMed] [Google Scholar]
  20. El‐Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47:D427–D432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Emsley P, Lohkamp B, Scott WG, Cowtan K. Features and development of Coot. Acta Crystallogr Sect D Biol Crystallogr. 2010;66:486–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Feng Y, Song X, Lin J, Xuan J, Cui Q, Wang J. Structure determination of archaea‐specific ribosomal protein L46a reveals a novel protein fold. Biochem Biophys Res Commun. 2014;450:67–72. [DOI] [PubMed] [Google Scholar]
  23. Ficko‐Blean E, Gregg KJ, Adams JJ, Hehemann JH, Czjzek M, Smith SP, et al. Portrait of an enzyme, a complete structural analysis of a multimodular β‐N‐acetylglucosaminidase from Clostridium perfringens . J Biol Chem. 2009;284:9876–9884. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Gilbert HJ, Knox JP, Boraston AB. Advances in understanding the molecular basis of plant cell wall polysaccharide recognition by carbohydrate‐binding modules. Curr Opin Struct Biol. 2013;23:669–677. [DOI] [PubMed] [Google Scholar]
  25. Grinberg IR, Yaniv O, Ortiz de Ora L, Muñoz‐Gutiérrez I, Hershko A, Livnah O, et al. Distinctive ligand‐binding specificities of tandem PA14 biomass‐sensory elements from Clostridium thermocellum and Clostridium clariflavum . Proteins. 2019;87:917–930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Gronenberg LS, Marcheschi RJ, Liao JC. Next generation biofuel engineering in prokaryotes. Curr Opin Chem Biol. 2013;17:462–471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Guillen D, Sanchez S, Rodriguez‐Sanoja R. Carbohydrate‐binding domains: multiplicity of biological roles. Appl Microbiol Biotechnol. 2010;85:1241–1249. [DOI] [PubMed] [Google Scholar]
  28. Herrmann T, Guntert P, Wuthrich K. Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J Mol Biol. 2002;319:209–227. [DOI] [PubMed] [Google Scholar]
  29. Hettle A, Fillo A, Abe K, Massel P, Pluvinage B, Langelaan DN, et al. Properties of a family 56 carbohydrate‐binding module and its role in the recognition and hydrolysis of beta‐1,3‐glucan. J Biol Chem. 2017;292:16955–16968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Holm L, Rosenstrom P. Dali server: conservation mapping in 3D. Nucleic Acids Res. 2010;38:W545–W549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Hooft RWW, Vriend G, Sander C, Abola EE. Errors in protein structures. Nature. 1996;381:272. [DOI] [PubMed] [Google Scholar]
  32. Jee JG, Ikegami T, Hashimoto M, Kawabata T, Ikeguchi M, Watanabe T, et al. Solution structure of the fibronectin type III domain from Bacillus circulans WL‐12 chitinase A1. J Biol Chem. 2002;277:1388–1397. [DOI] [PubMed] [Google Scholar]
  33. Johnson BA. Using NMRView to visualize and analyze the NMR spectra of macromolecules. Methods Mol Biol. 2004;278:313–352. [DOI] [PubMed] [Google Scholar]
  34. Jones DT. Protein secondary structure prediction based on position‐specific scoring matrices. J Mol Biol. 1999;292:195–202. [DOI] [PubMed] [Google Scholar]
  35. Jung YS, Zweckstetter M. Mars—robust automatic backbone assignment of proteins. J Biomol NMR. 2004;30:11–23. [DOI] [PubMed] [Google Scholar]
  36. Kabsch W. XDS. Acta Crystallogr Sect D Biol Crystallogr. 2010;66:125–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Kahel‐Raifer H, Jindou S, Bahari L, Nataf Y, Shoham Y, Bayer EA, et al. The unique set of putative membrane‐associated anti‐sigma factors in Clostridium thermocellum suggests a novel extracellular carbohydrate‐sensing mechanism involved in gene regulation. FEMS Microbiol Lett. 2010;308:84–93. [DOI] [PubMed] [Google Scholar]
  38. Kataeva IA, Seidel RD, Shah A, West LT, Li XL, Ljungdahl LG. The fibronectin type 3‐like repeat from the Clostridium thermocellum cellobiohydrolase CbhA promotes hydrolysis of cellulose by modifying its surface. Appl Environ Microbiol. 2002;68:4292–4300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Koradi R, Billeter M, Wuthrich K. MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graph. 1996;14:51–55. [DOI] [PubMed] [Google Scholar]
  40. Krissinel E, Henrick K. Secondary‐structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr Sect D Biol Crystallogr. 2004;60:2256–2268. [DOI] [PubMed] [Google Scholar]
  41. Krogh A, Larsson B, von Heijne G, Sonnhammer ELL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001;305:567–580. [DOI] [PubMed] [Google Scholar]
  42. Lamed R, Kenig R, Setter E, Bayer EA. Major characteristics of the cellulolytic system of Clostridium thermocellum coincide with those of the purified cellulosome. Enzyme Microb Technol. 1985;7:37–41. [Google Scholar]
  43. Laskowski RA, Rullmann JAC, MacArthur MW, Kaptein R, Thornton JM. AQUA and PROCHECK‐NMR: programs for checking the quality of protein structures solved by NMR. J Biomol NMR. 1996;8:477–486. [DOI] [PubMed] [Google Scholar]
  44. Li J, Zhang H, Li D, Liu Y‐J, Bayer EA, Cui Q, et al. Structure of the transcription open complex of distinct σI factors. Nat Commun. 2023;14:6455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Little E, Bork P, Doolittle RF. Tracing the spread of fibronectin type‐III domains in bacterial glycohydrolases. J Mol Evol. 1994;39:631–643. [DOI] [PubMed] [Google Scholar]
  46. Liu S, Liu Y‐J, Feng Y, Li B, Cui Q. Construction of consolidated bio‐saccharification biocatalyst and process optimization for highly efficient lignocellulose solubilization. Biotechnol Biofuels. 2019;12:35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Lombard V, Ramulu HG, Drula E, Coutinho PM, Henrissat B. The carbohydrate‐active enzymes database (CAZy) in 2013. Nucleic Acids Res. 2014;42:D490–D495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Mahoney BJ, Takayesu A, Zhou AQ, Cascio D, Clubb RT. The structure of the Clostridium thermocellum RsgI9 ectodomain provides insight into the mechanism of biomass sensing. Proteins. 2022;90:1457–1467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Marchler‐Bauer A, Bo Y, Han LY, He JE, Lanczycki CJ, Lu SN, et al. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 2017;45:D200–D203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Markley JL, Bax A, Arata Y, Hilbers CW, Kaptein R, Sykes BD, et al. Recommendations for the presentation of NMR structures of proteins and nucleic acids—(IUPAC recommendations 1998). Pure Appl Chem. 1998;70:117–142. [Google Scholar]
  51. Mccoy AJ, Grosse‐Kunstleve RW, Adams PD, Winn MD, Storoni LC, Read RJ. Phaser crystallographic software. J Appl Cryst. 2007;40:658–674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Montanier C, Flint JE, Bolam DN, Xie HF, Liu ZY, Rogowski A, et al. Circular permutation provides an evolutionary link between two families of calcium‐dependent carbohydrate binding modules. J Biol Chem. 2010;285:31742–31754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Moraïs S, Barak Y, Caspi J, Hadar Y, Lamed R, Shoham Y, et al. Contribution of a xylan‐binding module to the degradation of a complex cellulosic substrate by designer cellulosomes. Appl Environ Microbiol. 2010;76:3787–3796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Munoz‐Gutierrez I, Ortiz de Ora L, Rozman Grinberg I, Garty Y, Bayer EA, Shoham Y, et al. Decoding biomass‐sensing regulons of Clostridium thermocellum alternative sigma‐I factors in a heterologous Bacillus subtilis host system. PLoS One. 2016;11:e0146316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Nataf Y, Bahari L, Kahel‐Raifer H, Borovok I, Lamed R, Bayer EA, et al. Clostridium thermocellum cellulosomal genes are regulated by extracytoplasmic polysaccharides via alternative sigma factors. Proc Natl Acad Sci U S A. 2010;107:18646–18651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Nederveen AJ, Doreleijers JF, Vranken W, Miller Z, Spronk CAEM, Nabuurs SB, et al. RECOORD: a recalculated coordinate database of 500+ proteins from the PDB using restraints from the BioMagResBank. Proteins. 2005;59:662–672. [DOI] [PubMed] [Google Scholar]
  57. Ortiz de Ora L, Munoz‐Gutierrez I, Bayer EA, Shoham Y, Lamed R, Borovok I. Revisiting the regulation of the primary scaffoldin gene in Clostridium thermocellum . Appl Environ Microbiol. 2017;83:e03088‐16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Ortiz de Ora L, Lamed R, Liu YJ, Xu J, Cui Q, Feng Y, et al. Regulation of biomass degradation by alternative sigma factors in cellulolytic clostridia. Sci Rep. 2018;8:11036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Perez J, Munoz‐Dorado J, de la Rubia T, Martinez J. Biodegradation and biological treatments of cellulose, hemicellulose and lignin: an overview. Int Microbiol. 2002;5:53–63. [DOI] [PubMed] [Google Scholar]
  60. Raman B, Pan C, Hurst GB, Rodriguez M Jr, McKeown CK, Lankford PK, et al. Impact of pretreated Switchgrass and biomass carbohydrates on Clostridium thermocellum ATCC 27405 cellulosome composition: a quantitative proteomic analysis. PLoS One. 2009;4:e5271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Sauve V, Bruno S, Berks BC, Hemmings AM. The SoxYZ complex carries sulfur cycle intermediates on a peptide swinging arm. J Biol Chem. 2007;282:23194–23204. [DOI] [PubMed] [Google Scholar]
  62. Schindler DE, Armstrong JB, Reed TE. The portfolio concept in ecology and evolution. Front Ecol Environ. 2015;13:257–263. [Google Scholar]
  63. Shen Y, Bax A. Protein structural information derived from NMR chemical shift with the neural network program TALOS‐N. Methods Mol Biol. 2015;1260:17–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Shoseyov O, Shani Z, Levy I. Carbohydrate binding modules: biochemical properties and novel applications. Microbiol Mol Biol Rev. 2006;70:283–295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Staron A, Sofia HJ, Dietrich S, Ulrich LE, Liesegang H, Mascher T. The third pillar of bacterial signal transduction: classification of the extracytoplasmic function (ECF) sigma factor protein family. Mol Microbiol. 2009;74:557–581. [DOI] [PubMed] [Google Scholar]
  66. Takayesu A, Mahoney BJ, Goring AK, Jessup T, Loo RRO, Loo JA, et al. Insight into the autoproteolysis mechanism of the RsgI9 anti‐σ factor from Clostridium thermocellum . Proteins. 2024;92:946–958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Thomas C, Moraga I, Levin D, Krutzik PO, Podoplelova Y, Trejo A, et al. Structural linkage between ligand discrimination and receptor activation by type I interferons. Cell. 2011;146:621–632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Venditto I, Luis AS, Rydahl M, Schuckel J, Fernandes VO, Vidal‐Melgosa S, et al. Complexity of the Ruminococcus flavefaciens cellulosome reflects an expansion in glycan recognition. Proc Natl Acad Sci U S A. 2016;113:7136–7141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Wang Q‐S, Zhang K‐H, Cui Y, Wang Z‐J, Pan Q‐Y, Liu K, et al. Upgrade of macromolecular crystallography beamline BL17U1 at SSRF. Nucl Sci Tech. 2018;29:68. [Google Scholar]
  70. Wei Z, Chen C, Liu YJ, Dong S, Li J, Qi K, et al. Alternative σI/anti‐σI factors represent a unique form of bacterial σ/anti‐σ complex. Nucleic Acids Res. 2019;47:5988–5997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Xiao M, Liu Y‐J, Bayer EA, Kosugi A, Cui Q, Feng Y. Cellulosomal hemicellulases: indispensable players for ensuring effective lignocellulose bioconversion. Green Carbon. 2024;2:57–69. [Google Scholar]
  72. Xu Q, Bayer EA, Goldman M, Kenig R, Shoham Y, Lamed R. Architecture of the Bacteroides cellulosolvens cellulosome: description of a cell surface‐anchoring scaffoldin and a family 48 cellulase. J Bacteriol. 2004;186:968–977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Yaniv O, Fichman G, Borovok I, Shoham Y, Bayer EA, Lamed R, et al. Fine‐structural variance of family 3 carbohydrate‐binding modules as extracellular biomass‐sensing components of Clostridium thermocellum anti‐sigma(I) factors. Acta Crystallogr Sect D Biol Crystallogr. 2014;70:522–534. [DOI] [PubMed] [Google Scholar]
  74. Zhang WZ, Tang JC, Wang SS, Wang ZJ, Qin WM, He JH. The protein complex crystallography beamline (BL19U1) at the Shanghai synchrotron radiation facility. Nucl Sci Tech. 2019;30:170. [Google Scholar]
  75. Zhivin O, Dassa B, Moraïs S, Utturkar SM, Brown SD, Henrissat B, et al. Unique organization and unprecedented diversity of the Bacteroides (Pseudobacteroides) cellulosolvens cellulosome system. Biotechnol Biofuels. 2017;10:211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Zhivin‐Nissan O, Dassa B, Morag E, Kupervaser M, Levin Y, Bayer EA. Unraveling essential cellulosomal components of the (pseudo)Bacteroides cellulosolvens reveals an extensive reservoir of novel catalytic enzymes. Biotechnol Biofuels. 2019;12:115. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

APPENDIX S1: Supporting information.

PRO-33-e5193-s001.pdf (2.1MB, pdf)

Data Availability Statement

The coordinates of the crystal structures of PcRsgI4_Fn3′L and PcRsgI4_Fn3′ have been deposited in the Protein Data Bank (PDB) under accession codes 7CG5 and 7CG8, respectively. The coordinates of solution structures of PcRsgI4_Fn3′L and the NMR restraints have been deposited in the PDB under accession code 7CG1. The chemical shift assignments of PcRsgI4_Fn3′L have been deposited into the BioMagResBank under accession number 36358.


Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES