Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2019 Feb 18;28(4):794–799. doi: 10.1002/pro.3582

A 2.08 Å resolution structure of HLB5, a novel cellulase from the anaerobic gut bacterium Parabacteroides johnsonii DSM 18315

Changsoo Chang 1,2, Charles Brooke 3, Hailan Piao 4, Jamey Mack 1, Gyorgy Babnigg 1, Andrzej Joachimiak 1,2,, Matthias Hess 3,
PMCID: PMC6423722  PMID: 30687968

Abstract

Cellulases play a significant role in the degradation of complex carbohydrates. In the human gut, anaerobic bacteria are essential to the well‐being of the host by producing these essential enzymes that convert plant polymers into simple sugars that can then be further metabolized by the host. Here, we report the 2.08 Å resolution structure of HLB5, a chemically verified cellulase that was identified previously from an anaerobic gut bacterium and that has no structural cellulase homologues in PDB nor possesses any conserved region typical for glycosidases. We anticipate that the information presented here will facilitate the identification of additional cellulases for which no homologues have been identified to date and enhance our understanding how these novel cellulases bind and hydrolyze their substrates.

Keywords: cellulase, Parabacteroides johnsonii, CAZyme, carbohydrate metabolism, human gut

Short abstract

PDB Code(s): 5IR2

Introduction

Plant carbohydrates composed of large polymers such as cellulose, hemi‐cellulose, and lignin are a major energy source for micro‐organisms that inhabit the digestive tract of animals and humans. To efficiently decompose carbohydrate polymers, these microbes utilize a diverse set of carbohydrate active enzymes (CAZymes; www.cazy.org).1 Although in some bacterial genomes CAZymes account for more than 10% of the encoded genes, it is believed that many are still to be discovered. Cellulose, a polymer of glucose represents a major fraction of plant carbohydrates and three classes of cellulases are needed to decompose cellulose: 1,4‐β‐endoglucanases, cellobiohydrolases, and β‐glucosidases. Among these, 1,4‐β‐endoglucanases (endoglucanase; EC 3.2.1.4), randomly hydrolyze 1,4‐β‐glucosidic bonds represent a key group in the degradation of cellulose.2 Besides their physiological importance, CAZymes have recently received increased attention due to their industrial potential.3, 4, 5, 6, 7

In a previous study, we utilized a “guilty by association” strategy to discover novel cellulases.8 We identified and cloned 17 putative cellulases with too little sequence similarity to known cellulases to be identified as such. From this set, 11 (~65%) were verified to possess cellulolytic activity against carboxymethylcellulose (CMC) and pretreated Miscanthus.8 Of these cellulases, HLB5 possessed high activity toward both CMC and Miscanthus. In the genome of Parabacteroides johnsonii DSM 18315, hlb5 is part of a gene cluster, containing among others a β‐galactosidase/glucuronidase, two β‐1,4‐xylanases, a sugar phosphate permease, and three bacterial outer membrane proteins. Further analysis of the 247 amino acid long sequence of HLB5 failed to detect any signal peptide or transmembrane helices but identified a region (PF03737) conserved in proteins belonging to the RraA and RraA‐like family.

Here, we present the crystal structure of HLB5 at 2.08 Å resolution. The protein has no significant sequence identity to known cellulases. The structure shows a tight hexamer and provides information about the three‐dimensional arrangements and functional sites. HLB5 has several structural homologues in PDB, none of them have cellulase activity and the protein does not possess any conserved regions typical for CAZymes while still being capable of degrading cellulosic biomass.

Materials and Methods

Sub‐cloning, expression, and purification

Residues 3–243 of hlb5 were amplified by PCR from the previously constructed pET102‐hlb5 8 with KOD Hot Start DNA polymerase (MilliporeSigma, Hayward CA, USA) with forward (5′‐TACTTCCAATCCAATGCCGTAGATCAGTATAAGAAAGAAATCGGAATGATGA‐3′) and reverse primers (5′‐TTATCCACTTCCAATGTTATTTTGCTGTAATTTCCTGAACACTGTCTC‐3′). The PCR product was purified and cloned into pMCSG68 (MCSG, Argonne, IL, USA) using modified ligase‐independent procedure9 and transformed into Escherichia coli BL21(DE3)‐Gold strain (MilliporeSigma, Hayward CA, USA). The pMCSG68 vector provides an N‐terminal TEV‐cleavable His6 purification affinity tag. After cloning and sequencing of the insert, a point mutation of Pro191 to Ser was detected. Cells were grown in enriched M9 medium using selenomethionine (SeMet), under conditions known to inhibit methionine biosynthesis at 37°C to an OD600 of 1. Protein expression was induced with 0.5 mM IPTG. Expression was conducted overnight under aeration (200 rpm) at 18°C. Cells were harvested by centrifugation (2200g) and resuspended in five volumes of lysis buffer (50 mM HEPES pH 8.0, 500 mM NaCl, 20 mM imidazole, 10 mM 2‐mercaptoethanol, and 5% v/v glycerol) and stored at −20°C until processed further. Harvested cells were thawed and subsequently treated with a protease inhibitor cocktail (P8849, MilliporeSigma, Hayward CA, USA) and 1 mg/mL lysozyme prior to sonication. The lysate was clarified by centrifugation at 30,000 g (Sorvall RC5C‐Plus, ThermoFisher, West Sacramento, CA, USA) for 60 min, followed by filtration through 0.45 and 0.22 μm in‐line filters (Gelman, Pall Corporation, Westborough, MA, USA). The protein was purified with immobilized metal affinity chromatography (IMAC‐I) using a 5‐mL HiTrap Chelating HP column charged with Ni2+ ions followed by buffer‐exchange chromatography on a HiPrep 26/10 desalting column (both GE Healthcare Life Sciences, Pittsburgh, PA) using an ÄKTAxpress system (GE Healthcare Life Sciences). The His6‐tag was cleaved using the recombinant TEV protease expressed from the vector pRK508 (MCSG, Argonne, IL). The protease was added to the target protein in a ratio of 1:30 and the mixture was incubated at 4°C for 48 h. The protein was then purified using a 5 mL HiTrap Chelating column charged with Ni2+ ions. The protein was dialyzed in 20 mM pH 8.0 HEPES, 250 mM NaCl, 2 mM DTT, and concentrated using a Centricon Plus‐20 Centrifugal Concentrator (MilliporeSigma, Hayward, CA, USA) to 57 mg/mL.

Protein crystallization and data collection

Crystallization conditions were determined using MCSG crystallization suite (Mycrolytic) with the help of a Mosquito robot (TTP Labtech) using the sitting drop vapor diffusion technique in a 96‐well CrystalQuick plate (Greiner). Crystals suitable for X‐ray diffraction data collection were grown in a condition of MCSG3 well 43 (0.2 M Li2SO4, 0.1 M CHES, pH 9.5, 0.1 M Na/K Tartrate) at 24°C. Single wavelength anomalous diffraction data near the selenium absorption peak was collected from a SeMet‐substituted protein. The crystal growth buffer was supplemented with 25% ethylene glycol for cryoprotection. The crystal was picked using a Litholoop (Molecular dimensions, Maumee, OH, USA) and flash‐cooled in liquid nitrogen. Data were collected on an ADSC quantum Q315r charged coupled device detector (Poway, CA, USA) at 100°K in the 19ID beamline of the Structural Biology Center at the Advanced Photon Source, Argonne National Laboratory. The crystal belongs to hexagonal space group P6322 with cell parameters of a = b = 100.2 Å, c = 129.7 Å, α = β = 90°, γ = 120°. The diffraction data were processed by the HKL3000 suite of programs.10 Data collection statistics are presented in Table 1.

Table 1.

Data Collection and Refinement Statistics

Data collection statistics
Wavelength (Å) 0.9792
Resolution (Å) 50–2.08 (2.12–2.08)a
Space group P6322
Unique reflections 23,828
Completeness (%) 100 (100)a
I/sigma 39.1 (3.09)a
R merge (%) 0.11 (0.98)a
Cell parameters (Å, °) a = b = 100.2, c = 129.7, α = β = 90, γ = 120
Refinement statistics
Resolution range (Å) 41.2–2.08 (2.12–2.08)a
Reflections working/test 20,483/2,177 (1,538/85)a
R work/R free (%) 14.2/18.7 (19.2/23.2)a
Number of atoms 1996
Protein atoms 1767
Hetero atoms 31
Water atoms 198
Root mean square deviation
Bond lengths (Å) 0.002
Angles (°) 0.47
Ramachandran plot
Favored 98.20%
Outliers 0%
Average B factor
All atoms (Å2) 35.5
Protein atoms (Å2) 34.4
Ligand atoms (Å) 42.6
Solvent atoms (Å2) 45.2
a

Values in the highest resolution shell.

Structure determination and refinement

The crystal structure of HLB5 was solved by single wavelength anomalous diffraction method (SAD). All procedures for SAD phasing, phase improvement by density modification, and initial protein model building were done using the structure module of the HKL3000 software package.10 There is one molecule in an asymmetric unit and the structure was refined to an R/R free of 0.142/0.187 in 41.2–2.08 Å resolution range. Seven selenium sites were found using SHELXD.11 The mean figure of merit of the phase set from MLPHARE12 was 0.213 for 50–2.08 Å data and improved to 0.856 after density modification (DM).13 The structure‐building module using arp/warp14 built 199 out of 241 residues, while side chains of 187 residues were placed. The initial model was rebuilt and manually refined with the program COOT15 by using electron density maps based on DM‐phased reflection file. After each cycle of rebuilding, the model was refined further by PHENIX.16 The geometrical properties of the model were assessed by COOT and MolProbity17 and results suggest acceptable root mean square deviation from ideal geometry and a reasonable clash score. Detailed refinement statistics are shown in Table 1.

Results

Structure of HLB5

The monomer structure is α/β/β/α sandwich‐fold composed of eight helices and 13 strands [Fig. 1(A)]. The first layer is composed of α‐helices α5, α6, α7, and small β‐sheet, made of short β‐strands β4, β6, and β8. These small β‐strands are usually composed of two amino acids. The second layer has a β‐sheet composed of β‐strands in the following order: β1, β7, β5, β3, β2, and β9. The central four β‐strands are in parallel, while the outside Strands β1 and β9 are anti‐parallel. Strand β2 is 12 amino acids long and bends to the third layer where it is involved in the β‐sheet of the third layer. The third layer is composed of five β‐stands that are forming two small β‐sheets. β‐strands β2, β12, and β13 make an anti‐parallel β‐sheet and strands β11 and β10 make the antiparallel β‐sheet. The fourth layer is composed of five α‐helices (i.e., α1, α2, α3, α8, and α4). Among these helices, the first two helices protrude toward an adjacent molecule to make contact.

Figure 1.

Figure 1

Crystal structure of HLB5. (A) Stereo view of monomeric structure. Colored blue to red from N‐terminus to C‐terminus. Secondary structure elements are labeled. (B) Dimeric formation of two monomers. Each monomer is colored as green and cyan. (C) Two‐fold axis view of HLB5 hexamer. Each chain is represented in a different color. Surface clefts and internal cavities are shown as follows: big clefts along two‐fold axis are shown in yellow, channel penetrating along three‐fold axis is shown in green, the other cavities/pockets as shown in grey. (D) Three‐fold axis view of HLB5. Same color scheme as in C was used.”

The obtained crystal structure of HLB5 allowed the determination of HLB5 as a hexamer in its oligomeric state. As shown in Figure 1(B), α1 and α2 helices are three‐dimensionally swapped within a HLB5 dimer. More specifically, the HLB5 hexamer, generated by the symmetric operators (x,y,z), (x,xy, 1/2−z), (1–x + y,1−x,z), (1−y,1−x,1/2−z), (1−y,xy, −), and (1–x + y,y,1/2−z), displays perpendicular three‐fold and two‐fold symmetry in which two monomers bind each other tightly with the second and the third layer from one polypeptide chain and with α1 and α2 helices from another [Fig. 1(B)]. The dimeric interface area was determined to be 3630 Å2, accounting for 30% of the monomer's total surface area (12,300 Å2). Three of these dimers are arranged in a ring shape around a three‐fold axis to make a hexamer with overall dimensions of ~60 Å × 80 Å. Extensive interactions between neighboring dimers involving several helices and loop regions were also identified. A buried area of the hexamer was determined to measure 27,800 Å2, while the surface area of the hexamer was determined at 49,600 Å2, indicating extensive interactions in the hexamer suggesting that the hexamer is very stable. The formation of a hexamer gains 64.7 kcal/mol calculated using PISA.18

Inspection of the hexamer revealed several clefts and a large channel located at the interfaces between subunits. The significant feature of the hexamer is a channel ~60 Å long running across the entire structure along a three‐fold axis with a diameter around 6 Å [Fig. 1(D)]. In the middle of the hexamer, this channel expands and connects to a large cavity that appears to have three wide connections to the protein surface that are lined with acidic residues. There are also three narrow channels. All these channels and cavities are strongly hydrated and show a number of ordered water molecules. The large clefts located near a two‐fold axis on the interface of four peptide chains [Fig. 1(C)] have a wide entrance about 15 Å wide with 19 Å long, and the depth is about 15 Å leading to the central channel. In addition to the aforementioned opening, a smaller cleft not related to rotation axis exists. It is on the concave side of the sheet in Layer 2 and surrounded by Helices α1, α4, α7, and an entrance is slightly covered by Helix α1 of an adjacent molecule. This cleft appears to include a ligand binding site.

Homologous structures

When the structure of HLB5 was subjected to the Dali server,19 no significant similarity to known cellulases was detected. The uncharacterized protein PSPTO_3204 from Pseudomonas syringae pv. tomato str. DC3000 (PDB 3k4i) had the highest Z score (26.3) followed by RNA‐processing inhibitor RnaA like protein YER010CP from yeast (PDB 2c5q), 4‐hydroxy‐4‐methyl‐2‐oxoglutarate/4‐carboxy‐4‐hydroxy‐2‐oxoadipate(HMG/CHA) aldolase from Pseudomonas putida (PDB 3noj), S‐adenosylmethionine/2‐dimethylmenaquinone methyltransferase from Geobacillus kaustophilus (PDB 2pcn), RNA‐processing inhibitor RraA's from Streptomyces coelicolor (PDB 5x15), Thermus thermophilis (PDB 1j3l), E. coli (2yjv, 1q5x), Vibrio cholera (1vi4), P. aeruginosa (3c8o), putative Methyltransferase from Mycobacterium tuberculosis (1nxj) with Z scores over 15. Among these, HMG/CHA aldolase had a Z‐score of 22.9. Interestingly, this HMG/CHA aldolase also has α/β/β/α sandwich‐fold and exists as a hexamer20 and when compared to HLB5, structures show nearly identical fold with r.m.s.d. (~1.9 Å). Differences of these structures are observed in the N‐ and C‐termini of these sequences. More specifically, the N‐terminus of HLB5 is protruding to interact with an adjacent molecule to generate dimers while the N‐terminus of HMG/CHA aldolase does not. The C‐terminus of HMG/CHA aldolase has 35 more residues than HLB5 and it extends out to wrap around an adjacent protomer of the trimer.

Putative ligand binding site

The catalytic site of HMG/CHA aldolase is located in the cleft between adjacent protomers of the trimer and equivalent clefts are found on the structure of HLB5 [Fig. 2(A)]. As described previously, this cleft is located in the region partially covered by the N‐terminal helix of an adjacent polypeptide and occupied by a tartrate molecule from the crystallization solution. This tartrate makes hydrogen bonds with Gly125, Val127, Mse128, and Ser169, and hydrophobically interacts with Thr122, Trp124, Gly126, Arg147, and Gly168 [Fig. 2(B)]. In the HMG/CHA aldolase structure, pyruvate and magnesium ions are bound to the active site. A pyruvate interacts with a magnesium ion and the side chain of residues Arg123 (Arg147 in HLB5), the backbone N of Residues Asp102, Leu103, and Leu104 (Gly126, Val127, and Met128) and the magnesium ion interacts with Asp124 (Asp148 in HLB5). This spatial environment is well conserved although the metal ion is not found in HLB5 and different ligands are present. Other than these, the residues like Gly125 and Gly168 are spatially conserved in this cleft. Trials for crystallization of putative substrate bound protein have failed so far and the exact active site residues of HLB5 remain unknown. A direct comparison of the putative binding site of HLB5 and the HMG/CHA aldolase is shown in Figure 2(C).

Figure 2.

Figure 2

Putative ligand binding site of HLB5. (A) Stereo view of surface model with ligand tartrate. Only one of the six putative ligand binding clefts is highlighted. Tartrate molecule as a putative ligand is presented as stick while protein hexamer is presented as surface model. (B) Putative ligand binding site of HLB5. Tartrate molecule is represented as thick stick model, while interacting residues are shown as thin stick. Remaining protein chain is shown as ribbon. Hydrogen bonds are presented as cyan lines. Tartrate and interacting residues are labeled. (C) Comparison of putative binding sites. Residues interacting with tartrate as ligand are presented as thin sticks. Ligands are presented as thick sticks. HLB5 structure is presented in pale blue. HMG/CHA aldolase is presented in orange.

Structure data availability

The structure of HLB5 and its associated data can be accessed using PDB id: 5ir2.

Conclusions

We have determined the crystal structure of HLB5, a protein previously shown to hydrolyze CMC and pretreated Miscanthus,8 from P. johnsonii DSM 18315 using single wavelength anomalous diffraction. Supported by PISA analysis and size exclusion chromatography, we have determined that the biological unit of this protein is a hexamer consisting of three dimers. By comparing HLB5 with a homologous structure in PDB, HMG/CHA aldolase from P. putida, we propose a ligand binding site for this cellulase. Further experiments will be required to fully understand how this novel cellulase binds and hydrolyzes its substrate.

Conflict of Interest

The authors declare no competing interests.

Acknowledgments

This work was supported by National Institutes of Health Grants GM094585 and GM115586 (to A.J.) and the use of Structural Biology Center beamlines was supported by the U.S. Department of Energy, Office of Biological and Environmental Research, under contract DE‐AC02‐6CH11357.

Contributor Information

Andrzej Joachimiak, Email: andrzejj@anl.gov.

Matthias Hess, Email: mhess@ucdavis.edu.

References

  • 1. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The carbohydrate‐active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Lynd LR, Weimer PJ, van Zyl WH, Pretorius IS (2002) Microbial cellulose utilization: fundamentals and biotechnology. Microbiol Mol Biol Rev 66:506–577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Harris D, DeBolt S (2010) Synthesis, regulation and utilization of lignocellulosic biomass. Plant Biotechnol J 8:244–262. [DOI] [PubMed] [Google Scholar]
  • 4. Dashtban M, Schraft H, Syed TA, Qin W (2010) Fungal biodegradation and enzymatic modification of lignin. Int J Biochem Mol Biol 1:36–50. [PMC free article] [PubMed] [Google Scholar]
  • 5. Lynd LR, Liang X, Biddy MJ, Allee A, Cai H, Foust T, Himmel ME, Laser MS, Wang M, Wyman CE (2017) Cellulosic ethanol: status and innovation. Curr Opin Biotechnol 45:202–211. [DOI] [PubMed] [Google Scholar]
  • 6. Kuhad RC, Gupta R, Singh A (2011) Microbial cellulases and their industrial applications. Enzyme Res 2011:280696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Tan K, Heo S, Foo M, Chew IM, Yoo C (2019) An insight into nanocellulose as soft condensed matter: challenge and future prospective toward environmental sustainability. Sci Total Environ 650:1309–1326. [DOI] [PubMed] [Google Scholar]
  • 8. Piao H, Froula J, Du C, Kim TW, Hawley ER, Bauer S, Wang Z, Ivanova N, Clark DS, Klenk HP, Hess M (2014) Identification of novel biomass‐degrading enzymes from genomic dark matter: populating genomic sequence space with functional annotation. Biotechnol Bioeng 111:1550–1565. [DOI] [PubMed] [Google Scholar]
  • 9. Stols L, Gu M, Dieckman L, Raffen R, Collart FR, Donnelly MI (2002) A new vector for high‐throughput, ligation‐independent cloning encoding a tobacco etch virus protease cleavage site. Protein Expr Purif 25:8–15. [DOI] [PubMed] [Google Scholar]
  • 10. Minor W, Cymborowski M, Otwinowski Z, Chruszcz M (2006) HKL‐3000: the integration of data reduction and structure solution – from diffraction images to an initial model in minutes. Acta Cryst D62:859–866. [DOI] [PubMed] [Google Scholar]
  • 11. Sheldrick GM (2010) Experimental phasing with SHELXC/D/E: combining chain tracing with density modification. Acta Cryst D66:479–485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Otwinowski Z (1991) Proceedings of the CCP4 study weekend In: Wolf W, Evans PR, Leslie AGW, Eds, Isomorphous replacement and anomalous scattering. Warrington: Daresbury Laboratory; p. 80–86. [Google Scholar]
  • 13. Cowtan K (1994) An automated procedure for phase improvement by density modification Joint CCP4 and ESF‐EACBM Newsletter on Protein Crystallography. Volume 31, Warrington: Daresbury Laboratory; p. 34–38. [Google Scholar]
  • 14. Langer G, Cohen SX, Lamzin VS, Perrakis A (2008) Automated macromolecular model building for X‐ray crystallography using ARP/wARP version 7. Nat Protoc 3:1171–1179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Emsley P, Cowtan K (2004) Coot: model‐building tools for molecular graphics. Acta Crystallogr D60:2126–2132. [DOI] [PubMed] [Google Scholar]
  • 16. Adams PD, Afonine PV, Bunkoczi G, Chen VB, Davis IW, Echols N, Headd JJ, Hung LW, Kapral GJ, Grosse‐Kunstleve RW, McCoy AJ, Moriarty NW, Oeffner R, Read RJ, Richardson DC, Richardson JS, Terwilliger TC, Zwart PH (2010) PHENIX: a comprehensive Python‐based system for macromolecular structure solution. Acta Cryst D66:213–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Davis IW, Leaver‐Fay A, Chen VB, Block JN, Kapral GJ, Wang X, Murray LW, Arendall WB 3rd, Snoeyink J, Richardson JS, Richardson DC (2007) MolProbity: all‐atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res 35:W375–W383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Krissinel E, Henrick K (2007) Inference of macromolecular assemblies from crystalline state. J Mol Biol 372:774–797. [DOI] [PubMed] [Google Scholar]
  • 19. Holm L, Laakso LM (2016) Dali server update. Nucleic Acids Res 44:W351–W355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Wang W, Mazurkewich S, Kimber MS, Seah SY (2010) Structural and kinetic characterization of 4‐hydroxy‐4‐methyl‐2‐oxoglutarate/4‐carboxy‐4‐hydroxy‐2‐oxoadipate aldolase, a protocatechuate degradation enzyme evolutionarily convergent with the HpaI and DmpG pyruvate aldolases. J Biol Chem 285:36608–36615. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES