Skip to main content
Portland Press Open Access logoLink to Portland Press Open Access
. 2025 Aug 18;482(16):1181–1195. doi: 10.1042/BCJ20253186

Functional and structural characterization of AtAbf43C: an exo-1,5-α-L-arabinofuranosidase from Acetivibrio thermocellus DSM1313

Joey L Galindo 1, Philip D Jeffrey 2, Angela Zhu 1, A James Link 1,2,3,4,5, Jonathan M Conway 1,2,3,4,6,
PMCID: PMC12493170  PMID: 40785600

Abstract

Acetivibrio thermocellus degrades diverse polysaccharides found in plant biomass using an array of glycoside hydrolase (GH) enzymes. Here, we describe the structure and function of AtAbf43C, an uncharacterized GH family 43 subfamily 26 (GH43_26) α-L-arabinofuranosidase (EC 3.2.1.55) from A. thermocellus. AtAbf43C is optimally active on para-nitrophenol-α-L-arabinofuranoside at pH 5.5 and 65 °C, making it the most thermophilic bacterial GH43_26 enzyme characterized to date. We solved high-resolution crystal structures of full-length AtAbf43C and its individual carbohydrate binding module family 42 (CBM42) and GH43 domains, including a structure with L-arabinofuranose molecules bound to the CBM42. The CBM42 domain adopts a typical β-trefoil fold, and the GH43 domain forms a canonical 5-bladed β-propeller, each resembling those in the mesophilic GH43_26 enzyme SaAraf43A from Streptomyces avermitilis (PDB 3AKH). However, AtAbf43C exhibits a unique domain organization, with the CBM42 at the N-terminus and the GH43 domain at the C-terminus, the reverse of the arrangement observed in SaAraf43A. Structural alignment enabled identification of the conserved catalytic triad (D168, D283, and E344) in AtAbf43C, which we confirmed experimentally with site-directed mutagenesis. The deep-narrow topology of the AtAbf43C GH43 binding pocket is consistent with exo activity on arabino-oligosaccharide (AOS) substrates. Indeed, liquid chromatography-mass spectrometry (LC-MS) analysis of polysaccharides and oligosaccharides hydrolyzed by AtAbf43C confirmed exo activity primarily toward α-1,5-linked AOSs. This suggests AtAbf43C contributes to the degradation of AOS released from arabinose-rich polysaccharides by other A. thermocellus enzymes. Together, these results expand our understanding of the structure–function of GH43_26 enzymes and their role in plant biomass deconstruction.

Keywords: Acetivibrio thermocellus, arabino-oligosaccharides, α-L-arabinofuranosidase, carbohydrate binding protein, Clostridium thermocellum, glycoside hydrolase

Introduction

Plant cell walls are comprised of large biopolymers, primarily cellulose, hemicellulose, and lignin, which form the basis for lignocellulosic biomass [1-4]. Efficient degradation of the various components of lignocellulose has important applications in many bioprocesses, most notably for the sustainable bioproduction of fuels and chemicals from renewable lignocellulosic feedstocks but also in commercial food and beverage processing [2,3,5]. Complete deconstruction of lignocellulose requires a diverse array of carbohydrate active enzymes (CAZymes) [3-8]. The domains from these CAZymes have been categorized into various families by sequence homology in the CAZy database [7]. Glycoside hydrolase (GH) domains, which cleave O-glycosidic bonds, are the dominant catalytic players in lignocellulose degradation [4,6-9]. Additionally, carbohydrate binding module (CBM) domains are a type of noncatalytic domain, commonly encoded in the same protein as catalytic CAZy domains, that bind to carbohydrate substrates and heavily influence enzymatic activity and specificity [4,6-9].

α-L-arabinofuranosidases (EC 3.2.1.55), a type of GH enzyme which hydrolyze terminal nonreducing α-1,2, α-1,3, or α-1,5 linked arabinofuranose residues, play an important role in the degradation of arabinose-containing hemicelluloses such as arabinoxylan, arabinan, and arabinogalactan [3-5,10]. Owing to the diversity of these polysaccharides, α-L-arabinofuranosidases vary widely in their specificities toward a given substrate. Some α-L-arabinofuranosidases are only active on small substrates such as short arabino-oligosaccharides (AOSs) or arabino-xylo-oligosaccharides (AXOS) as well as the synthetic compound para-nitrophenol-α-L-arabinofuranoside (pNPAra), while other α-L-arabinofuranosidases are active primarily on large polysaccharides like arabinan or arabinoxylan [3,5,10,11]. Additionally, a subset of this latter category of α-L-arabinofuranosidases, also known as arabinoxylan arabinofuranohydrolases, specifically cleave arabinofuranosyl residues from arabinoxylans [3,5,10,11]. α-L-arabinofuranosidases can be found across several GH families including GH2, 3, 43, 51, 54, 62, and 159 [3,5]. The large GH family 43, which was recently divided into 37 subfamilies, commonly contains galactan 1,3-β-galactosidases (EC 3.2.1.145), β-xylosidases (EC 3.2.1.37), and endo-α-L arabinanases (EC 3.2.1.99), in addition to α-L-arabinofuranosidases [8,12,13]. Several families of CBMs are also commonly found associated with GH43 α-L-arabinofuranosidases, most notably CBM6 and 42 [8,12-14]. CBM6 domains commonly bind directly to polysaccharides such as xylan and amorphous cellulose, while CBM42 domains typically recognize small AOSs or the terminal nonreducing arabinofuranosyl residues of arabinan or arabinoxylan [8,13,14].

Acetivibrio thermocellus (basionym: Clostridium thermocellum) is a thermophilic Gram-positive obligate anaerobic bacterium that has been heavily studied for its ability to efficiently break down lignocellulose by natively producing a variety of cellulose and hemicellulose degrading enzymes [2,4,12,15]. To date, three α-L-arabinofuranosidases from A. thermocellus have been characterized, including an intracellular GH51 family α-L-arabinofuranosidase active toward AOS and AXOS, and two extracellular cellulosomal GH43 α-L-arabinofuranosidases [4,12,16,17]. The first of these GH43 α-L-arabinofuranosidases (Ct43Araf; subfamily 16) was most active to arabinoxylan polysaccharides, while the second (AxB8; subfamily 29) acted as a bifunctional β-xylosidase/α-L-arabinofuranosidase with primary activity toward small AXOS [4,12,17]. Furthermore, of the seven predicted GH43 genes in A. thermocellus genome, only three have been characterized including the two aforementioned α-L-arabinofuranosidases as well as a 1,3-β-galactosidase (1,3Gal43A; subfamily 24) [4,12,17,18]. Additionally, three of the four uncharacterized A. thermocellus GH43 proteins contain CBM42 domains [4,14]. Previously, Ribeiro et al. expressed these three CBM42 domains as isolated truncations and tested their binding of various natural substrates, finding they most strongly bind arabinoxylan and arabinan [14].

In this work, we describe the unique crystal structure and function of a fourth A. thermocellus GH43 α-L-arabinofuranosidase (AtAbf43C; subfamily 26) that acts primarily in an exo manner toward the α-1,5 linkages present in arabinan and smaller AOS substrates rather than those of arabinoxylan and associated AXOS. We solved the crystal structure of the full enzyme and each of its domains, including a structure of its CBM42 bound to arabinose. We characterized the optimal activity of AtAbf43C on a variety of substrates and demonstrated that its activity is exo in nature with AOSs as its primary substrate. Taken together, our work provides new insight into the structure and function of a thermophilic α-L-arabinofuranosidase.

Results

Sequence analysis and diversity of characterized GH43_26 enzymes

The gene encoding the AtAbf43C protein in A. thermocellus DSM1313 (Locus Tag: Clo1313_2794; GenBank Protein accession #: ADU75776.1) has an open reading frame of 1743 bp and is identical in sequence to the Cthe_2138 gene in A. thermocellus ATCC27405. The predicted 580 amino acid protein consists of an N-terminal signal peptide (residues 1–19), a CBM42 domain (residues 29–160), a GH43 domain (residues 179–488), and a C-terminal dockerin I domain (residues 511–568) (Figure 1a). This suggested that Clo1313_2794 probably codes for a secreted enzyme that is part of the extracellular A. thermocellus cellulosome. The top nonidentical BLASTp hits (67–93% identity) to AtAbf43C are comprised of almost all GH43 enzymes from other Acetivibrio species, many of which are predicted to be putative β-xylosidases or α-L-arabinofuranosidases (Supplementary Table S1). Ten bacterial GH43_26 enzymes (Supplementary Table S2) cataloged in the CAZy database have previously been characterized [19-25]. All of these GH43_26 enzymes are α-1,5-arabinofuranosidases active toward AOS or arabinan. Individually, these enzymes have 49–63% identity to AtAbf43C, and a phylogenetic tree constructed after a multiple protein sequence alignment shows their diversity (Figure 1b; Supplementary Table S2).

Figure 1. Sequence analysis and purification of AtAbf43C.

Figure 1

(A) The general architecture of the Clo1313_2794 Gene. (B) A phylogenetic tree constructed after a multiple protein sequence alignment using ClustalOmega of the GenBank protein accession # (ADU75776.1) corresponding to Clo1313_2794 (★) with the other characterized members of the GH43_26 subfamily. (C) Protein gel of the purified full-length AtAbf43C protein and its truncated versions: AtAbf43C_CBM42 and AtAbf43C_GH43. CBM, carbohydrate binding module; GH, glycoside hydrolase.

AtAbf43C enzymatic activity

To determine the biochemical properties of AtAbf43C, recombinant AtAbf43C (Residues 21–511, lacking the signal peptide and dockerin I domain) and truncation mutants AtAbf43C_CBM42 (Residues 21–167) and AtAbf43C_GH43 (Residues 167–494) were expressed and purified (Figure 1a & c; Supplementary Table S3). Using pNPAra as the substrate, we investigated the optimal reaction conditions for AtAbf43C. AtAbf43C showed optimal activity at pH 5.5 and 65 °C, though the enzyme retained >75% of its maximum activity within range of pH 5–6.5 and a temperature of 55–70 °C (Figure 2a & b). When incubated at elevated temperatures, AtAbf43C retains at least 46-48% of its initial activity and at least 28% after 6 hours when incubated at 55–60 °C and 65 °C, respectively (Figure 2c). This is consistent with the results of the temperature optimization assay in which enzyme activity begins to diminish at 70 °C and above (Figure 2b). The effects of additives on the activity of AtAbf43C were also tested (Figure 2d). Some divalent ions, including Ca2+, Mg2+, and Co2+, appeared to have a positive effect on enzymatic activity, with Ca2+ addition having the largest effect. In contrast, Zn2+ and Cu2+ caused a significant decrease in activity. The effects observed with the other additives tested were not statistically significant (P value ≤0.05) (Figure 2d, Supplementary Table S4).

Figure 2. Effect of environmental conditions on AtAbf43C activity on pNPAra.

Figure 2

(A) pH optimization performed at 55 °C. (B) Temperature optimization performed at pH 5.5. (C) Thermostability test of AtAbf43C. (D) Effect of various 10 mM additives on enzymatic activity. Asterisks indicate statistical significance (P≤0.05) compared with the untreated condition. Error bars represent standard deviations between triplicate technical replicates at each reaction condition. pNPAra, Para-nitrophenol-α-l-arabinofuranoside.

Substrate specificity

At the optimal pH and temperature, AtAbf43C and AtAbf43C_GH43 had specific activities of 4.95 ± 0.28 U/mg and 5.67 ± 0.63 U/mg, respectively, on pNPAra, while AtAbf43C_CBM42 showed no activity on pNPAra, with U defined as mM/s of released pNP (Supplementary Table S5A). This confirmed that AtAbf43C_GH43 contained the active catalytic domain. In addition to pNPAra, the activity of AtAbf43C and its truncated versions were tested on several other substrates. These included natural substrates: wheat arabinoxylan (WAX), beechwood xylan (BX), and sugar beet arabinan (SBA), which were tested using the dinitrosalicylic acid (DNS) reducing assay; as well as three additional synthetic pNP glycosides: para-nitrophenol-β-D-xylopyranoside (pNPXy), para-nitrophenol-α-D-galactopyranoside (pNPαGal), and para-nitrophenol-β-D-galactopyranoside (pNPβGal). However, no activity on any of these other substrates was detected (Supplementary Table S5A & B). Kinetic parameters were then determined at optimal conditions on pNPAra for AtAbf43C and AtAbf43C_GH43 (Table 1, Supplementary Table S6). Full substrate saturation could not be achieved for either protein, as the Km values were so high as to approach the solubility limit of pNPAra in aqueous solution.

Table 1. Kinetic parameters determined for AtAbf43C and AtAbf43C_GH43 on pNPAra at optimal conditions.

Protein K m (mM) k cat (s-1) k cat/K m (s-1 mM-1)
AtAbf43C 22.7 ± 4.1 269.2 ± 25.5 11.9 ± 2.4
AtAbf43C _GH43 59.6 ± 12.1 237.5 ± 33.4 4.0 ± 1.0

CBM, carbohydrate binding module. GH, glycoside hydrolase. pNPAra, para-nitrophenol-α-L-arabinofuranoside.

Crystal structure and mutagenesis study of AtAbf43C

Crystal structures were obtained for the full-length AtAbf43C protein (PDB code: 9NXG) at a resolution of 1.32 Å as well as individual domains AtAbf43C_CBM42 (PDB code: 9NXI) and AtAbf43C_GH43 (PDB code: 9NXJ) at 1.75 Å and 2.32 Å, respectively. Additionally, a 1.75 Å crystal structure (PDB code: 9NXH) was solved for AtAbf43C soaked in L-arabinose immediately prior to freezing and mounting in which two L-arabinofuranose molecules were found bound to the CBM42 domain. A summary of refinement statistics for all four structures can be found in Table 2.

Table 2. Summary of data and refinement statistics for the structures of AtAbf43C.

Protein AtAbf43C AtAbf43C + Arabinose AtAbf43C_CBM42 AtAbf43C_GH43
PDB code 9NXG 9NXH 9NXI 9NXJ
Resolution range for data (Å) 29–1.32 (1.35–1.32) 30–1.75 (1.84–1.75) 28–1.75 (1.80–1.75) 30–2.32 (2.38–2.32)
Space group P 1 21 1 C 1 2 1 P 21 21 2 P65
 a, b, c (Å) 80.3, 81.6, 82.1 86.9, 81.0, 76.9 48.2, 98.0, 37.9 105.3, 105.3, 128.9
 α, β, γ (o) 90, 108.4, 90 90, 119.9, 90 90, 90, 90 90, 90, 120
Unique reflections 232,133 (15,790) 46,612 (6,764) 18,790 (1,362) 35,192 (2,503)
Completeness (%) 98.9 (91.6) 100.0 (100.0) (100.) 99.7 (96.1)
Multiplicity 3.4 (2.6) 7.0 (6.7) 6.6 (6.6) 20.6 (20.2)
CC1/2 0.999 (0.776) 0.996 (0.520) 0.999 (0.880) 0.999 (0.867)
I/𝜎(I) 16.7 (2.7) 8.4 (2.1) 16.7 (2.5) 21.2 (3.3)
Rmerge 0.046 (0.359) 0.115 (0.807) 0.078 (0.715) 0.118 (0.852)
Rmeas 0.054 (0.452) 0.168 (0.875) 0.085 (0.777) 0.121 (0.874)
Resolution range for refinement (Å) 29–1.32 (1.34–1.32) 30–1.75 (1.79–1.75) 28–1.75 (1.84–1.75) 30–2.32 (2.35–2.32)
Reflections used in refinement 228,831 46,265 18,404 34,508
Reflections used for R-free 11,627 2267 917 1644
R-work 0.158 (0.246) 0.180 (0.288) 0.197 (0.248) 0.205 (0.292)
R-free 0.172 (0.265) 0.220 (0.304) 0.217 (0.337) 0.247 (0.367)
Number of nonhydrogen atoms 8695 4049 1215 5337
 Macromolecules 7680 3781 1160 5209
 Ligands/ions 81 39 6 40
 Water 934 229 49 88
Protein residues 940 468 137 654
RMS (bonds) (Å) 0.005 0.007 0.007 0.008
RMS (angles) (O) 0.874 0.88 0.921 0.965
Ramachandran favored (%) 96.6 96.6 94.1 95.8
Ramachandran allowed (%) 3.4 3.2 5.9 4.2
Ramachandran outliers (%) 0.0 0.2 0.0 0.0
Rotamer outliers (%) 0.2 0.2 2.4 1.4
Clash score 0.7 1.6 2.2 3.1
Wilson B-factor (Å2) 9.9 16.3 17.9 41.4
Average B-factor (Å2) 13.3 17.2 23.2 44.0
 Macromolecules (Å2) 12.1 16.9 22.9 44.0
 Ligands/ions (Å2) 16.4 33.0 31.3 53.8
 Water (Å2) 22.8 21.9 29.9 39.1

CBM, carbohydrate binding module. GH, glycoside hydrolase.

The full-length AtAbf43C arabinose-soaked structure (PDB 9NXH) consists of the smaller N-terminal CBM42 domain connected via a short 19 amino acid linker to the larger C-terminal catalytic GH43 domain (Figure 3a). In addition to the two arabinose molecules bound to the CBM42 domain, the structure contains three glycerol molecules and a central magnesium ion in the GH43 domain. The catalytic GH43 domain displays the five-bladed β-propeller fold typical of GH43 enzymes [11,13,24-29] (Figure 3b). The GH43 domain of AtAbf43C does not have a C-terminal β-jelly roll domain found in some subfamilies of GH43 proteins; however, the N-terminal strand of the domain appears to form part of the fifth blade in the blade V structure in what is colloquially termed a ‘molecular velcro’ [11,25-27,29] (Figure 3b). This closure in the structure, thought to provide extra structural stability, is not found in many GH43 enzymes in which the fifth blade in the beta propeller only consists of residues found in the C-terminal strand [11,27,29].

Figure 3. The major structural features of AtAbf43C shown using the arabinose-soaked structure (PDB code: 9NXH).

Figure 3

Magnesium (Mg2+) is shown as a purple sphere, glycerol molecules are shown as green stick structures, and arabinose molecules are shown as dark gray stick structures. (a) The overall structure of AtAbf43C with the GH43 domain shown in yellow and the CBM42 domain shown in red. (b) The five-bladed β-propeller structure of the GH43 domain, with each blade shown in a distinct color. (c) The three catalytic residues labeled and shown as stick structures within the GH43 structure. (d) The β-trefoil structure of the CBM42 domain, with each subdomain shown in a distinct color. (e) Arabinose bound in the β-pocket of the CBM42 domain. (f) Arabinose bound in the γ-pocket of the CBM42 domain. Contacted residues in (e & f) are labeled and shown as stick structures. Hydrogen bonds between the residues and arabinose molecule bonds are shown as dashed cyan lines, with bond lengths labeled in Å. CBM, carbohydrate binding module; GH, glycoside hydrolase.

GH43 enzymes operate via an inverting mechanism in which three catalytic residues are highly conserved: aspartate acting as a base, a glutamate acting as an acid, and a second aspartate acting as a pKa modulator [9,13,24-29]. In AtAbf43C, these residues were identified to be D168, D283, and E344, found closely grouped at the base of the beta propeller structure (Figure 3c). To demonstrate the importance of these residues to catalytic activity, site-directed mutagenesis was used to generate versions of AtAbf43C (AtAbf43C_D168A, AtAbf43C_D283A, AtAbf43C_E344A, and AtAbf43C_H408A) in which each of the three active site residues, as well as histidine (H408) initially thought to interact with the magnesium ion, was individually mutated to alanine (Supplementary Figure S1A & B). When purified and tested alongside wildtype AtAbf43C, activity on pNPAra was completely eliminated in any of the versions of AtAbf43C where one of the three catalytic residues (D168, D283, and E344) was mutated to alanine (Supplementary Figure S1B; Supplementary Table S7), indicating these sites are critical to enzymatic function. The H408A mutant appeared to have a small and statistically significant (P value ≤ 0.05) increase in activity relative to the wildtype enzyme, suggesting this histidine residue is not critical for activity (Supplementary Table S7).

The CBM42 domain of AtAbf43C displays the typical β-trefoil structure found in other CBM42 and similar CBM13 family proteins consisting of three 40–50 amino acid subdomains (α, β, and γ), each of which harbors a potential sugar-binding pocket (Figure 3d) [13,14,24]. However, it has been observed in other CBM42 proteins that one of these three pockets may become nonfunctional [13,14,24]. In the solved structure of arabinose-soaked AtAbf43C_CBM42 (PDB 9NXH), arabinose was only found bound in the β and ɣ pockets of the CBM42 domain (Figure 3d). In the β pocket, the arabinose molecule formed hydrogen bonds with three residues, Y73, H70, and D89 (Figure 3e). Similarly, the arabinose in the γ pocket contacted Y121, H118, and D136, as well as an additional residue N120 (Figure 3f). In the previous study of this CBM42 by Ribeiro et al., residues D39, D91, and D136 were individually altered to alanine [14]. While versions with the D91A and D136A substitutions showed significantly decreased binding affinity for arabinoxylan and arabinose relative to the wildtype protein, the D39A mutation, which corresponds to the α pocket, did not significantly affect binding [14]. The binding pattern of arabinose observed in our structure would thus appear to support that the α pocket is nonfunctional or does not contribute to arabinose binding in the Atabf43C CBM42 domain.

Of the ten previously characterized GH43_26 enzymes (Figure 1B; Supplementary Table S2), SaAraf43A from the bacterium Streptomyces avermitilis is the only GH43_26 enzyme other than AtAbf43C with a crystal structure containing both the CBM42 and GH43 domains. Previously, SaAraf43A was extensively characterized by Ichinose et al. and Fujimoto et al., including multiple crystal structures (PDB: 3AKF, 3AKG, 3AKH, 3AKI) of the full-length protein complexed with various substrates, and a mutagenesis study identifying its catalytic residues [24,30]. SaAraf43A is an exo-1,5-α-L-arabinofuranosidase, with activity primarily toward AOSs such as arabinotriose, arabinotetraose, and arabinopentose [24,30]. Like AtAbf43C, SaAraf43A contains both a GH43 and CBM42 domain, but in SaAraf43C, the GH43 domain resides at the N-terminus of the protein followed by a C-terminal CBM42 [24]. Interestingly, despite the reverse ordering of these domains between SaAraf43A and AtAbf43C, the individual GH43 and CBM42 domains closely align. Structural alignment of the GH43 and CBM42 domains in AtAbf43C individually to the structure of arabinotriose complexed structure of SaAraf43A (PDB code: 3AKH) results in an RMSD of 0.475 Å (269 common Cα atoms) and 0.532 Å (111 common Cα atoms), respectively (Figure 4a; Supplementary Figure S2A). As such, residues D168, D283, and E344 in the GH43 domain AtAbf43C closely aligned with corresponding catalytic residues D20, D135, and E196 in SaAraf43A (Figure 4b), which when individually changed to alanine by Fujimoto et al. eliminated enzymatic activity on pNPAra [24]. Furthermore, this structure of SaAraf43A contained an arabinose and arabinobiose molecule within the binding pocket of its GH43 domain. The topology of this binding pocket, characteristic of exo-acting GH43 enzymes, is such that the three catalytic residues are positioned at the bottom of a deep, narrow opening that sterically limits access to larger or branched substrates [24]. This is opposed to endo-acting GH43s that are active on polysaccharides such as arabinoxylan and arabinan, which possess a much more exposed binding cleft [24-27,29]. This narrowed binding pocket results from an extended loop structure in the fifth blade of the beta propeller, which AtAbf43C appears to possess (Figure 3a & b) [24,25]. When overlaid with the apo structure of the AtAbf43C GH43 domain, the bound arabinose and arabinobiose from SaAraf43A fit neatly within the surface structure of the AtAbf43C-binding pocket (Figure 4c), suggesting AtAbf43C maybe be active on similar AOS substrates as SaAraf43A. The binding domain of AtAbf43C also closely aligned with the CBM42 domain in SaAraf43A (Supplementary Figure S2A). However, while both structures had sugars in their β-subdomains, the SaAraf43A CBM42 had an arabinobiose bound in its α-pocket, and an unliganded γ-pocket (Supplementary Figure S2B-D). Furthermore, alignment of binding residues in the α-pocket differed significantly between the two proteins, with AtAbf43C possessing a proline and two glutamines at positions where SaAraf43A possesses glutamine, histidine, and aspartate residues, respectively (Supplementary Figure S2B). These differences could account for a nonfunctional α-binding pocket in the AtAbf43C CBM42.

Figure 4. Structural comparison of AtAbf43C and SaAraf43A.

Figure 4

(a) Structural alignment of the GH43 domain of arabinose soaked AtAbf43C structure (GH43 domain in yellow, CBM42 domain in red) with the structure of SaAraf43A complexed with arabinotriose (PDB Code: 3AKH) shown in light blue. (b) Alignment of the catalytic residues in SaAraf43A (light blue) and co-crystallized arabinose and arabinobiose molecules in SaAraf43A structure with the active site residues of AtAbf43C (yellow). (c) Superimposition of the arabinose and arabinobiose molecules from the SaAraf43A structure onto the surface structure of AtAbf43C, with the AtAbf43C catalytic residues shown in yellow. In (a-c), sodium (Na+) is shown as an orange sphere, chlorine (Cl-) is shown as a green sphere, glycerol molecules are shown as green stick structures, and sugar molecules are shown as dark gray stick structures. CBM, carbohydrate binding module; GH, glycoside hydrolase.

LC-MS analysis of natural polysaccharide and oligosaccharide hydrolysis

Finally, using liquid chromatography-mass spectrometry (LC-MS), the activity of AtAbf43C at optimal conditions was tested on the three natural substrates used previously (WAX, BX, SBA), as well as the following oligosaccharides: arabinobiose (A2), arabinotriose (A3), arabinotetraose (A4), arabinopentose (A5), 23-α-L-arabinofuranosyl-xylotriose (A2XXX), and 33-α-L-arabinofuranosyl-xylotetraose (XA3XXX). Based on these results, AtAbf43C appears to be primarily active toward α-1,5 linked AOS and to a lesser extent on arabinan, where it seems to act in an exo manner to release free arabinose (Table 3; Supplementary Figure S3). While marginal activity was detected on WAX, no significant activity was detected on BX or the AXOS tested, suggesting AtAbf43C does not degrade xylan-based substrates (Table 3).

Table 3. Summary of results from LC-MS analysis of AtAbf43C hydrolysis of various natural substrates and oligosaccharides. Mass spectra produced from the LC-MS-based experiment and used for subsequent analysis can be found in the supporting information (Supplementary Figure S3).

Substrate WAX BX SBA A2 A3 A4 A5 A2XXX XA3XXX
Activity +/- - + + + + + - -

A2, arabinobiose. A2XXX, 23-α-L-arabinofuranosyl-xylotriose. A3, arabinotriose. A4, arabinotetraose. A5, arabinopentose. BX, beechwood xylan. SBA, sugar beet arabinan. WAX, wheat arabinoxylan. XA3XXX, 33-α-L-arabinofuranosyl-xylotetraose.

Discussion

AtAbf43C is the fourth GH43 enzyme to be characterized from A. thermocellus and only the third with an experimentally determined structure [12,17,18,28,29]. Furthermore, AtAbf43C is the first enzyme in the GH43_26 subfamily from A. thermocellus, and only the second crystal structure of a GH43_26 enzyme that includes the CBM domain [12,17,18,24,25] (Supplementary Table S2). Natively, AtAbf43C is likely an extracellular cellulosomal enzyme based on the presence of a N-terminal signal peptide and C-terminal dockerin I domain [4,12,17,18]. Characterization of the activity of AtAbf43C shows that it is an α-arabinofuranosidase with activity toward pNPAra. The optimal pH and temperature for AtAbf43C on pNPAra (pH 5.5 and 65°C, Figure 2a & b) are generally consistent with other characterized A. thermocellus enzymes [4,12,17]. Kinetic parameters determined on pNPAra show AtAbf43C had very high K m as to approach the solubility limit of the substrate (Table 1), a result which was observed in the previously characterized A. thermocellus GH43 enzyme AxB8, an α-arabinofuranosidase that was also primarily active on oligosaccharides [12]. The larger k cat/Km value observed in the full-length AtAbf43C protein versus the AtAbf43C_GH43 domain would suggest the CBM42 domain may aid in substrate specificity to pNPAra; however, this is not definitive due to an inability to achieve substrate saturation due to the solubility limit of pNPAra. AtAbf43C was not active on the other pNP glycosides tested (Supplementary Table S5A), indicating it acts primarily as an α-arabinofuranosidase and does not have secondary function as a β-xylosidase as has been reported for other GH43 enzymes [3-5,12]. Meanwhile, testing on natural hemicellulose substrates (WAX, BX, and SBA) using the DNS assay suggested that AtAbf43C was inactive toward xylan and arabinan polysaccharides (Supplementary Table S5B).

The solution of multiple AtAbf43C crystal structures provided some insight into the enzyme’s preferred substrate and mode of action. First, the arabinose-bound structure of full-length AtAbf43C builds upon previous work by Ribeiro et al., by providing structural evidence for a nonfunctional α-binding pocket, with arabinose only bound in the β- and γ- subdomains [14]. Subsequent alignment with the homologous CBM42 structure in SaAraf43A further supports this, as significant differences in corresponding binding residues were observed between the two proteins. Next, alignment of the GH43 domain in AtAbf43C with SaAraf43A allowed for identification of its three conserved active site residues in AtAbf43C (D168, D283, and E344) and insight into the binding modality of its binding pocket [24]. Mutation of these catalytic residues in AtAbf43C confirms their involvement in its activity with all single point mutants losing activity (Supplementary Figure S1, Supplementary Table S7). The deep-narrow topology of this pocket (Figure 3c), which limits access to the active site, is very similar to that of SaAraf43A (Figure 4b & c), which acts in an exo manner toward α-1,5-linked AOS. Subsequent LC-MS analysis of the hydrolysis products of AtAbf43C on natural substrates shows that this enzyme is capable of liberating some arabinose from SBA and, to a lesser extent, from WAX (Table 3, Supplementary Figure S3), though notably not enough to be detected by the DNS assay, indicating that it probably acts in an exo manner at the ends of polysaccharides. LC-MS analysis of oligosaccharide hydrolysis (Table 3, Supplementary Figure S3) showed AtAbf43C is active toward α-1,5-linked AOS but not AXOS. This is consistent with activities observed in other members of the GH43_26 subfamily, which all act as arabinan- or AOS-degrading α-1,5-arabinofuranosidases (Supplementary Table S2). Taken together, we demonstrate through structural and functional characterization that AtAbf43C is an active arabinofuranosidase, specialized in degrading AOSs to arabinose in an exo manner. AtAbf43C from A. thermocellus is the most thermophilic bacterial GH43_26 enzyme characterized to date, thus deepening our understanding of this important subfamily of arabinofuranosidase enzyme.

Methods

Sequence and phylogenetic analysis

The nucleotide and protein sequences for AtAbf43C (Locus tag: Clo1313_2794, GenBank Protein accession: ADU75776.1) as well as its predicted domains were found using the CAZy database and the National Center for Biotechnology Information (NCBI) database from the A. thermocellus DSM1313 genome [7,31]. The N-terminal signal peptide was predicted using the SignalP 6.0 software [32]. The protein–protein BLAST search and pairwise alignment was conducted using the full amino acid sequence for Clo1313_2794 using the NCBI blastp tool. Characterized GH43 subfamily 26 proteins were found as annotated in the CAZy database, and their protein sequences were obtained using their primary accession number in the NCBI database. Multiple sequence alignment of these proteins was performed using the ClustalOmega algorithm from which the phylogenetic tree was constructed using the IQTREE webserver and visualized using the Interactive Tree of Life (iTOL) online tool [33-35].

Cloning of AtAbf43C

A table of oligonucleotide primers used to construct the plasmid in this study can be found in the supporting information (Supplementary Table S8). The gene for AtAbf43C was amplified from A. thermocellus DSM1313 genomic DNA purchased from the Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures via PCR (Supplementary Table S8; Primers JLG005-006). Amplification removed a predicted N-terminal signal peptide and C-terminal dockerin I domain from the Clo1313_2794 gene encoding the AtAbf43C module while adding overlap regions for subsequent Gibson assembly into the pET28b(+) expression vector (EMD Millipore) that added a C-terminal -LEHHHHHH purification tag. Truncated versions of AtAbf43C (AtAbf43C_CBM42 and AtAbf43C_GH43) were amplified from this resulting vector with overlaps for Gibson assembly into the pET28b(+) vector (Supplementary Table S8; Primers JLG005 & JLG036-38). Site-directed mutagenesis of AtAbf43C was carried out via PCR amplification of the full-length expression plasmid using mutagenic primers which added overlap regions for subsequent recircularization via Gibson assembly (Supplementary Table S8; Primers JLG195-202). Gibson assemblies were performed by incubating amplified DNA fragments at 50 °C for 1 hour with NEBuilder HiFi DNA Assembly Master mix (New England Biosciences) as per the manufacturer’s recommended protocol.

Bacterial strains and culture conditions

Plasmids were cloned in NEB chemically competent Escherichia coli DH5α (New England Biolabs). Plasmids were isolated using ZymoPURE miniprep kits (Zymo Research), and plasmid sequences were confirmed by sequencing (Azenta Genewiz). Sequenced confirmed plasmids were then transformed into chemically competent E. coli BL21 (DE3) pRosetta2 (EMD Millipore) for protein expression. E. coli cultures were maintained in enriched Luria-Bertani (LB) Medium (24 g/l yeast extract, 10 g/l tryptone, 5 g/l NaCl) or LB medium (5 g/l yeast extract, 10 g/l tryptone, 5 g/l NaCl, 15 g/l agar) (1.5% w/v) agar plates with 50 µg/ml kanamycin (IBI Scientific), or 50 µg/ml kanamycin (IBI Scientific) and 33 µg/ml chloramphenicol (RPI), as appropriate.

Protein expression and purification

Protein expression was induced by inoculating ZYM-5052 autoinduction media with overnight cultures of the transformed E. coli BL21 DE3 Rosetta strains [36]. Cells were harvested by centrifugation at 6000×g for 10 minutes after 18–22 hours of growth at 37 °C in a shaking incubator at 250 rpm. Cell pellets were resuspended in lysis buffer (20 mM Sodium Phosphate pH 7.4, 500 mM NaCl, 10 mM imidazole) before being lysed via sonication on ice using a Branson SFX 550 Sonifer® in cycles of 10 s on at 20 kHz and 10 s off for 10 minutes total. This lysate was then centrifuged for 30 minutes at 30,000×g at 4 °C, and the resulting supernatant was passed through a 0.22 µm filter. Full-length AtAbf43C and its truncated versions were then purified via Immobilized Metal Affinity Chromatography (IMAC) using 5 ml EconoFit IMAC columns (Bio-Rad) on an NGC Chromatography System (Bio-Rad) and fractionated in elution buffer (20 mM Sodium Phosphate pH 7.4, 500 mM NaCl, 250 mM imidazole). Fractions of IMAC-purified full-length AtAbf43C protein were combined and further purified via size exclusion chromatography via FPLC using a HiLoadTM 26/600 SuperdexTM 200 pg. column (Cytiva) in a mobile phase of 50 mM sodium phosphate pH 7.0, 150 mM NaCl buffer. Mutant versions of AtAbf43C and wildtype AtAbf43C protein used to test the effect of mutagenesis were purified via immobilized metal ion chromatography using His-Spin Protein Mini Prep Kits (Zymo Research) as per the manufacturer’s instructions. After purifications, all proteins were buffer exchanged into pH 6.0 100 mM sodium phosphate buffer using 30 kDa MWCO Centrifugal Filter Units (CELLTREAT®) or 10 kDa MWCO Spin-X® UF 20 ml Centrifugal Concentrators (Corning®). Purity of the resulting proteins was assessed by SDS-PAGE using 4–20% Mini-PROTEAN® TGX Stain-FreeTM Protein Gels (Bio-Rad) with Precision PlusTM unstained protein standards (Bio-Rad). Protein concentration was evaluated by measuring the A280nm of the resulting buffer-exchanged proteins using a Nanodrop One spectrophotometer (Thermo Scientific) and calculating protein concentration using Beer’s law and the calculated A280nm extinction coefficient of each individual protein [37-39]. Aliquots of full-length AtAbf43C were frozen at −80°C prior to further analysis at a protein concentration of 9 mg/ml.

Substrates

pNPAra and pNPXy were obtained from Sigma-Aldrich and EMD Millipore, respectively. pNPαGal and pNPβGal were obtained from TCI chemicals. Natural substrates with purities > 95% including WAX, BX, and SBA were obtained from Megazyme (Neogen). Oligosaccharides including arabinobiose, arabinotriose, arabinotetraose, arabinopentose, 23-α-L-arabinofuranosyl-xylotriose (A2XXX), and 33-α-L-arabinofuranosyl-xylotetraose (XA3XXX) were obtained from Megazyme (Neogen). Arabinose and xylose were purchased from Fisher Scientific.

Temperature and pH optimization

Prior to all assays, AtAbf43C protein at a final concentration of 9 mg/ml in pH 6 100 mM sodium phosphate buffer was diluted to 0.025 mg/ml in an appropriate reaction buffer, described further below. Reactions were initiated by adding 45 µl of 5 mM pNPAra solution dissolved in appropriate buffer to 5 µl of the 0.025 mg/ml enzyme solution in PCR strip tubes, before immediately being moved to a thermocycler for incubation. After 10 minutes, reactions were stopped with the addition of 100 µl of 1M sodium carbonate. The absorbance at 405 nm of 100 µl of each reaction was measured in a flat-bottomed clear 96 well plate using a BioTek SynergyH1 microplate reader (Agilent). pH optimization was first performed at 55 °C with 100 mM sodium acetate buffer used for pH 4–5.5 conditions, and 100 mM sodium phosphate buffer for pH 6–8 conditions. Temperature optimization was then performed at the optimal observed pH of 5.5 in 100 mM sodium acetate buffer. Activity was calculated as a percentage relative to the highest observed activity. All reaction conditions were performed in triplicate.

Thermostability and effect of additives

To test the thermostability of AtAbf43C, protein was first diluted as described previously in pH 5.5 100 mM sodium acetate buffer, before being incubated in a thermocycler at temperatures of 55–70 °C. Aliquots of protein were then removed at 30 minutes, 1 hour, 3 hour, and 6 hour timepoints. Activity was then tested as described above at the optimal observed temperature of 65 °C using 5 mM pNPAra dissolved in pH 5.5 100 mM sodium acetate buffer. Residual activity was calculated as a percentage relative to that of unincubated AtAbf43C. To test the effects of various salts and chelating agents, AtAbf43C was first diluted in pH 5.5 100 mM sodium acetate buffer containing the additive at 10 mM and preincubated at room temperature for 1 hour prior to adding substrate. Activity was then tested as described previously at optimal temperature and pH except the 5 mM pNPAra substrate solutions also contained 10 mM of the specific additive. Additives were as follows: NaCl, CaCl2, KCl, MgSO4*7H2O, MnCl2*4H2O, ZnSO4*7 H2O, CuCl2*2H2O, FeCl3*6H2O, CoCl2, NiCl2*6H2O, and EDTA tetrasodium dihydrate. Activity was calculated as a percentage relative to AtAbf43C incubated and tested with no additive. Additionally, the absorbance at 405 nm measured from blank solutions containing additives was subtracted from those observed in the corresponding enzymatic reaction conditions to control for the variation in absorbance due to the addition of the specific metal salt or chelating agent. All reaction conditions were performed in triplicate. Statistical significance of additive effects was determined by running a Brown-Forsythe and Welch ANOVA test on the collected data in GraphPad Prism version 10.0 (GraphPad Software, Boston, Massachusetts U.S.A.).

Substrate specificity

Activity against pNP-Glycosides (pNPAra, pNPXy, pNPαGal, and pNPβGal) was tested at optimal pH and temperature as described above for pNPAra on AtAbf43C, AtAbf43C_GH43, and AtAbf43C_CBM42. 4-nitrophenol (pNP) released was quantified via Beer’s law with a 405 nm extinction coefficient of 18,500 L/(mol*cm) and path length calculated empirically as per the manufacturer’s recommendation [40,41]. Specific activity was then calculated as U/mg of protein, where U is defined as µM/s of released pNP. Activity against natural substrates, WAX, BX, and SBA, was then tested by incubating the proteins as described above at optimal conditions except with substrate solutions containing natural substrate dissolved at 1% (w/v) in buffer and incubation time lengthened to 1 hour. Activity was detected as described previously by Conway et al. using the dinitrosalicylic acid (DNS) reducing assay with L-arabinose used as a standard for oligosaccharide release [42,43].

Kinetic parameters were determined for AtAbf43C and AtAbf43C_GH43 by incubating the proteins as described above except with varying concentrations of pNPAra (0.9–36 mM) and with reaction times shortened to 2 minutes to ensure linear initial rates of reaction. pNP released was quantified as described above with velocities calculated as µM pNP/s. Using the predicted molar mass of each protein, kinetic parameters were then calculated using the predicted molar mass via nonlinear regression using the ‘determine kcat’ model in GraphPad Prism version 10.0 (GraphPad Software, Boston, Massachusetts U.S.A.).

Testing of AtAbf43C mutants

To test the importance of certain residues to catalytic activity, purified AtAbf43C with residues D168, D283, E344, and H408 individually mutated to alanine was tested alongside wildtype AtAbf43C on pNPAra. Proteins were prepared and incubated in pH 5.5 sodium acetate with 5 mM pNPAra substrate as described above for 10 minutes at 65 °C at a final reaction concentration of 0.0025 mg/ml protein. Activity was calculated as a percentage relative to that of wildtype AtAbf43C. Statistical significance was determined by running a Brown-Forsythe and Welch ANOVA test on the collected data in GraphPad Prism version 10.0 (GraphPad Software, Boston, Massachusetts U.S.A.).

LC-MS analysis of hydrolyzed products

To further investigate the activity of AtAbf43C, reactions were initiated by adding 90 µl of substrate solution dissolved in pH 5.5 100 mM sodium acetate buffer to 10 µl of enzyme diluted in the same buffer as described above at 0.025 mg/ml or 10 µl of blank buffer. Samples were incubated at 65 °C for 1 hour in a thermocycler, after which the reactions were stopped by heating at 95 °C for 5 minutes to inactivate the enzyme. Natural substrates WAX, BX, and SBA were dissolved at a concentration of 1% (w/v), while oligosaccharides and sugars were dissolved at 0.1% w/v. LC-MS analysis was performed using an Agilent 6530 QTOF connected to an Agilent 1260 LC system. Mass spectra were acquired using electrospray ionization (ESI) with the instrument in positive ion mode. Reaction samples were run on an Agilent HI-PLEX Na (Octo) 300 × 7.7 mm column heated to 80 °C. The mobile phase was ultrapure water, and sample runs were 30 minutes long with a flow rate of 0.5 ml/min. Data were analyzed using Agilent MassHunter software; mass spectra and extracted ion chromatograms (EICs) for species of interest ([M + Na]+ adducts) were then obtained.

Crystallization, data collection, and structure refinement

Purified AtAbf43C, AtAbf43C_CBM42, and AtAbf43C_GH43 proteins at concentrations of 9.0 mg/ml, 4.4 mg/ml, and 8.5 mg/ml, respectively, were crystalized over the course of several days via the sitting drop vapor diffusion method. AtAbf43C was crystallized in the form of thin needle-like blades in space group P21 with two molecules in the asymmetric unit from a solution of 15-20% v/v propanol, 25% w/v PEG3350, 50 mM (NH4)2SO4, and 0.1 M HEPES pH 7.7. AtAbf43C crystals soaked in L-arabinose, crystallized in similar conditions, were in space group C2 with one molecule in the asymmetric unit. AtAbf43C_CBM42 was crystallized in space group P21212 with one molecule in the asymmetric unit in the form of small flat plates from a solution of 25% w/v PEG3350 and 0.1 M citric acid pH 3.0. AtAbf43C_GH43 was crystallized in space group P65 with two molecules in the asymmetric unit in the form of bi-pyramidal crystals in 25% w/v polyethylene glycol monomethyl ether (pegM) 5000, 0.25 M (NH4)2SO4, and 0.1 M Bis-Tris pH 5.5. Crystals were mounted in nylon loops (Hampton Research) and flash-cooled in liquid nitrogen after brief (<30 seconds) equilibration in cryoprotectant solutions. Cryoprotectant solutions corresponded to crystallization solutions supplemented with 27–30% v/v glycerol. For the arabinose soak of AtAbf43C crystals, the cryoprotectant solution was also supplemented with 5% w/v arabinose and the crystal equilibration time in this solution was extended to 3 minutes before flash-cooling.

Data were collected on beam lines 17-ID-1 (AMX) and 17-ID-2 (FMX) at Brookhaven National Lab, Upton, New York, U.S.A. Data were processed using XDS and scaled using AIMLESS [44,45]. Data collection and processing statistics are shown in Table 2. Structures of AtAbf43C, AtAbf43C_CBM42, and AtAbf43C_GH43 proteins were determined by molecular replacement using the program PHASER starting from the AlphaFold model of AtAbf43C and partially refined versions thereof [46]. AtAbf43C_GH43 was first solved using an edited AlphaFold model, and the partially refined structure was used as a model for AtAbf43C which was then completed using the AlphaFold model for AtAbf43C_ CBM42. Partial refinement of this full-length AtAbf43C model at 1.3 Å resolution was then used as the starting point for the solution of AtAbf43C_CBM42. Initial models were iteratively rebuilt and refined using COOT and PHENIX.REFINE, respectively [46,47]. An apparent metal binding site was added based on local site geometry and surrounding ligands and assigned as Mg2+. Arabinose binding to AtAbf43C was assessed by model-phased difference (||Fobs|-|Fcalc||) maps revealing two arabinose molecules in the furanose conformation bound to AtAbf43C. We were not successful in obtaining arabinose bound to AtAbf43C_GH43 in soaking experiments. Persistent difference density at the expected active site of AtAbf43C_GH43, too large to be the cryoprotectant used (glycerol), was modeled using a generic carbohydrate ring and formally designated as an unknown ligand (residue name UNL). No oligomerization of the AtAbf43C, AtAbf43C_CBM42, and AtAbf43C_GH43 proteins was observed within the crystal lattices of the four crystal forms. Final refinement statistics are shown in Table 2. The four structures have been deposited in the Protein Data Bank (AtAbf43C with id 9NXG, AtAbf43C:Arabinose with id 9NXH, AtAbf43C_CBM42 with id 9NXI, and AtAbf43C_GH43 with id 9NXJ) [48-52]. Visualizations were prepared and structural alignments were performed using the PyMOL Molecular Graphics System, Version 2.5.8 Schrödinger, LLC.

Supplementary Material

Online supplementary figure 1
Online supplementary table 1

Acknowledgments

This research used resources of the National Synchrotron Light Source II, a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Brookhaven National Laboratory under Contract No. DE-SC0012704. The Center for BioMolecular Structure (CBMS) is primarily supported by the National Institutes of Health, National Institute of General Medical Sciences (NIGMS) through a Center Core P30 Grant (P30GM133893), and by the DOE Office of Biological and Environmental Research (KP1605010).

Abbreviations

A2

arabinobiose

A3

arabinotriose

A4

arabinotetraose

A5

arabinopentose

AOS

arabino-oligosaccharides

AXOS

arabino-xylo-oligosaccharides

A2XXX

23-α-L-arabinofuranosyl-xylotriose

BX

beechwood xylan

CAZymes

carbohydrate active enzymes

CBM

carbohydrate binding module

DNS

dinitrosalicylic acid

GH

glycoside hydrolase

SBA

sugar beet arabinan

WAX

wheat arabinoxylan

XA3XXX

33-α-L-arabinofuranosyl-xylotetraose

pNP

4-nitrophenol

pNPAra

para-nitrophenol-α-L-arabinofuranoside

pNPXy

para-nitrophenol-β-D-xylopyranoside

pNPαGal

para-nitrophenol-α-D-galactopyranoside

pNPβGal

para-nitrophenol-β-D-galactopyranoside

Contributor Information

Joey L. Galindo, Email: jg5406@princeton.edu.

Philip D. Jeffrey, Email: pjeffrey@princeton.edu.

Angela Zhu, Email: az1864@princeton.edu.

A. James Link, Email: ajlink@princeton.edu.

Jonathan M. Conway, Email: jmconway@princeton.edu.

Data Availability

Protein crystal structures obtained in our study are deposited to the Protein Data Bank with accession numbers 9NXG [50], 9NXH [51], 9NXI [52], 9NXJ [53]. All other data are contained within the manuscript and supplementary information.

Competing Interests

The authors declare that they have no conflicts of interest with the contents of this article.

Funding

This work was supported by the Energy Research Fund administered by the Andlinger Center for Energy and the Environment at Princeton University and startup funds from the Department of Chemical and Biological Engineering at Princeton University to J.M.C. Mass spectrometry data were collected on an instrument purchased with a supplement to NIH grant [GM107036] to A.J.L. A.Z. was supported by an NSF Graduate Research Fellowship Program under grant [DGE-2039656].

Open Access

This article has been published open access under our Subscribe to Open programme, made possible through the support of our subscribing institutions, learn more here: https://portlandpress.com/pages/open_access_options_and_prices#conditional

CRediT Author Contribution

Joey L. Galindo - Conceptualization, Methodology, Formal analysis, Investigation, Visualization, Writing - Original Draft, Writing - Review & Editing; Philip D. Jeffrey - Formal analysis, Investigation, Data Curation, Visualization, Writing - Review & Editing; Angela Zhu - Formal analysis, Investigation, Visualization, Writing - Review & Editing; A. James Link - Conceptualization, Visualization, Resources, Supervision, Writing - Review & Editing; Jonathan M. Conway - Conceptualization, Resources, Supervision, Visualization, Writing - Original Draft, Writing - Review & Editing, Project administration, Funding acquisition

Ethics Approval

Ethical approval was not required for this study.

References

  • 1. Chen H2014) Chemical Composition and Structure of Natural Lignocellulose. In Biotechnology of Lignocellulose: Theory and Practice(Chen H., ed), pp. 25–71., , 10.1007/978-94-007-6898-7_2 [DOI] [Google Scholar]
  • 2. Bing R.G., Sulis D.B., Wang J.P., Adams M.W.W., Kelly R.M Thermophilic microbial deconstruction and conversion of natural and transgenic lignocellulose. Environ. Microbiol. Rep. 2021;13:272–293. doi: 10.1111/1758-2229.12943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Poria V., Saini J.K., Singh S., Nain L., Kuhad R.C Arabinofuranosidases: Characteristics, microbial production, and potential in waste valorization and industrial applications. Bioresour. Technol. 2020;304:123019. doi: 10.1016/j.biortech.2020.123019. [DOI] [PubMed] [Google Scholar]
  • 4. Hamann P.R.V., Noronha E.F Xylan-breakdown apparatus of Clostridium thermocellum . Cellulose. 2022;29:7535–7553. doi: 10.1007/s10570-022-04741-0. [DOI] [Google Scholar]
  • 5. Numan M.T., Bhosle N.B α-L-arabinofuranosidases: the potential applications in biotechnology. J. Ind. Microbiol. Biotechnol. 2006;33:247–260. doi: 10.1007/s10295-005-0072-1. [DOI] [PubMed] [Google Scholar]
  • 6. Gilbert H.J The biochemistry and structural biology of plant cell wall deconstruction. Plant Physiol. 2010;153:444–455. doi: 10.1104/pp.110.156646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Drula E., Garron M.-L., Dogan S., Lombard V., Henrissat B., Terrapon N The carbohydrate-active enzyme database: functions and literature. Nucleic Acids Res. 2022;50:D571–D577. doi: 10.1093/nar/gkab1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Mewis K., Lenfant N., Lombard V., Henrissat B Dividing the large glycoside hydrolase family 43 into subfamilies: a motivation for detailed enzyme characterization. Appl. Environ. Microbiol. 2016;82:1686–1692. doi: 10.1128/AEM.03453-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Vuong T.V., Wilson D.B Glycoside hydrolases: catalytic base/nucleophile diversity. Biotechnol. Bioeng. 2010;107:195–205. doi: 10.1002/bit.22838. [DOI] [PubMed] [Google Scholar]
  • 10. Sturgeon R.J1997) Advances in Macromolecular Carbohydrate Research, Greenwich, Connecticut, JAI Press; [Google Scholar]
  • 11. Vandermarliere E., Bourgois T.M., Winn M.D., Van Campenhout S., Volckaert G., Delcour J.A., et al. Structural analysis of a glycoside hydrolase family 43 arabinoxylan arabinofuranohydrolase in complex with xylotetraose reveals a different binding mechanism compared with other members of the same family. Biochem. J. 2009;418:39–47. doi: 10.1042/BJ20081256. [DOI] [PubMed] [Google Scholar]
  • 12. De Camargo B.R, Claassens N.J., Quirino B.F., Noronha E.F., Kengen S.W.M Heterologous expression and characterization of a putative glycoside hydrolase family 43 arabinofuranosidase from Clostridium thermocellum B8. Enzyme Microb. Technol. 2018;109:74–83. doi: 10.1016/j.enzmictec.2017.09.014. [DOI] [PubMed] [Google Scholar]
  • 13. The CAZypedia Consortium Ten years of CAZypedia: a living encyclopedia of carbohydrate-active enzymes. Glycobiology. 2018;28:3–8. doi: 10.1093/glycob/cwx089. [DOI] [PubMed] [Google Scholar]
  • 14. Ribeiro T., Santos-Silva T., Alves V.D., Dias F.M.V., Luís A.S., Prates J.A.M., et al. Family 42 carbohydrate-binding modules display multiple arabinoxylan-binding interfaces presenting different ligand affinities. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics. 2010;1804:2054–2062. doi: 10.1016/j.bbapap.2010.07.006. [DOI] [PubMed] [Google Scholar]
  • 15. Blumer-Schuette S.E., Brown S.D., Sander K.B., Bayer E.A., Kataeva I., Zurawski J.V., et al. Thermophilic lignocellulose deconstruction. FEMS Microbiol. Rev. 2014;38:393–448. doi: 10.1111/1574-6976.12044. [DOI] [PubMed] [Google Scholar]
  • 16. Taylor E.J., Smith N.L., Turkenburg J.P., D’Souza S., Gilbert H.J., Davies G.J. Structural insight into the ligand specificity of a thermostable family 51 arabinofuranosidase, Araf51, from Clostridium thermocellum Biochem. J. 2814490–4497. 10.1042/BJ20051780 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Ahmed S., Luis A.S., Bras J.L.A., Ghosh A., Gautam S., Gupta M.N., et al. A Novel α-L-Arabinofuranosidase of Family 43 Glycoside Hydrolase (Ct43Araf) from Clostridium thermocellum. PLOS ONE. 2013;8:e73575. doi: 10.1371/journal.pone.0073575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Ichinose H., Kuno A., Kotake T., Yoshida M., Sakka K., Hirabayashi J., et al. 2006Characterization of an Exo-β-1,3-Galactanase from Clostridium thermocellum . Appl. Environ. Microbiol. 723515–3523. 10.1128/AEM.72.5.3515-3523.2006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Liu Y., Angelov A., Feiler W., Baudrexl M., Zverlov V., Liebl W., et al. Arabinan saccharification by biogas reactor metagenome-derived arabinosyl hydrolases. Biotechnol. Biofuels Bioprod. 2022;15:121. doi: 10.1186/s13068-022-02216-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Matsuo N., Kaneko S., Kuno A., Kobayashi H., Kusakabe I Purification, characterization and gene cloning of two alpha-L-arabinofuranosidases from streptomyces chartreusis GS901. Biochem. J. 2000;346 Pt 1:9–15. doi: 10.1042/bj3460009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Michlmayr H., Hell J., Lorenz C., Böhmdorfer S., Rosenau T., Kneifel W Arabinoxylan oligosaccharide hydrolysis by family 43 and 51 glycosidases from Lactobacillus brevis DSM 20054. Appl. Environ. Microbiol. 2013;79:6747–6754. doi: 10.1128/AEM.02130-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Cartmell A., McKee L.S., Peña M.J., Larsbrink J., Brumer H., Kaneko S., et al. The structure and function of an arabinan-specific alpha-1,2-arabinofuranosidase identified from screening the activities of bacterial GH43 glycoside hydrolases. J. Biol. Chem. 2011;286:15483–15495. doi: 10.1074/jbc.M110.215962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Kang Y., Choi C.-Y., Kang J., Ju Y.-R., Kim H.B., Han N.S., et al. Functional Characterization of Endo- and Exo-Hydrolase Genes in Arabinan Degradation Gene Cluster of Bifidobacterium longum subsp. suis . Int. J. Mol. Sci. 2024;25:3175. doi: 10.3390/ijms25063175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Fujimoto Z., Ichinose H., Maehara T., Honda M., Kitaoka M., Kaneko S Crystal Structure of an Exo-1,5-α-l-arabinofuranosidase from Streptomyces avermitilis Provides Insights into the Mechanism of Substrate Discrimination between Exo- and Endo-type Enzymes in Glycoside Hydrolase Family 43. J. Biol. Chem. 2010;285:34134–34143. doi: 10.1074/jbc.M110.164251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Linares-Pastén J.A., Falck P., Albasri K., Kjellström S., Adlercreutz P., Logan D.T., et al. Three-dimensional structures and functional studies of two GH43 arabinofuranosidases from Weissella sp. strain 142 and Lactobacillus brevis . FEBS J. 2017;284:2019–2036. doi: 10.1111/febs.14101. [DOI] [PubMed] [Google Scholar]
  • 26. Falck P., Linares-Pastén J.A., Adlercreutz P., Karlsson E.N Characterization of a family 43 β-xylosidase from the xylooligosaccharide utilizing putative probiotic Weissella sp. strain 92. Glycobiology. 2016;26:193–202. doi: 10.1093/glycob/cwv092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Till M., Goldstone D., Card G., Attwood G.T., Moon C.D., Arcus V.L2014Structural analysis of the GH43 enzyme Xsa43E from Butyrivibrio proteoclasticus . Acta Crystallogr. F. Struct. Biol. Commun. 701193–1198. 10.1107/S2053230X14014745 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Jiang D., Fan J., Wang X., Zhao Y., Huang B., Liu J, et al. Crystal structure of 1,3Gal43A, an exo-β-1,3-galactanase from Clostridium thermocellum . J. Struct. Biol. 2012;180:447–457. doi: 10.1016/j.jsb.2012.08.005. [DOI] [PubMed] [Google Scholar]
  • 29. Goyal A., Ahmed S., Sharma K., Gupta V., Bule P., Alves V.D., et al. Molecular determinants of substrate specificity revealed by the structure of Clostridium thermocellum arabinofuranosidase 43A from glycosyl hydrolase family 43 subfamily 16. Acta Crystallogr. D. Struct. Biol. 2016;72:1281–1289. doi: 10.1107/S205979831601737X. [DOI] [PubMed] [Google Scholar]
  • 30. Ichinose H., Yoshida M., Fujimoto Z., Kaneko S Characterization of a modular enzyme of exo-1,5-alpha-L-arabinofuranosidase and arabinan binding module from Streptomyces avermitilis NBRC14893. Appl. Microbiol. Biotechnol. 2008;80:399–408. doi: 10.1007/s00253-008-1551-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Sayers E.W., Beck J., Bolton E.E., Brister J.R., Chan J., Comeau D.C., et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2024;52:D33–D43. doi: 10.1093/nar/gkad1044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Teufel F., Almagro Armenteros J.J., Johansen A.R., Gíslason M.H., Pihl S.I., Tsirigos K.D., et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 2022;40:1023–1025. doi: 10.1038/s41587-021-01156-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Letunic I., Bork P Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 2024;52:W78–W82. doi: 10.1093/nar/gkae268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Trifinopoulos J., Nguyen L.T., Von Haeseler A., Minh B.Q W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis. Nucleic Acids Res. 2016;44:W232–5. doi: 10.1093/nar/gkw256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Sievers F., Higgins D.G Clustal Omega for making accurate alignments of many protein sequences. Protein Sci. 2018;27:135–145. doi: 10.1002/pro.3290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Studier F.W Protein production by auto-induction in high density shaking cultures. Protein Expr. Purif. 2005;41:207–234. doi: 10.1016/j.pep.2005.01.016. [DOI] [PubMed] [Google Scholar]
  • 37. Noble J.E., Bailey M.J.A2009) Chapter 8 Quantitation of Protein. In Methods in Enzymology(Burgess R.R., Deutscher M.P., eds), pp. 73–95., Academic Press, 10.1016/S0076-6879(09)63008-1 [DOI] [PubMed] [Google Scholar]
  • 38. NovoPro Labs 2025) Protein Extinction Coefficient and Concentration Calculation https://www.novoprolabs.com/tools/protein-extinction-coefficient-calculation
  • 39. Gill S.C., von Hippel P.H. (198. 9) Calculation of protein extinction coefficients from amino acid sequence data Anal. Biochem. 182319–326. 10.1016/0003-2697(89)90602-7 [DOI] [PubMed] [Google Scholar]
  • 40. Bowers G.N., McComb R.B., Christensen R.G., Schaffer R High-purity 4-nitrophenol: purification, characterization, and specifications for use as a spectrophotometric reference material. Clin. Chem. 1980;26:724–729. doi: 10.1093/clinchem/26.6.724. [DOI] [PubMed] [Google Scholar]
  • 41. Agilent 2012) Multi-Volume Based Protein Quantification Methods https://www.agilent.com/cs/library/applications/multivolume-based-protein-quantification-methods-5994-3315EN-agilent.pdf
  • 42. Conway J.M., Pierce W.S., Le J.H., Harper G.W., Wright J.H., Tucker A.L., et al. Multidomain, surface layer-associated glycoside hydrolases contribute to plant polysaccharide degradation by caldicellulosiruptor species. J. Biol. Chem. 2016;291:6732–6747. doi: 10.1074/jbc.M115.707810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Miller G.L Use of dinitrosalicylic acid reagent for determination of reducing sugar. Anal. Chem. 1959;31:426–428. doi: 10.1021/ac60147a030. [DOI] [Google Scholar]
  • 44. Kabsch W XDS. Acta Crystallogr. D Biol. Crystallogr. 2010;66:125–132. doi: 10.1107/S0907444909047337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Evans P.R., Murshudov G.N How good are my data and what is the resolution? Acta Crystallogr. D Biol. Crystallogr. 2013;69:1204–1214. doi: 10.1107/S0907444913000061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Emsley P., Lohkamp B., Scott W.G., Cowtan K Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 2010;66:486–501. doi: 10.1107/S0907444910007493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Afonine P.V., Grosse-Kunstleve R.W., Echols N., Headd J.J., Moriarty N.W., Mustyakimov M., et al. Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallogr. D Biol. Crystallogr. 2012;68:352–367. doi: 10.1107/S0907444912001308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., et al. The protein data bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Galindo J.L., Jeffrey P.D., Conway J.M2025) Protein Data Bank (PDB) An alpha-l-arabinofuranosidase (AtAbf43C) from Acetivibrio thermocellus DSM1313 10.2210/pdb9NXG/pdb [DOI] [PMC free article] [PubMed]
  • 50. Galindo J.L., Jeffrey P.D., Conway J.M2025) Protein Data Bank (PDB) An alpha-l-arabinofuranosidase (AtAbf43C) from Acetivibrio thermocellus DSM1313 bound to arabinofuranose 10.2210/pdb9NXH/pdb [DOI]
  • 51. Galindo J.L., Jeffrey P.D., Conway J.M2025) Protein Data Bank (PDB) CBM42 domain of alpha-l-arabinofuranosidase (AtAbf43C) from Acetivibrio thermocellus DSM1313 10.2210/pdb9NXI/pdb [DOI]
  • 52. Galindo J.L., Jeffrey P.D., Conway J.M2025) The GH43 domain of an alpha-l-arabinofuranosidase (AtAbf43C_GH43) from Acetivibrio thermocellus DSM1313 10.2210/pdb9NXJ/pdb [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Online supplementary figure 1
Online supplementary table 1

Data Availability Statement

Protein crystal structures obtained in our study are deposited to the Protein Data Bank with accession numbers 9NXG [50], 9NXH [51], 9NXI [52], 9NXJ [53]. All other data are contained within the manuscript and supplementary information.


Articles from Biochemical Journal are provided here courtesy of Portland Press Ltd

RESOURCES