Abstract
Resistant starch is a prebiotic accessed by gut bacteria with specialized amylases and starch-binding proteins. The human gut symbiont Ruminococcus bromii expresses Sas6 (Starch Adherence System member 6), which consists of two starch-specific carbohydrate-binding modules from family 26 (RbCBM26) and family 74 (RbCBM74). Here, we present the crystal structures of Sas6 and of RbCBM74 bound with a double helical dimer of maltodecaose. The RbCBM74 starch-binding groove complements the double helical α-glucan geometry of amylopectin, suggesting that this module selects this feature in starch granules. Isothermal titration calorimetry and native mass spectrometry demonstrate that RbCBM74 recognizes longer single and double helical α-glucans, while RbCBM26 binds short maltooligosaccharides. Bioinformatic analysis supports the conservation of the amylopectin-targeting platform in CBM74s from resistant-starch degrading bacteria. Our results suggest that RbCBM74 and RbCBM26 within Sas6 recognize discrete aspects of the starch granule, providing molecular insight into how this structure is accommodated by gut bacteria.
The gut microbiota, the consortium of microbes that resides in the human gastrointestinal tract, influences many aspects of host physiology, including digestive health1. The composition of this community is modulated by diet, especially fiber2–4. Bacterial fermentation of dietary carbohydrates produces beneficial short-chain fatty acids including butyrate, a primary carbon source for colonocytes that has systemic anti-inflammatory and anti-tumorigenic properties3,5.
Resistant starch, defined as starch that is resistant to digestion in the upper gastrointestinal tract, is a prebiotic fiber that tends to increase butyrate in the large intestine6. Starch is composed of the glucose polymers amylopectin and amylose, which are layered within granules7,8. Amylose is a predominantly α1,4-linked glucan with infrequent α1,6 branching, forming the amorphous layers of the granule7. Amylopectin is an α1,4-linked polymer with α1,6-linked branches that allow the formation of parallel α1,4-glucan chains that form double helices that pack together in the crystalline region of the granule7,8. Raw, uncooked starch granules are resistant to host digestion owing to their semi-crystalline structure and are classified as resistant starch type 2 (RS2) (ref. 7).
Human gut bacteria that degrade RS2 in vitro include Bifidobacterium adolescentis and Ruminococcus bromii9–13. R. bromii is a Gram-positive anaerobe that increases in relative abundance in the gut upon host consumption of resistant potato or corn starch9,14,15. R. bromii is a keystone species for RS2 degradation because it cross-feeds butyrate-producing bacteria9. R. bromii synthesizes multi-protein starch-degrading complexes called amylosomes through protein–protein interactions between dockerin and complementary cohesin domains16–18. As many as 32 R. bromii proteins have predicted cohesin or dockerin domains, including amylases, pullulanases, starch-binding proteins and proteins of unknown function16,19. Many have carbohydrate-binding modules (CBMs) that presumably aid in binding starch and tether the bacterium to its food source20.
CBM family 74 (CBM74) was discovered as a discrete domain (MaCBM74) of an amylase from the potato starch-degrading bacterium Microbacterium aurum21. MaCBM74 binds amylose, amylopectin, and wheat, corn and potato starch granules21. CBM74s have ~300 amino acids, two to three times as many as most starch-binding CBMs20. CBM74s are found in multimodular glycoside hydrolase family 13 (GH13) enzymes that hydrolyze starch and are flanked by starch-binding CBMs from family 25 or 26 (CBM25 or CBM26) (refs. 20,21). Most CBM74s are encoded by gut microbes and 70% are found in Bifidobacteria21. The genomes of R. bromii and B. adolescentis each encode one putative CBM74-containing protein. The prevalence of CBM74s encoded within the genomes of RS2-degrading bacteria, and its increased representation in metagenomic and metatranscriptomic analyses from host diet studies, suggest a role in RS2 recognition in the distal gut22–24.
The R. bromii starch adherence system protein 6 (Sas6) is a secreted protein that contains both a CBM26 and CBM74 followed by a carboxy-terminal dockerin type 1 domain25–27. Here, we present the biochemical characterization and crystal structure of Sas6, providing a view of the CBM74 and its juxtaposition with the CBM26. We captured the structure of RbCBM74 with a double helical dimer of maltodecaose, which mimics the architecture of double helical amylopectin in starch granules, revealing that this module selects for this motif through an elongated binding groove. RbCBM74 binds longer maltooligosaccharides (≥8 glucose units), and native mass spectrometry suggests that both single and double helical α-glucans that adopt the geometry of double helical amylopectin are recognized. Our biochemical data demonstrate that CBM26 and CBM74 recognize different α-glucan moieties within starch granules, leading to overall enhanced granule binding.
Results
Sas6 cell localization
Sas6 consists of five discrete domains: an amino-terminal CBM26 (RbCBM26), a CBM74 (RbCBM74) flanked by bacterial immunoglobulin-like (BIg) domains, and a C-terminal type I dockerin (Fig. 1a)26. Although Sas6 has a signal peptide, it is unknown whether it is a constituent of a cell-bound amylosome or part of a freely secreted complex19. Sas6 is detected in the cell-free supernatant of R. bromii cultures in stationary phase but also elutes from the surface of exponentially growing cells with EDTA, which disrupts the calcium-dependent cohesin–dockerin interaction16,28,29. To determine the localization of Sas6, we grew cells to mid-log phase on potato amylopectin and performed western blot with custom antibodies against Sas6 (Fig. 1b). Sas6 was detected only in the cell fraction (Fig. 1b) and visualized on the cell surface by immunofluorescence (Fig. 1c). Therefore, we conclude that Sas6 is a component of a cell-surface amylosome in actively growing cells.
Fig. 1 |. Ruminococcus bromii Sas6 is a starch-binding protein that contains two CBMs.

a, Domain architecture of Sas6 annotated according to the carbohydrate-active enzyme database (www.cazy.org). SP, signal peptide; Doc, dockerin. Sas6T is the recombinantly expressed truncated version of Sas6 lacking the C-terminal dockerin. b, Western blot with anti-Sas6 antibody showing localization of Sas6 in the cell fraction (top panel). Parallel western blot with custom rabbit antiserum against glutamic acid decarboxylase (anti-GAD) to control for cell lysis (bottom panel). Lane 1, ladder; lane 2, R. bromii cell lysate; lane 3, cell-free culture supernatant; lane 4, trichloroacetic acid precipitated cell-free culture supernatant; lane 5, recombinant Sas6T. Blot is representative of two experiments. c, Anti-Sas6 immunofluorescent staining of fixed R. bromii cells grown in potato amylopectin. d, SDS–PAGE gel from Sas6 adsorption to potato, corn and wheat starch, and Avicel (cellulose) control. U, unbound protein; B, bound protein. Gel is representative of two experiments. e, Affinity PAGE with 0.1% of the indicated polysaccharide incorporated into the gel matrix. For each column, left lane is bovine serum albumin; right lane is Sas6T. NA, native gel; Amy, potato amylose; PAp, potato amylopectin; CAp, corn amylopectin; Gly, glycogen; Pul, pullulan; Dex, dextran. Gel is representative of three experiments.
Sas6 starch binding
We used a truncated construct of Sas6 (Sas6T, residues 31–665), lacking the C-terminal dockerin domain, to test binding to starch polysaccharides. Sas6T binds potato, corn and wheat starch granules, with the highest fraction of protein bound to corn starch, which has a larger surface area to mass ratio8, and no binding to Avicel (crystalline cellulose) (Fig. 1d). We tested Sas6T binding to amylopectin, amylose, glycogen and pullulan by using affinity PAGE. Glycogen is similar to amylopectin with more frequent α1,6 branching, approximately every 6–15 residues for liver glycogen30,31. Pullulan is a fungal α-glucan composed of repeating α1,6-linked maltotriose units32. Sas6T binds amylose, amylopectin (potato and corn) and glycogen but displays poor recognition of pullulan, suggesting a preference for longer α1,4-linked glucans (Fig. 1e). Sas6T does not bind dextran, a bacterial exopolysaccharide of α1,6-linked glucose33.
Structure of Sas6
The structure of Sas6T with α-cyclodextrin (ACX) was determined by single-wavelength anomalous dispersion of intrinsic sulfur-containing residues (1.6 Å resolution, Rwork = 16.8%, Rfree = 21.2%; Table 1). The final model contained two molecules of Sas6T in the asymmetric unit and one ACX bound at each RbCBM26. The Sas6T–ACX structure was used to phase a dataset from unliganded crystals (2.2 Å resolution, Rwork = 19.7%, Rfree = 25.5%; Table 1). The crystal structure of Sas6T is compact, with RbCBM26, BIgA and BIgB forming an arc over RbCBM74 (Fig. 2a). The two chains in the asymmetric unit exhibit some flexibility, resulting in different positioning between the RbCBM26 binding site and the RbCBM74 (Fig. 2b). Small angle X-ray scattering (SAXS) supported that there is little conformational flexibility in Sas6T (Extended Data Fig. 1a–f). Modeling of the SAXS structure by MultiFoXS suggests a slightly more extended model in solution, although the structure remains compact (Fig. 2c). This finding is supported by the extensive hydrogen bonding between BIgA (light gray) and BIgB (dark gray) that buries 354 Å of surface area (Extended Data Fig. 1g,h)34. Ig-like domains act as spacers in multimodular enzymes and provide structural stability35. Here, the BIg domains may keep RbCBM26 and RbCBM74 properly oriented.
Table 1 |.
Data collection and refinement statistics
| Sas6T + ACX | Sas6T unliganded | BIg–RbCM74–BIg + G10 | |
|---|---|---|---|
| Data collection | |||
| Space group | P 21 21 21 | P 21 21 21 | P 21 21 2 |
| Cell dimensions | |||
| a, b, c (Å) | 69.45, 82.53, 213.47 | 69.17, 82.37, 213.32 | 69.69, 160.07, 67.86 |
| α, β, γ (°) | 90, 90, 90 | 90, 90, 90 | 90, 90, 90 |
| Resolution (Å) | 35.0–1.61 (1.67–1.61)a | 44.77 – 2.19 (2.268 – 2.19) | 62.48–1.70 (1.76–1.70) |
| Rsym or Rmerge | 0.092 (1.799) | 0.096 (0.192) | 0.052 (0.75) |
| I / σI | 16.0 (1.1) | 22.7 (14.3) | 17.2 (2.2) |
| Completeness (%) | 76.8 (24.8) | 99.3 (99.9) | 99.9 (100.0) |
| Redundancy | 13.8 (10.7) | 16.6 (16.9) | 6.1 (6.2) |
| Refinement | |||
| Resolution (Å) | 1.61 | 2.19 | 1.70 |
| No. reflections | 122,315 | 63,244 | 84,061 |
| Rwork/Rfree | 0.168/0.212 | 0.197/0.255 | 0.179/0.199 |
| No. atoms | 11,425 | 10,523 | 4,954 |
| Protein | 9,721 | 9,527 | 4,043 |
| Ligand/ion | 253 | 38 | 237 |
| Water | 1,451 | 964 | 674 |
| B-factors | 22.2 | 25.6 | 33.2 |
| Protein | 20.7 | 25.4 | 31.8 |
| Ligand/ion | 35.8 | 29.9 | 31.5 |
| Water | 30.0 | 27.8 | 41.8 |
| R.m.s. deviations | |||
| Bond lengths (Å) | 0.013 | 0.001 | 0.013 |
| Bond angles (°) | 1.43 | 0.44 | 1.7 |
Values in parentheses are for the highest-resolution shell.
Fig. 2 |. Sas6 is a compact protein with two BIg domains that orient RbCBM26 and RbCBM74.

a, Semi-transparent surface rendition and cartoon of Sas6T (PDB 7UWW) with RbCBM26 domain in green, BIgA in light gray, RbCBM74 in blue, and BIgB in dark gray. The ACX bound to RbCBM26 is shown in wheat sticks and Ca2+ atoms are shown as yellow spheres. b, Overlay of chain A (purple) and chain B (cyan) within the asymmetric unit of 7UWW, anchored on the CBM74 domain, showing variation in the position of ACX relative to RbCBM74. c, Overlay of chain A of 7UWW (purple) and SAXS-derived MultiFoXS model (yellow) anchored on the CBM74 domain; r.m.s.d. of 1.2 Å over 347 pruned atom pairs. d, Side view of RbCBM74 with the central β-sandwich sheets in orange and cyan. A third β-sheet is shown in magenta and the protruding pairs of β-strands are in dark blue. β-strands connecting the beginning and end of the RbCBM74 domain are colored green. Ca2+ atoms are shown as yellow spheres. e. ACX bound at RbCBM26 (green) in chain A (left) and chain B (right), demonstrating minor conformational flexibility that places S286 from RbCBM74 (blue) within the binding site. Side chains involved in ligand binding are shown as green sticks with a hydrogen bond cutoff of 3.2 Å. ACX is displayed as wheat sticks. Omit map is contoured to 2.0σ and carved within 1.6 Å of ACX ligand. f, RbCBM74 binding to granular corn and potato starch was determined by adsorption depletion. The μmoles of protein bound per gram of starch was plotted against [free protein] to determine Kd and Bmax using a one-site specific binding model in GraphPad Prism. Graphs show nonlinear fit of three experiments with points indicating the mean and standard deviation. g, Affinity PAGE of Sas6T or individual domains, RbCBM26 and BIg–RbCBM74–BIg, with 0.1% polysaccharide. Gel is representative of two experiments.
Structure of RbCBM74
RbCBM74 (357 residues) has 21 β-strands and 13 short α-helices with a core β-sandwich fold of two sheets with five antiparallel β-strands (Fig. 2d and Extended Data Fig. 2a). A third short β-sheet forms a convex face, and two pairs of β-strands (residues 356–369 and 412–423) protrude from the region between the β-sandwich and the third β-sheet. Two short β-strands begin and end CBM74, marking the domain boundaries (Extended Data Fig. 2b).
The central fold of RbCBM74 resembles CBM9 from Thermotoga maritima Xylanase10A (PDB 1I82-A; Z-score, 9.8; r.m.s.d., 3.2 Å; identity, 17%)36,37 (Extended Data Fig. 2c). TmCBM9 binds glucose, cellobiose, cello-oligomers and xylo-oligomers at the reducing ends, and amorphous and crystalline cellulose37. The ligand binding site of TmCBM9 is formed by two tryptophans that create an aromatic clamp around cellobiose. In RbCBM74, W373 is conserved with one of these tryptophans in an extended channel partially covered by residues 374–384 that form a flexible loop resolved in one monomer (Extended Data Fig. 2d). There are three putative structural Ca2+ in the TmCBM9 structure and four in RbCBM74, one of which (Ca2+−4) aligns with a Ca2+ in TmCBM9 (Extended Data Fig. 2e and Supplementary Fig. 1)38,39. Ca2+−1 and Ca2+−2 are separated by 3.8 Å and share three coordinating residues but only Ca2+−2 is surface-exposed. Like TmCBM9, the Ca2+ ions in RbCBM74 probably provide structural stability40.
Molecular basis of RbCBM26 binding
RbCBM26 displays a β-sandwich like other CBM26 members20. CH–π stacking with ACX is provided by W63 and Y55 with hydrogen bonding from Y53, K101, Q103, the peptidic oxygen of A107 and K97 (chain A only; Fig. 2e). In chain B, ACX lies 3.2 Å from S286 of CBM74 and hydrogen bonds with O2 and O3 of Glc3. By contrast, S286 is 9.5 Å from ACX in chain A. The top structural homologs of RbCBM26 from the protein structure comparison server DALI are the CBM25 and CBM26 of α-amylase G-6 from Bacillus halodurans C-125 (BhCBM25 and BhCBM26) and ErCBM26b of Amy13K from Eubacterium rectale (Extended Data Fig. 3a,b)36,41–43. RbCBM26 has a long loop containing K97 and K101 that provide additional hydrogen bonding with ACX. Unlike BhCBM26, RbCBM26 does not undergo a conformational change upon ligand binding (Extended Data Fig. 3c)41. A sequence alignment with CBM26 members BhCBM26, ErCBM26 and the Lactobacillus amylovorus α–amylase CBM26 (LaCBM26) demonstrates conservation of the aromatic platform but variation in the hydrogen bonding network (Extended Data Fig. 3a). Sas6 W63 corresponds to LaCBM26 W32 that, when mutated, results in complete loss of binding44,45. Although the previously studied CBM26 modules have not been assayed for binding to ACX, many bind β-cyclodextrin, which has a similar geometry, within a dissociation constant (Kd) range of ~350–600 μM (refs. 41,43).
Binding mechanism of Sas6
Adsorption depletion (that is, pulldown assay) demonstrated that Sas6T bound corn starch (Kd = 0.95 ± 0.15 μM; binding capacity, Bmax = 0.102 ± 0.001 μmol g–1) with modestly better affinity than potato starch granules (Kd = 1.66 ± 0.40 μM, Bmax = 0.026 ± 0.004 μmol g–1) (Fig. 2f). The Bmax for corn starch is ~fourfold higher. We found that BIg–RbCBM74–BIg bound both corn starch (Kd = 1.57 ± 0.40 μM, Bmax = 0.114 ± 0.01 μmol g–1) and potato starch (Kd = 2.69 ± 1.59 μM, Bmax = 0.031 ± 0.006 μmol g–1) with similar affinity to Sas6T, while RbCBM26 did not display measurable binding to either (Fig. 2f). These results support that RbCBM74 drives insoluble starch binding by Sas6.
We also screened these constructs for binding to amylose and amylopectin via affinity PAGE. RbCBM26 displays poor recognition of both polysaccharides based upon the relatively small change in migration (Fig. 2g). Using isothermal titration calorimetry (ITC), we found that Sas6T and BIg–RbCBM74–BIg bound amylopectin with sub-micromolar affinity, whereas binding was not detectable for RbCBM26 (Table 2 and Extended Data Fig. 4a)46. Sas6T binds maltotriose (G3), maltoheptaose (G7), maltooctaose (G8) with a Kd in the hundreds of μM but exhibits a Kd of 6.2 ± 2 μM for maltodecaose (G10). Titrations with higher concentrations of G10 allowed for full saturation of both the RbCBM74 site (Kd = 8.6 ± 2.7 μM) and RbCBM26 (Kd = 730 ± 59 μM) (Supplementary Fig. 2). Binding of G10 to RbCBM26 when this domain was expressed separately resulted in modestly better binding (Kd = 252 ± 128 μM), perhaps owing to better access in the binding site (Table 2 and Extended Data Fig. 4b). RbCBM26 binds G7 and ACX, while BIg–RbCBM74–BIg had no detectable affinity for these sugars (Table 2 and Extended Data Fig. 4c). None of the constructs bound glucosyl-α1,6-maltotriosyl-α 1,6-maltotriose, an oligosaccharide of pullulan, suggesting that α1, 6 linkages are not specifically recognized by either domain. We determined that BIg–RbCBM74–BIg binds exclusively longer α-glucans of at least eight residues. Notably, α1,4-linked glucose polymers form double helices at ten glucose units, so we reasoned that RbCBM74 accommodates a double helical structure8.
Table 2 |.
Sas6 domain binding parameters
| G3 | ACX | G7 | G8 | G10 | Potato APc | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| n | Kd (μM) | n | Kd (μM) | n | Kd (μM) | n | Kd (μM) | n | Kd (μM) | Kd (μM) | |
| Sas6T | 1a | 880 ± 25 | 1.0 ± 0.4 | 178 ± 26 | 1a | 332 ± 15 | 1a | 496 ± 260 | 0.8 ± 0.07 | 6.2 ± 2b | 0.3 ± 0.1 |
| RbCBM26 | − | NB | 0.8 ± 0.13 | 169 ± 16 | 1a | 310 ± 34 | 1a | 285 ± 84 | 0.7 ± 0.21 | 252 ± 128 | NB |
| BIg–RbCBM74–BIg | − | NB | − | NB | − | NB | 1a | 393 ± 136 | 0.9 ± 0.20 | 5.2 ± 1.1 | 0.7 ± 0.1 |
| W373Ad | − | NB | 13.4 ± 5.4 | ||||||||
| H289Ad | 0.5 ± 0.09 | 73.1 ± 7.7 | 21.4 ± 3.1 | ||||||||
| F326Ad | 0.7 ± 0.13 | 100 ± 11 | 3.9 ± 1.4 | ||||||||
Experiments were performed in triplicate with mean ± s.d. reported. NB, no binding.
indicates that n was set to 1.
Titrations with 2 mM G10 and modeling to a multi-site binding model gave Kd1 = 8.6 ± 2.7 μM and Kd2 = 730 ± 59 μM. See Supplementary Fig. 2.
For amylopectin, curves were modeled for total binding (n = 1).
W373, H289A and F326A refer to site-directed mutants within the BIg–RbCBM74–BIg construct.
Molecular basis of RbCBM74 binding
BIg–RbCBM74–BIg co-crystallized with G10 (1.70 Å resolution, Rwork = 17.9%, Rfree = 19.9%) revealed two molecules of G10 as an extended double helix of ~42 Å along RbCBM74 from S286 (reducing ends) to W373 (non-reducing ends) (Fig. 3a). There was strong electron density for twelve glucoses in one chain and nine in the other, probably the result of slight variation in the occupancy of G10 along the binding cleft among different monomers in the unit cell; the electron density thus provides us with a composite of the ligand placement (Fig. 3b). There is little global change in the CBM74 upon binding, with the exception of G374 to K381 (Extended Data Fig. 5a). In the unliganded structure, this loop occludes surface exposure of W373, and in the G10 bound structure, the loop opens to create a continuous binding surface (Extended Data Fig. 5b). Additionally, Ca2+−4 is exchanged for Na+, suggesting flexibility in ion identity at that site (Supplementary Fig. 3).
Fig. 3 |. RbCBM74 has an extended groove that accommodates starch double helices.

a, The BIg–RbCBM74–BIg (PDB 7UWV) starch-binding site is an extended groove that spans nearly the length of the domain. A cartoon representation of BIgA in light gray, CBM74 in teal and BIgB in dark gray with two chains of maltodecaose (G10) wrapped around one another is shown in magenta and gray sticks. b, RbCBM74 is co-crystallized with G10 in a double helical conformation. Electron density for G10 is demonstrated by an omit map contoured to 2.0σ and carved to 1.6 Å with one chain of modeled Glc in magenta and the other in gray. c, Double helical G10 structure with Glc residues labeled from non-reducing to reducing ends. Chain G10A (Glc A1–A12) is shown in magenta and chain G10B (Glc B1–B9) in gray sticks. d, Corresponding hydrogen bonding network (3.2 Å cutoff) between RbCBM74 and G10. Side chains involved in hydrogen bonding are shown in teal sticks with nitrogens indicated in blue and oxygens in red. Hydrogen bonds are indicated by yellow dashed lines, and G10 residues directly involved in binding are shown in magenta (G10A) and gray (G10B) sticks. e, Surface representation of RbCBM74 with peptides protected from deuterium exchange in the presence of G10 colored in light cyan as determined by HDX–MS.
G10 is arranged as two parallel left-handed helices (G10A and G10B) stabilized by hydrogen bonds within each chain (most as Glcn O2–Glcn+1 O3) and between chains (most as O2 to O6) (Extended Data Fig. 5c). The Φ (O5-C1-O4′-C4′), Ψ (C1-O4′-C4′-C5′) angles of G10A and G10B approximate those observed in crystal structures of double helical A (88.8° ± 3°, −149.2° ± 4°) and B type starch (84.1° ± 0.3°, −144.4° ± 0.3°) (Extended Data Table 1)47,48. The Φ and Ψ angles vary along the length of the G10 double helix as predicted in double helical models of amylopectin49 (Extended Data Fig. 5c and Extended Data Table 1). The average period of one complete helical turn, or pitch, of G10A and G10B is 17.1 ± 1.7 Å, which deviates from the ~8 Å pitch of left-handed single helical V-type amylose that features Φ = 91–115° and Ψ = −97–131° (ref. 50). The substantially different dimensions of single helical α-glucan prevent its selection within the RbCBM74 binding site, leading us to conclude that this domain selects for double helical geometry (Extended Data Fig. 5d).
Each G10 molecule interacts with a protein as a stretch of three Glcs at a time, before the natural helical curvature brings the chain out of contact with the protein (Fig. 3c,d). For example, at the non-reducing end, Glc 1–3 of G10A fit into the ligand-binding groove, while Glcs 4–6 of G10A are solvent exposed and Glc 1–3 of G10B then fill the cavity. Along the length of the cavity, from the non-reducing to reducing end, Glcs 1–3 and 7–9 of both G10A and G10B alternate to fill this binding site. The binding cleft features a network of residues that hydrogen bond to the hydroxyl groups of glucose (Fig. 3d), with only three aromatics that provide CH–π stacking. Glc A2 stacks with W373, Glc A9 stacks with F326 and Glc B8 stacks with H289.
To define the starch-binding properties of RbCBM74 in solution, we employed hydrogen–deuterium exchange–mass spectrometry (HDX–MS). The conformational dynamics of BIg–RbCBM74–BIg alone and in the presence of G10 were measured over a 4-log timescale (Extended Data Fig. 6a,b). The conformational dynamics of the apo protein were consistent with the crystal structure, in terms of well-ordered domains and associated loops or flexible regions. The flanking BIg domains showed higher exchange rates than the core CBM74. The linker regions between domains do not show differentially high dynamic exchange, as would be expected for flexibly tethered independent domains, further supporting the integral nature of the BIg–RbCBM74–BIg motif.
The binding of G10 to RbCBM74 was explored by differential protection from exchange in the absence and presence of G10. Significant protection was observed in the presence of G10, while no significant increases in exchange were observed (Extended Data Fig. 6c). The protected regions upon G10 binding were localized to a single surface binding region that directly overlaps with the G10 binding site in the co-crystal structure (Fig. 3d,e). With the exception of the peptide from A314–Y318 (ANTTY), each of the protected peptides identified by HDX–MS contains at least one key binding residue identified from the co-crystal structure (Fig. 3e). These data provide a comprehensive picture of the structural dynamics of RbCBM74 binding to long maltooligosaccharides by way of an extended starch-binding cleft.
CBM74 conservation
An alignment of all 99 CBM74 sequences demonstrates six distinct clades (Extended Data Fig. 7, Supplementary Fig. 4 and Supplementary Table 1). RbCBM74 (No. 28) is in a cluster (blue) that invariably includes a dockerin domain as part of the full-length protein. However, there are other CBM74s originating from dockerin-containing proteins found in three more groups (green, cyan and magenta) (Extended Data Fig. 7). The prototypical CBM74 of the subfamily GH13_32 α-amylase from Microbacterium aurum (No. 52) bins into a clade (cyan) with its GH13_32 counterpart from Sanguibacter sp. (No. 54) and the CBM74-containing α-amylase from Clostridium bornimense (No. 58). A similar GH13_28 α-amylase from Streptococcus suis (No. 68) is in the adjacent cluster (magenta) near the CBM74s from two other hypothetical dockerin-containing proteins from Ruminococcus bovis (No. 67) and Ruminococcaceae bacterium (No. 70). Most CBM74s appended to the subfamily GH13_28, predominantly from Bifidobacteria, group together separately (red). The sixth cluster (walnut) covers CBM74s found in GH13_19 α-amylases. In total, CBM74s occur in α-amylases from several subfamilies or non-catalytic dockerin-containing proteins and are widely represented among Bifidobacteria.
We aligned representative sequences from each clade to highlight the similarities within these binding sites (Fig. 4a). Here and within the full alignment, W373 from RbCBM74 is 100% conserved (Fig. 4b). H289 is shared with 78 sequences or substituted with a tyrosine (18 out of 99) in Bifidobacteria and Candidatus scatavimonas (No. 25) and a tryptophan (3 out of 99) in Pseudoscardovia species (Fig. 4b and Supplementary Fig. 4). F326 is the most variable, sharing sequence identity or similarity with three of the six clades (phenylalanine, 19/99; tyrosine, 43/99), while the other clades feature a glycine or alanine (36 out of 99). Mapping all 99 CBM74s onto our structure using CONSURF demonstrates that the binding site is dominated by hydrogen bonding acceptors and donors51–53 (Fig. 4c). The center of the cleft including K556 (80 out of 99), D549 (63 out of 99) and E290 (99 out of 99) exhibits the highest conservation (Fig. 4c). The ends of the cleft are more varied, including S286 (22 out of 99), which interacts with the RbCBM26 ligand. A number of the sequences have an aromatic residue at K556 (tryptophan, 19 out of 99) and Y524 (tyrosine, 12 out of 99; phenylalanine, 45 out of 99) that could provide π stacking (Fig. 4a,c and Supplementary Fig. 4). Moderate variability in the putative binding site may suggest that CBM74 members have different affinities for starch.
Fig. 4 |. Conservation of binding residues among select CBM74 family members.

a, Sequence alignment of six representative sequences, one from each of the six clades of the CBM74 family (see full alignment in Supplementary Fig. 4 and phylogenetic tree in Extended Data Fig. 7). The residues responsible for stacking interactions are highlighted in red and those involved in hydrogen bonding with glucose moieties of the bound α-glucan are highlighted in yellow, based upon the RbCBM74 structure (blue). Identical and similar positions are signified by asterisks and dots or semicolons under the alignment blocks. b, Conservation of H289, F326 and W373 among the 99 CBM74 members is displayed as a bar graph. c, Binding residues of RbCBM74 (PDB 7UWV) colored by conservation score from least conserved (green) to most conserved (purple) generated using CONSURF from the alignment in Supplementary Fig. 4.
RbCBM74 mutational studies
Given that most CBM binding is mediated by aromatics, we reasoned that mutation of W373, F326 or H289 to alanine would dramatically decrease or eliminate binding. W373A and H289A constructs lost the ability to bind to insoluble corn starch, while binding of the F326A construct was greatly reduced (Fig. 5a). For potato starch, a lower percentage of H289A bound compared to the F326A and W373A mutants (Fig. 5b). By affinity PAGE, the W373A and F326A mutants retained binding to amylopectin while the H289A mutant had a modest decrease in binding to potato amylopectin (Fig. 5c). When we quantified binding by means of ITC, W373A lost binding for G10 while H289A and F326A had a ~10–20-fold decrease in affinity (Table 2 and Extended Data Fig. 8a). On potato amylopectin, F326A had a tenfold reduction in affinity while H289A and W373A exhibited a ~20-fold reduction (Extended Data Fig. 8b). That single mutations do not eliminate binding is not surprising given the extensive binding platform. The somewhat staggered double helical G10 bound in our crystal structure suggests that at least 12 glucose units contribute to binding (Fig. 3d).
Fig. 5 |. W373A, F326A and H289A mediate starch binding by RbCBM74.

a,b Binding to insoluble starch is eliminated or greatly reduced when W373, H289 or F326 is mutated. The amount of protein bound to starch granules was determined by quantitation of protein remaining in solution after binding. Graph shows the mean ± s.d. of n = 3 independent experiments. P values (*P < 0.05; ****P < 0.0001) determined by one-way ANOVA followed by a Dunnett’s multiple comparison post-hoc test comparing each construct to the control (BSA). c, Mutation of aromatic residues decreases but does not eliminate binding to amylopectin. Affinity PAGE with 0.1% potato amylopectin or maize amylopectin added to the gel matrix. Binding is indicated by reduced migration through the gel. Gels are representative of two separate experiments.
Native mass spectrometry
ITC revealed a binding stoichiometry of 1:1 between BIg–RbCBM74–BIg and G10, while the co-crystal structure demonstrates that two molecules of G10 are accommodated. To better determine stoichiometry and the proportion of single versus double helical maltooligosaccharide in solution, we employed native mass spectrometry (MS) in the presence of varying concentrations of G10 (Fig. 6). Each observed state differed by ~1,639 Da, the theoretical mass of G10 (Supplementary Table 2). To obtain binding affinities, we summed the peak intensities of all abundant charge states in our spectra and analyzed these intensity values as described previously54 (Extended Data Fig. 9). The Kd for BIg–RbCBM74–BIg was determined to be 2.16 ± 0.53 μM, which agrees with our ITC data (Fig. 6a,b and Extended Data Fig. 9a). As the concentration of ligand is increased, ligand molecules can bind nonspecifically during the nano-electrospray ionization (nESI) process, generating artifactual peaks in the mass spectra corresponding to a two-ligand-bound complex. This step is given by Kn, which corresponds to the dissociation constant for the nonspecific binding step during the nESI process; this variable also captures multimers of the ligand itself or nESI artifacts that encompass high concentrations of ligand trapped within individual droplets. Our Kn of 922.7 ± 259.9 μM suggests that an additional binding site on BIg–RbCBM74–BIg is highly unlikely (Fig. 6a).
Fig. 6 |. RbCBM74 and RbCBM26 bind separate molecules of G10 in solution.

Native MS of 5 μM protein and ligand. Intensities of each species, combined across multiple charge states, were extracted from the mass spectra and used to calculate the fractional abundance of unbound and bound states at equilibrium. a. Binding affinities (Kd) calculated from the fractional intensity of each species for G10. N/A, not available. b,c, Nonlinear least-squares fitting of fractional abundance of unbound and bound states for 0–300 μM G10 with BIg–RbCBM74–BIg (b) and Sas6T (c). Error bars represent standard deviation of three technical replicates. d, Mean of isotopic distribution of single, double and triple helices over different concentrations of G10. Error bars represent standard deviation of six scans. e, Kd calculated from the fractional intensity of each species for G14. f,g, Nonlinear least-squares fitting of fractional abundance of unbound and bound states for 0–300 μM G14 with BIg–RbCBM74–BIg (f) and Sas6T (g). Error bars represent standard deviation of three technical replicates. h, Mean of isotopic distribution of single, double and triple helices over different concentrations of G14. Error bars represent standard deviation of six scans. CL, concentration of the ligand; P, protein; PL, protein–ligand.
For Sas6T, the binding state distribution was markedly different (Fig. 6a,c and Extended Data Fig. 9b). At low G10 concentrations, there is a mix of one-bound and two-bound states, and as G10 increases, the two-bound fraction dominates. Kd values for 1:1 and 1:2 protein:ligand complexes were calculated to be 2.30 ± 0.25 μM and 104.64 ± 8.63 μM, respectively, in reasonable agreement with ITC data for BIg–RbCBM74–BIg and RbCBM26 alone (Fig. 6a). These data best support a model whereby RbCBM26 and RbCBM74 each bind one molecule of G10 independently.
Longer maltooligosaccharides form double helices in solution, and high-resolution MS with G10 showed a wide range of charged state distributions corresponding to single, double and triple helical structures depending on concentration (Fig. 6d and Supplementary Fig. 5a). G10 forms double and triple helices at high concentrations (300 μM), the latter of which may be an artifact of the ESI process, or by double helix formation from overlapping G10 molecules (Supplementary Fig. 5a). Although we could not resolve peaks from higher concentrations of G10, we can conclude that at 1 mM, the concentration used for crystallization, most of the ligand forms double helices. However, as RbCBM74 does not absolutely require a double helix but rather α-glucan that adopts the correct geometry, it is not surprising to see our high-affinity binding site saturated by a single G10 by both ITC and native MS, as this is the more abundant species at low concentrations.
Given that this protein binds solubilized potato amylopectin with ~tenfold better affinity than G10, we performed native MS with maltotetradecaose (G14). BIg–RbCBM74–BIg exhibits a modestly higher affinity for G14 (Kd = 1.29 ± 0.10 μM) than G10 (Kd = 2.16 ± 0.53 μM) (Fig. 6e,f). The binding state distribution for BIg–RbCBM74–BIg demonstrated that the two-bound state becomes the dominant species at higher G14 concentrations (Extended Data Fig. 9c). For Sas6T, we observed a binding state distribution similar to that with G10 (Fig. 6g). Notably, we observed higher affinities for G14 for both one-bound and two-bound states (Kd values of 0.17 ± 0.03 μM and 64.44 ± 4.83 μM, respectively) (Fig. 6e). This suggests that the RbCBM74 binding platform extends beyond what we see in our crystal structure. Our lower Kn for both constructs with G14 may be an artifact of the nESI process, as described above, rather than an additional binding site, although we cannot completely rule out this possibility. The higher affinity observed for Sas6T over the RbCBM74 may be because binding by the longer ligand is aided by the juxtaposition of the CBM26, which then becomes saturated at higher concentrations. It is possible that there is some synergy in binding at the two sites with longer ligands, although further work is needed to investigate this possibility. High-resolution MS of G14 alone demonstrates both single and double helical populations, with more than half of the ligand forming double helices at 150 μM (Fig. 6h and Supplementary Fig. 5b). As these states are in equilibrium, we cannot test binding to a single versus a double helical structure. However, the RbCBM74 site clearly selects maltooligosaccharides that adopt the geometry found in double helical α1,4-linked glucose.
Discussion
CBMs are distinct protein domains that assist with substrate breakdown by specifically binding polysaccharide targets. These domains are especially important for binding to insoluble substrates like crystalline cellulose and semi-crystalline starch granules. The CBM74 family binds insoluble starch and its constituents, amylose and amylopectin. CBM74s are frequently (81 out of 99 sequences) encoded adjacent to CBM25s or CBM26s21 (Supplementary Table 1). RbCBM26 has a canonical binding platform that accommodates motifs found in linear and circular maltooligosaccharides. By contrast, RbCBM74 has an extended ligand binding groove that requires at least eight glucose residues and accommodates the geometry specific to double helices found in amylopectin. Although our data suggest binding to single helices as well, the dimensions of the RbCBM74 binding platform preclude binding of single helices that adopt the wider V-amylose geometry (Extended Data Fig. 5d).
As it is on the cell surface, the CBM74 of Sas6 may target R. bromii to the crystalline amylopectin regions of starch granules that are not easily accessible to human or other bacterial amylases. Amylopectin within starch granules is so tightly packed that multiple hydrogen bonding interactions stabilize interactions between adjacent double helices8,47,48. The binding cleft on RbCBM74 is quite shallow (Fig. 3a), such that we speculate that RbCBM74 recognition of these double helices could occur without disrupting the crystalline architecture. Our data using maltooligosaccharides as long as 14 glucose units suggest that individual molecules bind at RbCBM26 and RbCBM74. However, it is possible that longer α-glucan chains, such as amylopectin α1,4-linked regions that can span up to 60 glucose units or amylose chains of the proper geometry, would allow these modules to dock to the same helical structure55. As the CBM26 and CBM74 do not recognize α1,6 branch points, these motifs make up a relatively small proportion of amylopectin and thus are unlikely to interfere with protein docking.
Whether Sas6 aids in localizing the organism and its enzymatic machinery to the granule or whether the protein has a more integral role in aiding catalysis by unwinding or disrupting the crystalline structure of the granule is unknown. At this moment, we favor the idea that the primary function of CBM74 is docking to starch granules, as RbCBM74 seems to recognize the native shape of the double helical amylopectin. However, it is possible that as the protein docks to the granule, this results in local disruption of the crystalline network that aids in starch degradation.
Unlike R. bromii, resistant starch-using Bifidobacteria encode CBM74-containing multimodular extracellular amylases56. A recent study looked at the amylases that were differentially encoded between Bifidobacterial strains that could or could not bind starch granules57,58. Resistant starch degrading enzyme 3 (RSD3) was differentially encoded in the resistant starch-binding strains. It contains a CBM74 and has high activity on high-amylose corn starch. The CBM74–CBM26 motif is present in RSD3, so the structural and functional insights from Sas6 may suggest how these CBMs structurally assist with granular starch hydrolysis. CBM74s might serve as a molecular marker for the ability to break down resistant starch in metagenomic samples21. CBM74s might also make attractive additions to engineered enzymes for enhanced starch degradation on the industrial scale or as an adjunct to starch prebiotics. The structural and functional picture of RbCBM74 here will accelerate the targeted use of this domain for various health and industrial applications.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41594-023-01166-6.
Methods
Recombinant protein cloning and expression
We used a previously described cloning and expression protocol to generate each of the recombinant protein constructs used in this study59. Genomic DNA was isolated from R. bromii strain L2–63, and the constructs for Sas6 without the signal peptide were amplified using the primers listed in Supplementary Table 3 with overhangs complementary to the Expresso T7 Cloning & Expression System N-His pETite vector (Lucigen). The forward primers were engineered to include a tobacco etch virus (TEV) protease recognition site. PCR was performed with Flash PHUSION polymerase (ThermoFisher). The amplified products and the linearized N-His pETite vector were transformed in HI-Control10G Chemically Competent Cells (Lucigen) and plated on LB plates supplemented with 50 μg ml−1 kanamycin (Kan). Transformants were screened for the insertion of Sas6 and validated by sequencing. The Sas6-pETite plasmids were transformed into chloramphenicol (Chl)-resistant E. coli Rosetta (DE3) pLysS cells and plated on LB plates supplemented with 50 μg ml−1 Kan and 20 μg ml−1 Chl. E. coli cells were grown at 37 °C to an optical density at 600 nm of 0.6–0.8 in Terrific Broth supplemented with 50 μg ml−1 Kan and 20 μg ml−1 Chl, after which time the temperature was lowered to 20 °C and 0.5 mM Isopropyl β-d-1-thiogalactopyranoside (IPTG) was added. After 16 h of growth, 1 l of cells was centrifuged, resuspended in 40 ml of Buffer A (20 mM Tris pH 8.0, 300 mM NaCl) and lysed by sonication. Cell lysate was separated from cell debris by centrifugation for 30 min at 30,000×g. Then, 3 ml of Ni-NTA resin was packed into Econo-Pac Chromatography Columns (Bio-Rad) and equilibrated with Buffer A. Lysate was passed through the packed columns and washed with 70 ml of Buffer A. Proteins were eluted from the columns by a stepwise increase in Buffer B (20 mM Tris pH 8.0, 300 mM NaCl, 500 mM imidazole) at 10–25% Buffer B. TEV protease (1 mg) was added to each protein and the mix was dialyzed overnight using dialysis tubing (SpectraPor) in 1 l of storage buffer (20 mM HEPES pH 7, 100 mM NaCl). The dialyzed protein–TEV mixture was applied to Ni-NTA resin and the flow-through was collected and concentrated using a VivaSpin 20 concentrator (Fisher Scientific).
Sas6 immunofluorescence
Custom anti-Sas6T antiserum was generated by rabbit immunization with purified recombinant Sas6T (Lampire Biological Laboratories). Antiserum was used for western blotting and cell staining. R. bromii cells were grown to mid-log phase on RUM media16 with 0.1% potato amylopectin. A 1 ml sample of R. bromii culture was centrifuged for 1 min at 13,000×g and washed three times with 1× PBS pH 7.4. Then, 2 μl of cells were spread on a glass slide and fixed with 10% formaldehyde in PBS. Slides were washed 3× in PBS but were not permeabilized. Cells were blocked for 30 min with 10% goat serum (Jackson ImmunoResearch). The anti-Sas6T antiserum was diluted 1:1,000 in 10% goat serum and applied for 1 h to cells at 20 °C. The slides were washed three times for 5 min in PBS before the application of 1:500 goat anti-rabbit Alex-aFluor488 antibody (ThermoFisher) for 30 min. Slides were washed three times for 5 min in PBS and preserved with Prolong Gold Antifade reagent and then dried overnight. Cells were imaged at the University of Michigan Microscopy Core on a Leica Stellaris Light Scanning Confocal microscope with a ×100 objective using LASX software.
Western blotting
R. bromii was grown to mid-log phase overnight on RUM media containing 0.1% potato amylopectin16. A total of 1 ml of cells was pelleted and washed twice in PBS pH 7.4, then resuspended to a final volume of 50 μl in 5 mM Tris-HCl pH 8.5. The culture supernatant was passed through a 0.2 μm filter and 50 μl was reserved for analysis. Proteins were precipitated from the remaining supernatant by the addition of ¼ volume of 100% trichloroacetic acid and incubated for 30 min on ice. The precipitate was collected by centrifugation and washed twice with 200 μl cold acetone. The resulting pellet was dried and resuspended in 50 μl of 5 mM Tris-HCl pH 8.5. Samples were separated by SDS–PAGE on two 10% Tris-glycine gels and then transferred to a PVDF membrane. Blots were blocked in EveryBlot Blocking Buffer (Bio-Rad) for 30 min and then washed with PBS pH 7.4 + 0.05% Tween 20 (PBST). To detect Sas6, one membrane was incubated with custom rabbit anti-Sas6 antiserum (Lampire) diluted 1:500 and the other with custom rabbit α-glutamic acid decarboxylase from R. bromii (Lampire) diluted 1:10,000 in PBST + 5% non-fat dry milk (PBST-milk) for 1 h. Blots were washed in PBST and incubated in horseradish peroxidase-conjugated goat anti-rabbit antibody (ThermoFisher) diluted 1:5,000 in PBST-milk and the signal was detected by ECL chemiluminescence (ThermoFisher) via GeneSys on a Syngene Pxi6 scanner.
Granular starch-binding assays and adsorption depletion
Granular starch-binding assays were conducted with potato starch (Bob’s Red Mill), corn starch (Sigma), wheat starch (Sigma) or Avicel (Fluka). All polysaccharides were washed three times with an excess of assay buffer (20 mM HEPES pH 7.0, 100 mM NaCl) to remove soluble starch and oligosaccharides and prepared as a 50 mg ml−1 slurry. A total of 1 mg (corn) or 5 mg (potato) of the starch slurry was aliquoted into 0.2 ml tubes in triplicate, centrifuged at 2,000×g for 2 min and the supernatant was carefully removed. Then, 100 μl of protein ranging from 0.5 μM to 10 μM protein was added to each starch and the tubes were agitated by end-over-end rotation at room temperature for 1 h. After centrifugation at 2,000×g for 2 min, 20 μl of the supernatant was removed for unbound protein concentration determination by absorbance at A280 using a ThermoFisher NanodropOne with three replicate measurements per sample. The remaining 80 μl of supernatant was removed for SDS–PAGE gel analysis. The concentration of unbound protein remaining in the supernatant was used to determine the μmoles of protein bound per gram of starch, which was plotted against the concentration of initial (free) protein to generate a binding curve41. Overall Kd and Bmax were determined with a one-site binding model (specific binding) using GraphPad Prism version 9.2.0 for Windows (GraphPad Software, San Diego, California USA; www.graphpad.com).
To assess starch granules for bound protein, the granules were washed three times with an excess of assay buffer by mixing and centrifugation, the final wash supernatant was removed and 100 μl of Laemmli buffer containing 1 M urea was added to the starch pellet to denature any bound protein but keep the original volume consistent. To qualitatively determine the amount of unbound and bound protein, 10 μl each of the wash supernatant and solubilized pellet fraction were run separately using SDS–PAGE. Bovine serum albumin (BSA) was used as a negative control and to confirm that the unbound protein was sufficiently washed from the starch granules.
Polysaccharide affinity PAGE
Non-denaturing polyacrylamide gels with and without potato amylopectin (Sigma), corn amylopectin (Sigma), potato amylose (Sigma), bovine liver glycogen (Sigma), pullulan (Sigma) or dextran (Sigma) to a final concentration of 0.1% polysaccharide were cast. All polysaccharides were autoclaved, and amylose was solubilized by alkaline solubilization with 1 M NaOH and acid neutralization to pH 7 with HCl60. Sas6 protein samples were mixed with 6X loading dye lacking SDS. Gels were run concurrently for 4 h on ice and subsequently stained with Coomassie (0.025% Coomassie blue R350, 10% acetic acid and 45% methanol). Gels were imaged on a Bio-Rad Gel Doc Go imaging system. The distance between each band and the top of the separating gel was measured using ImageJ61. The ratio of the distance migrated by each band to the distance the BSA band traveled was determined. Binding was considered positive if the ratio was less than 0.85, as previously described43.
Isothermal titration calorimetry
All ITC experiments were carried out using a TA Instruments standard volume NanoITC. For each experiment, 1,300 μl of 25 μM protein was added to the sample cell and the reference cell was filled with distilled water. The sample injection syringe was loaded with 250 μl of the appropriate ligand concentration (0.5–5 mM) to fully saturate the protein by the end of 25 injections of 10 μl. Titrations were performed at 25 °C with a stirring speed of 250 rpm. The resulting data were modeled using TA Instruments NanoAnalyze software employing the pre-set models for independent binding and blank (constant) to subtract the heat of dilution. Note that exothermic heat release is denoted with an upward peak on this machine. For interactions with high affinity (c-value of >5 at 25 μM protein), no alterations were made to the model. If the calculated c-value of an interaction fell below five, the n-value was set to one, as indicated in the figure legend following the guidance for modeling low-affinity interactions62. For polysaccharide titrations, curves were modeled by varying the substrate concentration until n = 1, such that the Kd represents the overall affinity for the construct46.
Protein crystallization
Crystallization conditions for ACX (2 mM) bound (PDB 7UWW) and unliganded (PDB 7UWU) crystals of Sas6T were screened by means of a 96-well sparse matrix screen (Peg Ion HT, Hampton Research) in a sitting drop vapor diffusion experiment at room temperature. Screens were set up using an Art Robbins Gryphon robot with 20 mg ml−1 protein in a three-well tray (Art Robbins no. 102-0001-13) using protein-to-well solution ratios of 2:1, 1:1 and 1:2. Small crystals were observed in 0.2 M potassium thiocyanate pH 7.0, 20% w/v polyethylene glycol (PEG) 3350 (condition B2) and were further optimized by varying pH, PEG 3350 percentage and potassium thiocyanate concentration. Crystals were microseeded with a crystal seeding tool (Hampton) in a sitting drop setup of 1.5 μl drops with 2:1, 1:1 or 1:2 protein:well solution ratios. The optimal crystallization solution contained 0.3 M potassium thiocyanate pH 7.0, 24% PEG 3350 and 1 mM Anderson–Evans polyoxotungstate [TeW6O24]6− (TEW) (Jena Biosciences) to improve crystal diffraction. Before data collection, crystals were cryoprotected in a mixture of 80% crystallization solution supplemented with 20% ethylene glycol and then plunged into liquid nitrogen.
Crystallization conditions for maltodecaose-bound RbCBM74 structure (PDB 7UWV) were generated from BIg–RbCBM74–BIg construct (residues 134–665) using 96-well sparse matrix screens. A crystalline mass observed in 60% v/v Tacsimate pH 7.0, 0.1 M BIS-TRIS propane pH 7.0 (Hampton Salt-Rx HT-well H12) was used to microseed an optimized solution containing 30% Tacsimate, 0.1 M HEPES pH 7.0 and 2 mM maltodecaose (CarboExpert). No additional cryoprotection was required before plunge-freezing into liquid nitrogen.
Structure determination and refinement
X-ray data were collected at the Life Sciences Collaborative Access Team (LS-CAT) at Argonne National Laboratory’s Advanced Photon Source (APS) in Argonne, IL, USA. Data were processed at APS using autoPROC with XDS for spot finding, indexing and integration followed by Aimless for scaling and merging63–65. Intrinsic sulfur SAD phasing was used to determine the structure of Sas6T–ACX (PDB 7UWW) using AutoSol in Phenix66,67. Those coordinates were then used for molecular replacement in Phaser to determine the unliganded Sas6T (PDB 7UWU) and BIg–RbCBM74–BIg/G10 (PDB 7UWV) structures68. All three structures were refined by manual model building in Coot and refinement in Phenix. refine69,70. Ramachandran plots showed angles within the favored (>96%) or allowed region, except for the Sas6T–ACX structure which had 0.08% in the outlier region. Metal ion identities were validated using the web-based CheckMyMetal (CMM) tool71 (https://cmm.minorlab.org). Carbohydrate models were validated using Privateer72. The X-ray diffraction and structure refinement statistics are summarized in Table 1.
Size exclusion chromatography–SAXS experiment
SAXS was performed at the Biophysics Collaborative Access Team (BioCAT; beamline 18ID at APS) with in-line size exclusion chromatography. Data collection was performed via the BioCon software developed at the BioCAT beamline. The sample was loaded onto a Superdex 200 Increase 10/300 GL column (Cytiva) at 0.6 ml min−1 by an AKTA Pure FPLC (GE), and the eluate, after it passed through the UV monitor, was flown through the SAXS flow cell. The flow cell consists of a 1.0 mm internal diameter quartz capillary with ~20 μm walls with a coflowing buffer sheath to separate the sample from the capillary walls and prevent radiation damage73. Scattering intensity was recorded using a Pilatus3 X 1M (Dectris) detector at 3.6 m from the sample giving a q-range of 0.003 Å−1 to 0.35 Å−1. Exposures (0.7 s) were acquired every 1 s during elution, and data were reduced using BioXTAS RAW 2.1.1 (ref. 74). Within RAW, the volume of correlation, molecular weight and oligomeric state were determined75,76. Buffer blanks were created by averaging regions flanking the elution peak and were subtracted from exposures selected from the elution peak to create the I(q) vs q curves used for subsequent analyses. The molecular weight was calculated by comparison to known structures (Shape&Size)77. P(r) function was determined using GNOM78. GNOM and Shape&Size are part of the ATSAS package (version 3.0)79. High-resolution structures were fit to the SAXS data using FoXS and flexibility in the high-resolution structures was modeled against the Multi-FoXS data80. Supplementary Tables 4a–c list sample, instrumentation and software for the size exclusion chromatography–SAXS experiment.
HDX–MS experiments
HDX–MS experiments were performed using a Synapt G2-SX HDMS system (Waters) and data were collected by the IntelliStart software integrated with MassLynx, similar to previously reported studies81. Deuteration reactions were incubated at 20 °C for 15 s, 150 s, 1,500 s and 15,000 s in triplicate. Then, 3 μl of BIg–RbCBM74–BIg, alone or in the presence of G10, was diluted with 57 μl of deuterated labeling buffer. Nondeuterated data were acquired by dilution with protonated buffer, and fully deuterated data were prepared by dilution in 99% D2O, 1% (v/v) formic acid for 48 h at room temperature. Samples were measured in triplicate using automated handling with a PAL liquid handling system (LEAP), using randomized sequential collection with Chronos.
Following incubation, deuteration was quenched by mixing 50 μl of the solution with 50 μl of 100 mM phosphate, pH 2.5 at 0.3 °C. Immediately after the samples were quenched, 95 μl of the sample was loaded onto an Acquity M-class UPLC (Waters) with sequential inline pepsin digestion (Waters Enzymate BEH Pepsin column, 2.1 mm × 30 mm) for 3 min at 15 °C followed by reverse phase purification (Acquity UPLC BEH C18 1.7 μm at 0.2 °C). The sample was loaded onto the column equilibrated with 95% water, 5% acetonitrile and 0.1% formic acid at a flow rate of 40 μl min–1. A 7 min linear gradient (5–35% acetonitrile) followed by a ramp and 2 min block (85% acetonitrile) was used for separation and directly continuously infused onto a Synapt XS using Ion Mobility (Waters). [Glu1]-Fibrinopeptide B was used as a reference.
Data from nondeuterated samples were used for peptide identification with ProteinLynx Global Server 3.0 (Waters). Full coverage of the protein was obtained, with the exception of the region from residues 289–296, where peptides were not detected. The filtered peptide list and MS data were imported into HDExaminer (Sierra Analytics) for deuterium uptake calculation using both retention time and mobility matching. Representative peptides were used for a final cumulative sequence coverage of 91.4%. Normalized deuterium uptake was calculated for protein alone and with G10, and differential protection, defined as those regions with an average of 5% difference in deuteration between states over the 150–15,000 s timepoints, were mapped onto the crystal structure using PyMOL (Schrodinger).
Native MS
Stock solutions of BIg–RbCBM74–BIg and Sas6 were de-salted and solvent exchanged into 200 mM ammonium acetate (pH 6.8–7.0) using Amicon Ultra-0.5 ml centrifugal filters (MilliporeSigma) with a 10 kDa molecular weight cutoff. Ten consecutive washing steps were performed to achieve sufficient desalting. The final concentrations of each protein stock solution after desalting were estimated by UV absorbance at 280 nm. A stock solution of G10 was prepared by dissolving a known mass in 200 mM ammonium acetate to achieve a final concentration of 200 μM. For native MS titration experiments used to quantify Kd values, the concentration of protein was fixed at 5 μM, and enough G10 was added to achieve final concentrations of 0, 5, 25, 50, 100 and 150 μM. Protein–G10 mixtures were then incubated at 4 °C overnight to achieve equilibration before native MS analysis.
All native binding experiments were performed using a Q Exactive Orbitrap MS with Ultra High Mass Range (UHMR) platform (ThermoFisher Scientific)82. Each sample (~3 μM) was transferred to a gold-coated borosilicate capillary needle (prepared in-house), and ions were generated by direct infusion using an nESI source operated in positive mode. The capillary voltage was held at 1.2 kV, the inlet capillary was heated to 250 °C and the S-lens RF level was kept at 80. Low m/z detector optimization and high m/z transfer optics were used, and the trapping gas pressure was set to two. In-source trapping was enabled with the desolvation voltage fixed at −25 V for improved ion transmission and efficient salt adduct removal. Transient times were set at 128 ms (resolution of 25,000 at m/z 400), and five microscans were combined into a single scan. A total of ~50 scans were averaged to produce the presented mass spectra. All full scan data were acquired using a noise threshold of zero to avoid pre-processing of mass spectra. A total of three measurements for each ligand concentration were performed. Data were then processed and deconvoluted using UniDec software83.
Kd measurements by native MS
We performed titration experiments for both BIg–RbCBM74–BIg and Sas6T using G10 and acquired modeled titration curves. Each bound state differed by ~1,639 Da, which agrees with the theoretical mass of G10. To obtain the binding constants, we summed the peak intensities of all abundant charge states in our mass spectra. Kd values were calculated using the relative intensities of unbound protein and each ligand-bound species from the mass spectra as previously described84. In brief, the protein–ligand binding equilibrium of BIg–RbCBM74–BIg with G10 in solution can be described by equation (1):
![]() |
(1) |
where L is the ligand and P and PL are the free protein and protein with one specifically bound ligand, respectively. As the concentration of ligand is increased, ligand molecules can bind nonspecifically during the nESI process, generating artifactual peaks in the mass spectra corresponding to a two-ligand-bound complex. Here, we presume that nonspecific binding arises equally for free protein and that which possesses one specifically bound ligand, represented by Pl and PLl in equation (1). Based on these assumptions, the equations of mass balance and binding states can be described by equation (2a–d):
| (2a) |
| (2b) |
| (2c) |
| (2d) |
where CP and CL represent the total concentrations of protein and ligand, respectively, and concentrations in brackets represent those at equilibrium. Kd and Kn represent the dissociation constants for specific and nonspecific binding steps, respectively. If the peak intensities of free protein and ligand-bound complexes are proportional to the abundances of those in solution and the spray and detection efficiency of all species is the same, then the fractional intensities of each species can be determined by equation (3):
| (3) |
Here, the fractional intensities are calculated as the sum of the intensities of main peak ions at all charge states. A Fourier transform MS method is used with signal intensities proportional to both ion abundance and charge state so ion intensities are normalized for each charge state, n85,86. These fractional intensities can be calculated from the titration experiment at each ligand concentration and can then be related to the equilibrium constants by equation (4a–c):
| (4a) |
| (4b) |
| (4c) |
[L] can also be determined from nESI–MS titration data:
| (5) |
[L] was then obtained at each ligand concentration and applied to equation (4a–c). Equations (4a,b) were then fitted to experimental fractional intensities using nonlinear least-squares curve fitting using the lsqnonlin.m. function in MATLAB. A more detailed derivation of these equations is provided elsewhere84, along with the approach used for Sas6, which possesses two sites for specific binding (RbCBM74 and RbCBM26) and exhibits a third nonspecific bound state as shown in equation (6).
![]() |
(6) |
High-resolution MS
Stocks of G10 or G14 were diluted to 5, 25, 50, 100, 150 and 300 μM with 200 mM ammonium acetate. Tuning parameters similar to those in native MS experiments were used with a few exceptions. All experiments were performed in negative mode, and low m/z detector optimization and low m/z transfer optics were used. The negative mode was selected, as positive mode spectra were heavily adducted with common cations present in solution. In-source trapping was 0 and −25 V for G10 and G14, respectively. G14 required more activation to assist in sufficient desolvation. We noticed that higher energies would generate excessive fragments of both G10 and G14. Transient times were set at 1,024 ms (resolution of 200,000 at m/z 400), and approximately six scans were averaged to produce the presented mass spectra.
An overlap of monoisotopic m/z peaks corresponding to single, double and triple helices was observed at high concentrations (Supplementary Fig. 5a,b). To approximate the relative abundance of each oligomeric state, we first simulated the isotopic distribution of each state using enviPat and then calculated the theoretical proportion of the monoisotopic species with respect to the proceeding peak that corresponds to a difference in carbon-13 composition87. We then manually used these proportion factors to approximate the intensities of each oligomeric state in our experimental data. Calculations were performed for six individual scans and averaged.
Sequence comparison and evolutionary analysis
Amino acid sequences of 29 CBM74 modules were collected according to information in the CAZy database (http://www.cazy.org; CAZy update, March 2022)88. This set was completed with sequences of hypothetical CBM74s based on protein BLAST searches (https://blast.ncbi.nlm.nih.gov/Blast.cgi) using the CBM74 sequences from Sas6 of Ruminococcus bromii (GenBank accession no. PKD32096.1) and the GH13_32 α-amylase from Microbacterium aurum (GenBank accession no. AKG25402.1) as queries19,89,90. Three searches with each query sequence were performed, limiting the searched databases to taxonomy kingdoms of Bacteria, Archaea and Eucarya (with no relevant results for the latter two). To capture a wide spectrum of organisms harboring a CBM74 module, one non-redundant amino acid sequence was selected to represent each species and/or bacterial strain. The BLAST searches yielded 93 additional CBM74 sequences of bacterial origin, with the last sequence from the CBM74 of a putative α-amylase from uncultured Eubacterium sp. (GenBank accession no. SCJ65691.1; E-value: 3e−39). This set of 122 sequences was reduced to 99 by eliminating 23 sequences owing to their redundancy and/or incompleteness of the CBM74 (Supplementary Table 1). All sequences were retrieved from the GenBank (https://www.ncbi.nlm.nih.gov/genbank) and/or UniProt (https://www.uniprot.org) databases91,92. Sequence alignment was performed using Clustal-Omega (https://www.ebi.ac.uk/Tools/msa/clustalo)93. Subtle manual tuning of the computer-produced alignment was necessary to perform to maximize sequence similarities. The evolutionary tree of these 99 sequences was calculated by a maximum-likelihood method (on the final alignment including the gaps) using the WAG substitution model and the bootstrapping procedure with 500 bootstrap trials implemented in the MEGA-X package94–96. The calculated tree file was displayed with the program iTOL97 (https://itol.embl.de). The structural comparison was created using the above-mentioned alignment in conjunction with the web-based CONSURF tool51–53.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Extended Data
Extended Data Fig. 1 |. Small Angle X-Ray Scattering indicates that Sas6 remains mostly compact in solution with minor extension beyond that of the crystal structure.

a. Total subtracted scattering intensity (left y axis) and Rg (right y axis) as a function of time for the SEC-SAXS elution. The elution resolved several peaks, including a single strong monodisperse peak as indicated by the constant radius of gyration (Rg). b. Guinier fit analysis with normalized residual shown in the bottom panel. Rg and I(0) values of 29.44 ± 0.04Å and 0.04 ± 3.65 × 10−5 were obtained and the fit and normalized fit residuals confirmed this peak was monodisperse. The molecular weight of Sas6T from the SAXS data was calculated to be 61.0 kDa (theoretical 68.9 kDa) indicating it is primarily monomeric in solution. c. P(r) versus r normalized by I(0). The Dmax from the P(r) function for Sas6T is 90Å. The overall shape of the P(r) function for Sas6T, calculated by indirect Fourier transform (IFT) using GNOM, has a relatively Gaussian shape that is characteristic of a globular compact particle with the main peak at r = ~30 Å. There is a small peak at r = 55Å which suggests there are two structurally separate motifs, possibly RbCBM26 and RbCBM74. d. Dimensionless Kratky plot; y = 3/e and as dashed gray lines to indicate where a globular protein would peak. The small plateau in the mid to high q region, around qRg = 5 in the dimensionless Kratky plot indicates some extension or disorder in the system. These results suggest the presence of two separate modules with flexibility between them, likely corresponding to the two CBMs. e. FoXS and f. MultiFoXS fits (black) to the Sas6T SAXS data (red) with normalized residual shown in the bottom panel. The FoXS fit had a χ2= 2.46 and showed systematic deviations in the normalized fit residual suggesting significant differences between the lowest energy conformation of Sas6T in the crystal structure and the structure of Sas6T in solution. For MultiFoXS we assigned the linkers between the domains (residues 130–137 and 572–583) as flexible. MultiFoXS gave a best fit with a 1-state solution with a χ2= 0.96 and calculated Rg of 29.2Å which corroborates the Guinier Rg calculation. g. Topology map of BIgA and BIgB domains illustrating the Greek key motif in BIgA and showing the loops that hydrogen bond with one another. h. A surface area analysis of the BIg domains using PISA in CCP4 gives a buried surface area of 353.9Å34. Residues providing hydrogen bonding are represented by stick side chains and the hydrogen bonds are shown by dashed yellow lines.
Extended Data Fig. 2 |. RbCBM74 is a singular globular domain, most similar to TmCBM9.

a. Structure of RbCBM74 (PDB 7uww) colored from N-terminus (blue) to C-terminus (red). b. Short β-strands leading into and out of RbCBM74 domain are colored in red and blue. c. Overlay of TmCBM9 (gold) (PDB 1i82-A) and RbCBM74 (blue). The DALI server calculated an RMSD of 3.2Å and sequence identity of 17%. d. Close-up view of TmCBM9 binding site showing the two TmCBM9 Trp residues involved in binding cellobiose (gold) and W373 of RbCBM74 (blue) which lies in the same region but is occluded from the surface by a loop containing residues 374–384. e. Zoomed in view of calciums coordinated in the RbCBM74 domain with side chains shown in sticks, main chain shown in lines and Ca2+ ions by yellow spheres. Atomic distances are shown in Å and residues are labeled. Residues are colored by element with oxygen shown in red.
Extended Data Fig. 3 |. RbCBM26 shares a conserved binding site with other CBM26.

The top structural homologs of RbCBM26 from DALI36,42 are the CBM25 from Bacillus halodurans C-125 (BhCBM25) from α-amylase G-6 (PDB ID: 2C3V-A, Z-score: 12.4, RMSD 1.9Å, identity: 16%) and CBM26 (BhCBM26) from the same enzyme (PDB ID: 6B3P-B, Z-score: 12.1, RMSD 1.9Å, identity: 20%)41. Another top DALI result is ErCBM26b of Amy13K from Eubacterium rectale (PDB ID 2C3H-B, Z-score: 10.8, RMSD 1.7Å, identity: 19%)43. a. Sequence alignment of RbCBM26 (RBL236_00020), ErCBM26 (ERE_20420), BhCBM26 (BH0413), and LaCBM26 (Q48502). Conserved binding site residues are indicated by a red arrow while variable residues are indicated by a blue arrow and provide hydrogen bonding. b. Overlay of RbCBM26 (green) with BhCBM26 (PDB 2c3h, orange), and ErCBM26 (PDB 6b3p, purple). c. Overlay of unliganded RbCBM26 (blue) and ACX-bound RbCBM26 (green) showing that loop 1 does not move upon ligand binding. b-strands are numbered for reference.
Extended Data Fig. 4 |. Representative ITC graphs of Sas6 domains.

Sas6T, RbCBM26, and BIg-RbCBM74-BIg binding to a. potato amylopectin, b. maltodecaose (G10), and c. α-cyclodextrin (ACX). Note that exothermic heat release is denoted with an upward peak on this machine.
Extended Data Fig. 5 |. RbCBM74 selects a double helical ligand geometry.

a. Overlay of RbCBM74 from Sas6T structure (PDB 7uww) in blue with RbCBM74 from BIg-RbCBM74-BIg co-crystal structure (PDB 7uwv) in deep teal. b. Loop from G374-G382 demonstrating that the unliganded loop (blue) occludes W373 but moves to allow access to W373 in the ligand-bound structure (deep teal). c. An extended view of the geometry of the G10 ligand. Intramolecular hydrogen bonds (3.6Å cutoff for ideal geometry and 3.2Å with minimal acceptable geometry) within and between G10 chains are shown in slate. Φ (O5-C1-O4′-C4′) and ψ (C1-O4′-C4′-C5′) angles of the Glc linkages in the G10 double helix ligand are labeled with G10A in magenta and G10B in grey. d. The geometry of the G10 ligand more closely resembles that of double helical B starch (cyan)48 than single helical cycloamylose (yellow, 1c58)50. Models were manually aligned in PyMOL to compare the angles, pitch, and period of the helical turns.
Extended Data Fig. 6 |. HDX-MS analysis of RbCBM74.

a. Heatmap of exchange dynamics of BIg-RbCBM74-BIg. All values are the average of three replicates. b. Representative differential uptake for peptides that both showed no significant difference (upper panels) and those which showed significant differential decreased deuteration (lower panels) in the G10 bound BIg-RbCBM74-BIg. Data points are represented by the mean +/− standard deviation.c. Heatmap of the differential exchange dynamics of BIg-RbCBM74-BIg in the absence and presence of G10. Blue represents lower exchange (protection) in the G10 bound form and red higher exchange in the G10 bound form. All values are the average of three replicates.
Extended Data Fig. 7 |. Phylogenetic tree of the 99 CBM74 family members.

a. A maximum-likelihood tree covering 99 sequences with emphasis on the two experimentally characterized CBM74s, Sas6 from Ruminococcus bromii (No. 28, blue cluster) and the subfamily GH13_32 α-amylase from Microbacterium aurum (No. 52; cyan cluster)35. The bootstrap values higher than 70% are shown. For details concerning all 99 CBM74 sequences, see Supplementary Table 1.
Extended Data Fig. 8 |. Representative ITC graphs of RbCBM74 mutations.

BIg-RbCBM74-BIg, H289A, F236A, and W373A mutations binding to a. maltodecaose (G10), and b. potato amylopectin (PAP). Note that exothermic heat release is denoted with an upward peak on this machine.
Extended Data Fig. 9 |. Mass spectra of Sas6 constructs at different ligand concentrations (0–300μM) and a fixed protein concentration of 5μM.

Charge states for unbound protein are annotated with an orange dashed line. Peaks corresponding to different bound states are observed after each charge state of the unbound protein. Spectra of a. BIg-RbCBM74-BIg or b. Sas6T in equilibrium with G10. Spectra of c. BIg-RbCBM74-BIg or d. Sas6T in equilibrium with G14.
Extended Data Table 1 |.
Table of Φ (O5-C1-O4′-C4′) and ψ (C1-O4′-C4′-C5′) angles of G10 ligand bound by RbCBM74
| RbCBM74 | B starch | V amylose | ||||
|---|---|---|---|---|---|---|
| Φ | ψ | Φ | ψ | Φ | ψ | |
| GlcAI-2 | 76.3 | −148.7 | 83.9 | −144.2 | 102.2 | −143.4 |
| GlcA2–3 | 70.6 | −147.1 | 84.2 | −144.0 | 112.1 | −108.9 |
| GlcA3–4 | 57.1 | −146.8 | 84.0 | −144.3 | 98.3 | −143.9 |
| GlcA4–5 | 91.3 | −142.2 | 84.0 | −144.0 | 105.8 | −146.3 |
| GlcA5–6 | 97.9 | −111.3 | 83.9 | −144.2 | 103.1 | −124.7 |
| GlcA6–7 | 93.3 | −139.3 | 101.5 | −121.9 | ||
| GlcA7–8 | 60.8 | −160.0 | 104.0 | −123.8 | ||
| GlcA8–9 | 99.3 | −134.8 | 105.0 | −125.9 | ||
| GICA9–10 | 102.4 | −136.1 | 105.7 | −144.5 | ||
| GlcA10–11 | 99.2 | −146.3 | 103.0 | −142.2 | ||
| GlcA11–12 | 84.5 | −142.1 | 109.0 | −116.6 | ||
| Average | 84.8 | −141.3 | 84.0 | −144.1 | 104.5 | −131.1 |
| GlcB1–2 | 97.1 | −141.7 | 83.9 | −144.2 | ||
| GlcB2–3 | 84.7 | −146.4 | 84.3 | −144.0 | ||
| GlcB3–4 | 83.1 | −150.7 | 83.9 | −144.2 | ||
| GlcB4–5 | 104.0 | −111.9 | 84.0 | −144.0 | ||
| GlcB5–6 | 76.4 | −147.9 | 83.9 | −144.2 | ||
| GlcB6–7 | 74.3 | −148.6 | ||||
| GlcB7–8 | 100.2 | −145.7 | ||||
| GlcB8–9 | 88.7 | −148.0 | ||||
| Average | 88.9 | −142.7 | 84.0 | −144.1 | ||
Supplementary Material
Acknowledgements
This work is primarily supported by a Ruth L. Kirschstein National Research Service Award Individual Predoctoral Fellowship (F31-F31AT011282 to A.L.P.) from the National Center for Complementary and Integrative Health (NCCIH) and a Research Program Project grant (P01-HL149633 to N.M.K.) from the National Heart, Lung and Blood Institute (NHLBI) of the National Institutes of Health (NIH). Next-generation Native MS technologies were supported by the National Institute of General Medical Sciences (NIGMS) of the NIH (R01-GM095832 to B.T.R.). HDX–MS acquisition was supported by the National Science Foundation (NSF) (DBI 2018007 to C.W.V.K.). The structural biology approaches used resources of the Advanced Photon Source; a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under contract no. DE-AC02-06CH11357. The Biophysics Collaborative Access Team is supported by P30-GM138395 from NIGMS-NIH. Use of the Pilatus3 X 1M detector was provided by Grant 1S10OD018090-01 from NIGMS-NIH. Use of the LS-CAT Sector 21 was supported by the Michigan Economic Development Corporation and the Michigan Technology Tri-Corridor (grant 085P1000817). S.J. and F.M. thank the Slovak Grant Agency VEGA for the financial support by grant no. 2/0146/21. In collaboration with this research, we acknowledge support from the University of Michigan Biomedical Research Core Facilities Light Microscopy Core. For the native MS work, we would like to acknowledge the Biological Mass Spectrometry facility at the University of Michigan Department of Chemistry. The content is solely the responsibility of the authors and does not necessarily represent the official views of VEGA, the National Science Foundation or the NIH.
Footnotes
Code availability
SAXS data collection was performed using the Python-based BioCon software developed at and for the BioCAT beamline, available at https://github.com/biocatiit/beamline-control-user/tree/master/biocon. The UniDec software is available at https://github.com/michaelmarty/UniDec/. Source data are provided with this paper.
Competing interests
The authors declare no competing interests.
Extended data is available for this paper at https://doi.org/10.1038/s41594-023-01166-6.
Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41594-023-01166-6.
Data availability
The X-ray structures and diffraction data reported in this paper have been deposited in the Protein Data Bank under accession codes 7UWU, 7UWV and 7UWW. The SAXS data are deposited in the small angle X-ray scattering database (SASDB) under accession code SASDPE2 (ref. 98). CBM74 sequences were pulled from the CAZy database (http://www.cazy.org; CAZy update, March 2022) and via BLAST against GenBank (https://www.ncbi.nlm.nih.gov/genbank) and/or UniProt (https://www.uniprot.org) databases in March 2022. Native MS data are publicly available in the Deep Blue Data Repository administered by the University of Michigan at https://doi.org/10.7302/5fmh-8f87. The HDX–MS data are publicly available in the Zenodo database under accession number 8371163 (https://zenodo.org/record/8371163). Source data and Supplementary Data files are provided with this paper. All other relevant data supporting the key findings of this study are available within the article, its Supplementary Information or from the corresponding authors upon reasonable request. Source data are provided with this paper.
References
- 1.Salminen S, Isolauri E & Onnela T Gut flora in normal and disordered states. Chemotherapy 41, 5–15 (1995). [DOI] [PubMed] [Google Scholar]
- 2.Gibson GR & Roberfroid MB Dietary modulation of the human colonic microbiota: introducing the concept of prebiotics. J. Nutr 125, 1401–1412 (1995). [DOI] [PubMed] [Google Scholar]
- 3.Cummings JH & Macfarlane GT The control and consequences of bacterial fermentation in the human colon. J. Appl. Bacteriol 70, 443–459 (1991). [DOI] [PubMed] [Google Scholar]
- 4.Backhed F et al. Host–bacterial mutualism in the human intestine. Science 307, 1915–1920 (2005). [DOI] [PubMed] [Google Scholar]
- 5.Wu X et al. Effects of the intestinal microbial metabolite butyrate on the development of colorectal cancer. J. Cancer 9, 2510–2517 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zaman SA & Sarbini SR The potential of resistant starch as a prebiotic. Crit. Rev. Biotechnol 36, 578–584 (2016). [DOI] [PubMed] [Google Scholar]
- 7.Bertoft E Understanding starch structure: recent progress. Agronomy 7, 56 (2017). [Google Scholar]
- 8.Pérez S & Bertoft E The molecular structures of starch components and their contribution to the architecture of starch granules: a comprehensive review. Starch 62, 389–420 (2010). [Google Scholar]
- 9.Ze X et al. Ruminococcus bromii is a keystone species for the degradation of resistant starch in the human colon. ISME J 6, 1535–1543 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Jung DH et al. Bifidobacterium adolescentis P2P3, a human gut bacterium having strong non-gelatinized resistant starch-degrading activity. J. Microbiol. Biotechnol 29, 1904–1915 (2019). [DOI] [PubMed] [Google Scholar]
- 11.Teichmann J & Cockburn DW In vitro fermentation reveals changes in butyrate production dependent on resistant starch source and microbiome composition. Front. Microbiol 12, 640253 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Duranti S et al. Genomic characterization and transcriptional studies of the starch-utilizing strain Bifidobacterium adolescentis 22L. Appl. Environ. Microbiol 80, 6080–6090 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Belenguer A et al. Two routes of metabolic cross-feeding between Bifidobacterium adolescentis and butyrate-producing anaerobes from the human gut. Appl. Environ. Microbiol 72, 3593–3599 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Venkataraman A et al. Variable responses of human microbiomes to dietary supplementation with resistant starch. Microbiome 4, 33 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Baxter NT et al. Dynamics of human gut microbiota and short-chain fatty acids in response to dietary interventions with three fermentable fibers. mBio 10, e02566–18 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ze X et al. Unique organization of extracellular amylases into amylosomes in the resistant starch-utilizing human colonic firmicutes bacterium Ruminococcus bromii. mBio 6, e01058–15 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Smith SP & Bayer EA Insights into cellulosome assembly and dynamics: from dissection to reconstruction of the supramolecular enzyme complex. Curr. Opin. Struct. Biol 23, 686–694 (2013). [DOI] [PubMed] [Google Scholar]
- 18.Bayer EA, Morag E & Lamed R The cellulosome—a treasure-trove for biotechnology. Trends Biotechnol 12, 379–386 (1994). [DOI] [PubMed] [Google Scholar]
- 19.Mukhopadhya I et al. Sporulation capability and amylosome conservation among diverse human colonic and rumen isolates of the keystone starch-degrader Ruminococcus bromii. Environ. Microbiol 20, 324–336 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Janecek S et al. Starch-binding domains as CBM families—history, occurrence, structure, function and evolution. Biotechnol. Adv 37, 107451 (2019). [DOI] [PubMed] [Google Scholar]
- 21.Valk V et al. Carbohydrate-binding module 74 is a novel starch-binding domain associated with large and multidomain alpha-amylase enzymes. FEBS J 283, 2354–2368 (2016). [DOI] [PubMed] [Google Scholar]
- 22.Dobranowski PA & Stintzi A Resistant starch, microbiome, and precision modulation. Gut Microbes 13, 1926842 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ravi A et al. Hybrid metagenome assemblies link carbohydrate structure with function in the human gut microbiome. Commun. Biol 5, 932 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Xu J et al. Metatranscriptomic analysis of colonic microbiota’s functional response to different dietary fibers in growing pigs. Anim. Microbiome 3, 45 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zhang H et al. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 46, W95–W101 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lombard V et al. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42, D490–D495 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Blum M et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 49, D344–D354 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Cerqueira FM et al. Sas20 is a highly flexible starch-binding protein in the Ruminococcus bromii cell-surface amylosome. J. Biol. Chem 298, 101896 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Fontes CM & Gilbert HJ Cellulosomes: highly efficient nanomachines designed to deconstruct plant cell wall complex carbohydrates. Annu. Rev. Biochem 79, 655–681 (2010). [DOI] [PubMed] [Google Scholar]
- 30.Matsui M, Kakuta M & Misaki A Comparison of the unit-chain distributions of glycogens from different biological sources, revealed by anion exchange chromatography. Biosci. Biotechnol. Biochem 57, 623–627 (1993). [Google Scholar]
- 31.Brewer MK & Gentry MS Brain glycogen structure and its associated proteins: past, present and future. Adv. Neurobiol 23, 17–81 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Singh RS, Saini GK & Kennedy JF Pullulan: microbial sources, production and applications. Carbohydr. Polym 73, 515–531 (2008). [DOI] [PubMed] [Google Scholar]
- 33.Khalikova E, Susi P & Korpela T Microbial dextran-hydrolyzing enzymes: fundamentals and applications. Microbiol Mol. Biol. Rev 69, 306–325 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Krissinel E & Henrick K Inference of macromolecular assemblies from crystalline state. J. Mol. Biol 372, 774–797 (2007). [DOI] [PubMed] [Google Scholar]
- 35.Valk V, Rachel MVDK & Dijkhuizen L The evolutionary origin and possible functional roles of FNIII domains in two Microbacterium aurum B8.A granular starch degrading enzymes, and in other carbohydrate acting enzymes. Amylase 1, 1–11 (2017). [Google Scholar]
- 36.Holm L Using Dali for protein structure comparison. Methods Mol. Biol 2112, 29–42 (2020). [DOI] [PubMed] [Google Scholar]
- 37.Notenboom V et al. Crystal structures of the family 9 carbohydrate-binding module from Thermotoga maritima xylanase 10A in native and ligand-bound forms. Biochemistry 40, 6248–6256 (2001). [DOI] [PubMed] [Google Scholar]
- 38.Milles LF et al. Calcium stabilizes the strongest protein fold. Nat. Commun 9, 4764 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Zheng H et al. CheckMyMetal: a macromolecular metal-binding validation tool. Acta Crystallogr. Struct. Biol 73, 223–233 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Strynadka NCJ & James MNG Towards an understanding of the effects of calcium on protein structure and function. Curr. Opin. Struct. Biol 1, 905–914 (1991). [Google Scholar]
- 41.Boraston AB et al. A structural and functional analysis of α-glucan recognition by family 25 and 26 carbohydrate-binding modules reveals a conserved mode of starch recognition. J. Biol. Chem 281, 587–598 (2006). [DOI] [PubMed] [Google Scholar]
- 42.Holm L DALI and the persistence of protein shape. Protein Sci 29, 128–140 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Cockburn DW et al. Novel carbohydrate binding modules in the surface anchored α-amylase of Eubacterium rectale provide a molecular rationale for the range of starches used by this organism in the human gut. Mol. Microbiol 107, 249–264 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Rodriguez-Sanoja R et al. A single residue mutation abolishes attachment of the CBM26 starch-binding domain from Lactobacillus amylovorus α-amylase. J. Ind. Microbiol. Biotechnol 36, 341–346 (2009). [DOI] [PubMed] [Google Scholar]
- 45.Guillen D et al. Alpha-amylase starch binding domains: cooperative effects of binding to starch granules of multiple tandemly arranged domains. Appl. Environ. Microbiol 73, 3833–3837 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Abbott DW & Boraston AB Quantitative approaches to the analysis of carbohydrate-binding module function. Methods Enzymol 510, 211–231 (2012). [DOI] [PubMed] [Google Scholar]
- 47.Imberty A et al. The double-helical nature of the crystalline part of A-starch. J. Mol. Biol 201, 365–378 (1988). [DOI] [PubMed] [Google Scholar]
- 48.Imberty A & Perez S A revisit to the three-dimensional structure of B-type starch. Biopolymers 27, 1205–1221 (1988). [Google Scholar]
- 49.O’Sullivan AC & Perez S The relationship between internal chain length of amylopectin and crystallinity in starch. Biopolymers 50, 381–390 (1999). [DOI] [PubMed] [Google Scholar]
- 50.Gessler K et al. V-Amylose at atomic resolution: X-ray structure of a cycloamylose with 26 glucose residues (cyclomaltohexaicosaose). Proc. Natl Acad. Sci. USA 96, 4246–4251 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Ashkenazy H et al. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res 44, W344–W350 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Ashkenazy H et al. ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res 38, W529–W533 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Celniker G et al. ConSurf: using evolutionary data to raise testable hypotheses about protein function. Isr. J. Chem 53, 199–206 (2013). [Google Scholar]
- 54.Soper MT et al. Amyloid-β/neuropeptide interactions assessed by ion mobility-mass spectrometry. Phys. Chem. Chem. Phys 15, 8952–8961 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Hizukuri S Relationship between the distribution of the chain length of amylopectin and the crystalline structure of starch granules. Carbohydr. Res 141, 295–306 (1985). [Google Scholar]
- 56.Cerqueira FM et al. Starch digestion by gut bacteria: crowdsourcing for carbs. Trends Microbiol 28, 95–108 (2020). [DOI] [PubMed] [Google Scholar]
- 57.Jung DH et al. The presence of resistant starch-degrading amylases in Bifidobacterium adolescentis of the human gut. Int. J. Biol. Macromol 161, 389–397 (2020). [DOI] [PubMed] [Google Scholar]
- 58.Rees DA & Welsh EJ Secondary and tertiary structure of polysaccharides in solutions and gels. Angew. Chem 16, 214–224 (1977). [Google Scholar]
- 59.Tauzin AS et al. Molecular dissection of xyloglucan recognition in a prominent human gut symbiont. mBio 7, e02134–15 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Hillmann G Measurement by end-point determination on paper, in Methods of enzymatic analysis 2nd edn (ed. Bergmeyer HU) (Academic Press, 1974). [Google Scholar]
- 61.Schneider CA, Rasband WS & Eliceiri KW NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Turnbull WB & Daranas AH On the value of c: can low affinity systems be studied by isothermal titration calorimetry? J. Am. Chem. Soc 125, 14859–14866 (2003). [DOI] [PubMed] [Google Scholar]
- 63.Vonrhein C et al. Data processing and analysis with the autoPROC toolbox. Acta Crystallogr 67, 293–302 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Kabsch W Xds. Acta Crystallogr. Biol. Crystallogr 66, 125–132 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Evans PR & Murshudov GN How good are my data and what is the resolution? Acta Crystallogr. Biol. Crystallogr 69, 1204–1214 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.El Omari K et al. Pushing the limits of sulfur SAD phasing: de novo structure solution of the N-terminal domain of the ectodomain of HCV E1. Acta Crystallogr. Biol. Crystallogr 708, 2197–2203 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Terwilliger TC et al. Decision-making in structure solution using Bayesian estimates of map quality: the PHENIX AutoSol wizard. Acta Crystallogr 65, 582–601 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.McCoy AJ et al. Phaser crystallographic software. J. Appl. Crystallogr 40, 658–674 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Emsley P & Cowtan K Coot: model-building tools for molecular graphics. Acta Crystallogr. Biol. Crystallogr 60, 2126–2132 (2004). [DOI] [PubMed] [Google Scholar]
- 70.Afonine PV et al. Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallogr. Biol. Crystallogr 68, 352–367 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Zheng H et al. Validation of metal-binding sites in macromolecular structures with the CheckMyMetal web server. Nat. Protoc 9, 156–170 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Agirre J et al. Privateer: software for the conformational validation of carbohydrate structures. Nat. Struct. Mol. Biol 22, 833–834 (2015). [DOI] [PubMed] [Google Scholar]
- 73.Kirby N et al. Improved radiation dose efficiency in solution SAXS using a sheath flow sample environment. Acta Crystallogr. Struct. Biol 72, 1254–1266 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Hopkins JB, Gillilan RE & Skou S BioXTAS RAW: improvements to a free open-source program for small-angle X-ray scattering data reduction and analysis. J. Appl. Crystallogr 50, 1545–1553 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Rambo RP & Tainer JA Accurate assessment of mass, models and resolution by small-angle scattering. Nature 496, 477–481 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Piiadov V et al. SAXSMoW 2.0: online calculator of the molecular weight of proteins in dilute solution from experimental SAXS data measured on a relative scale. Protein Sci 28, 454–463 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Franke D, Jeffries CM & Svergun DI Machine learning methods for X-ray scattering data analysis from biomacromolecular solutions. Biophys. J 114, 2485–2492 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Svergun D Determination of the regularization parameter in indirect-transform methods using perceptual criteria. J. Appl. Crystallogr 25, 495–503 (1992). [Google Scholar]
- 79.Manalastas-Cantos K et al. ATSAS 3.0: expanded functionality and new tools for small-angle scattering data analysis. J. Appl. Crystallogr 54, 343–355 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Schneidman-Duhovny D et al. FoXS, FoXSDock and MultiFoXS: single-state and multi-state structural modeling of proteins and their complexes based on SAXS profiles. Nucleic Acids Res 44, W424–W429 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Murphy RD et al. The Toxoplasma glucan phosphatase TgLaforin utilizes a distinct functional mechanism that can be exploited by therapeutic inhibitors. J. Biol. Chem 298, 102089 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.van de Waterbeemd M et al. High-fidelity mass analysis unveils heterogeneity in intact ribosomal particles. Nat. Methods 14, 283–286 (2017). [DOI] [PubMed] [Google Scholar]
- 83.Marty MT et al. Bayesian deconvolution of mass and ion mobility spectra: from binary interactions to polydisperse ensembles. Anal. Chem 87, 4370–4376 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Gulbakan B et al. Native electrospray ionization mass spectrometry reveals multiple facets of aptamer–ligand interactions: from mechanism to binding constants. J. Am. Chem. Soc 140, 7486–7497 (2018). [DOI] [PubMed] [Google Scholar]
- 85.Wang W, Kitova EN & Klassen JS Influence of solution and gas phase processes on protein–carbohydrate binding affinities determined by nanoelectrospray Fourier transform ion cyclotron resonance mass spectrometry. Anal. Chem 75, 4945–4955 (2003). [DOI] [PubMed] [Google Scholar]
- 86.Báez Bolivar EG et al. Submicron emitters enable reliable quantification of weak protein–glycan interactions by ESI–MS. Anal. Chem 93, 4231–4239 (2021). [DOI] [PubMed] [Google Scholar]
- 87.Loos M et al. Accelerated isotope fine structure calculation using pruned transition trees. Anal. Chem 87, 5738–5744 (2015). [DOI] [PubMed] [Google Scholar]
- 88.Drula E et al. The carbohydrate-active enzyme database: functions and literature. Nucleic Acids Res 50, D571–D577 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Valk V et al. Degradation of granular starch by the bacterium Microbacterium aurum Strain B8.A involves a modular α-amylase enzyme system with FNIII and CBM25 domains. Appl. Environ. Microbiol 81, 6610–6620 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Altschul SF et al. Basic local alignment search tool. J. Mol. Biol 215, 403–410 (1990). [DOI] [PubMed] [Google Scholar]
- 91.Sayers EW et al. GenBank. Nucleic Acids Res 49, D92–D96 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Consortium UniProt. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 49, D480–D489 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Sievers F et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol 7, 539 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Whelan S & Goldman N A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol 18, 691–699 (2001). [DOI] [PubMed] [Google Scholar]
- 95.Felsenstein J Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985). [DOI] [PubMed] [Google Scholar]
- 96.Kumar S et al. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol 35, 1547–1549 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Letunic I & Bork P Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23, 127–128 (2007). [DOI] [PubMed] [Google Scholar]
- 98.Kikhney AG et al. SASBDB: towards an automatically curated and validated repository for biological scattering data. Protein Sci 29, 66–75 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The X-ray structures and diffraction data reported in this paper have been deposited in the Protein Data Bank under accession codes 7UWU, 7UWV and 7UWW. The SAXS data are deposited in the small angle X-ray scattering database (SASDB) under accession code SASDPE2 (ref. 98). CBM74 sequences were pulled from the CAZy database (http://www.cazy.org; CAZy update, March 2022) and via BLAST against GenBank (https://www.ncbi.nlm.nih.gov/genbank) and/or UniProt (https://www.uniprot.org) databases in March 2022. Native MS data are publicly available in the Deep Blue Data Repository administered by the University of Michigan at https://doi.org/10.7302/5fmh-8f87. The HDX–MS data are publicly available in the Zenodo database under accession number 8371163 (https://zenodo.org/record/8371163). Source data and Supplementary Data files are provided with this paper. All other relevant data supporting the key findings of this study are available within the article, its Supplementary Information or from the corresponding authors upon reasonable request. Source data are provided with this paper.


