Abstract
Carboxysomes are bacterial microcompartments that function as the centerpiece of the bacterial CO2-concentrating mechanism by facilitating high CO2 concentrations near the carboxylase Rubisco. The carboxysome self-assembles from thousands of individual proteins into icosahedral-like particles with a dense enzyme cargo encapsulated within a proteinaceous shell. In the case of the α-carboxysome, there is little molecular insight into protein-protein interactions that drive the assembly process. Here, studies on the α-carboxysome from Halothiobacillus neapolitanus demonstrate that Rubisco interacts with the N-terminus of CsoS2, a multivalent, intrinsically disordered protein. X-ray structural analysis of the CsoS2 interaction motif bound to Rubisco reveals a series of conserved electrostatic interactions that are only made with properly assembled hexadecameric Rubisco. Although biophysical measurements indicate this single interaction is weak, its implicit multivalency induces high-affinity binding through avidity. Taken together, our results indicate CsoS2 acts as an interaction hub to condense Rubisco and enable efficient α-carboxysome formation.
Introduction
Many carbon-assimilating bacteria possess CO2-concentrating mechanisms (CCMs) to facilitate carbon fixation by the enzyme Rubisco.1 The centerpiece of the CCM is the carboxysome, a large protein complex that encapsulates Rubisco and carbonic anhydrase and is thought to produce locally high concentrations of CO2.2,3 The carboxysome is a large (100–400 nm diameter) and composite (~10 different protomers) structure comprising both a virus-like protein shell and cargo enzymes.4–6 Moreover, carboxysome formation requires thousands of individual proteins to accurately self-assemble.7–9 How this mesoscopic complex, with linear dimensions roughly ten-fold larger than any of its individual components, assembles with high structural and compositional fidelity remains unknown.
Carboxysomes occur in two distinct evolutionary lineages, α and β, that are functionally and morphologically similar.4,10,11 Both enclose a dense enzymatic cargo of Rubisco (a complex of eight large and eight small subunits termed CbbLS or RbcLS in the α and β lineages, respectively) and carbonic anhydrase inside the icosahedral shell composed of hexameric and pentameric proteins. One or more scaffolding proteins serve as interaction hubs, mediating the associations among the various components.4
Although the α-carboxysome was the first to be identified and characterized,12 the β-carboxysome assembly process is better understood. Two proteins, CcmM and CcmN, act in tandem as the scaffold to mediate a hierarchical set of interactions bridging shell with cargo.4,13 An amphipathic encapsulation peptide on CcmN anchors to CcmK, a hexameric shell protein.14 CcmN also binds to CcmM, a scaffolding protein with a γ-carbonic anhydrase domain that also contains three to five tandem repeats of a Rubisco small subunit like (SSUL) module separated by disordered linkers. SSUL repeats then interact with Rubisco.15–18 Contrary to expectations based on sequence homology, SSULs do not displace the Rubisco small subunit but bind across the interface of two RbcL2 dimers and a small subunit.17
The assembly of α-carboxysomes—the predominant form among oceanic cyanobacteria and autotrophic proteobacteria—is, to date, more opaque. One unique component of the α-carboxysome is CsoS2, a large (~900 residues) intrinsically disordered protein (IDP), which, unlike CcmM or CcmN, contains no recognizable domains.19,20 CsoS2 is indispensable for carboxysome assembly and thus is hypothesized to be a potential scaffolding protein. Knock-outs in the α-carboxysome model organism Halothiobacillus neapolitanus produce high CO2-requiring phenotypes and result in no observable carboxysomes.19,21 Pulldown and native agarose gel-shift assays using purified protein have demonstrated that CsoS2 interacts with both Rubisco and CsoS1 hexameric shell proteins.19,22–24 The specific sites of interaction, however, have not been definitively determined nor is it clear how they collectively give rise to robust assembly.
Here, we show that a repeated peptide motif in the N-terminal domain of CsoS2 interacts with Rubisco to facilitate encapsulation into the carboxysome. Using a fusion of this peptide with Rubisco we obtained a structure of the binding site, which revealed a predominantly electrostatic interaction interface mediated by highly conserved residues. This binding site lies at a conjunction of Rubisco subunits uniquely present in the complete CbbL8S8 oligomer, thus ensuring the encapsulation of only the functional holoenzyme. Energetic characterization indicated that the individual peptide/Rubisco interaction is very weak and relies on the engagement of multiple binding sites to increase its interaction strength. Bioinformatic analysis and expression of CsoS2-truncated heterologous carboxysomes implicate the multivalency of this interaction as an essential feature of the assembly process. Our data suggest that CsoS2 acts as a protein interaction hub that gathers Rubisco to nascent carboxysome shell facets through branching low-affinity interactions that collectively give rise to efficient and robust cargo accumulation.
Results
CsoS2 interacts with Rubisco
We and others have demonstrated the essentiality of CsoS2 to α-carboxysome formation.19,21 This fact, in combination with CsoS2’s unique sequence characteristics,20 led us to consider whether it is the scaffolding protein driving assembly of the α-carboxysome. CsoS2 is a repetitive IDP.19,25 It can be divided into three major domains, the N-terminal domain (NTD), Middle region (MR), and C-terminal domain (CTD), based on sequence self-similarity of the repeated motifs contained therein.19 The full protein has a high disorder score prediction throughout26–28 and is only predicted to possess secondary structure within the repeats of the NTD (hereafter generically referred to as the ‘N-peptide’ or specifically by numbers, e.g. N1 through N4; Fig. 1a).29 Circular dichroism (CD) spectra indicated that only the NTD has α-helical content (Fig. 1b). However, the repeat sequences in the NTD do not necessarily coincide with regions of greater predicted order. It is thus possible that the N-peptides are in dynamic equilibrium between helical and unstructured conformations.
Figure 1.
a, Repeat structure of H. neapolitanus CsoS2 with secondary structure prediction by JPred29 and disorder scores from PONDR-FIT, DISOPRED3, and MFDp2.26–28 “Frameshift location” indicates the site of a programmed ribosomal frameshift, which results in expression of about 50% prematurely truncated protein (CsoS2A) and 50% full-length protein (CsoS2B).21 b, Circular dichroism spectra of each of the CsoS2 domains. c, Native agarose gel of Rubisco, CsoS2, and their mixture. Uncropped image is shown as Source Data. d, Negative stain TEM micrographs of purified Rubisco, CsoS2, or the aggregates observed when mixed. All three images are at the same scale; scale bar is 100 nm.
Rubisco and CsoS2 together constitute a significant fraction of the cargo mass in purified carboxysomes and have complementary isoelectric points (5.9 and 9.1, respectively) suggesting a possible electrostatic association.5 We therefore tested whether these two proteins physically interact via native agarose gel shift assays. As hypothesized, the combination of Rubisco and CsoS2 shows a distinct shift from either individual component (Fig. 1c). This result pointed toward a direct interaction between CsoS2 and Rubisco and corroborated prior evidence.19 Furthermore, we observed dense aggregates of CsoS2 and Rubisco by transmission electron microscopy (TEM) when the two proteins were co-incubated (Fig. 1d).
Repeated NTD motif binds Rubisco with low affinity
We next sought to identify the specific element of CsoS2 capable of interacting with Rubisco. This was carried out using bio-layer interferometry (BLI)—a label-free optical technique that monitors recruitment of a “prey” protein by a surface-immobilized “bait.”30 BLI analysis on CsoS2 and its various fragments revealed that binding activity resided in the NTD (Fig. 2a). IDPs often interact with their targets through short linear motifs31,32 and further experiments demonstrated that each of the N-peptides (N1-N4) individually showed Rubisco binding activity (Extended Data Fig. 1). For further analysis we designed a single peptide consensus sequence, which we term N* (with sequence GRDLARARREALSQQGKAAV), that was fused to a polyproline II helical sequence to limit surface effects. This construct bound Rubisco with high affinity. A randomized sequence of N* (GRRKGLRAAGRALQVEQADSRA) did not bind (Fig. 2b), nor did any of the other conserved peptides from the MR or CTD (Extended Data Fig. 2), suggesting that the interaction was indeed sequence specific and not, for example, due to generic charge-charge attraction.
Figure 2.
a, Bio-layer interferometry (BLI) Rubisco binding response normalized to the bait loading signal for full-length CsoS2 and each of the domains. b, Top, schematic of the BLI sensor surface with the N*-peptide displayed on an extended polyproline II helix (N*-polyPro) as the bait and Rubisco as the prey species. Bottom, BLI response shows active binding of Rubisco by N* but not by a scrambled version. c, Weblogo conservation of the N-peptide motif calculated by MEME46 from 231 CsoS2 sequences that contained 901 N-peptide occurrences. Protein sequences are available as source data. d, Microscale thermophoresis (MST) binding isotherm with the first two H. neapolitanus CsoS2 N-peptide repeats fused to GFP, [N1-N2]-GFP, as the target and Rubisco as the ligand. The abscissa represents the concentration of binding sites for [N1-N2]-GFP, i.e. four per Rubisco. Data points are means and s.d. of measurements performed in triplicate. 95% confidence interval (CI95) estimated by bootstrap analysis. e, Standard free energies of binding for the reaction in (d) calculated from binding isotherms at 20, 60, and 160 mM NaCl. Solid dark blue lines are measured for [N1-N2]-GFP with light blue spanning the 95% confidence interval. The pink regions bounded by red lines give the estimated energy range for [N1]-GFP binding to individual Rubisco sites. The lower limit is derived from the maximum concentration for which no binding was clearly detected while the upper limit was derived from the [N1-N2]-GFP binding free energy (details are provided in Supplementary Note 1 on approximating the monomeric binding energy). At 160 mM NaCl, no [N1-N2]-GFP binding could be detected and the dashed lines with arrows indicate lower limits of the KD. Data for panels e and f available as Source Data.
The interaction appeared to be driven by a specific sequence of positively charged residues. We analyzed a set of 231 CsoS2 sequences from α-cyanobacteria and proteobacteria with α-carboxysomes to identify the pan-species consensus N-peptide motif (Fig. 2c), recapitulating previous results.19 Notably, among the most highly conserved positions in the N-peptide motif are basic residues at positions 3, 9, 10, and 18, implying that the interaction likely has significant ionic character. R to A mutations were made for positions 3 and 10 in all of the four repeats in the NTD and entirely eliminated the binding in BLI (Extended Data Fig. 3). Furthermore, a retrospective statistical examination of CsoS2 peptide array binding data from Cai et al.19 revealed a significant enrichment of Rubisco binding to peptides matching the N-peptide arginine motif (Extended Data Fig. 4).
In principle, the binding energy between Rubisco and the N-peptide should be calculable from fitting the association and dissociation kinetics. However, due to the inherently high valency of the CbbL8S8 Rubisco complex and the surface-induced avidity of neighboring bait proteins, it was difficult to obtain reliable fits to a simple binding model (see Supplementary Note 1 and Supplementary Fig. 1). For this reason, the solution-phase technique microscale thermophoresis (MST) was used to measure binding in an alternative fashion. Unexpectedly, while the implied dissociation constants (KD’s) from BLI were in the tens of nM regime, MST revealed no apparent binding under the same conditions (pH 7.5, 150 mM NaCl) (e.g. Extended Data Fig. 5a). Decreasing the salt to 20 mM NaCl, however, resulted in robust binding of a tandem N-peptide-GFP species, [N1-N2]-GFP, to Rubisco with a KD of 75 nM on a stoichiometric binding site basis (i.e. one [N1-N2]-GFP binding to two of eight sites per Rubisco) (Fig. 2d).
MST indicated the N-peptide/Rubisco interaction is highly sensitive to salt concentration. Increasing NaCl from 20 mM to 60 mM showed a substantial increase in the KD from 75 nM to 500 nM (Fig. 2e). Further increasing NaCl to 160 mM—near physiological ionic strength33 —weakened the binding beyond detection (Extended Data Fig. 5a).
The valency of the binding interaction dramatically influences its strength. MST measurements on a single N-peptide-GFP, [N1]-GFP, at 20 mM NaCl revealed no discernable binding up to CbbLS concentrations as high as 100 μM (Extended Data Fig. 5b). The limits of the monomeric binding free energy can, nonetheless, be roughly approximated at the lower bound using the fact that its KD should exceed the maximum CbbLS concentration for which [N1]-GFP binding was not observed, and, at the upper bound, extrapolating from the measured binding free energy of [N1-N2]-GFP using several thermodynamic assumptions and approximations described in detail in Supplementary Note 1. Thus, we estimate the monomeric binding constant at 20 mM NaCl to be in the range and weaker still at higher salt concentrations (Fig. 2e).
Taken together, these data present two puzzling observations. First, the individual N-peptide/Rubisco interaction alone appears too weak to drive carboxysome cargo encapsulation, particularly when approaching realistic intracellular ionic strength. Second, the relatively tight binding of Rubisco by a single N-peptide construct at 150 mM NaCl on BLI stands in apparent contradiction to the negative binding results obtained from MST under similar conditions. A mechanistic reconciliation of these issues is presented in the Discussion.
Structural determination of the N-peptide/Rubisco complex
We next sought to obtain a structure of the N-peptide/Rubisco complex in order to locate the binding sites and to establish the nature of the specific molecular contacts. The NTD is largely disordered and its four N-peptide repeats could, in principle, adopt heterogeneous arrangements among the eight Rubisco binding sites. Furthermore, the binding of a single N-peptide is weak and salt sensitive. Disorder, structural heterogeneity, and partial occupancy therefore all pose significant challenges for co-crystallization. To circumvent these problems, we fused the N* consensus peptide to the C-terminus of the Rubisco large subunit (CbbL) via a short linker, -SS-, (Fig. 3a) to insure high local concentrations and saturation of all putative binding sites. This fusion protein was readily expressed, purified and was confirmed by size exclusion chromatography to be of the correct CbbL8S8 oligomerization state (Extended Data Fig. 6a). BLI measurements revealed no significant interaction of the Rubisco-N* fusion (prey) to surface N*-peptide (bait) suggesting that Rubisco-N* self-passivates its binding site (Extended Data Fig. 6b,c).
Figure 3.
a, Schematic of the Rubisco-N* fusion construct and side and top views of a surface representation of the CbbL8S8 biological assembly with bound N*-peptide. CbbL and CbbS are the large and small Rubisco subunits, respectively. The molecular symmetry axes are indicated by white arrows. The yellow and orange CbbLs are identical; the coloring is to highlight the CbbL2 dimer units. The red dot is at the last structured residue of CbbL, while the dashed white line indicates the probable linkage to N*. b, Zoomed view of binding site with 2FO-FC map at σ = 1.0 carved within 1.6 Å of N*. Specific regions of focus are shown as dashed outlines with the subpanel label. c-f, Molecular interactions of each of the five highly conserved residues of the N-peptide motif: R9, R10, G17/ K18, and R3. Salt bridges, cation-π interactions, and select hydrogen bonds are specifically highlighted. R3 (f) had significant conformational heterogeneity among the four N*-peptides in the asymmetric unit, with relatively weak density pointing to a triad of salt bridge partners. Displayed N* atoms are from chain E for all panels in magenta and the other chains in (f) are shown as dull pink. The specific interactions were characterized with the PDBePISA47 and CaPTURE35 web servers. g, Rubisco sequence comparison at the N*-peptide interaction site. The Weblogo conservation sequence is from 231 α-carboxysomal Form IA Rubiscos. Two specific representatives, H. neapolitanus (used in this study) and Prochlorococcus marinus MIT 9313, are shown. Below are various outgroup Form I Rubiscos and the H. neapolitanus Form II Rubisco. Participation in carboxysomes (α or β) is indicated along the right of the table. Note that the residues are non-sequential and are numbered according to the H. neapolitanus sequence. Protein sequences are available as Supplementary Data 1–3.
After screening and optimization of crystallization conditions, diffraction quality crystals were obtained (Table 1). X-ray diffraction data were collected and a 2.4 Å resolution structure was solved by molecular replacement using an existing model from Kerfeld and Yeates of H. neapolitanus Rubisco (PDB: 1SVD). The space group was C2 with four CbbL-N* and CbbS chains in the asymmetric unit. The Rubisco structure itself was essentially indistinguishable from wild-type with an average Cα RMSD of 0.27 Å. Clear unmodeled electron density was observed along the groove at the interface between two CbbL subunits (spanning separate CbbL2 dimers) and a CbbS subunit (Fig. 3a, Extended Data Fig. 7a) and was well-separated from non-physiological crystal contacts. The N*-peptide was found to adopt a helical conformation and an all-atom model was manually built into the experimental density, which was sufficiently clear for unambiguous assignment of both the peptide direction and sequence registration. Following several rounds of refinement, the real-space cross-correlation for the modeled portion of N* (res. 2–19, Fig. 2c) was 90% or greater for each of the four N*-peptides in the asymmetric unit (Fig. 3b, Extended Data Fig. 7b). All of the binding sites are occupied, indicating that the neighboring sites are not mutually occluding. Thus, the CbbL8S8 biological assembly likely possesses eight possible CsoS2 interaction sites.
Table 1.
Data collection and refinement statistics (molecular replacement)
| H. neapolitanus CbbL-N*, CbbS (PDB 6UEW) | |
|---|---|
| Data collectiona | |
| Space group | C 2 |
| Cell dimensions | |
| a, b, c (Å) | 171.83, 153.95, 108.06 |
| α, β, γ (°) | 90, 124.70, 90 |
| Resolution (Å) | 104.1 – 2.4 (2.486 – 2.4)b |
| Rmerge | 0.1244 (0.5876) |
| I / σI | 12.84 (2.97) |
| Completeness (%) | 99.90 (99.92) |
| Redundancy | 6.9 (6.4) |
| Refinement | |
| Resolution (Å) | 104.1 – 2.4 (2.486 – 2.4) |
| No. reflections | 89949 |
| Rwork / Rfree | 0.188 / 0.259 |
| No. atoms | 18304 |
| Protein | 17636 |
| Ligand/ion | 0 |
| Water | 668 |
| B-factors | 41.80 |
| Protein | 41.97 |
| Ligand/ion | n/a |
| Water | 37.21 |
| R.m.s. deviations | |
| Bond lengths (Å) | 0.008 |
| Bond angles (°) | 1.20 |
Data set is for a single crystal.
parentheses are for the highest-resolution shell.
The structure of the bound N*-peptide is largely α-helical, consistent with the secondary structure predictions and CD data (Fig. 1a,b). The last clearly structured residue of CbbL is at position 455, which is typical of structures of non-activated Form I Rubisco.34 The remainder of the CbbL C-terminus and the -SS- linker preceding N* are not observed in the electron density. Although lack of density complicates the assignment of N*/CbbL pairings, the structured portion of N* begins near CbbL helix 6 and the fusion thus likely originates from the C-terminus of this same subunit. This also agrees with previous structural models of other Rubiscos, in which the C-terminus extends over the so-called loop 6 in the same direction as the N* binding site (Fig 3a, dashed white).34 From there, the N* helix makes contacts with CbbS, spans the boundary to the neighboring CbbL2 dimer, and finishes by breaking out of the helix at the N-terminal domain of the second CbbL. A noteworthy quality of the N*/Rubisco binding site is that, by contacting both CbbL and CbbS and bridging the CbbL2 dimer interface, it exists only on the CbbL8S8 Rubisco holoenzyme. This fact implies that only fully assembled Rubisco would be admitted into the carboxysome.
Each one of the highly conserved N* motif residues (Fig. 2c) is observed to make binding contacts along the Rubisco interface. R9 forms a salt-bridge with CbbL D360 and cation-π interaction with F346 (Fig. 3c). R10 has a salt-bridge to CbbL D69 and dual cation-π interactions with CbbL Y72 and CbbS Y96 (Fig. 3d). G17 appears to play a critical role in breaking the N* helix by facilitating backbone hydrogen bonds with CbbL and adopting glycine-specific ψ-φ angles. K18 makes a salt bridge with CbbL D26 (Fig. 3e). R3 adopts multiple sidechain conformations among the four N*-peptides in the asymmetric unit in which it variously forms salt bridges with CbbS 94, CbbL 344, or N* D4 (Fig. 3f, Extended Data Fig. 7c–f). All together the interactions are predominantly ionic and offer a structural explanation as to the energetic sensitivity to salt.
Amino acid residues involved in these electrostatic interactions are conserved for α-carboxysomal Form IA Rubisco. However, these residues were, in general, not conserved among an outgroup of various other Form I Rubiscos and the H. neapolitanus Form II Rubisco (Fig. 3g). To assay if these evolutionary observations are significant, two binding site mutants were made to test disruption of the binding interface. In one, each of the cation-π aromatics was mutated to alanine (CbbL Y72A, F346A; CbbS Y96A). In the other, a mutation was selected to resemble the β-carboxysomal Rubisco and to perturb the binding environment of N* R10 (CbbL Y72R). Neither mutant interacted with N* (Extended Data Fig. 3c,d).
Structural comparison to CcmM/Rubisco
The general binding site of N*/Rubisco significantly overlaps with that of the recently determined CcmM/Rubisco interaction from the β-carboxysome but the specific molecular details are distinct.17 While CcmM binds with multiple regions across the SSUL domain, N* has a smaller footprint as a single α-helix (Extended Data Fig. 8). In both cases, salt bridges—with the positive charge contributed by the scaffolding protein—are key parts of the interactions. A notable feature of the N*/Rubisco interaction, but absent in CcmM, are the prominent cation-π interactions.35 The complete conservation of the aromatics in the Rubisco binding site and the lack of binding when mutated to alanines suggest that the cation-π interactions indeed contribute meaningfully to the binding energy and specificity. Interestingly, cation-π contacts are a particularly common interaction modality among IDPs involved in protein liquid-liquid phase separation.36 See Supplementary Note 2 for additional detail on the structural comparison between CsoS2/Rubisco and CcmM/Rubisco.
Hydrogen/deuterium exchange of carboxysomal versus purified Rubisco
To interrogate the CsoS2/Rubisco interaction in a native context, hydrogen/deuterium exchange (HDX) mass spectrometry experiments were performed in order to identify regions of Rubisco possessing differential protection when encapsulated within carboxysomes. HDX analysis of purified Rubisco versus carboxysomal Rubisco revealed a majority of peptides had nearly identical HDX rates (Extended Data Fig. 9). The lack of an obvious HDX protection footprint of the N-peptide binding site may point toward a dynamic and fluid nature of the carboxysome interior. Since the individual N-peptide binding is so weak, it is plausible that within the carboxysome the molecular interactions are transitory and rapidly exchanged such that they leave little imprint on the HDX rates. What differences do exist may reflect altered Rubisco structural dynamics or, potentially, additional longer-lived interaction partners.
Effect of N-peptide multivalency on carboxysome formation
We set out to determine the importance of the number of N-peptide repeats on carboxysome assembly. H. neapolitanus CsoS2 contains four copies of the repeat but there is likely significant natural diversity. To this end, the consensus motif was used to quantify occurrences throughout the set of 231 CsoS2 genes.37 Every sequence contained at least two copies of the motif (Fig. 4b) suggesting that a valency greater than one may be a general requirement for carboxysome assembly. Using a previously developed method whereby carboxysomes are produced heterologously in E. coli by expressing the known genes from a single plasmid (pHnCB10),38 we tested the effect of N-peptide repeat number on carboxysome formation. A series of pHnCB10 constructs were made possessing CsoS2 variants with a decreasing number of N-peptide repeats and tested for carboxysome expression. Only CsoS2 variants with two or more repeats were capable of forming carboxysomes (Fig. 4a and Supplementary Fig. 2) consistent with the bioinformatic result.
Figure 4.
a, Truncated CsoS2 proteins with variable numbers of N-peptide repeats and TEM images of the resulting carboxysomes if any were formed. Both images are at the same scale. b, Histogram of N-peptide repeat numbers across 231 CsoS2 sequences. c, Merged GFP fluorescence and phase contrast images of protein liquid-liquid droplets formed from a solution of Rubisco and NTD-GFP. d, Microscopic model of the phase separated state. The branching of interactions due to the multivalency of both components provides the liquid cohesion while the relative weakness and exchangeability of the individual interactions confers fluidity.
Phase separation of Rubisco and NTD
IDPs are highly represented in systems that undergo protein liquid-liquid phase separation. The propensity toward phase separation is promoted by weak individual interactions, often salt sensitive, and multivalent association either through well-defined binding sites or via less specific interactions related to the general amino acid composition.39,40 Phase separation has recently emerged as a common theme for the organization of Rubisco into CCM architectures. In the algal pyrenoid, Rubisco phase separates with EPYC1, a repetitive IDP.41–43 From β-carboxysomes the short form of the scaffold protein CcmM, M35,44 was shown to demix with Rubisco into protein liquid droplets.17 We hypothesized that CsoS2 and, in particular, the NTD may similarly demix with Rubisco. Indeed, when Rubisco and NTD-GFP are combined at 1.0 μM each at low salt (20 mM NaCl) the solution became turbid. Imaging by phase contrast and epifluorescence microscopy revealed that round green fluorescent droplets are formed (Fig. 4c) ostensibly with the NTD spanning Rubiscos to mediate condensation (Fig. 4d). The droplets are fully re-dissolved upon salt addition up to 150 mM NaCl, and none are observed with either individual component at the same concentrations (Extended Data Fig. 10).
Discussion
We have characterized in molecular detail the binding interface of Rubisco and CsoS2 that facilitates α-carboxysome cargo encapsulation. CsoS2, as a large IDP, posed a significant challenge for structural determination. Through biophysical binding assays we narrowed down the interaction to a repeated motif within the CsoS2 NTD, fused this fragment directly to Rubisco, and obtained an x-ray crystal structure of the protein-peptide complex. We suggest that this workflow might be a valuable general strategy for determining structures of IDPs interacting with structured proteins since these interactions are often individually weak and transient.
Despite no apparent sequence similarity, the CsoS2/Rubisco binding bears striking parallels to the recently characterized CcmM/Rubisco interaction at the heart of β-carboxysome assembly.17 In both cases the scaffold protein binding element has multiple repeats interspersed by flexible linkers. The binding locations on Rubisco are very similar; both straddle an L2 dimer interface while also making critical contacts with a small subunit. This site is only present in the fully assembled CbbL8S8 Rubisco holoenzyme so Rubisco assembly intermediates, namely CbbL2 and (CbbL2)4, would presumably not be encapsulated prematurely. Notwithstanding this global similarity, the specific structural details of the binding are distinct, making this an intriguing example of convergent evolution.
Another commonality between the α- and β-carboxysome scaffold/Rubisco systems is the propensity to undergo protein liquid-liquid phase separation. Phase separation is increasingly understood to play an organizational role in eukaryotes in the formation of membrane-less organelles.45 These structures and the droplets we observe (Fig. 4c), however, have at least a thousand-fold greater volume than carboxysomes. Furthermore, they are not enclosed within protein shells. Therefore, while suggestive of a dense liquid cargo phase, the role of demixing in the carboxysome assembly process remains unresolved.
The N-peptide/Rubisco interface is comprised chiefly of salt bridges and cation-π interactions. Consequently, the binding energy is highly sensitive to the solution ionic strength. Indeed, our solution phase binding measurements with MST indicate that the interaction dramatically weakens at near-physiological ionic strength, with single site KD’s upwards of 1 mM. Moreover, the phase separated droplets are fully dissolved under the same elevated salt concentrations. In apparent contradiction, however, the BLI measurements under the same conditions indicated strong binding (KD ~ 100 nM).
The essential difference is that BLI is a surface-based technique. Since the “prey” Rubsico has a site valency of eight, it could be simultaneously engaged by multiple “bait” N*-peptides in microscopically dense patches on the surface (see Supplementary Note 1, Comments on BLI). This surface avidity effect enabled tight Rubisco binding even when the individual interactions were very weak. We propose that this artificial surface avidity represents a useful analogy to the early stages of carboxysome assembly. Several experiments have implicated CsoS2 association with the CsoS1 shell hexamer including native gel shifts19 and pulldown assays.22 Furthermore, the CsoS2 C-terminus was found at the shell25 and truncation of the CTD precludes carboxysome formation.21 Through the shell interaction, multiple CsoS2 molecules could be recruited to achieve high local concentration and then bind to Rubisco in a multivalent fashion with high affinity.
Our data have led us to the following speculative model of α-carboxysome assembly: At physiological ionic strength and the likely free concentrations of Rubisco and CsoS2 the interaction is insufficiently strong to drive significant association or demixing (Fig. 5a, point 1). However, in the presence of shell proteins, CsoS2 is gathered to high local concentration via interaction to the nascent shell surface and facilitates phase separation with Rubisco in the immediate vicinity of the shell (Fig. 5a, point 2). Eventually more shells with cargo droplets coalesce until the structure is fully enclosed (Fig. 5b).
Figure 5.
a, Model phase diagram of the hypothesized Rubisco/CsoS2 phase separation driven by the multivalent NTD interaction with Rubisco. The blue region represents the joint concentrations at which demixing occurs. At point 1 the cytosolic concentrations lie within the soluble region and both are fully dissolved. Through interactions with a nascent carboxysome shell, multiple CsoS2s are brought together, thus greatly increasing the concentration locally while the Rubisco concentration remains the same (point 2). This process locally exceeds the phase transition threshold and leads to local phase separation in the immediate vicinity of the shell. b, Model of α-carboxysome assembly in which the specific accumulation of cargo on the shell proceeds via the mechanism described in (a).
A full accounting of the interaction partners and the site binding energetics is alone insufficient to understand the carboxysome assembly process. Multivalency, surface avidity, and protein liquid-liquid phase separation appear to play important roles but their relationships to the shell and the emergent size regularity remain unclear and warrant further investigation. Ultimately a detailed understanding of the principles of carboxysome assembly may be leveraged toward the design of synthetic microcompartments for biotechnological applications.
Methods
Protein expression and purification
All proteins used for biochemical assays contained a terminal affinity tag, either a hexahistidine tag or a Strep-tag II (see SI, Protein Sequences). Each construct was cloned via Golden Gate assembly48 into a pET-14 based destination vector with ColE1 origin, T7 promoter, and carbenicillin resistance (see Supplementary Table 1 for all protein sequences and Supplementary Table 2 for all plasmids use in this study). These were transformed into E. coli BL21-AI expression cells. All Rubisco constructs were also co-transformed with pGro7 for expressing GroEL-GroES to facilitate proper protein folding. Cells were grown at 37°C to OD600 of 0.3–0.5 in 1 L of LB media before lowering the temperature to 18°C, inducing with 0.1% (w/v) L-arabinose, and growing overnight.
Cultures were harvested by centrifugation at 4,000 g and the pellets were frozen and stored at −80°C. The pellets were thawed on ice and resuspended with ~25 mL of lysis buffer (50 mM Tris, 150 mM NaCl, pH 7.5) supplemented with 1 mM phenylmethanesulfonyl fluoride (PMSF), 0.1 mg/mL lysozyme, and 0.01 mg/mL DNaseI. The cells were lysed with three passes through an Avestin EmulsiFlex-C3 homogenizer and clarified by centrifugation at 12,000 g for 30 min. The clarified lysate was then incubated with the appropriate affinity resin for 30 min at 4°C with 2 mL of resin per 1 L of initial culture and transferred to a gravity column. His-tagged proteins were bound to HisPur Ni-NTA resin (Thermo), washed with lysis buffer with 30 mM imidazole, and eluted with lysis buffer with 300 mM imidazole. Strep-II-tagged proteins were bound to Strep-Tactin resin (EMD Millipore), washed with lysis buffer, and eluted with lysis buffer containing 2.5 mM desthiobiotin. All proteins were buffer exchanged to lysis buffer with 10DG Desalting Columns (Bio-Rad). For storage, proteins were made to 10% (w/v) glycerol, flash frozen in liquid nitrogen, and stored at −80°C.
Protein purity was assessed by SDS-PAGE gel analysis. In general, all protein was >90% the desired product. Size exclusion chromatography was performed analytically to confirm purity and aggregation state and, if needed, as a final preparative step.
Native protein agarose gel shift
Gels were prepared with 0.7% agarose (w/v) in native running buffer (25mM Tris, 192mM glycine, pH 8.3). Each protein was loaded to a final concentration of 5 mg/mL. The gels were run for 1 hour at 100 V in native running buffer and stained with GelCode Blue (ThermoFisher).
Bio-layer interferometry
Protein-protein interactions were measured using bio-layer interferometry (BLI) with an Octet RED384 (Forte Bio). The “bait” protein was immobilized on Ni-NTA Dip and Read Biosensors via a terminal His-tag. Typical “bait” concentrations for the sensor loading were 10 μg/mL. The soluble “prey” protein concentrations were varied in the nanomolar to micromolar range. The buffer used for all loading, association/dissociation, and wash steps was 50 mM Tris, 150 mM NaCl, 0.01% (w/v) Triton X-100, pH 7.5. Sensor regeneration of the Ni-NTA was done with 50 mM Tris, 150 mM NaCl, 0.05% (w/v) SDS, 300 mM imidazole, pH 7.5. The typical experimental binding sequence used was: load “bait”, buffer wash, “prey” association, “prey” dissociation in buffer, sensor regeneration, buffer wash. For the experiments testing the binding activity of specific peptides (Fig. 2b and Extended Data Fig. 2), “bait” proteins were designed with a 40 amino acid proline rich region between the His-tag and the peptide. This insertion is expected to adopt an extended polyproline II helix conformation ~10 nm in length 49 and was included to limit possible surface occlusion.
Microscale thermophoresis
Solution protein-protein binding was monitored by microscale thermophoresis (MST) with a Monolith NT.115 (Nanotemper). The target proteins were portions of the CsoS2 NTD fused to Superfolder GFP and used at a concentration of 50 nM. Unlabeled Rubisco was used as the ligand with concentrations varied in two-fold increments from 10 μM (as CbbL8S8) down to 0.3 nM. Experiments were carried out in buffer with 6.7 mM Tris, 0.01% Triton X-100, pH 7.5 and either 20, 60, or 160 mM NaCl. The samples were loaded into MST Premium Coated Capillaries (Nanotemper) and analyzed using 20% blue LED power for fluorescence excitation and Medium infrared laser power for the thermophoresis. Data fitting and bootstrap error estimation was performed using custom scripts in MATLAB (MathWorks).
While the binding free energy of [N1-N2]-GFP to Rubisco could be directly calculated, the monomeric binding reaction (i.e. one N-peptide to one Rubisco site) could not be experimentally determined. We therefore made a number of thermodynamic approximations—detailed in Supplementary Note 1—to provide an estimated range for this binding energy.
Crystallization, x-ray diffraction, and refinement
Initial screening of crystallization conditions for CbbL-N*, CbbS was done using the Hampton Crystal Screen (HR2–110) with protein at 15 mg/mL combined 1:1 with the screen mother liquors. Due to the hypothesized ionic nature of the interaction, screen conditions having lower salt concentrations were prioritized in the follow-up optimization. Ultimately the best crystals were obtained from a mother liquor of 0.2M MgCl2 ● 6H2O, 0.1M HEPES, 30% (v/v) PEG-400. Protein at 15 mg/mL diluted 1:2 with mother liquor was allowed to equilibrate for one day by hanging drop vapor diffusion whereupon it was microseeded with pulverized crystals from more concentrated conditions delivered with a cat whisker.
Crystals were looped and directly frozen on the beamline under a 100K nitrogen jet without additional cryoprotectant. X-ray diffraction was collected with wavelength 1.11 Å on a Pilatus3 S 6M (Dectris) detector with a 50μm beam pinhole at the Advanced Light Source, BL 8.3.1, Berkeley, CA.
The data were indexed and integrated with XDS50 and scaled and merged with AIMLESS.51,52 Molecular replacement was carried out in Phenix using the existing wild-type H. neapolitanus Rubisco structure (PDB ID: 1SVD) as the search model.53,54 Cycles of automatic refinement were performed with Phenix while Coot was used for manual model building.55 The final refined structure backbone conformations were 96.0% Ramachandran favored, 3.8% allowed, and 0.2% outliers.
Carboxysome construct generation and purification
Heterologous expression of carboxysomes in E. coli was performed following the methods of Bonacci et al. using the plasmid pHnCB10, which contains genes encoding all ten of the proteins known to participate in carboxysome formation.38 Golden Gate assembly was used to make the truncations of the CsoS2 NTD shown in Fig. 4a.
Carboxysomes were purified as previously described.21 Briefly, the cells were harvested, resuspended in 25mL TEMB buffer (10 mM Tris, 10 mM MgCl2, 1 mM EDTA, and 20 mM NaHCO3, pH 8.4), lysed with a homogenizer, and the lysate clarified by centrifugation at 12,000 g for 30 min. The supernatant was further centrifuged at 40,000 g for 30min to pellet the carboxysomes. The carboxysome pellet was resuspended in 1x Cellytic B (Sigma-Aldrich) in order to solubilize any residual membrane fragments. The solution was spun a second time at 40,000 g for 30 min to pellet the carboxysomes again. The pellet was resuspended with 3mL of TEMB, clarified with a 5min spin at 3,000 g, and loaded on top of a 25-mL sucrose step gradient (10, 20, 30, 40, and 50% w/v sucrose). This was ultracentrifuged at 105,000 g for 30 min. The solution was fractionated and analyzed by SDS-PAGE. Those fractions containing the expected set of carboxysomal proteins (and which also demonstrated visible Tyndall scattering) were pooled, pelleted by centrifugation for 90min at 105,000 g, resuspended in 1mL of TEMB, and stored at 4°C.
Negative stain TEM
Rubisco, CsoS2, and purified carboxysomes were visualized by negative stain transmission electron microscopy. Formvar/carbon coated copper grids were prepared by glow discharge prior to sample application. The grids were washed with deionized water several times before staining with 2% (w/v) uranyl acetate. Imaging was performed on a JEOL 1200 EX transmission electron microscope.
Hydrogen/deuterium exchange mass spectrometry
Peptide mass fingerprinting from purified Rubisco and carboxysomes was performed using on-column pepsin digestion, followed by reversed-phase HPLC, and tandem mass spectrometry on a Thermo Scientific LTQ Orbitrap Discovery.56,57 For hydrogen exchange, the samples were diluted 1:10 in D2O buffer (50 mM Tris, 150 mM NaCl, pD 7.5) and then aliquots removed and quenched in 500 mM glycine, 2 M guanidinium hydrochloride (GdnHCl), pH 2.0 buffer at log-spaced time intervals from 20 seconds to 48 hours. Samples were immediately frozen in liquid nitrogen upon addition of quenching solution. Deuterated control samples were prepared by 1:10 dilution in D2O, 50 mM Tris, 150 mM NaCl, 6 M GdnHCl, pD 7.5 and quenching with 500 mM glycine, pH 2.0. Samples were thawed, digested on-column as before, and analyzed by LCMS. Data analysis was performed with HDExaminer (Sierra Analytics).
CD spectroscopy
Purified protein was first exchanged into CD buffer (20 mM sodium phosphate and 20 mM sodium sulfate, pH 7.4) to minimize the background absorbance. From this solution, 300 μL was transferred to a 1-mm quartz cell. The sample containing only CD buffer was included as a negative control. Data were collected on a J-815 circular dichroism spectrometer (JASCO). Spectra were collected from 190 to 260 nm in 0.5 nm steps with the scanning speed of 20 nm/min and signal averaging for 1 s for each step. Each sample was measured 3 times and the spectra were averaged. Protein concentrations were determined using 280 nm absorbance and extinction coefficients calculated using ProtParam.
N-peptide motif matching
Cai et al.19 performed a peptide array experiment assaying Rubisco binding to every 8-mer peptide of CsoS2 tiled residue-by-residue with a fluorescence readout. The apparent binding activity was too dispersed throughout the sequence to allow prospective identification of the interaction sequence at that time. With our new biochemical evidence of the binding sequence, we revisited this peptide array dataset (publicly available from the Supplementary Material of Cai et al.) to see whether peptides matching portions of the putative binding motif had enriched Rubisco binding activity. Given the ionic character of the interaction we searched among the full set of 1070 peptides for those with two or more positive residues (n=319) and for the subset of these matching at least two of three arginines in the RxxxxxRR motif from Fig. 2c (n=91). To assess the statistical significance of the motif’s binding enrichment, bootstrap analysis was performed by randomly selecting 91 peptides with replacement from either the full set or the double positive subset, 10,000 times each, and calculating the mean fluorescence (see Extended Data Fig. 4).
Bioinformatics
The CsoS2 secondary structure predictions were made using JPred.29 The disorder score was calculated with PONDR-FIT, DISOPRED3, and MFDp2.26–28
The candidate α-carboxysome-associated CsoS2 sequences were selected from the Integrated Microbial Genomes (IMG) database by searching for the CsoS2 PFAM (PF12288) within 100kb of loci containing the Rubisco large and small subunits (PF00016 and PF00101), α-carboxysomal carbonic anhydrase (PF08936), and bacterial microcompartment shell proteins (PF00936). These sequences (n=231) were aligned with ClustalOmega,58 truncated to include only the NTD (i.e. all sequence before the first MR repeat), and analyzed with MEME 46 to find repeated sequence motifs (Fig. 2c). The Motif Alignment and Search Tool (MAST) 37 was used to locate and count all occurrences of the motif within the full CsoS2 sequences (Fig. 4b).
Reporting Summary
Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.
Data availability
Atomic coordinates and structure factors for the Rubisco / CsoS2 N-peptide fusion have been deposited in the wwPDB with accession code PDB 6UEW. Plasmids for all protein constructs used are available from Addgene. Raw data for all MST experiments in Fig. 2e,f and Extended Data Fig. 5 are available as Source Data. All protein sequences used for the binding motif analysis and for Fig. 2c and Fig. 3g are available as FASTA files in Supplementary Data 1–3.
Extended Data
Extended Data Fig. 1. Rubisco binding by N-peptides and design of consensus N*-peptide.
a, BLI binding activity toward Rubisco for each of the NTD N-peptides fused to GFP. b, N*-peptide and CsoS2 sequence colored and aligned by repeat peptides. The key conserved N-peptide residues are bolded.
Extended Data Fig. 2. BLI of select CsoS2 peptides with Rubisco.
a, Primary sequence of CsoS2 highlighting each of the repeated and/or conserved elements. b, Schematic representation of a set of BLI experiments testing the specificity of the Rubisco - CsoS2 interaction. Each of a series of CsoS2 elements and control sequences was fused to polyproline II helices that were surface immobilized to a Ni-NTA functionalized biosensor surface via an N-terminal hexahistidine tag. c, BLI traces of the constructs from (b) when incubated with 100 nM Rubisco. The trace colors match the dots in (b). Only the N*-peptide demonstrates any specific binding activity.
Extended Data Fig. 3. BLI of NTD and Rubisco mutants.
a, BLI response towards 100 nM Rubisco with bait of either the NTD or the NTD with R3A, R10A mutations made within all four of the N-peptide repeats. Removing these conserved arginines entirely eliminates the binding. b, Size exclusion chromatograms of wild-type H. neapolitanus Rubisco (wtRubisco), a mutant with all cation-π aromatics mutated to alanines (CbbL: Y72A, F346A; CbbS: Y96A), and a salt bridge disrupting mutation (CbbL: Y72A). All species eluted at a volume consistent with the CbbL8S8 structure. c, Each Rubisco species was tested for binding activity by BLI to the polyproline helix / N*-peptide fusion construct, N*-polyPro, (solid lines) and the randomized N*-polyPro negative control (dashed lines). Only the wild-type Rubisco had specific binding activity to N*-polyPro over the randomized N*-peptide control. The aromatic removal mutant (yellow) had some non-specific binding to both baits but showed no preference for the real N*-peptide sequence. d, Differential BLI binding signal of each Rubisco species to N*-polyPro relative to random N*-polyPro. Both Rubisco binding site mutants clearly possess no specific association.
Extended Data Fig. 4. Enrichment of binding motif from existing peptide array data.
Cai et al. performed a fluorescent peptide array experiment assaying the binding of Rubisco to every 8-mer of CsoS2 tiled residue-by-residue and found broadly scattered activity, precluding the specific identification of the interaction sequence. We reexamined this dataset (generously provided in their Supplementary Material) in light of our new biochemical evidence to look for a statistical enrichment of binding activity for those peptides containing two positive residues or, more specifically, containing at least two arginines matching the RxxxxxRR motif. a, Cumulative distributions of Rubisco binding fluorescence response for CsoS2 array peptides including the full dataset (n=1070), those with more than two basic residues (n=319), and those matching the N*-peptide arginine motif (n=91). b, Distributions of bootstrap results. 91 peptides were taken at random (with replacement) from either the full dataset or those with two or more basic residues and the median fluorescence response calculated. 10,000 trials were conducted with each set and none exceeded the motif matching median implying a strong statistical enrichment (p < 10−4).
Extended Data Fig. 5. MST salt dependence and single N-peptide response.
a, MST responses for [N1-N2]-GFP association to Rubisco. The concentration of the target, [N1-N2]-GFP, was 50 nM. The abscissa represents the concentration of effective binding sites and is four times the Rubisco CbbL8S8 concentration since each target will engage two of the eight possible sites. Binding experiments were performed at 20, 60, and 160 mM NaCl. At 20 mM NaCl three replicates were performed across 16 Rubisco concentrations. Black lines indicate the means while the gray whiskers show +/− one standard deviation. At 60 mM NaCl the experiment was performed twice with slightly varying concentrations. At 160 mM NaCl data from one representative experiment is shown. The fits to the 20 mM and 60 mM NaCl data are according to Eq. S2 and represent the mean fit parameters from bootstrap sampling of the data. For 160 mM NaCl no binding could be determined over this concentration range and the dashed orange line is drawn at zero response as a visual guide. b, Comparison between a double N-peptide, [N1-N2]-GFP, and single N-peptide, [N1]-GFP, species by MST. Both had 50 nM target. The Rubisco binding site concentration is specific to the two different targets. For [N1-N2]-GFP it is the concentration of CbbL8S8 multiplied by 4 and for [N1]-GFP it is the concentration of CbbL8S8 multiplied by 8 since the former has four potential binding sites on the Rubisco holoenzyme while the latter has eight. The [N1-N2]-GFP data points are the mean values from (a). The [N1]-GFP data points are from one representative experiment and indicate no conclusive binding over the concentration range. The dashed red line is at zero response as a visual guide.
Extended Data Fig. 6. Rubisco / N*-peptide fusion characterization.
a, Size exclusion analysis of wild-type Rubisco and the N*-peptide fusion construct. Both elute at volumes commensurate with compact CbbL8S8 complexes. A run with the Bio-Rad Gel Filtration Standard is included for comparison. Standard masses are indicated. b, BLI responses of wtRubisco and the N*-peptide fusion Rubisco at 100 nM with N*-polyPro as the surface bait. The fusion showed no binding. c, Proposed cartoon model of differential BLI binding activities. N*-peptide fusion Rubisco is apparently self-passivated by saturating the binding sites from stable association of the fused N*-peptides.
Extended Data Fig. 7. N-peptide electron density and interchain heterogeneity.
a, Views of the electron density at each of the N*-peptide binding sites within the asymmetric unit. The Fo–Fc maps (3.0 σ) are displayed as green (positive) and red (negative) mesh, and the 2Fo–Fc maps (1.0 σ) are shown as semi-transparent blue surfaces. All maps were calculated with the omission of the modeled N*-peptide atoms; the displayed N*-peptide sticks are present simply as a visual guide. b, All of the N*-peptides within the asymmetric unit superposed using only the adjacent Rubisco subunits (i.e. not using the N*-peptide coordinates). 2Fo–Fc maps are shown as mesh contoured at 1.0 σ. c-f, Zoomed in views of each of the conserved interaction sites. While some of the sites, such as N* R9 (c), are highly uniform in conformation and occupancy, others demonstrate a range of possible conformations and/or poor occupancy. N* R3 (f) in particular has a triad of possible salt bridges with CbbL D26, CbbL E344, or N* D4, respectively.
Extended Data Fig. 8.
in semi-transparent green. b, Detailed comparative view of the scaffold/Rubisco interaction interface. The inset table pairs equivalent Rubisco positions from alignment and the dashed lines indicate select specific interactions to the corresponding scaffold element shown with salt bridges in black and cation-π interactions in green. “Hnea” is the α-carboxysomal Form IA Rubisco from Halothiobacillus neapolitanus with CbbL (in orange/yellow) and CbbS (in cyan). The N*-peptide-bound structure is from the current study with PDB ID: 6UEW. “Selon” is the β-carboxysomal Form IB Rubisco from Synechococcus elongatus PCC 7942 with large subunit, RbcL, and small subunit, RbcS, both in grey. The bound small subunit-like repeat, CcmM-SSUL1, is shown in green. The atomic model was determined from cryo-electron microscopy single particle analysis and has PDB ID: 6HBC.
Extended Data Fig. 9. Hydrogen / deuterium exchange of Rubisco inside and outside carboxysomes.
a, The structure displayed contains two CbbLs and two CbbS and shows the CbbL2 dimer interface across which the N*-peptide (in magenta) binds. The Rubisco cartoon is colored according to the differential protection to amide hydrogen exchange. Those residues in blue experience greater protection within purified carboxysomes and those in red experience greater protection as free Rubisco. The comparison between these states was carried out with HDExaminer (Sierra Analytics) using moderate smoothing. Four specific peptides outlined in black highlight some of the diversity of HDX behavior. Most peptides that were observed from both states had essentially identical exchange kinetics as exemplified by the top right subpanel for CbbS: 57–67. Less common were peptides with different exchange profiles between encapsulated and unencapsulated Rubisco. CbbL: 34–44 (lower left subpanel) had slightly more protection in free Rubisco. CbbL: 328–341 (upper left subpanel) and CbbL: 262–267 (lower right subpanel) both had greater protection inside carboxysomes. The lack of a clear HDX protection footprint from the N peptide binding site may point toward a dynamic and liquid carboxysome interior in which the weak N-peptide / Rubisco interactions are rapidly exchanged.
Extended Data Fig. 10. Phase separation microscopy.
Phase contrast and fluorescence images of Rubisco, NTD-GFP, and the mixture (all at 1μM) at 20 mM and 150 mM NaCl. At low salt Rubisco + NTD-GFP demixes into round liquid droplets. NTD-GFP alone at low salt shows a number of small fluorescent puncta (potentially small aggregates) but does not form large droplets. No droplets or aggregates are observed for any sample at high salt. All figures have the same scale with the common 10 μm scalebar in the bottom right image.
Supplementary Material
Acknowledgements
We thank Cecilia Blikstad, Chi-Yun Lin, and Michael Hagan for helpful comments on the manuscript. We also thank Peter Huang for his help with the BLI instrumentation and Cheryl Kerfeld for advice on Rubisco crystallization. Yinon Bar-On assisted us in gathering the CsoS2 sequences. We acknowledge the staff at the UC Berkeley Electron Microscope Laboratory for training and assistance with TEM. George Meigs and James Holton assisted with the x-ray diffraction and we gratefully acknowledge their input. We also thank Noam Prywes for help with enzyme assays. Whiskers for crystal microseeding were kindly gifted by S.T. Kuhl. Beamline 8.3.1 at the Advanced Light Source is operated by the University of California Office of the President, Multicampus Research Programs and Initiatives grant MR-15-328599, the National Institutes of Health (R01 GM124149 and P30 GM124169), Plexxikon Inc. and the Integrated Diffraction Analysis Technologies program of the US Department of Energy Office of Biological and Environmental Research. The work was supported by grants from the U.S. Department of Energy (DE-SC00016240) and the National Institute of General Medical Sciences (R01GM129241) to D.F.S. and a grant from the National Institute of General Medical Sciences (R01GM050945) to S.M.
Footnotes
Competing interests:
D.F.S. is a co-founder of Scribe Therapeutics and a scientific advisory board member of Scribe Therapeutics and Mammoth Biosciences. All other authors declare no competing interests.
References
- 1.Raven JA, Cockell CS & De La Rocha CL The evolution of inorganic carbon concentrating mechanisms in photosynthesis. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 363, 2641–2650 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Mangan NM, Flamholz A, Hood RD, Milo R & Savage DF pH determines the energetic efficiency of the cyanobacterial CO2 concentrating mechanism. Proc. Natl. Acad. Sci. USA 113, E5354–62 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Espie GS & Kimber MS Carboxysomes: cyanobacterial RubisCO comes in small packages. Photosyn Res 109, 7–20 (2011). [DOI] [PubMed] [Google Scholar]
- 4.Rae BD, Long BM, Badger MR & Price GD Functions, compositions, and evolution of the two types of carboxysomes: polyhedral microcompartments that facilitate CO2 fixation in cyanobacteria and some proteobacteria. Microbiol. Mol. Biol. Rev. 77, 357–379 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Heinhorst S, Cannon GC & Shively JM in Complex intracellular structures in prokaryotes (ed. Shively JM) 2, 141–165 (Springer; Berlin Heidelberg, 2006). [Google Scholar]
- 6.Kerfeld CA & Melnicki MR Assembly, function and evolution of cyanobacterial carboxysomes. Curr. Opin. Plant Biol. 31, 66–75 (2016). [DOI] [PubMed] [Google Scholar]
- 7.Tanaka S et al. Atomic-level models of the bacterial carboxysome shell. Science 319, 1083–1086 (2008). [DOI] [PubMed] [Google Scholar]
- 8.Schmid MF et al. Structure of Halothiobacillus neapolitanus carboxysomes by cryo-electron tomography. J. Mol. Biol. 364, 526–535 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Iancu CV et al. The structure of isolated Synechococcus strain WH8102 carboxysomes as revealed by electron cryotomography. J. Mol. Biol. 372, 764–773 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Shih PM et al. Biochemical characterization of predicted Precambrian RuBisCO. Nat. Commun. 7, 10382 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Whitehead L, Long BM, Price GD & Badger MR Comparing the in vivo function of α-carboxysomes and β-carboxysomes in two model cyanobacteria. Plant Physiol. 165, 398–411 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Shively JM, Ball F, Brown DH & Saunders RE Functional organelles in prokaryotes: polyhedral inclusions (carboxysomes) of Thiobacillus neapolitanus. Science 182, 584–586 (1973). [DOI] [PubMed] [Google Scholar]
- 13.Cameron JC, Wilson SC, Bernstein SL & Kerfeld CA Biogenesis of a bacterial organelle: the carboxysome assembly pathway. Cell 155, 1131–1140 (2013). [DOI] [PubMed] [Google Scholar]
- 14.Kinney JN, Salmeen A, Cai F & Kerfeld CA Elucidating essential role of conserved carboxysomal protein CcmN reveals common feature of bacterial microcompartment assembly. J. Biol. Chem. 287, 17729–17736 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Long BM, Badger MR, Whitney SM & Price GD Analysis of carboxysomes from Synechococcus PCC7942 reveals multiple Rubisco complexes with carboxysomal proteins CcmM and CcaA. J. Biol. Chem. 282, 29323–29335 (2007). [DOI] [PubMed] [Google Scholar]
- 16.Ryan P et al. The small RbcS-like domains of the β-carboxysome structural protein CcmM bind RubisCO at a site distinct from that binding the RbcS subunit. J. Biol. Chem. 294, 2593–2603 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wang H et al. Rubisco condensate formation by CcmM in β-carboxysome biogenesis. Nature 566, 131–135 (2019). [DOI] [PubMed] [Google Scholar]
- 18.Long BM, Rae BD, Badger MR & Price GD Over-expression of the β-carboxysomal CcmM protein in Synechococcus PCC7942 reveals a tight co-regulation of carboxysomal carbonic anhydrase (CcaA) and M58 content. Photosyn Res 109, 33–45 (2011). [DOI] [PubMed] [Google Scholar]
- 19.Cai F et al. Advances in understanding carboxysome assembly in prochlorococcus and synechococcus implicate csos2 as a critical component. Life (Basel) 5, 1141–1171 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cannon GC et al. Organization of carboxysome genes in the thiobacilli. Curr Microbiol 46, 115–119 (2003). [DOI] [PubMed] [Google Scholar]
- 21.Chaijarasphong T et al. Programmed Ribosomal Frameshifting Mediates Expression of the α-Carboxysome. J. Mol. Biol. 428, 153–164 (2016). [DOI] [PubMed] [Google Scholar]
- 22.Williams EB Identification and Characterization of Protein Interactions in the Carboxysome of Halothiobacillus neapolitanus. (2006). [Google Scholar]
- 23.Liu Y et al. Deciphering molecular details in the assembly of alpha-type carboxysome. Sci. Rep. 8, 15062 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Gonzales AD et al. Proteomic analysis of the CO2 -concentrating mechanism in the open-ocean cyanobacteriumSynechococcus WH8102. Can. J. Bot. 83, 735–745 (2005). [Google Scholar]
- 25.Baker SH et al. The correlation of the gene csoS2 of the carboxysome operon with two polypeptides of the carboxysome in thiobacillus neapolitanus. Arch. Microbiol. 172, 233–239 (1999). [DOI] [PubMed] [Google Scholar]
- 26.Xue B, Dunbrack RL, Williams RW, Dunker AK & Uversky VN PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. Biochim. Biophys. Acta 1804, 996–1010 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Jones DT & Cozzetto D DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics 31, 857–863 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Mizianty MJ, Peng Z & Kurgan L MFDp2: Accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles. Intrinsically Disord Proteins 1, e24428 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Drozdetskiy A, Cole C, Procter J & Barton GJ JPred4: a protein secondary structure prediction server. Nucleic Acids Res. 43, W389–94 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Abdiche Y, Malashock D, Pinkerton A & Pons J Determining kinetics and affinities of protein interactions using a parallel real-time label-free biosensor, the Octet. Anal. Biochem. 377, 209–217 (2008). [DOI] [PubMed] [Google Scholar]
- 31.van der Lee R et al. Classification of intrinsically disordered regions and proteins. Chem. Rev. 114, 6589–6631 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Davey NE et al. Attributes of short linear motifs. Mol. Biosyst. 8, 268–281 (2012). [DOI] [PubMed] [Google Scholar]
- 33.Alberty RA Thermodynamics of biochemical reactions. (John Wiley & Sons, Inc., 2003). doi: 10.1002/0471332607 [DOI] [Google Scholar]
- 34.Schneider G, Lindqvist Y & Brändén CI RUBISCO: structure and mechanism. Annu. Rev. Biophys. Biomol. Struct. 21, 119–143 (1992). [DOI] [PubMed] [Google Scholar]
- 35.Gallivan JP & Dougherty DA Cation-pi interactions in structural biology. Proc. Natl. Acad. Sci. USA 96, 9459–9464 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wang J et al. A Molecular Grammar Governing the Driving Forces for Phase Separation of Prion-like RNA Binding Proteins. Cell 174, 688–699.e16 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bailey TL & Gribskov M Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14, 48–54 (1998). [DOI] [PubMed] [Google Scholar]
- 38.Bonacci W et al. Modularity of a carbon-fixing protein organelle. Proc. Natl. Acad. Sci. USA 109, 478–483 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Li P et al. Phase transitions in the assembly of multivalent signalling proteins. Nature 483, 336–340 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Boeynaems S et al. Protein phase separation: A new phase in cell biology. Trends Cell Biol. 28, 420–435 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Mackinder LCM et al. A repeat protein links Rubisco to form the eukaryotic carbon-concentrating organelle. Proc. Natl. Acad. Sci. USA 113, 5958–5963 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wunder T, Cheng SLH, Lai S-K, Li H-Y & Mueller-Cajar O The phase separation underlying the pyrenoid-based microalgal Rubisco supercharger. Nat. Commun. 9, 5076 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Freeman Rosenzweig ES et al. The Eukaryotic CO2-Concentrating Organelle Is Liquid-like and Exhibits Dynamic Reorganization. Cell 171, 148–162.e19 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Long BM, Tucker L, Badger MR & Price GD Functional cyanobacterial beta-carboxysomes have an absolute requirement for both long and short forms of the CcmM protein. Plant Physiol. 153, 285–293 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Hyman AA, Weber CA & Jülicher F Liquid-liquid phase separation in biology. Annu. Rev. Cell Dev. Biol. 30, 39–58 (2014). [DOI] [PubMed] [Google Scholar]
Methods-only References
- 46.Bailey TL & Elkan C Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994). [PubMed] [Google Scholar]
- 47.Krissinel E & Henrick K Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 372, 774–797 (2007). [DOI] [PubMed] [Google Scholar]
- 48.Engler C, Kandzia R & Marillonnet S A one pot, one step, precision cloning method with high throughput capability. PLoS One 3, e3647 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Schuler B, Lipman EA, Steinbach PJ, Kumke M & Eaton WA Polyproline and the “spectroscopic ruler” revisited with single-molecule fluorescence. Proc. Natl. Acad. Sci. USA 102, 2754–2759 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Kabsch W Integration, scaling, space-group assignment and post-refinement. Acta Crystallogr. Sect. D, Biol. Crystallogr. 66, 133–144 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Collaborative Computational Project Number 4. The CCP4 suite: programs for protein crystallography. Acta Crystallogr. Sect. D, Biol. Crystallogr. 50, 760–763 (1994). [DOI] [PubMed] [Google Scholar]
- 52.Evans PR & Murshudov GN How good are my data and what is the resolution? Acta Crystallogr. Sect. D, Biol. Crystallogr. 69, 1204–1214 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.McCoy AJ et al. Phaser crystallographic software. J Appl Crystallogr 40, 658–674 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Adams PD et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. Sect. D, Biol. Crystallogr. 66, 213–221 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Emsley P & Cowtan K Coot: model-building tools for molecular graphics. Acta Crystallogr. Sect. D, Biol. Crystallogr. 60, 2126–2132 (2004). [DOI] [PubMed] [Google Scholar]
- 56.Lim SA, Bolin ER & Marqusee S Tracing a protein’s folding pathway over evolutionary time using ancestral sequence reconstruction and hydrogen exchange. Elife 7, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Samelson AJ et al. Kinetic and structural comparison of a protein’s cotranslational folding and refolding pathways. Sci. Adv. 4, eaas9098 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Sievers F & Higgins DG Clustal Omega for making accurate alignments of many protein sequences. Protein Sci. 27, 135–145 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Atomic coordinates and structure factors for the Rubisco / CsoS2 N-peptide fusion have been deposited in the wwPDB with accession code PDB 6UEW. Plasmids for all protein constructs used are available from Addgene. Raw data for all MST experiments in Fig. 2e,f and Extended Data Fig. 5 are available as Source Data. All protein sequences used for the binding motif analysis and for Fig. 2c and Fig. 3g are available as FASTA files in Supplementary Data 1–3.















