Significance
N-glycosylation is a common posttranslational modification of extracellular proteins. It plays a role in protein folding in the endoplasmic reticulum and later participates in important protein–carbohydrate interactions. Viruses, in particular, are often highly glycosylated. Knowledge of the determinants of modification at a particular site is important for better understanding these processes. One challenge in obtaining this information is separating the effect of the two glycosyltransferases, OST-A and OST-B. Our study focuses on OST-B and reveals how multiple, closely spaced glycosylation sites can influence one another in the context of OST-B glycosylation.
Keywords: glycobiology, mass spectrometry, oligosaccharyltransferase
Abstract
N-glycosylation is a common posttranslational modification of secreted proteins in eukaryotes. This modification targets asparagine residues within the consensus sequence, N–X–S/T. While this sequence is required for glycosylation, the initial transfer of a high-mannose glycan by oligosaccharyl transferases A or B (OST-A or OST-B) can lead to incomplete occupancy at a given site. Factors that determine the extent of transfer are not well understood, and understanding them may provide insight into the function of these important enzymes. Here, we use mass spectrometry (MS) to simultaneously measure relative occupancies for three N-glycosylation sites on the N-terminal IgV domain of the recombinant glycoprotein, hCEACAM1. We demonstrate that addition is primarily by the OST-B enzyme and propose a kinetic model of OST-B N-glycosylation. Fitting the kinetic model to the MS data yields distinct rates for glycan addition at most sites and suggests a largely stochastic initial order of glycan addition. The model also suggests that glycosylation at one site influences the efficiency of subsequent modifications at the other sites, and glycosylation at the central or N-terminal site leads to dead-end products that seldom lead to full glycosylation of all three sites. Only one path of progressive glycosylation, one initiated by glycosylation at the C-terminal site, can efficiently lead to full occupancy for all three sites. Thus, the hCEACAM1 domain provides an effective model system to study site-specific recognition of glycosylation sequons by OST-B and suggests that the order and efficiency of posttranslational glycosylation is influenced by steric cross-talk between adjoining acceptor sites.
Glycosylation is a common posttranslational modification (PTM) of eukaryotic proteins in which a carbohydrate is attached to the sidechains of specific residues. In mammals, most secreted and membrane proteins are O- or N-glycosylated with complex structures (1). In addition, single sugars attached to cytosolic proteins play important regulatory roles in the cytoplasm and nucleus (2). N-glycosylation is perhaps best understood; it occurs on asparagine residues found within a consensus motif of N–X–S/T and plays important biological roles in both protein folding and cell signaling processes (3). However, not all consensus sequons are glycosylated, occupancy at some sites can be incomplete, and omissions can have biological consequences (4). Here we present data that can uncover factors leading to these omissions.
Some specificity in N-glycosylation may arise in the initial transfer of a high-mannose glycan (Glc3Man9GlcNAc2) from a dolichol phosphate donor to the consensus asparagine. This step is catalyzed by the oligosaccharyltransferase (OST) complex which comes in two forms (OST-A and OST-B) (Fig. 1A). OST-A associates with the translocon and glycosylates nascent polypeptides in a cotranslational manner, while OST-B acts posttranslationally (5). Additionally, OST-B forms transient disulfide bonds with protein substrates via an oxidoreductase subunit (6). The differential activity of these two complexes has been investigated at the proteome level in combination with CRISPR-Cas9 mutants deficient for either catalytic subunit (STT3A or STT3B). This study found that many N-glycosylation sites modified by OST-A are fully occupied and proposed that OST-B glycosylates primarily a subset of sequons that were skipped by OST-A (7). Most of these skipped sites are within 65 residues of the C terminus of a protein, a stretch of residues that may be required to retain attachment to the ribosome while OST-A acts.
Fig. 1.
(A) Diagram of OST-A and OST-B function. (Left) A nascent polypeptide is shown emerging from the ribosome (light blue, PDB 6FTI), passing into the ER lumen via the sec61 channel (green) and being scanned by OST-A (red, PDB 6S7O). (Right) OST-B (blue, PDB 6S7T) is shown interacting with a partially folded peptide substrate. A dolichol pyrophosphate-linked oligosaccharide donor is shown in the membrane to the left of OST-A. (B) Model of glycosylation pathways for hCEACAM1-IgV via OST-B. After protein translation, glycans can be added in any order to each of the three N-glycan sequons (indicated by an “N” in the linear polypeptide representation), and each addition is given its own rate constant. Glycans are shown with the Symbol Nomenclature for Glycans symbol for GlcNAc, for simplicity.
Since OST-B acts posttranslationally, it is not constrained to an ordered addition to substrate glycosylation sites from N-terminal to C-terminal, as is the case for cotranslational addition by OST-A. In addition, steric constraints may contribute to substrate access, since folding of protein substrates may be initiated or completed prior to OST-B action (Fig. 1A). Here, we use mass spectrometry (MS) to examine the glycosylation of a small protein, the N-terminal IgV domain of human Carcinoembryonic Antigen Cell Adhesion Molecule 1, hCEACAM1.The hCEACAM1 is a highly glycosylated extracellular protein receptor, which has been implicated in gastrointestinal autoimmune disorders and host–pathogen interactions (8, 9). The N-terminal immunoglobulin-V-like domain (IgV) contains three N-glycosylation sites. While our initial study of hCEACAM1 was motivated by its biological function, the hCEACAM1-IgV construct proves a convenient model system for understanding OST-catalyzed glycosylation. First, the small size of the N-terminal domain (12 kDa) makes it an accessible target for top-down MS/MS analysis of the sites of glycosylation. Second, despite the small size of the protein, this construct contains three consensus sequons, all within 35 residues of the C terminus, sites N104, N111, N115 of the Uniprot sequence P13688, suggesting that all glycosylation may be dependent on OST-B. Lastly, hCEACAM1-IgV does not possess any cysteine residues and, therefore, will not involve the oxidoreductase activity of OST-B; in their absence, glycosylation levels are lower, allowing the level of glycosylation to report more directly on accessibility of the OST-B catalytic site.
Top-down MS (TDMS) is an advanced MS method in which an intact protein is analyzed directly, as opposed to analysis of proteolytic fragments (10). The top-down approach is especially beneficial when a protein has multiple modifications on different regions of the protein. More widely used bottom-up approaches rely on proteolysis which can, in principle, separate PTMs onto different peptides. Tandem MS is more easily accomplished for peptides than intact proteins, but information can be lost by enzymatic digestion, specifically, the correlation between sites of modification for substoichiometric PTMs. TDMS allows for the localization of multiple glycosylation sites simultaneously and informs upon the relative abundance of distinct glycan occupancy states for the intact protein. It does, however, require extensive and controlled fragmentation that focuses on backbone cleavage and does not dissociate the PTMs of interest. Electron-based fragmentation techniques, such as electron capture dissociation (ECD) or electron transfer dissociation, are employed here to improve sequence coverage and to preserve labile PTMs (11–13). Using these MS approaches, relative occupancies for three N-glycosylation sites on the N-terminal IgV domain of the wild-type recombinant glycoprotein, hCEACAM1, were simultaneously measured, along with three mutant proteins in which single glycosylation sites were removed by mutating asparagine to glutamine.
The resulting information was used to develop a kinetic model of OST-B-catalyzed glycosylation (Fig. 1B). Initial glycosylation occurred readily at all three sites, but, surprisingly, glycosylation at the central site led to dead-end products that were seldom glycosylated at additional sites. Thus, the order of glycosylation by OST-B is largely stochastic, and subsequent glycosylations appear to be impacted by the position of the initial glycosylation event. Only a single path of ordered glycosylation of the three sites, starting with the C-terminal site, followed by the central site and finally the N-terminal site, efficiently produced a fully glycosylated recombinant product. This is in contrast to the efficient N- to C-terminal glycan modification that is characteristic of cotranslational OST-A-catalyzed glycosylation.
Results
Glycosylation Occupancy.
After the initial transfer of a high-mannose glycan to CEACAM1-IgV sites, subsequent trimming in the endoplasmic reticulum (ER) and further glycan additions in the Golgi cisternae would normally lead to a complex mixture of glycoforms that could complicate MS analysis. To avoid this, recombinant hCEACAM1-IgV was expressed in an MGAT1 null HEK293 cell line (HEK293S GnT1−) that only produces Man5GlcNAc2 glycans (14). These glycans are susceptible to cleavage between the two N acetylglucosamine (GlcNAc) residues by endoglycosidase-F1 (EndoF1), leaving a single GlcNAc “scar” at any glycosylated site (15). The simplified mass spectrum of intact hCEACAM1-WT, after EndoF1 treatment, showed a mixture of three species with deconvoluted monoisotopic masses of 12,417.22, 12,620.29, and 12,823.37 Da, which correspond to the protein with one, two, or three GlcNAc modifications. These species were observed with relative proportions of 29%, 46%, and 26%, respectively (Fig. 2A). The species lacking glycans was not observed, and, presumably, is present at a level below noise.
Fig. 2.
(A) Mass spectrum of hCEACAM1-IgV-WT showing a region containing the 12+ charge distribution. Three peaks are observed corresponding to the protein with one, two, or three GlcNAc residues (blue squares). In the case of one or two GlcNAc residues, the location of the modification is ambiguous and indicated with brackets. (B) Comparison of mass spectra for hCEACAM1 glycosylation site mutants. The number within the blue square indicates the number of GlcNAc residues. (C) Comparison of mass spectra for hCEACAM1-WT samples expressed in cells with and without the presence of OST-B inhibitor. (D) Example fragmentation map from top-down ECD MS/MS spectra of monoglycosylated hCEACAM1-IgV. Lines drawn between residues indicate an assigned fragment ion. The notch at the top of a line indicates a c-type fragment ion, while a notch at the bottom indicates an assigned z ion. Residue N111 is highlighted, indicating the presence of a GlcNAc PTM.
The glycosylation positions of species with one and two GlcNAcs are indeterminate in MS spectra of the intact protein. To determine these positions, top-down ECD MS/MS was applied. An example of the ECD MS/MS fragmentation coverage is shown in Fig. 2D for the monoglycosylated components. Fragments were observed across a large portion of the protein backbone. More importantly, fragmentation between the three glycosylation sites is observed, allowing determination of site-specific glycan occupancy.
By combining the relative abundances from the intact mass spectrum with the fragmentation patterns from the MS/MS spectra, the abundances of each potential “occupancy state” were determined for wild type, wild type in the presence of an OST-B inhibitor, and three glycosylation site mutants (Table 1). These states are denoted by a three-bit binary code in which the bits correspond to sites N104, N111, and N115, and “1” indicates glycosylation and “0” indicates the absence of glycosylation. We first focus on wild-type data under normal OST-A and OST-B activity. The results show that the locations of glycans are not random. Instead, there are clear patterns for modification preference. In the case of the species with one GlcNAc, the modification was predominantly located at the central N111 sequon, [010], while, in the case of the two-GlcNAc species, a mixture of [110] and [101] in similar amounts was observed, and [011] was observed at a lower level.
Table 1.
Glycoform distributions of hCEACAM1-IgV variants
| Relative Proportion (%) | ||||||||
|---|---|---|---|---|---|---|---|---|
| Sample | [000] | [100] | [010] | [001] | [110] | [101] | [011] | [111] |
| WT* | 0‡ ± 0 | 2 ± 2 | 27 ± 12 | 0‡ ± 0 | 23 ± 5 | 19 ± 9 | 6 ± 6 | 25 ± 10 |
| Inhibitor-treated† | 1 | 6 | 70 | 3 | 8 | 7 | 3 | 1 |
| N104Q† | 0‡ | n/a§ | 19 | 1 | n/a§ | n/a§ | 81 | n/a§ |
| N111Q† | 0‡ | 2 | n/a§ | 0‡ | n/a§ | 98 | n/a§ | n/a§ |
| N115Q† | 0‡ | 0‡ | 39 | n/a§ | 61 | n/a§ | n/a§ | n/a§ |
*Wild-type (WT) sample is reported as the average ± SD of all biological replicates (n = 4).
†Inhibitor-treated and mutant samples were measured on single biological samples.
‡Not observed.
§Impossible due to NΔQ mutation.
To test the reproducibility of these results, hCEACAM1-IgV-WT was expressed in duplicate by two different laboratories. The SDs in glycoform distributions are listed in Table 1. While there is some variation, there is consistency in that species [010], [110], [101], and [111] were highly populated, and [000], [100], [001], and [011] were absent or sparingly populated.
Glycosite Mutants.
To further explore the patterns in N-glycosylation, we produced three mutant versions of hCEACAM1-IgV where each glycosylated asparagine residue was individually mutated to a glutamine, thus eliminating the possibility of glycosylation at the respective site. Top-down MS analysis was then repeated on each of these samples. The intact mass spectra of each mutant show significant changes compared to the wild-type construct (Fig. 2B). In each case, species were observed corresponding to the protein with one asparagine to glutamine mutation (+14 Da) and one or two GlcNAc modifications. The most dramatic change is observed when site 2 (N111) is mutated. Near-complete glycosylation was observed, with the fully occupied two-GlcNAc form comprising 98% of the protein signal. Fragmentation patterns of the one-GlcNAc form of each mutant give insight into pair-wise correlations between glycosylation at the remaining sites. For the N104Q mutant, the single GlcNAc was primarily observed at N111. A similar trend was seen for the N115Q mutant where the glycan modification was only observed at N111.
Support for OST-B Exclusive Glycosylation.
Using these data, we sought to develop a simple model describing the glycosylation events; however, the details of this model depend on whether hCEACAM1-IgV is glycosylated cotranslationally by OST-A or posttranslationally by OST-B. Based on prior literature, we expected OST-A to be unable to glycosylate this construct, due to the close proximity of the N-glycosylation sequons to the C terminus. However, to test this expectation, we prepared a two-domain hCEACAM1 construct (hCEACAM1-IgV-IgC) that appended an additional 95-aa IgC domain C-terminal to the IgV domain. The added domain carries an additional six N-glycosylation sites, making nine sites in total. Intact mass measurement of this sample showed a mixture of species consistent with five to nine GlcNAc residues (SI Appendix, Fig. S3A). ECD fragmentation spectra revealed that the first three glycosylation sites, corresponding to those on the IgV domain, were now highly occupied. No fragment ions were observed without GlcNAc modification at these sites (SI Appendix, Fig. S3B). This observation is consistent with the fact that these three sites are now more than 65 residues away from the C terminus and that glycosylation by both OST-A and OST-B is possible. Also, the fourth glycosylation site is sufficiently far from the C terminus for OST-A glycosylation and was also observed to be completely occupied by top-down ECD MS/MS. The remaining five glycosylation sites are within 65 residues of the C terminus and, as expected, were variably modified. One potential caveat regarding the evidence from the two-domain construct is that the second domain contains a disulfide bond. This raises the possibility that the increased glycosylation occupancy of sites 1, 2, and 3 may not be solely due to engaging OST-A but also may have enhanced OST-B activity, due to the oxidoreductase subunit of OST-B.
To further test the mechanism of N-glycosylation, we expressed the hCEACAM1-IgV construct while treating the HEK293 expression culture with an OST-B inhibitor that has been shown to selectively repress glycosylation by OST-B (16). Intact mass spectra show that the inhibitor-treated protein sample is significantly less glycosylated than untreated samples (Fig. 2C). While suppression of glycosylation is not complete, this may reflect some site specificity in suppression as opposed to residual activity of OST-A (16). The inhibitor is not believed to be an active site inhibitor, which would uniformly suppress glycosylation, but is believed to act allosterically, a manner that could be site specific. Moreover, it was previously observed that administration of the inhibitor to an STT3A knockout did not completely prevent glycosylation of the reporter protein (16).
Tandem MS was also performed on the inhibitor-treated samples, resulting in glycoform distributions shown in Table 1. The results show that the most abundant form in the inhibitor-treated sample contains a single glycan at the central glycosylation site (N111). Interestingly, small amounts of [100] and [001], previously unobserved, were also detected. None of the above observations alone provide a definitive argument for the exclusive action of OSTB, but, in total, the argument for dominance by OSTB is very strong.
Kinetic Modeling.
Direct interpretation of the relative amounts of the hCEACAM1-IgV glycoforms is difficult. Several minor species are observed in small amounts and, thus, may be transient intermediates that are quickly glycosylated. Another explanation is that less abundant species are produced slowly for reasons due to the mechanism or structure of the enzyme. To gain further insight into the mechanism of N-glycosylation of hCEACAM1-IgV, we developed a kinetic model of OST-B-catalyzed addition. The model extracts relative rate constants for glycosylation at the three N-linked sites from the steady-state concentrations of the various glycoforms, as explained below.
We started modeling with a hypothesis that posttranslational addition of N-glycans would be stochastic with 12 possible reactions, each with a unique rate constant. In addition, to mimic the cell expression process, the model employed a single rate describing the protein entering the ER via translation and a single common rate for the secretion of each species, giving a total of 14 rate constants. These become variables in eight differential equations describing the rate of change of the eight glycosylated species in the wild-type sample.
We have increased the amount of modeling data by examining glycosylation of mutants that eliminate glycosylation at each of the three consensus sites (N to Q mutations). If we assume that mutation acts only by elimination of the respective glycosylation site, while kinetic constants for glycosylation of the remaining sites are the same as for wild type, we get 4 additional equations for each mutant, for a total of 20. Because MS analysis gives only proportions of the various glycosylated species relative to the sum total of all glycoforms in a given mass spectrum, only the relative magnitudes of the constants are meaningful. We can therefore set one variable (the rate for wild-type hCEACAM1 entering the ER) to an arbitrary number. The rates for mutants entering the ER are related to this by three scaling factors. Thus, we end up with 16 variables and 20 differential equations that relate these variables to experimental data. The set of variables best satisfying the equations is found by a process more fully described in Materials and Methods. The resulting values for each rate constant are shown in Fig. 3, in which the determined rates have replaced the kinetic constants in Fig. 1B, and our binary codes have replaced the nodes in Fig. 1B. Note that the rates of substrate entering and sum of products leaving are the same within experimental error. The derived rate constants can be used to predict the glycoform distribution for the wild-type and mutant expression systems, and these values are listed in SI Appendix, Table S28. These values compare quite well with the observed distributions listed in Table 1, particularly for the mutant systems. Any ER-associated degradation (ERAD) processes are not explicitly included in our model, and rates of leaving the ER may not be the same for all species. These may explain our inability to get an exact fit to data. Regardless of the limitations of our model, these results suggest distinct pathways for site-specific occupancy of hCEACAM1-IgV. Glycosylation of site 1 and site 2 proceeds efficiently, but further addition at site 3 is unfavorable. Initial glycosylation of site 2 is also favorable; however, this leads to a dead end where further addition at either of the remaining sites is very slow. In fact, the most efficient pathway to full glycosylation seems to proceed via site 3, then site 2, and finally site 1.
Fig. 3.
Flowchart describing the kinetic model of OST-B glycosylation. Glycosylation state is shown as a three-bit binary representation, for simplicity. The order of the three bits (e.g., [010]) corresponds to sites N104, N111, and N115, respectively, where a “0” represents an unglycosylated residue and a “1” represents the presence of a GlcNAc at the respective site. Each arrow indicates a separate reaction. In the case of the processes removing protein from the ER, only one arrow is drawn, for clarity, but the model includes a separate process for each species to exit the ER. The numbers above each arrow indicate the value of the corresponding optimized rate constant from the kinetic model, while the numbers below each arrow indicate the SD. The thickness of each line has been scaled in proportion to the value of the rate constant. The orientation of the graph mirrors the diagram in Fig. 1B.
These results can be tested by the glycoform distribution measured on the hCEACAM1 IgV sample expressed during OST-B inhibitor treatment (Table 1). If we assume that inhibition slows the rate of all glycosylation steps equally, then one would expect an accumulation of intermediate states. Our model predicts that [100] and [001] are two such intermediates. The data show that these species are present in larger amounts than without inhibitor treatment; however, they are still minor glycoforms compared to the [010] species. This discrepancy may reflect a more complex mechanism of inhibition that impacts some glycosylation sites more severely than others. Alternatively, some specificity in ERAD may play a significant role in the low abundance of [000], [100], and [001] glycoforms.
Discussion
A single dominant path to full glycosylation implies there are restraints on the process and mechanism. Analysis of the structure of the glycotransferase may provide insight. The structure of the human OST-B complex has been solved by cryoelectron microscopy (cryo-EM) with electron density consistent for an average peptide acceptor sequon bound in the catalytic site (17). To our knowledge, there is no evidence that OST-B acts in a processive fashion on an unfolded protein. Also, unglycosylated versions of hCEACAM1-IgV readily fold, and several crystal structures of the bacterially expressed protein have been determined (18, 19). Hence, it seems reasonable to examine the possibility that OST-B acts on a form of hCEACAM1-IgV that can theoretically range between a fully folded and an unfolded state. A folded model based on an X-ray crystal structure (Protein Data Bank [PDB] 4QXW) of hCEACAM1-IgV was aligned with the bound peptide at each of the three glycosylation sites. In each case, significant clashes between the folded hCEACAM1-IgV and the OST-B complex were observed (not shown), suggesting that the protein must be at least partially unfolded during glycosylation. It is intriguing that OST-A can more efficiently glycosylate the three sequons of hCEACAM1 IgV than OST-B if a C-terminal extension beyond the IgV domain is provided. The key difference is that OST-A mediated cotranslational glycosylation enforces rapid glycosylation as the peptide emerges from the SEC61 translocon prior to protein folding. Hence, partial folding may be responsible for the slower differential glycosylation at certain sites.
Next, the possible effect of prior glycosylation at adjacent sites was explored. An unfolded polypeptide containing the three N-glycosylation sites was built and aligned so that site 1 (N104) was inside the OST-B catalytic site, and then a Glc3Man9GlcNAc2 glycan was modeled on site 2 (N111) (Fig. 4). The 14-residue oligosaccharide is quite large, and it is clear that steric interactions could easily inhibit entry of additional hCEACAM1-IgV sequons. Similar models were examined with initial glycosylation at either N104 or N115, and the remaining NXS or NXT sites placed in the catalytic region of OST-B. These also suggest steric interactions that could inhibit further glycosylation. Unfortunately, without more structural data, it is not possible to build more specific models. In particular, we have ignored the possible motion of OST-B domains. Crystal structures of the bacterial OST PglB (homologous to STT3B) in complex with model substrates have shown that a large loop (EL5) undergoes significant rearrangement between disordered and ordered states as substrates bind and products are released (20–22). Additional indications of internal motion are present in the cryo-EM structure OST-B (17). Several loops in STT3B did not produce strong electron density, and even the entire catalytic domain of the oxidoreductase MagT1 was not observed.
Fig. 4.
Model of unfolded hCEACAM1-IgV peptide (blue ribbon) docked into the catalytic site of STT3B (blue surface). Site 1 (N104) is docked in the catalytic site, and a Glc3Man9GlcNAc2 N-glycan (transparent surface) is attached at site 2 (N111). The structure of OST-B was taken from PDB 67ST.
We should point out that other factors are known to affect probabilities for glycosylation (23). While most of these have been studied under conditions where OST-A rather than OST-B is active, the similarity of active sites in the two complexes suggests consideration. One well-documented factor is the presence of a serine rather than threonine in the sequon. Sequons with threonine are much more likely glycosylated, and it is even known that closely spaced threonine-containing sequons can be glycosylated (24). Site N104 is an NXS site, and both other sites are NXT sites. Yet, their initial glycosylation rates are very similar. Likewise, our N115 site has an aspartic acid in the X site, something that is known to suppress glycosylation to some extent (25). Yet, we do not observe suppression of our initial addition rates. These observations do suggest that either OST-B functions differently, or other factors such as steric factors may be rate limiting.
The assumptions of our kinetic model can clearly be debated. We assume that substrates, such as the glycosylated dolichol pyrophosphate and unglycosylated protein, reach steady-state levels early in the process and remain at those levels for the majority of expression time. It is unclear whether this assumption would be true for protein overexpression during our typical 6-d culture period. Also, glycosylation could alter the folding trajectory of hCEACAM1, and some forms may be preferentially secreted. Additionally, there are quality control processes in the ER that target poorly folded and possibly underglycosylated proteins through an ERAD pathway for destruction by cytosolic proteosomes (26). The loss of unfolded species through the ERAD pathway may be significant, as previous expression of unglycosylated hCEACAM1-IgV in Escherichia coli formed inclusion bodies (27). Some of these flaws could be corrected by the addition of more processes (and unknown parameters) to the kinetic model. However, the number of independent variables in even our simplest model (14) exceeds the amount of MS data on the initial hCEACAM1-IgV construct (measurements on eight species and eight equations relating these to variables). We were able to compensate by adding data from mutants missing glycosylation sites, but this required additional assumptions regarding a lack of perturbation to remaining sites. Nevertheless, some level of site-to-site interaction is clear, and this interaction likely has structural implications.
The interplay of glycans on different sites of glycoproteins could well have implications for other biological functions. Proteins other than CEACAMs are very heavily glycosylated, and glycans may well interact in the course of stabilizing certain conformations or protecting sites from interactions with other proteins, as in the case of glycan shields of certain virus surfaces from antigenic development (28). Site-specific information on occupancy and the glycan structures occupying each site could be very important to understanding glycan function in these instances. The top-down MS strategy used in our studies can supply these data.
Here we have measured the glycan occupancy of hCEACAM1-IgV using a combination of top-down and bottom-up MS. Based on the combination of increased occupancy in a two-domain construct, significantly reduced glycosylation after treatment with an OST-B specific inhibitor, and literature precedence for inefficient OST-A activity toward sequons proximal to the C terminus, we conclude that N-glycosylation of this protein is catalyzed by OST-B. Using these data, we have proposed a kinetic model of OST-B-driven glycosylation that predicts the most efficient pathway to full glycosylation proceeds from C to N terminus. While some of these observations are probably specific to our model protein, this work highlights that, for OST-B-catalyzed glycosylation, the relative rates of glycosylation at various sites are determined by factors other than the NXS versus NXT motif, and that the order of glycosylation is not constrained to follow N-terminal to C-terminal processing. The N-glycosylation process is a complex interplay between the activity of the OST complexes, protein folding, and subsequent trafficking through the ER and/or Golgi apparatus. Our results suggest that folding trajectories are glycoform specific and would be ripe for future inquiry. Similar studies on other glycoproteins may shed further light on these complex processes.
Materials and Methods
Protein Expression and Purification.
The hCEACAM1 expression constructs were synthesized using human codon optimization (ThermoFisher) and were inserted into a pGEn2 expression vector (29). The resulting expression products contain an N-terminal signal sequence followed by a His-tag, green fluorescent protein (GFP) domain, tobacco etch virus (TEV) protease recognition site, and an SGG linker followed by the hCEACAM1-IgV domain (UniProt P13688, residues 34 to 141). Three glycosylation site mutants (N104Q, N111Q, and N115Q) were generated using a Q5 site-directed mutagenesis kit (New England Biolabs), and a two-domain hCEACAM1-IgV-IgC (residues 34 to 236) construct was also prepared in the same vector backbone. All constructs were expressed by transient transfection in HEK293S (GnT1−) cells (ATCC), leading to secretion of the recombinant product into conditioned medium (14, 29). The HEK293S (GnT1−) recombinant host is defective in the glycan processing enzyme, MGAT1, and leads to secretion of glycosylated products harboring Man5GlcNAc2 glycan structures that can be subsequently cleaved by endoglycosidase F1 (EndoF1) to result in a single GlcNAc-Asn linkage at the respective glycosylation site. The hCEACAM1 expression products were harvested 6 d posttransfection, and the conditioned media was subjected to centrifugation prior to protein purification. For OST-B inhibition studies, inhibitor C19 (Enamine, cat# Z26531254) was added to the cell culture medium to a final concentration of 25 µM with 0.1% (vol/vol) of dimethyl sulfoxide. For protein purification, the crude medium was loaded onto Ni-NTA Superflow (Qiagen) column, washed with 25 mM Hepes, 300 mM NaCl, 20 mM imidazole, pH 7.0, and eluted with 25 mM Hepes, 300 mM NaCl, 300 mM imidazole, pH 7.0. Purified protein was concentrated and treated with recombinant TEV protease and EndoF1 (both His-tagged) simultaneously for 24 h at 4 °C to remove the GFP fusion tag and N-glycans. The digestion products were then passed through a Ni-NTA Superflow column to remove the cleaved GFP fusion tag, TEV protease, and EndoF1 to result in a purified untagged protein in the flow-through fraction. Protein was further purified over a Superdex-75 (GE Healthcare Life Sciences) column and concentrated to 1 mg/mL to 3 mg/mL. Each purification step was checked by sodium dodecyl sulfate polyacrylamide gel electrophoresis. Human CEACAM1 variants were expressed and purified in two different laboratories independently at the University of Georgia and University of Virginia, following the same protocol, to act as biological replicates.
MS.
Following protein expression and purification, protein samples were buffer exchanged into 10 mM ammonium acetate and diluted to a concentration of 1 µM with 50/49/1 (vol/vol/v)% methanol–water–formic acid solution for MS analysis. All mass spectra were acquired using a 12 T Bruker SolariX Fourier transform ion cyclotron resonance (FT-ICR)-MS. Sample solutions were directly infused via a syringe pump at a flow rate of 2.0 µL/min and ionized via electrospray with a capillary voltage of 4,500 V and end plate offset of −500 V. Ions were accumulated in the collision cell for 0.1 s for MS spectra and 1.0 s for MS/MS prior to transfer to the analyzer cell. A 1.0-ms time-of-flight delay was set before trapping ions. For MS spectra, 50 scans were coadded, while, for MS/MS, 300 scans were acquired. ECD MS/MS spectra were achieved with an electron bias of 1.0 V, lens voltage of 10.0 V, and electron pulse length of 15 ms to 20 ms. The FT-ICR time domain transients contained 2 million data points with a length of 0.8389 s, which corresponds to a resolving power of ∼77,000 at m/z 1,000.
For bottom-up confirmation of selected datasets, protein digestion was performed using sequencing grade trypsin (Promega). A 1 mg/mL hCEACAM1 solution was mixed with trypsin at a 50:1 protein-to-enzyme ratio (wt/wt) and incubated at 37 °C overnight. The resulting digest solution was diluted 100-fold with 50/50/0.1 (vol/vol/v)% methanol, water, and formic acid solution and infused directly into the mass spectrometer. Identical instrument settings were used as above, with the exception that, for ECD MS/MS experiments, the glycopeptide precursor ions were irradiated with electrons for 100 ms, and 200 scans were averaged.
Tables of all intact mass distributions and ECD MS/MS fragment ion assignments are included as SI Appendix.
Assignment of Glycan Occupancy.
The proportion of monoglycosylation, diglycosylation, and triglycosylation was determined from the intensities of the corresponding peaks in the mass spectrum of the intact protein (e.g., SI Appendix, Fig. S1 and Table S1), or the glycopeptide peaks in the mass spectrum of the tryptic digest (SI Appendix, Fig. S2 and Table S16). Only a single glycoform is possible for the triglycosylated protein, [111], and its proportion is obtained as the ratio of its abundance in the mass spectrum over the sum of all glycoforms. Three glycoforms each are possible for the monoglycosylated ([100], [010], [001]) and diglycosylated proteoforms ([110], [101], [011]). The proportion of each glycoform for the monoglycosylated and diglycosylated protein was calculated from the c- and z-fragment ions in the ECD MS/MS spectra of the intact protein (top-down; e.g., SI Appendix, Table S2) or the glycopeptides (bottom-up; e.g., SI Appendix, Table S17). Only fragments in the glycosylated portion of the protein sequence which distinguish sites of modification were used for calculating the proportion of each glycoform, specifically c75 to c85, and z27 to z37 sequence ions for top-down data, and c6 to c16 and z12 to z22 for the bottom-up data. These diagnostic ions are shown in bold and italics in SI Appendix, Tables S2, S3, S5, S7, S9, S11, S12, S14, S15, S17, S18, S20, S21, S23, S24, S26, and S27. The proportion of [100] is calculated using the abundances of c ions that contain the N104 residue, but no other potential glycosite. The ratio of this subset of c ions that contain a glycan over all fragments in this range gives the proportion of [100]. The proportion of [001] is determined in similar fashion using z ions that contain N115, but no other potential glycosite. The proportion of [010] cannot be determined directly, but can be inferred from calculating the fraction of ions that have a glycan at either N104 or N111 (using c82 to c85), and subtracting the contribution of [100], as determined above. The diglycosylated proteoforms, [110], [101], and [011], are analyzed in a similar fashion by using intensities of the diagnostic peaks that contain an unglycosylated NXS or NXT site. The proteoform ratios are multiplied by the glycan proportions from the mass spectra of the intact protein to yield the overall proportions shown in Table 1.
Kinetic Modeling.
The kinetic model presented in Fig. 1B is recapitulated in the differential equations below, with the exception that concentrations have been replaced by percentages (Rijk), as our MS data report only ratios of species. These equations can be solved numerically, using an evolutionary approach (genetic algorithm) to find optimal rate constants that satisfy the rate equations. This approach was computationally slow, but yielded an initial set of rate constants for which computer simulations of the protein expression versus time show that the proteoforms concentrations reach steady state, as shown in SI Appendix, Fig. S4. If we assume steady state to dominate production of the species measured, the differentials can be set to zero and the equations rearranged to a set of linear equations in which the 12 kinetic constants of Fig. 1B plus a common constant for export from the ER (kout) are variables. This approach for solving the 20 simultaneous equations proved to be much more rapid than the numerical approach, and was applied to the quadruplicate sets of data to yield the relative rate constants and SDs in Fig. 3. The kin is set to an arbitrary number (100), as only ratios of species are relevant. Percentages of unobserved, but allowed, species were set to the limit of detection (0.5) to avoid trivial solutions. The kin values in the mutant equations must be treated as additional variables, as the equations describe different processes, resulting in a total of 16 variables. The resulting set of equations is readily expressed in vector notation (b = k * A), where b contains kin rates, k all other rates, and A percentages of species. The vector equation is solved in a MATLAB script (SI Appendix) using the least-squares function (lsqr). The function returns the best-fit kinetic constants (k) and a residual that is minimized to determine a best set of kin for the mutants. Solutions were obtained for the four biological replicates, two of which included an average of two technical replicates. Averages and SDs were calculated for each of the variables, and the average residual was included in an estimated error for the total flow out of the system.
Molecular modeling.
Using UCSF Chimera (30), a peptide containing the amino acid sequence SGRETIYPNASLLIQNVTQNDT from CEACAM1 was generated. The residues for the first N-glycosylation consensus sequences were superimposed onto chain K of the cryo-EM structure of OST-B (PDB 67ST). Phi and psi torsion angles for the superimposed residues were adjusted to match those found in the reference structure. Additional residues were adjusted to remove clashes between the CEACAM1 peptide and OST-B. A Glc3Man9GlcNAc2 glycan model was generated using the Glycam webserver (31) and covalently bonded to the Asn residue of the second glycosylation site. The position of the glycan and the Asn chi1 torsion angle was adjusted to remove clashes between the glycan and OST-B. Additional models were generated using the same process but aligning the peptide to the catalytic site of OST-B via the third glycosylation site and attaching the glycan to the first and second glycosylation sites.
Equations.
hCEACAM1 W.
where Ri represents the percentage of the ith species relative to the sum of all eight wild-type species.
hCEACAM1 N104Q.
where Ri represents the percentage of the ith species relative to the sum of all four N104Q species.
hCEACAM1 N111Q.
where Ri represents the percentage of the ith species relative to the sum of all four N111Q species.
hCEACAM1 N115Q.
where Ri represents the percentage of the ith species relative to the sum of all four N115Q species.
Supplementary Material
Acknowledgments
This work was supported by NIH Grants R01 GM033225 (J.H.P., K.W.M., and I.J.A.) and R35 GM131829 (L.C.). Protein and peptide tandem MS measurements on the Bruker Solarix were supported by the NIH under grant S10 OD025118.
Footnotes
The authors declare no competing interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2202992119/-/DCSupplemental.
Data, Materials, and Software Availability
The mzML data have been deposited in Open Science Framework (https://osf.io/8pwm6) (32). All other study data are included in the article and/or SI Appendix.
References
- 1.Apweiler R., Hermjakob H., Sharon N., On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim. Biophys. Acta 1473, 4–8 (1999). [DOI] [PubMed] [Google Scholar]
- 2.Hart G. W., Housley M. P., Slawson C., Cycling of O-linked β-N-acetylglucosamine on nucleocytoplasmic proteins. Nature 446, 1017–1022 (2007). [DOI] [PubMed] [Google Scholar]
- 3.Stanley P., Taniguchi N., Aebi M., “N-Glycans” in Essentials of Glycobiology, Varki A., et al., Eds. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 2017), chap. 9. [Google Scholar]
- 4.Contessa J. N., Bhojani M. S., Freeze H. H., Rehemtulla A., Lawrence T. S., Inhibition of N-linked glycosylation disrupts receptor tyrosine kinase signaling in tumor cells. Cancer Res. 68, 3803–3809 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ruiz-Canada C., Kelleher D. J., Gilmore R., Cotranslational and posttranslational N-glycosylation of polypeptides by distinct mammalian OST isoforms. Cell 136, 272–283 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Mohorko E., et al. , Structural basis of substrate specificity of human oligosaccharyl transferase subunit N33/Tusc3 and its role in regulating protein N-glycosylation. Structure 22, 590–601 (2014). [DOI] [PubMed] [Google Scholar]
- 7.Cherepanova N. A., Venev S. V., Leszyk J. D., Shaffer S. A., Gilmore R., Quantitative glycoproteomics reveals new classes of STT3A- and STT3B-dependent N-glycosylation sites. J. Cell Biol. 218, 2782–2796 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gray-Owen S. D., Blumberg R. S., CEACAM1: Contact-dependent control of immunity. Nat. Rev. Immunol. 6, 433–446 (2006). [DOI] [PubMed] [Google Scholar]
- 9.Kuespert K., Pils S., Hauck C. R., CEACAMs: Their role in physiology and pathophysiology. Curr. Opin. Cell Biol. 18, 565–571 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Siuti N., Kelleher N. L., Decoding protein modifications using top-down mass spectrometry. Nat. Methods 4, 817–821 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Zubarev R. A., Electron-capture dissociation tandem mass spectrometry. Curr. Opin. Biotechnol. 15, 12–16 (2004). [DOI] [PubMed] [Google Scholar]
- 12.Zubarev R. A., et al. , Electron capture dissociation for structural characterization of multiply charged protein cations. Anal. Chem. 72, 563–573 (2000). [DOI] [PubMed] [Google Scholar]
- 13.Zubarev R. A., Kelleher N. L., McLafferty F. W., Electron capture dissociation of multiply charged protein cations. A nonergodic process. J. Am. Chem. Soc. 120, 3265–3266 (1998). [Google Scholar]
- 14.Reeves P. J., Callewaert N., Contreras R., Khorana H. G., Structure and function in rhodopsin: High-level expression of rhodopsin with restricted and homogeneous N-glycosylation by a tetracycline-inducible N-acetylglucosaminyltransferase I-negative HEK293S stable mammalian cell line. Proc. Natl. Acad. Sci. U.S.A. 99, 13419–13424 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Meng L., et al. , Enzymatic basis for N-glycan sialylation: Structure of rat α2,6-sialyltransferase (ST6GAL1) reveals conserved and unique features for glycan sialylation. J. Biol. Chem. 288, 34680–34698 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Rinis N., et al. , Editing N-glycan site occupancy with small-molecule oligosaccharyltransferase inhibitors. Cell Chem. Biol. 25, 1231–1241.e4 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ramírez A. S., Kowal J., Locher K. P., Cryo-electron microscopy structures of human oligosaccharyltransferase complexes OST-A and OST-B. Science 366, 1372–1375 (2019). [DOI] [PubMed] [Google Scholar]
- 18.Huang Y.-H., et al. , CEACAM1 regulates TIM-3-mediated tolerance and exhaustion. Nature 517, 386–390 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Fedarovich A., Tomberg J., Nicholas R. A., Davies C., Structure of the N-terminal domain of human CEACAM1: Binding target of the opacity proteins during invasion of Neisseria meningitidis and N. gonorrhoeae. Acta Crystallogr. D Biol. Crystallogr. 62, 971–979 (2006). [DOI] [PubMed] [Google Scholar]
- 20.Lizak C., Gerber S., Numao S., Aebi M., Locher K. P., X-ray structure of a bacterial oligosaccharyltransferase. Nature 474, 350–355 (2011). [DOI] [PubMed] [Google Scholar]
- 21.Napiórkowska M., Boilevin J., Darbre T., Reymond J.-L., Locher K. P., Structure of peptide. Sci. Rep. 8, 16297 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Napiórkowska M., et al. , Molecular basis of lipid-linked oligosaccharide recognition and processing by bacterial oligosaccharyltransferase. Nat. Struct. Mol. Biol. 24, 1100–1106 (2017). [DOI] [PubMed] [Google Scholar]
- 23.Aebi M., N-linked protein glycosylation in the ER. Biochimica Et Biophysica Acta- bacterial oligosaccharyltransferase PglB bound to a reactive LLO and an inhibitory Molecular. Cell Res. 1833, 2430–2437 (2013). [DOI] [PubMed] [Google Scholar]
- 24.Shrimal S., Gilmore R., Glycosylation of closely spaced acceptor sites in human glycoproteins. J. Cell Sci. 126, 5513–5523 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Malaby H. L. H., Kobertz W. R., The middle X residue influences cotranslational N-glycosylation consensus site skipping. Biochemistry 53, 4884–4893 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Qi L., Tsai B., Arvan P., New insights into the physiological role of endoplasmic reticulum-associated degradation. Trends Cell Biol. 27, 430–440 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhuo Y., Yang J. Y., Moremen K. W., Prestegard J. H., Glycosylation alters dimerization properties of a cell-surface signaling protein, carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1). J. Biol. Chem. 291, 20085–20095 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhao P., et al. , Virus-receptor interactions of glycosylated SARS-CoV-2 spike and human ACE2 receptor. Cell Host Microbe 28, 586–601.e6 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Moremen K. W., et al. , Expression system for structural and functional studies of human glycosylation enzymes. Nat. Chem. Biol. 14, 156–162 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Pettersen E. F., et al. , UCSF Chimera—A visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004). [DOI] [PubMed] [Google Scholar]
- 31.Woods Group, (2005-2022), GLYCAM Web. https://www.glycam.org. Accessed 15 December 2021.
- 32.Amster I. J. (2022), CEACAM Top Down MS Data. Open Science Framework. osf.io/8pwm6. Deposited 5 October 2022.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The mzML data have been deposited in Open Science Framework (https://osf.io/8pwm6) (32). All other study data are included in the article and/or SI Appendix.




