Significance
In our study, we describe an initial in vivo map of the O-glycoproteome and a practical approach to quantitatively map O-glycosites in complex samples such as tissue extracts and biological fluids. This represents an essential initial step toward revealing structure–function relationships of O-glycosylation in both normal physiology and in a disease context. We illustrate the value of this approach by mapping a set of proteins in mouse liver that contain GalNAc-T2-specific O-glycosites. The identification of these proteins provides a mechanistic framework to explain the clinical presentation of the congenital glycosylation disorder of GALNT2 (which encodes GalNAc-T2).
Keywords: O-GalNAc, site-specific, glycoproteome, mouse tissues, metabolic disorder
Abstract
The family of GalNAc-Ts (GalNAcpolypeptide:N-Acetylgalactosaminyl transferases) catalyzes the first committed step in the synthesis of O-glycans, which is an abundant and biologically important protein modification. Abnormalities in the activity of individual GalNAc-Ts can result in congenital disorders of O-glycosylation (CDG) and influence a broad array of biological functions. How site-specific O-glycans regulate biology is unclear. Compiling in vivo O-glycosites would be an invaluable step in determining the function of site-specific O-glycans. We integrated chemical and enzymatic conditions that cleave O-glycosites, a higher-energy dissociation product ions-triggered electron-transfer/higher-energy collision dissociation mass spectrometry (MS) workflow and software to study nine mouse tissues and whole blood. We identified 2,154 O-glycosites from 595 glycoproteins. The O-glycosites and glycoproteins displayed consensus motifs and shared functions as classified by Gene Ontology terms. Limited overlap of O-glycosites was observed with protein O-GlcNAcylation and phosphorylation sites. Quantitative glycoproteomics and proteomics revealed a tissue-specific regulation of O-glycosites that the differential expression of Galnt isoenzymes in tissues partly contributes to. We examined the Galnt2-null mouse model, which phenocopies congenital disorder of glycosylation involving GALNT2 and revealed a network of glycoproteins that lack GalNAc-T2-specific O-glycans. The known direct and indirect functions of these glycoproteins appear consistent with the complex metabolic phenotypes observed in the Galnt2-null animals. Through this study and interrogation of databases and the literature, we have compiled an atlas of experimentally identified mouse O-glycosites consisting of 2,925 O-glycosites from 758 glycoproteins.
O-glycosylation is an abundant protein posttranslational modification (PTM) that influences protein structure, stability, and function. A family of Golgi resident, uridine diphosphate N-acetylgalactosamine (UDP) GalNAc polypeptide:N-Acetylgalactosaminyl transferases (GalNAc-Ts) catalyzes the first committed step in the biosynthesis of O-glycans in which GalNAc is transferred to Ser/Thr (and to a lesser extent Tyr) residues of protein substrates (1). Despite technological advancements in studying O-glycosylated proteins, a number of features of O-glycosylation sites (O-glycosites) remain elusive. For example, O-glycosite motifs have been predicted through the use of algorithms (2, 3), but there is only limited corroboration by direct mapping of in vivo O-glycosites. Further, the functional characteristics [as cataloged by Gene Ontology (GO) terms] of O-glycosites have been experimentally determined in only a few instances (4, 5). To delineate the structure–function significance of O-GalNAc glycosylation broadly, it is essential but technically challenging to qualify and quantify the proteins carrying O-GalNAc glycosylation, assign the precise position of each of the O-glycosites, and determine the O-glycan structures found at each site. The difficulty lies in the exquisite substrate specificities of the individual GalNAc-Ts (6) and the myriad of O-glycan structures (7) that decorate proteins. The family of mouse Galnts and human GALNTS consists of 19 and 20 members, respectively, which have overlapping or unique substrate specificities (1, 6, 8). The repertoire of Galnt isoenzymes expressed in each cell/tissue and the physiological modulation of their activities regulates protein O-glycosylation.
Since not all threonines and serines are decorated with O-glycans, rules likely exist to specify which amino acid acquires sugar. Unlike protein N-linked glycosylation with a consensus motif NXS/T, X ≠ proline (9), O-glycosylation lacks a simple motif. However, specific patterns such as the presence of proline residues at the −1, +1 or +3 position relative to serine/threonine residues begin to offer clues about the “rules” that inform the acquisition of O-glycosites (10, 11). Available databases such as UniProt, an O-glycoprotein database (http://www.oglyp.org/), and GlyGen (https://www.glygen.org/) report only 43 O-glycosites from 20 mouse glycoproteins and 15 publications (12–14). Further manual curation of O-glycosites identified in the literature (15–25) added the number to 581 O-glycosites from 255 glycoproteins (Dataset S1). The incomplete knowledge about O-glycosites in vivo makes the study of the structure–function relation of O-glycoproteins in a complex biosystem a daunting task. Therefore, we sought to establish practical conditions to qualify (the identification of O-glycosites) and quantify (the determination of the relative abundance of the O-glycosites) the site-specific O-glycoproteome in complex tissue extracts or biological fluids. This represents an important step toward defining the functional significance of O-glycosylation in health and disease.
Robust identification of O-glycosites has been reported using genetically engineered cells termed SimpleCell (26). For adequate enrichment and subsequent mapping of O-glycopeptides, complex O-glycans were reduced to a single O-GalNAc-Ser/Thr (Tn antigen) by mutating COSMC (C1GALT1C1), the enzyme responsible for extending O-GalNAc to the T antigen, Gal-GalNAc-Ser/Thr (26). This approach is not practical for examining O-glycosites at the organismal level, as COSMC is an essential gene for mouse development (27). Recently, an elegant GalNAc-T isoenzyme-specific strategy termed GalNAc-T Bump-and-Hole engineering has been described (28–30). This approach exploits structurally modified GalNAc-T isoenzymes that are engineered to accommodate a specific UDP-GalNAc analog, allowing for the identification of isoform-specific glycoprotein substrates in living cells (29, 31). A complimentary approach to map O-glycosites exploits a toolbox of O-glycoproteases and mucinases, including OgpA (OpeRATOR, Genovis) (32), StcE (33–36), IMPa (37), and SmE (32). The specificity of each of these proteases allows for precise localization of O-glycosites on glycoproteins and mucins. Using OgpA, we pioneered a practical methodology to pinpoint solitary O-glycosites in tissue extracts and biological fluid (5, 38). The methodology, termed site-specific extraction of O-linked glycopeptides (EXoO), uses an O-glycoprotease, OgpA, that cleaves the N termini of O-glycosylated serines and threonines in an O-glycan-dependent way and presents the O-glycosites as the first amino acid in the identified peptide sequences. This enhances confidence in the assignment of O-glycosites (5) and was used to identify thousands of O-glycosites in human kidney tumors and serum (5).
In addition to effective methodology to isolate glycopeptides, MS instrumentation and software have proven to be very helpful in revealing O-glycosites in a sample. Following enrichment, isolated O-glycopeptides were analyzed using liquid chromatography-mass spectrometry (LC-MS) with different peptide fragmentation techniques. After optimization, higher-energy dissociation product ions-triggered electron-transfer/higher-energy collision dissociation (HCD-pd-EThcD) was employed to validate the O-glycosites generated by OgpA proteolysis (39). Once LC-MS data were collected, high-performance computational software was critical to deciphering the precise location of O-glycosites in peptide sequences. Recently published software, including MSFragger-Glyco (40), pGlyco3 (41), and O-Pair (42), employs the next-generation index-search algorithm to significantly improve search speed to tackle the highly complex HCD-pd-EThcD dataset of O-glycopeptides. The series and stepwise innovations in O-glycoprotease-based methods, HCD-pd-EThcD optimization, and software engineering establish a practical approach to mapping the in vivo O-glycosites.
Here, we employ combinations of chemical and enzymatic tools to map 2,154 O-glycosites from 595 glycoproteins derived from the mouse brain, heart, lung, liver, spleen, kidney, colon, muscle, submandibular gland (SMG), and whole blood. Galnt2-null mice phenocopy many aspects of the human congenital disorder of glycosylation involving GALNT2 (CDG-GALNT2). Through comparison of integrated quantitative glycoproteomics and proteomics from liver samples isolated from wild-type and Galnt2-null mice, we have identified hubs of Galnt2-modified glycoproteins which may account for lipid and metabolic dysregulation seen in CDG-GALNT2 patients. Finally, we examined databases, literature, and data from this study to compile an atlas of 2925 experimentally identified mouse O-glycosites from 758 glycoproteins to permit future investigation of biological functions of protein O-glycosylation.
Results
An Integrated Workflow for the Qualitative and Quantitative Determination of O-Glycosites.
The determination of the “O-glycoproteome”, i.e., Identifying the specific location of every O-glycan (O-glycosites) on the serine, threonine, and tyrosine residues of every protein in a cell, tissue, or organ, would provide a valuable baseline for defining any consequences to changes that may accompany mutations and/or physiologic changes to the expression of enzymes responsible for the acquisition of these carbohydrate side chains. To obtain an atlas of serine- and threonine-O-glycosites with high precision and confidence from mouse tissues, organs, and fluid, we integrated the EXoO method (5, 38), HCD-pd-EThcD mass spectrometry (39), and database software packages (40–42) to develop a qualitative and quantitative analysis workflow of O-glycosites (Fig. 1). The EXoO method starts with the preparation of a tryptic digest of the sample. Glycopeptides are enriched with HyperSep™ Retain AX cartridges and subsequent conjugation of peptide N termini to a solid phase. Finally, we exploited the specificity of OgpA (OpeRATOR, Genovis), which digests peptides carrying O-glycans yielding glycopeptides with the O-glycosites at the first N-terminal amino acid position of glycopeptide sequences (38). The EXoO procedure also has the additional benefit of avoiding contamination of our samples with peptides derived from abundant blood proteins such as albumin, which lacks O-glycosylation. Next, glycopeptides were fragmented by an optimized HCD-pd-EThcD LC-MS method that produces diagnostic oxonium ions, which facilitate the identification of the glycopeptide and confirm the location of the O-glycan tags within the glycopeptide sequences. The resulting LC-MS data were analyzed using complimentary software, including MSFragger-Glyco, pGlyco3, and O-Pair (40–42). This integrated workflow permits qualitative and quantitative analysis of thousands of O-glycosites.
Fig. 1.
Qualitative and quantitative mapping of O-glycosites in tissue extracts and blood by integrating EXoO, HCD-pd-EThcD, and software. The proteins extracted from samples were processed using the EXoO protocol to generate glycopeptides that were LC-MS fragmented with HCD-pd-EThcD and identified using MSFragger-Glyco, pGlyco3, and O-Pair.
During data analysis, we considered whether oxonium ions could be contributed by nearby N-glycans, thereby affecting the identification of O-glycosites. We found that 10 out of 33347 (0.03%) assigned peptide-spectrum match (PSM) from Dataset S2 contain peptide sequences with NXS/T motif where EThcD mapped N-glycans at the N-linked glycosylation sites (SI Appendix, Table S1). However, data show that EThcD mapped the O-glycosites at the first Ser or Thr residues of these peptide sequences (SI Appendix, Table S1). Therefore, the N-glycan did not appear to affect the O-glycosite mapping in the study. In addition, we integrated quantitative proteomics into the glycoproteomics workflow to determine whether a change in detected abundance of an O-glycosite was due to either a change in protein abundance or a change of O-glycosite abundance without alteration of protein abundance.
Precision Mapping of 2,154 O-glycosites from Mouse Tissue Extracts and Blood.
Mouse tissues and blood from wild-type were analyzed to catalog O-glycosites employing quantitative label-free and TMTpro 16plex-labeled glycoproteomics. We used label-free glycoproteomics to reach a better depth of O-glycosite identification. Brain, heart, lung, liver, spleen, kidney, colon, muscle, SMG, and blood from 6 mice (three males and three females, 1 y old) were pooled for each tissue. The O-glycopeptides were extracted, fractionated by off-line basic reversed-phased fractionation into 24 fractions, and analyzed by LC-MS with HCD-pd-EThcD. We used MSFragger-Glyco, pGlyco3, and O-Pair to search the 240 LC-MS data files against a UniProt mouse protein database. We analyzed the data using MSFragger-Glyco on a high-performance computing environment, BioWulf (HPC) at the NIH, and pGlyco3 on a PC. Because we were unable to complete a whole database search using O-Pair on the HPC in a reasonable length of time, we searched against a smaller protein database of glycoproteins identified by MSFragger-Glyco and pGlyco3. Comparison of the software search results revealed a similar number of glycopeptide identifications with, 4,139, 4,242, and 4,330, for MSFragger-Glyco, pGlyco3, and O-Pair, respectively (Fig. 2A). A significant number of glycopeptide sequences overlapped (Fig. 2A). At the same time, each software program also identified unique glycopeptide sequences (Fig. 2A). O-Pair identified more than double the number of glycan compositions as the other two software packages (Fig. 2D). We merged the database search results at the PSM level for those glycopeptide sequences that matched in all three software packages employed. A total of 38,314 PSMs were assigned to the identical glycopeptide sequences by all three software programs. EThcD confirmed approximately 87.0% of them to have the O-glycosites at the first amino acid position of glycopeptide sequences (Fig. 2E). The PSMs assigned to the identical glycopeptide sequences by all three software programs along with EThcD confirmation of the O-glycosites were used as high confidence data of O-glycosites for downstream analysis (Dataset S2). In total, we mapped 2,154 O-glycosites, 2,834 glycopeptide sequences, 38 different glycan compositions (Panel after Fig. 2), and 4,020 glycopeptides from 595 glycoproteins (Fig. 2). The different number between O-glycosites and glycopeptide sequences might be the result of partial glycosylation resulting in assigning multiple glycopeptide sequences to a single glycosite. For instance, TLGSLLPDTVLSSPLSHR, TLGSLLPDTVLS, TLGSLLPDTVL, and TLGSLLPDTVLSSPL mapped to a glycosite at Vwf T743. The EThcD results from O-Pair and pGlyco3 had a significant agreement (99.5% of PSMs) on the location and number of O-glycosites on glycopeptide sequences (Fig. 2E). Most of the glycopeptides (91.5% of PSMs) had one O-glycosite (Fig. 2E). H1N1 (H = Hexose and N = HexNAc) or most likely Gal-GalNAc was the primary structure accounting for 78.4% of the PSMs (Fig. 2F). Five glycan compositions, including H1N1, H2N2, N1, H3N3, and H1N2, accounted for more than 97% of the total number of PSMs (Fig. 2F). The glycan compositions are an aggregate glycan mass on the glycopeptides. For instance, the glycan composition H2N2 can be either two H1N1 glycans or a single H2N2 glycan attached to the glycopeptide. Because of the specificity of OgpA, which is reported to be unable to digest core-2 glycan structures (39, 43, 44), we assumed that H2N2 would likely represent two H1N1 on the same glycopeptide, with an H1N1 at the N-terminal Ser/Thr residue and the second H1N1 in the middle of the glycopeptide. Therefore, we manually annotated a glycopeptide T(H2N2)IGVANVEAQPFEHS with an EThcD glycosite localization score of 0.86 by pGlyco3 and 1 by O-Pair suggesting high confidence of glycosite localization (SI Appendix, Fig. S1). The ratio of oxonium ions 138 m/z and 144 m/z is 13.29, which is >2, suggesting the presence of GlcNAc in this glycan structure (SI Appendix, Fig. S1). A second glycopeptide T(H2N2)HGPQFLLP was annotated that there could be only one O-glycosite at the N-terminal Thr residue (SI Appendix, Fig. S1). We were unable to dissect the exact glycan structure with the current information for the glycopeptides. However, the observation of a single H2N2 on glycopeptide suggests that the specificity of OgpA is not entirely clear, and future development of new approaches will be needed for glycan structure analysis after O-glycoproteases and mucinases digestion. The analysis established the integration of EXoO, HCD-pd-EThcD, and software packages for precise mapping of O-glycosites with multilevel analysis to ensure high confidence in O-glycosite localization.
Fig. 2.
Precise mapping of O-glycosites from mouse tissue extracts and blood using MSFragger-Glyco, pGlyco3, and O-Pair. Euler diagrams of glycopeptide sequences without glycan information (A), glycopeptides with glycan information (B), glycoproteins (C), and glycan compositions (D) identified by MSFragger-Glyco, pGlyco3, and O-Pair software packages, respectively. (E) The identification and glycosite mapping after merging data from all software packages using MS/MS scan numbers. EThcD confirmed that the O-glycosites being at the first amino acid position of the glycopeptide sequences were used for downstream analysis. Software packages/programs agreed on the location and number of O-glycosites on the glycopeptides. (F) Bar chart showing the primary glycan compositions of identified glycopeptides.
Consensus Motif Analysis of Amino Acids Flanking O-Glycosites.
To reveal the consensus motif of flanking amino acids, 2,150 O-glycosites were analyzed. Four O-glycosites were removed because the sites close to protein N or C termini were not included by the motif analysis software. It was observed that Thr and Ser accounted for approximately 70.1% and 29.9% of O-glycosites, respectively (Fig. 3A). Pro residues were the most prevalent amino acid residues flanking a single O-glycosite, scanning positions +7 (C-terminal from the site of the O-glycan) through −7 (N-terminal from the site of the O-glycan) (Fig. 3A). Additionally, Pro residues at +3 and −1 were present in approximately 30.7% and 22.2% of O-glycosites, respectively (Fig. 3A). Pro residues at positions other than +3 and −1 were present in at least 11.0% of the O-glycosites (Fig. 3A). Other than Pro residues, Ser, Thr, Glu, Ala residues were also found in abundance (Fig. 3A). The consensus motif for O-glycosites at Ser and Thr were similar (Fig. 3A). We then compared the experimentally identified O-glycosites to O-glycosites predicted by NetOGlyc4.0 (4). We observed that 1,739 (approximately 80.7%) of the experimentally identified O-glycosites were predicted by NetOGlyc4.0 (Fig. 3B). We detected an additional 415 O-glycosites experimentally (representing 19.3% of the total sites found) (Fig. 3B). NetOGlyc4.0 predicted 22,608 O-glycosites from 68,802 Ser and Thr residues on the 595 glycoproteins identified experimentally.
Fig. 3.
Characterization of O-glycosites and glycoproteins. (A) Consensus motif analysis of O-glycosites. (B) Venn diagram showing the overlap of O-glycosites between experimentally identified and netOGlyc4.0 predicted on 595 identified glycoproteins. (C) Presence of amino acid residues next to the O-glycosites. (D) GO terms analysis of glycoproteins.
Enzyme specificity limits our ability to find certain flanking sequences of O-glycosites. IMPa, an enzyme with similar O-glycoprotease activity to OgpA, can not cleave O-glycosites with an Asp residue at −1 position (37). In addition, crystal structure analysis of OgpA shows the involvement of amino acid residues next to the O-glycosites in the OgpA-ligand interaction (43). We analyzed the amino acid residues next to the O-glycosites to determine whether any amino acid might block OgpA cleavage. All amino acid residues appeared to be found next to the O-glycosites identified with OgpA cleavage (Fig. 3C). It was observed that, besides Pro residues, hydrophobic Ala, polar Ser and Thr residues adjacent to the O-glycosites, and acidic Glu residues at +1 position were favorable (Fig. 3C). By contrast, Cys residues at −1 position were the least popular, possibly due to the function of Cys residues in disulfide bond formation (Fig. 3C). The consensus motif for other seven O-glycoproteases lacked the prevalent Pro residues and was markedly different from the consensus motif of all O-glycosites we identified (36) (Fig. 3A). The unique specificity of different O-glycoproteases may introduce a bias toward a subgroup of O-glycosites.
GO Analysis of Glycoproteins.
We next conducted a GO analysis of the 595 glycoproteins (Fig. 3D). As expected, we found extracellular matrix constituents involved in the binding of collagen, glycosaminoglycan, growth factors, and cell adhesion molecules (Fig. 3D). Peptidase and receptor regulator activities (proteins that bind to and/or modulate the activity of a receptor) were also enriched within the glycoproteins identified (Fig. 3D). Biological process analysis reveals a broad spectrum of predicted functions that include extracellular structure organization, coagulation, regulation of synapse structure and organization, receptor-mediated endocytosis, cell-substrate adhesion, response to wounding, angiogenesis, and axon development (Fig. 3D).
Limited Overlap between O-GalNAc and O-GlcNAc O-Glycosites or Phosphorylation Sites.
It is thought that protein PTMs such as O-GlcNAcylation and phosphorylation may have limited competition with O-GalNAc glycosylation on serine and threonine residues because of the subcellular compartmentation of Galnts and OGT (O-GlcNAc transferase). However, there is limited experimental evidence on the modified sites to support the hypothesis. Therefore, glycosites between O-GalNAc (2,154 sites) and experimentally identified mouse O-GlcNAc database [3,793 sites, (45, 46)] were compared to reveal the potential competition of the PTMs. Only 12 O-glycosites from nine glycoproteins were overlapping (SI Appendix, Fig. S2A). The 12 O-glycosites were reconfirmed to be O-GalNAc glycosylated in our data by checking the ratio of oxonium ions 138 m/z and 144 m/z that could be used to distinguish HexNAc residues as GalNAc or GlcNAc reported by O-Pair (42). Tissue annotation showed that most overlapping O-glycosites were within the brain (SI Appendix, Fig. S2C).
We next investigated the potential competition between O-GalNAc (2,154 sites) and phosphorylation sites (100,795 sites) from PhosphoSitePlus (https://www.phosphosite.org/homeAction.action). Only 34 sites from 26 proteins overlapped between the PTMs (SI Appendix, Fig. S2B). Of note, O-GalNAc, O-GlcNAc, and phosphorylation were found on S408 at the C terminus of nucleobindin-2 (Nucb2). Nucb2 may localize to multiple subcellular locations, including the nuclear envelope, cytoplasm, Golgi apparatus, membrane, and ER, which may partly explain the potential modification of S408 by the three PTMs. Thus, O-GalNAc glycosylation has limited overlap with O-GlcNAcylation, which may be due to different subcellular localizations of GalNAc-T family members (47) and OGT (48). Kinases are found in both the cytoplasm and secretory apparatus, and this might partly explain their limited competition for the same sites (49).
Tissue-Specific Regulation of O-Glycosylation at O-Glycosites.
A fundamental question centers around how glycoproteins and their O-glycosites are regulated in tissues. To begin to answer this question, we assessed the abundance of glycoproteins and their O-glycosites by integrating TMTpro 16plex-based quantitative proteomics and glycoproteomics to study eight mouse tissues, including the heart, brain, kidney, spleen, lung, muscle, colon, and liver (Fig. 4A). Proteomics quantified 4,863 proteins (Dataset S3). Unsupervised, qualitative, hierarchical clustering of proteins showed that each tissue had a distinct profile, with the brain being the most unique (Fig. 4B). The heart and muscle clustered together, the spleen, lung, and colon clustered together, kidney and liver were closely clustered (Fig. 4B). Correlation analysis of tissues by examining the protein abundance showed an average correlation score of approximately 0.46 (Fig. 4C). We then analyzed glycoproteomics data that quantified 489 O-glycosites (Dataset S4). Among the 489 O-glycosites, 311 O-glycosites from 95 glycoproteins had their glycoprotein abundance also quantified in the proteomics analysis (Dataset S4). The 95 glycoproteins appeared to have a similar abundance in tissues because analysis of them generated an average correlation score of approximately 0.84, a relatively strong correlation among tissues (Fig. 4D). Despite the strong correlation score of the 95 glycoproteins across tissues, analysis of their O-glycosites resulted in a dramatically lower correlation score of about 0.29 across tissues, suggesting that O-glycosylation was regulated independently of glycoprotein abundance (Fig. 4E). To directly illustrate the relationship, correlation of abundance between glycoproteins and O-glycosites in each tissue resulted in a weak correlation with an average score of approximately 0.18 (Fig. 4F). Dot plot and regression analysis confirmed the weak correlation (Fig. 4G). The analysis of the abundance of glycoproteins and O-glycosites suggested glycosylation regulation at individual O-glycosites. Next, we examined the tissue-specific profile of O-glycosites using unsupervised hierarchical clustering (Fig. 4H), which had a similar clustering pattern to that seen for proteomics (Fig. 4B). However, the driver of the clustering might be due to the differential abundance of O-glycosites rather than glycoprotein abundance.
Fig. 4.
Tissue-specific regulation of O-glycosylation at O-glycosites revealed by integrating quantitative glycoproteomics and proteomics. (A) Integration of quantitative glycoproteomics and proteomics using TMTpro 16plex labeling to study mouse tissues. (B) Unsupervised hierarchical clustering analysis of proteins and tissues. Reporter ion S/N in TMT proteomics refers to the ratio of the intensity of the reporter ion peak (signal) to the background noise in the spectrum (noise). (C) Table showing the tissue correlation by analyzing the abundance of proteins. (D) Table showing the tissue correlation by analyzing the abundance of 95 glycoproteins. (E) Table showing the tissue correlation by analyzing the abundance of O-glycosites. (F) Table showing the correlation of abundance between glycoproteins and O-glycosites in tissues. (G) Dot plot and regression analysis of the abundance of glycoproteins and O-glycosites. (H) Unsupervised hierarchical clustering analysis of O-glycosites and tissues.
The Relative Protein Abundance of GalNAc-Ts in Tissues Might Partly Explain the Tissue-Specific Profile of O-Glycosites.
Given that all tissues contained approximately the same levels of these 95 glycoproteins, resulting in a strong correlation across tissues, we hypothesized that the differential expression of GalNAc-Ts might be a vital factor for the tissue-specific profile of O-glycosites. Indeed, the relative protein abundance of GalNAc-Ts obtained from https://www.proteomicsdb.org/ showed a differential expression in tissues (Fig. 5A). Correlation analysis of tissues using the relative protein abundance of GalNAc-Ts appeared to show a negative or no correlation of the brain to kidney, spleen, lung, colon, and liver while a median correlation to heart and muscle (Fig. 5B). Tissues with similar relative protein abundance of GalNAc-Ts have a higher correlation score. The brain appeared to have a distinct set of GalNAc-Ts distinguishing it from other tissues. Noticeably, GalNAc-T13 is exclusively expressed in the brain, neuron cells, oligodendrocytes, and spinal cord but not in other tissues (50). GalNAc-T13 might be a critical factor contributing to the distinctive brain profile of O-glycosites. Using our quantitative proteomics data, we found a similar abundance of C1galt1 in tissues, excluding its contribution to the tissue-specific profile of O-glycosites (Fig. 5C). We also experimentally validated that GalNAc-T2 has a higher abundance in the spleen, colon, kidney, and lung using our quantitative proteomics data (Fig. 5C). We did not detect other GalNAc-Ts in our quantitative proteomics.
Fig. 5.
GalNAcTs display different expression profiles across tissues. (A) Unsupervised hierarchical clustering of GalNAcTs in tissues. (B) Table showing the tissue correlation analysis of the relative abundance of GalNAcTs reported from https://www.proteomicsdb.org/. The correlation analysis considered the relative protein abundance of different GalNAcTs. A high correlation score suggests that the two tissues have similar relative protein abundance of GalNAcTs. (C) Bar chart showing protein abundance of C1galt1 and GalNAc-T2 in our quantitative proteomics.
Quantitative O-Glycoproteomics and Proteomics Analysis Defined Isoform-Specific O-Glycosites.
Isoform-specific O-glycosites can be identified by comparing profiles derived from wild-type and null mice. Patients with GALNT2-CDG have aberrant lipid, energy, and metabolic phenotype that may be caused by O-glycosylation deficiency on a number of GalNAc-T2 substrates in the liver (16, 51, 52). To rapidly assess which substrates were not glycosylated, we took advantage of a Galnt2 null mouse that phenocopies the human condition and performed quantitative glycoproteomics (Dataset S5) and proteomics analysis (Dataset S6) of the livers from wild-type and Galnt2 null mice (Fig. 6) (51).
Fig. 6.
Quantitative analysis of glycoproteomics and proteomics identified O-glycosites and proteins with a significant change in abundance. (A) Schematic diagram identifying a significant change in the abundance of O-glycosites and proteins by integrating quantitative glycoproteomics and proteomics. (B) The volcano plot shows a significant change in the O-glycosite PSM number between Galnt2−/− and wild-type samples. (C) Volcano plot of proteins compared between Galnt2−/− and wild-type samples.
We identified 1,026 O-glycosites from 355 glycoproteins, where 4,927 and 4,958 PSMs were assigned in Galnt2−/− and wild-type samples, respectively (Dataset S5). The PSM number in Galnt2−/− and wild-type samples was close, suggesting that a data normalization was unnecessary. A fold change of at least two and a PSM difference of five between samples was used to define the significant change in the abundance of O-glycosites (Fig. 6B).
In the Galnt2−/− sample, 82 O-glycosites from 62 distinct glycoproteins showed a significant reduction suggesting that they were potential GalNAc-T2 substrates (Fig. 6B). The decreased abundance of O-glycosites might result from either the loss of GalNAc-T2 activity and/or a reduced abundance of glycoproteins in the null background. None of the protein abundance associated with these 82 glycosites was found to be reduced in abundance in the samples derived from the null animal. Among the 82 O-glycosites, 47 O-glycosites from 32 glycoproteins did not change in protein abundance between samples from the null and wild-type animals. We were unable to accurately determine the abundance of 28 glycoproteins containing 33 O-glycosites in the proteomics analysis. The remaining two O-glycosites in glycoproteins were found in Ig gamma-2B chain C (Igh-3, site at T104) and proteoglycan 4 (Prg4, site at T472), which each had an increased protein abundance in the sample from the null animals.
Among the 82 O-glycosites, four O-glycosites were also identified in a previous study (15) that included phospholipid transfer protein (Pltp) S483, alpha-2-HS-glycoprotein (Ahsg, also called Fetua) T267, gelsolin (Gsn) S48, and Igh-3 T104. Consistency with the previous findings further supports the validity of our approach to mapping GalNAc-T2-specific O-glycosites.
We observed that 76 O-glycosites increased abundance in samples derived from null animals. Of these 76 O-glycosites, 48 were found in 27 glycoproteins which showed no change in protein abundance. Increased abundance of only three sites (T256, T635, and one of the sites at T378, T457, or T593) from Prg4 could be explained by an increase of Prg4 glycoprotein abundance in Galnt2−/− sample. We were unable to quantify the abundance of the 20 glycoproteins that accounted for the remaining 25 O-glycosites. We expected that the loss of GalNAc-T2 activity would lead to a glycosylation shift, i.e., the reduction of one glycosite leads to increasing glycosylation at a nearby site. The phenomenon of glycosylation shift is seen in cell lines (44). In our data (Dataset S5), 13 glycoproteins appear to show a shift in the glycosylation site upon loss of GalNAc-T2 activity (SI Appendix, Table S2).
Insights into GALNT2-CDG Phenotypes Using STRING Network Analysis.
We used STRING network analysis to investigate the networks associated with the proteins that displayed abundance changes in their O-glycosites or protein levels. STRING is based on published data related to direct protein–protein interactions and/or indirect protein functional associations to predict dysregulated biological networks. Broadly stated, this analysis revealed that the glycoproteins with changes in abundance in O-glycosites are involved in plasma lipoprotein particle remodeling, regulation of catalytic activity, and system development (Fig. 7A). Proteins that changed in abundance play a variety of roles, including lipid and fatty acid metabolism, oxidoreductase activity, and general metabolic processes (Fig. 7B). When glycoproteins with changes in abundance of O-glycosites were analyzed in combination with proteins that changed in abundance, nine putative GalNAc-T2 substrates such as Ahsg, Pltp, apolipoprotein B-100 (Apob), apolipoprotein E (Apoe), nidogen-1 (Nid1), hepatocyte growth factor activator (Hgfac), inter-alpha-trypsin inhibitor heavy chain H2 (Itih2), alpha-1-antitrypsin 1-2 (Serpina1b) and solute carrier organic anion transporter family member 1A1 (Slco1a1) appear to be hubs, with at least five protein-protein interactions/associations (Fig. 7C). O-glycans on the hub glycoproteins may have a greater potential to participate in the metabolic networks. Taken together, the Galnt2 null phenotype would be predicted to yield a complex metabolic phenotype (Fig. 7C).
Fig. 7.
STRING analysis of changed O-glycosites and proteins. (A) STRING network of changed O-glycosites. (B) STRING network of changed proteins. (C) STRING network of changed O-glycosites and proteins.
Experimentally Identified GalNAc-T2 Substrates Were Accurately Predicted by ISOGlyP.
We sought to explore whether experimentally identified GalNAc-T2 substrates were consistent with the ISOGlyP prediction algorithm (2) (Fig. 8A). This algorithm has the advantage over others because it allows the prediction of isoform-specific glycosylation. The 355 glycoproteins identified in mouse livers were submitted to ISOGlyP, which generated scores for each GalNAc-T. An ISOGlyP score of at least one suggests that the O-glycosites are the isoenzyme substrates (2). After keeping the ISOGlyP score of at least one for GalNAc-T2, 4,753 GalNAc-T2 O-glycosites were predicted from the 355 glycoproteins. Consensus motif analysis of the 82 experimentally identified vs. 4,753 ISOGlyP predicted GalNAc-T2 O-glycosites revealed a similar pattern of favorable amino acids surrounding the O-glycosites (Fig. 8B). We found that 79% (65 out of the 82) GalNAc-T2 O-glycosites identified experimentally were scored to be GalNAc-T2 O-glycosite substrates by ISOGlyP (Fig. 8C). However, experimentally identified sites showed preferences for charged amino acids vicinal to the glycosite (Fig. 8B). These preferences were not seen in the ISOGlyP algorithm and may be a function of the in vitro screening platform used to generate the data that powers this in silico analysis. Thus, we were able to identify additional glycosites that would not have been predicted in silico.
Fig. 8.
Screening GalNAc-T2 O-glycosite substrates for future investigation. (A) Schematic workflow to analyze the 82 experimentally identified GalNAc-T2 O-glycosite substrates. (B) Consensus motif analysis of experimentally determined vs. ISOGlyP predicted GalNAc-T2 O-glycosite substrates. (C) Venn diagram of GalNAc-T2 O-glycosite substrates among experimentally determined, ISOGlyP predicted, and exclusive GalNAc-T2 O-glycosite substrates. (D) Table showing filter criteria of ISOGlyP scores to determine O-glycosite substrates exclusive to GalNAc-T2 in the liver. (E) Table listing O-glycosite substrates exclusive to GalNAc-T2. Yellow color highlights either ISOGlyP score >1 or the presence of the GalNAcTs in the liver. ND: not detected. (F) Bar chart showing Ahsg T267 PSM number and its ranking in different tissues. (G) Abundance correlation of glycoprotein Ahsg and its T267 in tissues. (H) Experimental workflow to validate Ahsg T267 as a GalNAc-T2-specific substrate. (I) MS2 spectrum of identification of Ahsg T267 glycosylated by GalNAc-T2 and GalNAc(13C6) followed by IMPa cleavage.
ISOGlyP Might Assist in Prioritizing O-Glycosites for a Functional Study.
Because ISOGlyP also scored for other GalNAc-Ts, it might also be possible to predict O-glycosites exclusively modified by GalNAc-T2 but not by other GalNAc-Ts in a given tissue. For example, the liver has been determined to express GalNAc-T1-7, 12, and 18 (Fig. 5A). Considering the isoenzymes expressed in the liver and their ISOGlyP scores, GalNAc-T1, 3-5, and 12 were filtered and we identified 13 putative GalNAc-T2-specific substrates in the liver (Fig. 8 C–E). Ahsg T267 had a high ISOGlyP GalNAc-T2 score and a high PSM number in tissues suggesting that Ahsg T267 is a GalNAc-T2-specific and a highly abundant O-glycosite in tissues (Fig. 8 E and F). The protein expression level of Ahsg was not changed between null and wild-type livers, indicating that the change was at the O-glycosite level (Fig. 8E). Experimentally, Ahsg is a highly abundant protein in tissues and blood, and O-glycosylated T267 is strongly correlated to its protein expression level (Fig. 8G). The significant reduction of Ahsg T267 in Galnt2 null animals might influence the function of Ahsg within the disease model. Although the biological role of Ahsg T267 is unclear, we speculate that T267 may be involved in the metabolic disorder caused by or associated with Ahsg (53, 54). Mathews et al. reported that Ahsg null mice are sensitive to insulin and resistant to weight gain (55), a similar phenotype observed in Galnt2 null mice (16). In a follow-up study, Ahsg null mice showed protection against obesity and insulin resistance associated with aging (56). In humans, AHSG is associated with insulin resistance and fat accumulation in the liver (57) and type 2 diabetes (58). Structural studies show an interaction between Ahsg and the insulin receptor (Insr) that may explain how Ahsg interrupts insulin signaling to influence energy homeostasis (59, 60). Future studies will be needed to elucidate how an O-glycan at Ahsg T267 may affect protein function. To validate Ahsg T267 as a GalNAc-T2 O-glycosite substrate, recombinant Ahsg was sequentially trypsin digested, de-O-glycosylated, glycosylated with GalNAc-T2 and UDP-GalNAc(13C6) and digested with IMPa (Fig. 8H). IMPa has good activity toward O-glycosites modified by a single GalNAc. After the final IMPa digestion, GalNAc(13C6) labeled glycopeptide with T267 at the first amino acid position was detected, suggesting that GalNAc-T2 glycosylates Ahsg T267 (Fig. 8I).
Discussion
The study establishes a practical approach to quantitatively examine solitary O-glycosites of glycoproteins in complex tissue, organ extracts, and biological fluids. We exploited the approach to compile an atlas of mouse O-glycosites that provides a deep view of site-specific O-glycosylation in a mammal. The catalog of O-glycosites can be used by the community as a reference to determine tissue- and site-specific changes of O-glycosylation and a fundamental resource for defining the global effects of O-glycosylation within tissues. In a proof-of-concept study, we used this approach to examine the Galnt2 null mouse, which is a disease model that displays lipid dysregulation and altered energy homeostasis, to identify GalNAc-T2 specific substrates in vivo to have a better understanding of how site-specific O-glycosylation regulates the complexities of metabolic homeostasis. The general approach of comparing the catalog of O-glycosites from a wild-type and null mouse identifies the isoform-specific substrates of the GalNAc-T ablated. In instances where an animal model phenocopies a human condition, inference can be made about the O-glycosites and O-glycoproteins relevant to human diseases.
Our approach integrated multiple components, including the EXoO sample preparation, EThcD LC-MS instrumentation, database searching software packages, and label-free and TMTpro 16plex labeled quantitative glycoproteomics and proteomics strategies. There are opportunities for future technical improvements. Other O-glycoproteases may replace OgpA in the EXoO protocol to access different subsets of O-glycosites, including those found in clusters. O-glycoproteases with specificity toward different O-glycan structures may permit additional site-specific determination of O-glycan structures. It was encouraging to note that software packages developed by different research groups showed agreement on the identification of peptide sequences. However, each software package identified a unique subset of glycopeptides. Future software development will hopefully yield a single software program that will provide accurate identifications of both glycopeptide sequences and O-glycan structures. It may be possible to optimize the database search packages to further increase the number and confidence of O-glycosite identification. The integration of quantitative glycoproteomics and proteomics is essential to distinguish between changes in protein abundance or a change of O-glycosite abundance without a change in protein abundance. Indeed, the integration of glycoproteomics and proteomics revealed a tissue-specific regulation of O-glycosites that might be attributed to the different abundance of GalNAc-Ts.
We used two different O-glycosite prediction software platforms, NetOGlyc-4.0 and ISOGlyP. Our results corroborated some of the predictions of these software packages. However, both NetOGlyc-4.0 and ISOGlyP predicted a significantly larger number of O-glycosites than we identified experimentally. Thus, our approach may fail to identify all O-glycosites. However, a subset of our experimentally identified O-glycosites (19.3%, Fig. 3B, and 20.7%, Fig. 8C) were not predicted by NetOGlyc-4.0 or ISOGlyP, respectively. The discrepancy between experimentally identified versus software-predicted O-glycosites may reflect differences between cellular glycosylation (occurring on native proteins in the context of the complement of glycosyltransferases expressed within a particular cell) and glycosylation of synthetic (model) peptides by isolated glycosyltransferases.
Our approach primarily identified solitary O-glycosites with short glycans instead of clustered O-glycosites or complex glycans. The limitation may be because of the sample preparation procedure and the specificity of OgpA (32). The sample preparation procedure has C18 desalting steps that may exclude long glycopeptides with clustered O-glycosites, resulting from resistance to trypsin digestion and potentially causing them to be retained on the C18 column during peptide elution. Recent characterization of O-glycoproteases and mucinases, such as StcE (33–35), IMPa (37), and SmE (32) on purified mucins and mucin-domain-containing glycoproteins, may facilitate the enrichment and identification of clustered O-glycosites and complex O-glycans from complex samples. We observed GlcNAc containing H2N2 glycan on the N-terminal Thr residue of glycopeptides that did not seem to reconcile with previous reports finding an inability of OgpA to digest the core-2 glycan structure (39, 43). However, the previous studies that used a mixture of four glycoproteins (32) or a single glycopeptide (36) may not have assessed sufficient combinations of glycans and peptide sequences. Additional investigation will be needed to elucidate the glycan structure specificity of O-glycoproteases and mucinase using complex samples and glycan structure-orientated approaches.
Among the glycoproteins with putative GalNAc-T2 O-glycosites, nine glycoproteins are hubs in the protein–protein interaction/association networks suggesting that their O-glycosites may have greater biological significance. We observed a potential glycosylation shift upon loss of GalNAc-T2 activity which is likely due to the loss of GalNAc-T2-specific glycans affecting access to additional sites by other members of the GalNAcT family. Future studies will focus on the analysis of GalNAc-T2-specific O-glycosites to understand whether the loss of these sites influences metabolism. We observed that experimentally identified GalNAc-T2 sites preferred charged amino acids vicinal to the glycosites. Still, the cause of the observation was not clear because the spectra might be better when there were charged amino acids nearby versus the actual preference for glycosites near charged residues. The practical approach for quantitative mapping of O-glycosites in the complex samples is an essential gateway to deciphering all the GalNAc-Ts substrates and pathways that contribute to the complex organismal phenotypes observed in vivo.
Materials and Methods
Isolation of Mouse Tissues.
Galnt2-deficient mice were described previously (16). Three wild-type male and three wild-type female mice (1-y-old) were used to analyze pooled tissues and blood. Experimental procedures were reviewed and approved by the Animal Care and Use Committee of the NIH (ASP #17-833). Additional experiment information is described in SI Appendix.
Expression and Purification of Recombinant OgpA, Sialidase, IMPa, and GalNAc-T2.
Recombinant OgpA (B2UR60), sialidase (B2UPI5), and IMPa (Q9I5W4) DNA were cloned into pET28a and transformed to Escherichia coli BL21(DE3) followed by protein expression and purification according to methods detailed in SI Appendix.
GalNAc-T2 was kindly provided by Nadine Samara and used in a previous study (16).
Isolation of O-Glycopeptides and Peptides for TMTpro™ 16plex Labeling.
The method has been detailed previously (5, 38) with minor changes. Experimental protocols are included in SI Appendix.
Off-Line High-pH Reversed-Phase Fractionation of Glycopeptides and Peptides.
Glycopeptides and peptides (100 μg) were fractionated into 24 fractions. Fractionation of glycopeptides and peptides was performed as described in SI Appendix.
LC-MS/MS Analysis with HCD-pd-EThcD.
Glycopeptides and peptides were analyzed on an orbitrap Fusion Lumos mass spectrometer with HCD-pd-EThcD (Thermo Fisher Scientific) according to methods detailed in SI Appendix.
Identification of O-Glycopeptides.
Software packages, including MSFragger-Glyco (version 17.0), pGlyco3 (release date 2021-06-15), and O-Pair (version 0.0.320), were used to identify glycopeptides. The software packages were chosen because of their fast search speed for complex samples (61). Experimental details of the use of the software packages are described in SI Appendix.
Label-Free Quantitation of O-Glycopeptides Using Spectral Counting.
Glycopeptide sequences were transformed into 15 amino acid sequences with the O-glycosites in the middle and seven amino acid residues flanking the O-glycosites. The same glycosites were counted in each sample. The fold change of glycosites between samples was calculated. Glycosites only presented in one sample were indicated using the maximal fold change observed in the data. At least a twofold difference and at least five PSM counts between samples were used to determine significant change.
Quantitation of TMTpro 16plex Labeled O-Glycopeptides and Peptides.
Proteome Discoverer 2.5 was used to quantify TMTpro 16plex labeled glycopeptides and peptides according to methods described in SI Appendix.
In Vitro Deglycosylation and Glycosylation of Proteins.
Mouse Ahsg glycoprotein was purchased from R&D Systems. The protein was digested by trypsin and treated with GalNAc-T2 and UDP-GalNAc(13C6) (Chemily Glycoscience, GA), followed by IMPa. The experimental protocol is included in SI Appendix.
Bioinformatics and Data Visualization.
Software including NetOGlyc-4.0 (62), Isoform Specific O-Glycosylation Prediction (ISOGlyP, https://isoglyp.utep.edu/) (2), ImageGP (63), InteractiVenn (64), Euler diagrams (https://eulerr.co/), STRING (65), and WebLogo (66) were used in the study. Additional information regarding the software is described in SI Appendix. Mouse O-GlcNAc glycosylation sites were obtained from Junfeng Ma (https://oglcnac.org/atlas/) (45). Phosphorylation sites were obtained from PhosphoSitePlus v6.6.0.4 (https://www.phosphosite.org/homeAction.action).
Supplementary Material
Appendix 01 (PDF)
Dataset S01 (XLSX)
Dataset S02 (XLSX)
Dataset S03 (XLSX)
Dataset S04 (XLSX)
Dataset S05 (XLSX)
Dataset S06 (XLSX)
Acknowledgments
We thank our colleagues for many helpful discussions. This research was supported by the Intramural Research Program of the NIDCR, NIH (Z01-DE-000713 to K.G.T.H. and 1-ZIA-DE000739-05 to L.A.T.). We acknowledge The National Institute of Dental and Craniofacial Research (NIDCR) Mass Spectrometry Facility supported by the Division of Intramural Research, NIDCR/NIH (ZIA DE000751).
Author contributions
W.Y. and L.A.T. designed research; W.Y., A.C., P.M., K.D., and A.L. performed research; W.Y., E.T., and K.G.T.H. contributed new reagents/analytic tools; W.Y. analyzed data; and W.Y., A.C., P.M., K.G.T.H., and L.A.T. wrote the paper.
Competing interests
The authors declare no competing interest.
Footnotes
This article is a PNAS Direct Submission.
Data, Materials, and Software Availability
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (67) partner repository with the dataset identifier PXD040196. All other data are included in the manuscript and/or supporting information.
Supporting Information
References
- 1.Bennett E. P., et al. , Control of mucin-type O-glycosylation: A classification of the polypeptide GalNAc-transferase gene family. Glycobiology 22, 736–756 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Mohl J. E., Gerken T. A., Leung M. Y., ISOGlyP: De novo prediction of isoform-specific mucin-type O-glycosylation. Glycobiology 31, 168–172 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Hansen J. E., et al. , NetOglyc: Prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility. Glycoconj. J. 15, 115–130 (1998). [DOI] [PubMed] [Google Scholar]
- 4.Steentoft C., et al. , Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. EMBO J. 32, 1478–1488 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Yang W., Ao M., Hu Y., Li Q. K., Zhang H., Mapping the O-glycoproteome using site-specific extraction of O-linked glycopeptides (EXoO). Mol. Syst. Biol. 14, e8486 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hintze J., et al. , Probing the contribution of individual polypeptide GalNAc-transferase isoforms to the O-glycoproteome by inducible expression in isogenic cell lines. J. Biol. Chem. 293, 19064–19077 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kudelka M. R., et al. , Cellular O-glycome reporter/amplification to explore O-glycans of living cells. Nat. Methods 13, 81–86 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gerken T. A., et al. , Emerging paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family of glycosyltransferases. J. Biol. Chem. 286, 14493–14507 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Schwarz F., Aebi M., Mechanisms and principles of N-linked protein glycosylation. Curr. Opin. Struct. Biol. 21, 576–582 (2011). [DOI] [PubMed] [Google Scholar]
- 10.O’Connell B., Tabak L. A., Ramasubbu N., The influence of flanking sequences on O-glycosylation. Biochem. Biophys. Res. Commun. 180, 1024–1030 (1991). [DOI] [PubMed] [Google Scholar]
- 11.O’Connell B. C., Hagen F. K., Tabak L. A., The influence of flanking sequence on the O-glycosylation of threonine in vitro. J. Biol. Chem. 267, 25010–25018 (1992). [PubMed] [Google Scholar]
- 12.UniProt Consortium, UniProt: The universal protein knowledgebase. Nucleic Acids Res. 46, 2699 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Huang J., et al. , OGP: A repository of experimentally characterized O-glycoproteins to facilitate studies on O-glycosylation. Genomics Proteomics Bioinf. 19, 611–618 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.York W. S., et al. , GlyGen: Computational and informatics resources for glycoscience. Glycobiology 30, 72–73 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Khetarpal S. A., et al. , Loss of function of GALNT2 lowers high-density lipoproteins in humans, nonhuman primates, and rodents. Cell Metab. 24, 234–245 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Verzijl C. R. C., et al. , A novel role for GalNAc-T2 dependent glycosylation in energy homeostasis. Mol. Metab. 60, 101472 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Miwa H. E., Gerken T. A., Jamison O., Tabak L. A., Isoform-specific O-glycosylation of osteopontin and bone sialoprotein by polypeptide N-acetylgalactosaminyltransferase-1. J. Biol. Chem. 285, 1208–1219 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Christensen B., et al. , Cell type-specific post-translational modifications of mouse osteopontin are associated with different adhesive properties. J. Biol. Chem. 282, 19463–19472 (2007). [DOI] [PubMed] [Google Scholar]
- 19.Harrison R., et al. , Glycoproteomic characterization of recombinant mouse alpha-dystroglycan. Glycobiology 22, 662–675 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gomez Toledo A., et al. , O-Mannose and O-N-acetyl galactosamine glycosylation of mammalian alpha-dystroglycan is conserved in a region-specific manner. Glycobiology 22, 1413–1423 (2012). [DOI] [PubMed] [Google Scholar]
- 21.Medzihradszky K. F., Kaasik K., Chalkley R. J., Tissue-specific glycosylation at the glycopeptide level. Mol. Cell Proteomics 14, 2103–2110 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Trinidad J. C., Schoepfer R., Burlingame A. L., Medzihradszky K. F., N- and O-glycosylation in the murine synaptosome. Mol. Cell Proteomics 12, 3474–3488 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Jing B., et al. , Glycosylation of dentin matrix protein 1 is a novel key element for astrocyte maturation and BBB integrity. Protein Cell 9, 298–309 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Sun Y., et al. , Glycosylation of dentin matrix protein 1 is critical for osteogenesis. Sci. Rep. 5, 17518 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tian E., et al. , Galnt11 regulates kidney function by glycosylating the endocytosis receptor megalin to modulate ligand binding. Proc. Natl. Acad. Sci. U.S.A. 116, 25196–25202 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Steentoft C., et al. , Mining the O-glycoproteome using zinc-finger nuclease-glycoengineered SimpleCell lines. Nat. Methods 8, 977–982 (2011). [DOI] [PubMed] [Google Scholar]
- 27.Wang Y., et al. , Cosmc is an essential chaperone for correct protein O-glycosylation. Proc. Natl. Acad. Sci. U.S.A. 107, 9228–9233 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Calle B., et al. , Bump-and-hole engineering of human polypeptide N-acetylgalactosamine transferases to dissect their protein substrates and glycosylation sites in cells. STAR Protoc. 4, 101974 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Schumann B., et al. , Bump-and-hole engineering identifies specific substrates of glycosyltransferases in living cells. Mol. Cell 78, 824–834.e15 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Schjoldager K. T., Clausen H., Hurtado-Guerrero R., A bump-and-hole approach to dissect regulation of protein O-glycosylation. Mol. Cell 78, 803–805 (2020). [DOI] [PubMed] [Google Scholar]
- 31.Choi J., et al. , Engineering orthogonal polypeptide GalNAc-transferase and UDP-sugar pairs. J. Am. Chem. Soc. 141, 13442–13453 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Chongsaritsinsuk J., et al. , Glycoproteomic landscape and structural dynamics of TIM family immune checkpoints enabled by mucinase SmE. bioRxiv [Preprint] (2023). 10.1101/2023.02.01.526488 (Accessed 6 April 2023). [DOI] [PMC free article] [PubMed]
- 33.Nason R., et al. , Display of the human mucinome with defined O-glycans by gene engineered cells. Nat. Commun. 12, 4070 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Malaker S. A., et al. , Revealing the human mucinome. Nat. Commun. 13, 3542 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Malaker S. A., et al. , The mucin-selective protease StcE enables molecular and functional analysis of human cancer-associated mucins. Proc. Natl. Acad. Sci. U.S.A. 116, 7278–7287 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Shon D. J., et al. , An enzymatic toolkit for selective proteolysis, detection, and visualization of mucin-domain glycoproteins. Proc. Natl. Acad. Sci. U.S.A. 117, 21299–21307 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Vainauskas S., et al. , A broad-specificity O-glycoprotease that enables improved analysis of glycoproteins and glycopeptides containing intact complex O-glycans. Anal. Chem. 94, 1060–1069 (2022). [DOI] [PubMed] [Google Scholar]
- 38.Yang W., Song A., Ao M., Xu Y., Zhang H., Large-scale site-specific mapping of the O-GalNAc glycoproteome. Nat. Protocols 15, 2589–2610 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Riley N. M., Malaker S. A., Bertozzi C. R., Electron-based dissociation is needed for O-glycopeptides derived from opeRATOR proteolysis. Anal. Chem. 92, 14878–14884 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Polasky D. A., Yu F., Teo G. C., Nesvizhskii A. I., Fast and comprehensive N- and O-glycoproteomics analysis with MSFragger-glyco. Nat. Methods 17, 1125–1132 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Zeng W. F., Cao W. Q., Liu M. Q., He S. M., Yang P. Y., Precise, fast and comprehensive analysis of intact glycopeptides and modified glycans with pGlyco3. Nat. Methods 18, 1515–1523 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lu L., Riley N. M., Shortreed M. R., Bertozzi C. R., Smith L. M., O-Pair search with metamorpheus for O-glycopeptide characterization. Nat. Methods 17, 1133–1138 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Trastoy B., Naegeli A., Anso I., Sjogren J., Guerin M. E., Structural basis of mammalian mucin processing by the human gut O-glycopeptidase OgpA from Akkermansia muciniphila. Nat. Commun. 11, 4844 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Schjoldager K. T., et al. , Deconstruction of O-glycosylation–GalNAc-T isoforms direct distinct subsets of the O-glycoproteome. EMBO Rep. 16, 1713–1722 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Ma J., Li Y., Hou C., Wu C., O-GlcNAcAtlas: A database of experimentally identified O-GlcNAc sites and proteins. Glycobiology 31, 719–723 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Wulff-Fuentes E., et al. , The human O-GlcNAcome database and meta-analysis. Sci. Data 8, 25 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Becker J. L., Tran D. T., Tabak L. A., Members of the GalNAc-T family of enzymes utilize distinct Golgi localization mechanisms. Glycobiology 28, 841–848 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Seo H. G., et al. , Identification of the nuclear localisation signal of O-GlcNAc transferase and its nuclear import regulation. Sci. Rep. 6, 34614 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Zhang H., et al. , A subcellular map of the human kinome. Elife 10, e64943 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Giansanti P., et al. , Mass spectrometry-based draft of the mouse proteome. Nat. Methods 19, 803–811 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Zilmer M., et al. , Novel congenital disorder of O-linked glycosylation caused by GALNT2 loss of function. Brain 143, 1114–1126 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Teslovich T. M., et al. , Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Icer M. A., Yildiran H., Effects of fetuin-A with diverse functions and multiple mechanisms on human health. Clin. Biochem. 88, 1–10 (2021). [DOI] [PubMed] [Google Scholar]
- 54.Bourebaba L., Marycz K., Pathophysiological implication of fetuin-A glycoprotein in the development of metabolic disorders: A concise review. J. Clin. Med. 8, 2033 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Mathews S. T., et al. , Improved insulin sensitivity and resistance to weight gain in mice null for the Ahsg gene. Diabetes 51, 2450–2458 (2002). [DOI] [PubMed] [Google Scholar]
- 56.Mathews S. T., et al. , Fetuin-null mice are protected against obesity and insulin resistance associated with aging. Biochem. Biophys. Res. Commun. 350, 437–443 (2006). [DOI] [PubMed] [Google Scholar]
- 57.Stefan N., et al. , Alpha2-Heremans-Schmid glycoprotein/fetuin-A is associated with insulin resistance and fat accumulation in the liver in humans. Diabetes Care 29, 853–857 (2006). [DOI] [PubMed] [Google Scholar]
- 58.Stefan N., et al. , Plasma fetuin-A levels and the risk of type 2 diabetes. Diabetes 57, 2762–2767 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Goustin A. S., Derar N., Abou-Samra A. B., Ahsg-fetuin blocks the metabolic arm of insulin action through its interaction with the 95-kD beta-subunit of the insulin receptor. Cell. Signal. 25, 981–988 (2013). [DOI] [PubMed] [Google Scholar]
- 60.Goustin A. S., Abou-Samra A. B., The "thrifty" gene encoding Ahsg/Fetuin-A meets the insulin receptor: Insights into the mechanism of insulin resistance. Cell. Signal. 23, 980–990 (2011). [DOI] [PubMed] [Google Scholar]
- 61.Rangel-Angarita V., Mahoney K. E., Ince D., Malaker S. A., A systematic comparison of current bioinformatic tools for glycoproteomics data. bioRxiv [Preprint] (2022). 10.1101/2022.03.15.484528. [DOI]
- 62.Julenius K., Molgaard A., Gupta R., Brunak S., Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology 15, 153–164 (2005). [DOI] [PubMed] [Google Scholar]
- 63.Chen T., Liu Y.-X., Huang L., ImageGP: An easy-to-use data visualization web server for scientific researchers. iMeta 1, e5 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Heberle H., Meirelles G. V., da Silva F. R., Telles G. P., Minghim R., InteractiVenn: A web-based tool for the analysis of sets through Venn diagrams. BMC Bioinformatics 16, 169 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Franceschini A., et al. , STRING v9.1: Protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, D808–D815 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Crooks G. E., Hon G., Chandonia J. M., Brenner S. E., WebLogo: A sequence logo generator. Genome Res. 14, 1188–1190 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Perez-Riverol Y., et al. , The PRIDE database resources in 2022: A hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 50, D543–D552 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix 01 (PDF)
Dataset S01 (XLSX)
Dataset S02 (XLSX)
Dataset S03 (XLSX)
Dataset S04 (XLSX)
Dataset S05 (XLSX)
Dataset S06 (XLSX)
Data Availability Statement
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (67) partner repository with the dataset identifier PXD040196. All other data are included in the manuscript and/or supporting information.








