Abstract
Mucin domains are densely O-glycosylated modular protein domains found in various extracellular and transmembrane proteins. Mucin-domain glycoproteins play important roles in many human diseases, such as cancer and cystic fibrosis, but the scope of the mucinome remains poorly defined. Recently, we characterized a bacterial O-glycoprotease, StcE, and demonstrated that an inactive point mutant retains binding selectivity for mucin-domain glycoproteins. In this work, we leverage inactive StcE to selectively enrich and identify mucin-domain glycoproteins from complex samples like cell lysate and crude ovarian cancer patient ascites fluid. Our enrichment strategy is further aided by an algorithm to assign confidence to mucin-domain glycoprotein identifications. This mucinomics platform facilitates detection of hundreds of glycopeptides from mucin domains and highly overlapping populations of mucin-domain glycoproteins from ovarian cancer patients. Ultimately, we demonstrate our mucinomics approach can reveal key molecular signatures of cancer from in vitro and ex vivo sources.
Subject terms: Glycobiology, Mass spectrometry, Ovarian cancer, Glycosylation
Mucin-domain glycoproteins are densely O-glycosylated proteins with unique secondary structure that imparts a large influence on cellular environments. Here, the authors develop a technique to selectively enrich and characterize mucin-domain glycoproteins from cell lysate and patient biofluids.
Introduction
Mucin domains are modular protein domains that adopt rigid and extended bottle-brush-like structures due to a high density of O-glycosylated serine and threonine residues1–3. Mucin-type O-glycans are characterized by an initiating α-N-acetylgalactosamine (α-GalNAc) monosaccharide that can be further elaborated into several core structures through complex regulation of glycosyltransferases4,5. As a result, mucin domains serve as highly heterogenous swaths of glycosylation that exert both biophysical and biochemical influence. For instance, this includes the ability to redistribute receptor molecules at the glycocalyx and to drive high avidity binding interactions6–8. In the canonical mucin (MUC) family, mucin domains often occur as tandem repeats, creating heavily glycosylated superstructures. Canonical mucins are central to many functions in health and disease, and have long been associated with human cancers, e.g., MUC1 and MUC16 (also known as CA-125)9–12. Dysregulation of mucin domain expression and aberrant mucin domain glycosylation patterns have been implicated in disease pathologies, especially in tumor progression, where mucins modulate immune responses and also promote proliferation through biomechanical mechanisms13–15.
Mucin domains also exist in proteins outside of the 21 canonical mucins (Fig. 1A). For example, CD43 on the surface of leukemia cells selectively interacts with the glyco-immune checkpoint receptor Siglec-7 through its N-terminal mucin domain16; mucin domain-containing splice variants of CD44 (CD44v) serve as cancer cell markers relative to the ubiquitously expressed standard isoform17; CD45 mucin domains act as suppressors of T-cell activation18; mucin domain O-glycosylation on PSGL-1 is required for leukocyte-endothelial interactions19; and aberrant regulation of mucin domains in podocalyxin and SynCAM1 are implicated in a variety of cancers20,21. In all of these cases, shared functional attributes of mucin domains impart structural and biophysical properties relevant to their biology. Thus, instead of the more traditional categorization of the glycoproteome into N- and O-glycoproteins (both of which are represented by mucin-domain glycoproteins), it is logical to parse the glycoproteome into the mucinome, a family of glycoproteins whose mucin domains make them functionally related. However, even as the tools to capture the broadly defined N- and O-glycoproteome continue to improve22–31, mucin domains remain enigmatic and difficult to characterize. As such, a comprehensive list of all proteins with a mucin domain does not exist. This lack of a well-defined mucinome leaves a critical blind spot in our ability to interrogate mucin domain functions across molecular biology.
Toward this goal, enzymes derived from microorganisms known to colonize mucosal environments have shown promise for developing tools specifically suited to characterize mucin-domain glycoproteins32–38. We recently characterized a panel of such enzymes, termed O-glycoproteases, and showed that each of them harbor a selectivity toward mucins as well as unique peptide- and glycan-based cleavage motifs39. Using catalytic point mutants, we also demonstrated that select O-glycoproteases can retain binding specificity for mucin domains; these were then used as mucin-selective staining reagents for Western blots, immunohistochemistry, and flow cytometry39. One particular enzyme of interest is secreted protease of C1 esterase inhibitor (StcE) from enterohemorrhagic Escherichia coli, which recognizes mucin domains decorated with a variety of O-glycan modifications40–43. This gives StcE both the selectivity needed to specifically bind mucin domains and the breadth to bind diverse mucin domain subtypes that vary in glycosylation patterns. Indeed, StcE has shown great utility for selective release of mucin fragments from biological samples and for improving mass spectrometry (MS)-based analysis of mucin domains40.
We reasoned that the catalytically inactive point mutant of StcE (StcEE447D) could function as a universal mucin enrichment tool for mucin domain discovery, similar to how inactive O-glycosidases and engineered sialidases can enrich broadly for O-glycosylated and sialylated glycoproteins, respectively44,45. Here we show that StcEE447D-conjugated beads selectively enrich mucin-domain glycoproteins from complex cancer cell lysates and from crude ovarian cancer patient ascites fluid. As part of this workflow, we developed a mucin-domain candidacy algorithm to assign confidence scores to proteins that have a high likelihood of containing a mucin domain. Additionally, we detected hundreds of glycopeptides derived from mucin domains in the StcEE447D-enriched samples. Ultimately, we demonstrate that this mucinomics platform can define key molecular signatures of cancer in both in vitro and ex vivo systems and is a valuable approach to unravel the role of mucin domains in health and disease.
Results
Mucin enrichment and definition strategy to describe the mucinome
Our previous work indicated that a catalytically inactive point mutant of StcE (StcEE447D) retains its binding specificity for mucin domains while leaving them intact for subsequent analysis39,40. Through a straightforward reductive amination approach, we conjugated StcEE447D to POROS-AL beads to generate a solid phase support material to use for enrichments46. To optimize our enrichment protocol, we added StcEE447D-conjugated beads to OVCAR3 supernatant followed by an anti-MUC16 Western blot for detection. We tuned several parameters of the enrichment, including binding time, bead-to-substrate ratio, wash buffers, and elution conditions (see Methods); a simplified protocol is detailed in Fig. 1B. With a suitable enrichment protocol defined, we scaled up the reaction for mass spectrometry by enriching 500 µg of HeLa cell lysate with 100 µL of pre-washed StcEE447D-conjugated beads. Bound proteins were eluted by boiling in protein loading buffer, elutions were separated by one-dimensional gel electrophoresis, and in-gel digestions were performed prior to label-free quantitative shotgun proteomics (i.e, GeLC-MS/MS, see Supplementary Fig. 1).
To calculate the degree of enrichment provided by StcEE447D-conjugated beads, 30 µg of unenriched cell lysate was simultaneously prepared and analyzed alongside each elution. Significantly enriched proteins were determined by comparing area-under-the-curve-based label free quantitation (LFQ) values for proteins in the elution relative to lysate, with processing and calculations performed using MaxQuant and Perseus47,48. The volcano plot in Fig. 2A shows several known and canonical mucins enriched in the elution (right; green), as opposed to untreated lysate (left). In particular, MUC1, MUC13, and MUC16 were significantly enriched, as well as known mucin-domain glycoproteins CD55 (decay-accelerating factor, DAF) and syndecan-1 (SDC1).
While these initial results were exciting, it quickly became clear that hand-curating proteins with known mucin domains would be untenable for the mucinome discovery platform. Not only is hand-curation low throughput, but it inherently misses proteins without known mucin domains. Instead, we developed a mucin-domain candidacy algorithm to calculate which proteins have a high probability of bearing a mucin domain. Previous work has mined sequences looking for PTS domains in various non-human organisms49,50, but we wanted to extend our criteria to use protein-level data that includes predicted O-glycosites, subcellular localization information, and previously annotated PTM-sites to annotate putative mucin domains in the human proteome. As summarized graphically in Fig. 2B, our algorithm comprised several steps to assign a Mucin Score to every protein in the human proteome. Mucin-domain candidacy algorithm processing was preceded by O-GalNAc glycosite prediction using the NetOGlyc4.0 tool, a support vector machine-based predictor developed using a map of ~3,000 O-glycosites from 600 O-glycoproteins that was generated through SimpleCell technology51. Predictions from NetOGlyc4.0 were then screened for known phosphosites annotated in Uniprot52 and PhosphoSitePlus53, and any overlap in phosphosites with predicted O-GalNAc sites resulted in removal of the predicted O-GalNAc site from consideration. This was a necessary step because NetOGlyc4.0 often predicted O-GalNAc sites in known phosphodomains of intracellular proteins, resulting in a high number of false positive mucin candidates after downstream processing. Note that O-GalNAc and phosphorylation sites are not known to have a high degree of overlap, as the former is generally extracellular whereas the latter is often intracellular.
Following O-GalNAc site prediction and phosphosite filtering, the algorithm asked four questions of each protein: (1) Was the protein predicted to be extracellular, secreted, and/or transmembrane?; (2) Were there at least 9 predicted O-glycosylation sites within a stretch of 50 residues?; (3) Was the distance between any given pair of O-glycosites less than 12% of the entire mucin domain (i.e., are glycosites <6 residues away from each other in a 50 residue sequence)?; and (4) Was the ratio of threonine to serine residues skewed toward threonine? Each of these benchmarks were determined through expert curation of known mucin sequences, which are further described in Methods. Using a point system based on the answers to these questions, the algorithm ultimately assigned a Mucin Score to each protein in the human proteome. By manually assessing outputs, we determined that a score of >2 was a high confidence mucin-domain glycoprotein, between 2 and 1.5 was a medium confidence mucin-domain glycoprotein, and between 1.5 and 1.2 was a low confidence mucin-domain glycoprotein. Proteins with a score lower than 1.2 were not considered mucin-domain glycoproteins. Levels of confidence also capture the idea that a mucin domain may not be a binary concept; there may be gradients of O-glycosylation density and patterns that contribute to mucin-like attributes. See Supplementary Data 1 for the mucin candidate algorithm output of the entire human proteome and Supplementary Data 2 for the location of where putative mucin domains and predicted O-glycosites occur; 357 proteins contain a putative mucin domain by our estimate (score > 1.2), encompassing 20 of the 21 canonical mucins (MUC-15 was excluded), and comprising roughly 2% of the proteome (Fig. 2B). For comparison, proteases represent up to 2% of the human proteome; thus, mucin-domain glycoproteins could be much more common than previously thought54.
Using Mucin Scores to reannotate the dataset from Fig. 2A, we labeled high, medium, and low confidence mucin-domain glycoproteins as red, orange, and yellow, respectively (Fig. 2C). The canonical and known mucins from Fig. 2A are still labeled in green. A large number of high confidence mucin-domain glycoproteins are enriched in the StcEE447D elution, some of which are labeled in red with gene names associated with specific proteins. Interestingly, some high confidence and a handful of medium to low confidence mucin-domain glycoproteins are on the left side of the volcano plot, i.e., not enriched in the StcEE447D elution. This could indicate (1) that StcEE447D does not effectively enrich some mucin domains, (2) that mucin domains in these proteins are not heavily glycosylated in HeLa cells, or (3) that the mucin-domain candidacy algorithm has some degree of error. Inherently, the mucin-domain candidacy algorithm is an imperfect predictor of all mucin domains across the proteome. Indeed, no high efficacy mucin domain prediction algorithm exists, nor was that the focus of this work. Instead, our mucin-domain candidacy algorithm indicates degrees of confidence for assigning proteins with a putative mucin domain that can be used to assess mucinome enrichment with StcEE447D-conjugated beads.
Inactive O-glycoproteases enrich mucin-domain glycoproteins from various cancer cell lines
Given that the HeLa lysate enrichment was successful, we decided to expand the approach to other cancer-associated cell lines, including SKBR3 (breast), OVCAR3 (ovarian), K562 (leukemia), and Capan2 (colorectal). The corresponding volcano plots are shown in Fig. 3A–D (see Supplementary Data 3 for Perseus processing files). As before, red dots signified a score of >2 (high confidence), orange dots 2–1.5 (medium confidence), and yellow dots 1.5–1.2 (low confidence). Strongly enriched mucin-domain glycoproteins were labeled with their gene names associated with specific proteins.
The Upset plot in Fig. 4A compares commonly observed mucin-domain glycoproteins across the cell lines. The total number of enriched mucin-domain glycoproteins from each cell line is shown on the bottom left (blue horizontal bars). If a group of mucin-domain glycoproteins was only seen in one cell line, only one gray dot is darkened; the number of proteins that are only seen in that cell line are shown in bar graph form above. For instance, 9 mucin-domain glycoproteins were only detected in the K562 cell line. Overlap between samples are shown by multiple darkened gray dots and a line connecting them; as an example, 2 mucin-domain glycoproteins were only detected in both the SKBR3 and OVCAR3 cell lines. A total of seven mucin-domain glycoproteins were seen in all five cell lines; these proteins are shown above the Upset plot. The putative mucin domain (orange, as calculated by the mucin-domain candidacy algorithm), transmembrane domains (purple), and annotated N-glycan sites (green) are noted on each of the proteins.
To better understand how many of the proteins contained previously undescribed mucin domains, we compared our dataset to the SimpleCell dataset from Clausen and colleagues51, which is one of the most comprehensive study on O-glycosites to date (albeit with truncated O-glycan species). To consider a mucin-domain glycoprotein in this comparison, more than 1 glycopeptide had to be detected from within the assigned mucin domain. Additionally, if the protein was a canonical (e.g. MUC15) or confirmed (e.g. Gp1bα) mucin-domain glycoprotein, these were considered as previously described/known proteins. Several of the proteins (4/7) found in all five cell lines were previously known to have a mucin domain, including: Mucin-1 (MUC1), dystroglycan (DAG1), agrin (AGN), and complement decay factor (CD55, DAF). However, we discovered that three of the overlapping proteins have previously undescribed mucin domains: low-density lipoprotein receptor 8 (LRP8), major facilitator superfamily domain 6 (MSFD6), and porimin (PORIM). MSFD6 is a multi-pass transmembrane protein that is implicated in antigen processing and presentation of exogenous peptide antigens via MHC class II, whereas porimin is involved in oncotic cell death characterized by vacuolization and increased membrane permeability.
Extending this analysis to all of the enriched mucin-domain glycoproteins, we found that approximately one-quarter (~31%, 15 of 58) were newly discovered (Fig. 4B, Supplementary Data 4). Of these proteins, perhaps the most surprising was adhesion G protein-coupled receptor L1 (ADGRL1), as GPCRs are generally not thought of as mucin-domain glycoproteins. This particular GPCR is implicated in both cell adhesion and signal transduction; future studies will be devoted to understanding the role of mucin domains in GPCR signaling. To broadly characterize features and functions of proteins present in our mucinome list, we performed GO term enrichment using DAVID55,56. Perhaps unsurprisingly, the most enriched cellular component (CC) GO terms were associated with membranes, cell surfaces, extracellular space, among others (Fig. 4C).
In another extension of our mucinomics workflow, we performed an enrichment using a different O-glycoprotease. While StcE does not demonstrate drastic glycan specificity, we have characterized several other O-glycoproteases with varying glyco-proteolytic specificities39. BT4244 is a O-glycoprotease of particular interest from Bacteroides thetaiotaomicron that cleaves N-terminally to serine and threonine residues bearing truncated O-glycans, such as the cancer-associated T- and Tn-antigens (Gal-GalNAc and GalNAc, respectively). We reasoned that a point mutant of BT4244 (BT4244E575A) could also enrich mucin-domain glycoproteins bearing shortened O-glycan structures. Thus, we conjugated BT4244E575A to beads and performed an analogous enrichment using HeLa lysate with and without sialidase pretreatment, with results shown in Supplementary Fig. 2. Without sialidase treatment, only six mucin-domain glycoproteins were significantly enriched in the elution, suggesting that not many mucin-domain glycoproteins bear truncated O-glycans in HeLa cells. We then pre-treated HeLa lysate with 100 nM sialidase overnight and repeated this procedure, which resulted in the enrichment of 13 mucin-domain glycoproteins. Though not as robust as StcE enrichment, this proof-of-principle procedure demonstrates that other O-glycoproteases could be used to enrich and identify cancer-associated glycoforms of mucin-domain glycoproteins.
We next asked how selective our mucin-domain-centric platform is when compared to lectin (i.e,. glycan-centric) enrichments commonly used for O-glycoproteomics. Jacalin has preference for mucin-type O-glycans including GalNAc and GalNAc-Gal; thus, we conjugated Jacalin to POROS-AL beads and performed enrichments on HeLa cell lysate with and without pretreatment with sialidase. The resulting volcano plots are shown in Supplementary Fig. 3. To be sure, Jacalin does enrich most of the mucin-domain glycoproteins, but as demonstrated by the large number of enriched non-mucin proteins, it is clear that Jacalin is less specific for mucin-domain glycoproteins. This point is further illustrated in Supplementary Fig. 4. The Jacalin (+/− sialidase) pulldown resulted in the enrichment of 205 and 273 proteins, respectively. The percentage of mucin-domain glycoproteins within this subset is only 16–17%, meaning that 171 and 230 non-mucin proteins were enriched in the two samples. Using the same HeLa lysate, StcEE447D-conjugated beads enriched a total of 75 proteins, 28% of which were mucin-domain glycoproteins. Thus, StcEE447D is approximately two-fold more selective for mucin-domain glycoproteins. Further, we detected only 54 non-mucin proteins in this enrichment, compared to the 230 in the Jacalin pulldown, representing a > 4-fold reduction in non-mucin proteins. While Jacalin did enrich more mucin-domain glycoproteins, selectivity is especially important when considering potential goals of characterizing mucin-domain O-glycopeptides; non-mucin proteins, and their associated unmodified peptides, will outcompete the glycopeptides for ionization and detection.
We then investigated the non-mucins that were enriched by the StcEE447D cell line enrichments to understand if there was an unexpected selectivity for features other than mucin domains or if it was likely due to non-specific binding. We calculated how many of the non-mucins were commonly found between cell lines, as demonstrated by the Upset Plot in Supplementary Fig. 5. Here, the majority of enriched proteins were found in only one cell line, suggesting that these proteins were primarily non-specifically binding to the beads. On the other hand, 5 proteins were found in all cell lines, and 7 were found in at least 4 cell lines (Supplementary Data 5; Master_NonMucin tab). Of these 12 proteins, 6 are potential mucin-domain glycoproteins with Mucin Scores that did not meet our initial thresholds but have several predicted O-glycosylation sites. The other proteins are likely to be (a) abundantly expressed and non-specifically binding (e.g. myosin) and/or (b) previously undescribed glycan or mucin-binding proteins. Taking this one step further, we performed cellular component GO term enrichments for all of the non-mucins. The highest protein counts were “extracellular exosome” (87) and “integral component of membrane” (80); “perinuclear region of cytoplasm” was far less abundant at a protein count of 15 (Supplementary Data 5).
Additionally, we explored which assigned mucin-domain glycoproteins were repeatedly not enriched by our technique. As with the enriched non-mucins, we generated an Upset Plot to determine which of our assigned mucin-domain glycoproteins were not enriched reproducibly (Supplementary Fig. 6). Here, five proteins were consistently not enriched across all five cell lines and five across at least four cell lines. The majority of these proteins were intracellular cytoplasmic proteins that were likely overscored as mucin-domain glycoproteins due to their presumed phosphorylation/O-GlcNAc sites that were predicted by NetOGlyc4.0 as O-GalNAc sites (Supplementary Data 6). We tried to account for these proteins by removing annotated phosphosites from the NetOGlyc4.0 glycosite assignments, though, we note that phosphosite databases are likely incomplete. Taken together, we believe that these analyses demonstrate that our approach provides satisfactory selectivity for mucin domains.
Mucinomics platform allows for identification of ovarian cancer patient mucinome
Following the establishment of our mucin domain enrichment approach in cell lines, we next wanted to test the mucinomics platform on clinically relevant patient samples. Ovarian cancer ranks fifth in cancer deaths among women and is often diagnosed in stage III or IV, leading to a poor prognosis. This is due, in part, to the fact that the only clinically relevant biomarker is CA-125, a peptide epitope of MUC16, but the exact structural definition of this antigen continues to be elusive. Previously, we showed that StcE could digest MUC16 from crude ovarian cancer patient ascites fluid40, leading us to reason that our enrichment technique could be used to selectively isolate MUC16 and other mucin-domain glycoproteins from ascites fluid as a potential diagnostic strategy. As such, we performed mucinomics enrichment with StcEE447D-beads on five de-identified patient samples (OC235, OC234, OC114, OC109, and OC107). As seen in Fig. 5A–E, the grand majority of putative mucin-domain glycoproteins were significantly enriched in the elution (see Supplementary Data 7 for Perseus processing information); in all but one of the experiments (OC114), MUC16 (denoted in purple) was significantly enriched. The enrichment in these experiments was even more successful than in the cell lines; in four out of five patient samples (excluding OC235), zero mucin-domain glycoproteins were “enriched” in the crude ascites fluid. This is also demonstrated by the selectivity calculations depicted in Supplementary Fig. 7, as well as the non-mucin proteins investigated in Supplementary Fig. 8 and Data 8. For the full list of enriched mucin-domain glycoproteins, see Supplementary Data 8. The enrichment was likely more successful due to the presence of fewer interfering proteins found in biofluids. Again, we compared our results to the SimpleCell dataset and found approximately half (~54%, 33 of 61) of the mucin-domain glycoprotein candidates have previously unannotated mucin domains; these are detailed in Supplementary Data 4.
Figure 5F compares overlap between the ascites samples with a Venn diagram of all enriched mucin proteins. Each sample is represented by a different color box, and the overlap between samples is given by a number within the boxes. Notably, 26 mucin-domain glycoproteins were enriched in all five samples, demonstrating substantial overlap between patients. The 26 overlapping proteins and their MucinScores are listed in Supplementary Table 1. Again, as expected, the most enriched cellular component GO terms for the mucin-domain glycoproteins were associated with membranes, lumen, extracellular matrix, and the basement membrane (Fig. 5G). As before, the mucinome list contains some known mucin-domain glycoproteins, such as CD44, podocalyxin (PODXL), and agrin (AGRN). In addition, the list contains previously undescribed mucin-domain glycoproteins, such as thymosin beta-4 and Trem-like transcript 2 protein. This further underscores the need for tools, like the strategy described here, to help define members of the mucinome. Additionally, we detected adhesion G protein-coupled receptor L1 (ADGRL1) as enriched in all five samples, further enforcing our conviction that this protein contains a mucin domain. While our patient cohort is currently too small to make any clinical claims, we believe that these overlapping mucin-domain glycoproteins could represent a better diagnostic and/or prognostic indicator for ovarian cancer. Future efforts will be devoted to expanding the study to a larger number of patients and comparing the results to patient outcomes, with the goal of developing a rapid mucin-fingerprinting approach using this mucinomics platform.
StcEE447D-enrichment also captures O-glycopeptides from mucin domains
Characterization of intact O-glycopeptides was not an original goal when designing these experiments, but we reasoned that StcEE447D-enrichment should function as a de facto glycopeptide enrichment by selecting for highly O-glycosylated mucin-domain glycoproteins at the protein (i.e, pre-proteolysis) level. We observed a large number of spectra in our ascites enrichments bearing the “HexNAc fingerprint”, that is, oxonium ions specific to glycopeptides, which prompted us to search our data for intact glycopeptides. Generally, electron-driven dissociation is better suited for characterizing O-glycopeptides because it can provide O-glycosite localization57,58. This is especially true for O-glycopeptides derived from mucin-domain glycoproteins, which will likely have multiply glycosylated sequences59–61. Even so, collision-based fragmentation can still provide O-glycopeptide identifications that include peptide sequence and the total glycan mass modification, though details about number of glycans or glycosite positions (and by extension, fine details about glycan structure) are usually inaccessible. Previous glycomic work suggests that some of these structures may include large, highly fucosylated and sialylated complex and hybrid N-glycans in addition to highly sialylated core-1 and-2 O-glycans with a smaller amount of sulfated core-2 O-glycan structures62–64. We collected only higher-energy collision dissociation (HCD) spectra through this study, limiting our ability to thoroughly characterize O-glycopeptides. Additionally, given that we performed in-gel tryptic digestion, it is unlikely that we were able to extract the intact mucin domains from many of our samples, nor were we able to fully characterize mucin domains of interest. Attempts to use StcE for in-gel digests resulted in limited digestion efficiency, and alternative methods to couple StcE proteolysis to this enrichment strategy are currently under investigation. Regardless, we searched our ascites data using O-Pair Search, a recently developed open-modification-centric glycoproteomic search algorithm that is particularly well-suited for the complex searches required of O-glycopeptide searches that consider large protein databases65 (see Supplementary Data 9 for glycan databases used). Even though we could not capitalize on the site-localization capabilities of O-Pair Search, we identified several hundred glycopeptides in both the enriched and crude ascites samples; the total list of all glycopeptides identified is available in Supplementary Data 10 and 11.
Intriguingly, we discovered several O-glycopeptides on proteins that had previously uncharacterized mucin domains, as demonstrated in Fig. 6A. Here, the putative mucin domain is indicated by an orange box, annotated N-glycan sites are shown with green dots, and approximate location of the O-glycopeptides detected are shown using red dots. These proteins did not have any annotated O-glycosites in the SimpleCell dataset or in Uniprot, thus these O-glycopeptides represent novel modifications on the mucin-domain glycoproteins. The presence of several identified O-glycopeptides in the regions assigned to be putative mucin domains by our mucin-domain candidacy algorithm also strengthens our claim that the proteins do, in fact, have mucin domains. Additionally, we detected a large number of glycopeptides from MUC16, which is a key step toward better structural definition of this important cancer antigen. The total glycan compositions for these peptides included N1, H1N1, N2, H1N2, N3, H1N1A1, H2N2, H1N2A1, H1N1A2, H2N2A1, and H2N2A2, where H is hexose, N is HexNAc, and A is Neu5Ac. The ratio of 138/144 in all of these cases was ~1, suggesting that the glycans are primarily core 1 (i.e., do not contain GlcNAc). Together, this would suggest that the compositions N2, H1N2, N3, H2N2, H1N2A1, H2N2A1, and H2N2A2 were multiply glycosylated peptides.
Next, we wanted to compare the glycoprotein sources of glycopeptides detected in the elution versus the crude cancer patient ascites fluid. As demonstrated in Fig. 6B, only 3% of glycopeptide spectral matches (glycoPSMs) originated from mucin-domain glycoprotein identifications in the unenriched ascites fluid, while 60% of glycopeptides from the elution came from mucin-domain glycoproteins. Further, 82% of all glycoPSMs in the elution were O-glycopeptides (rather than N-glycopeptides), compared to only 17% in ascites fluid (Fig. 6C). Supplementary Fig. 9 (data available in Data 9 and 10) shows the number of N- and O-glycopeptides detected in n number of experiments (where unique glycopeptide is defined as sequence peptide sequences and total mass combination), suggesting a significant biological variance in glycopeptide species between patients despite high protein-level overlap observed in Fig. 5F. We note that there is some level of ambiguity in glycopeptide identifications, given that 2 fucose residues may be assigned as a single sialic acid and vice versa. Regardless, to visualize the degree of uniqueness/overlap between glycopeptides identified in ascites and enriched samples, we constructed glycopeptide-glycan networks shown in Fig. 6D, E, which are modified versions of previous protein-glycan visualizations introduced in Riley et al.24 In these networks, unique glycopeptide identifications are arranged vertically as nodes in the middle of the network (black nodes in both panels). Unique glycan masses are then organized as nodes in the semi-circles on either side of the glycopeptide identifications, with each semi-circle representing the same glycan masses. In other words, gray nodes on the left of each network and color nodes on the right show the same glycan masses and are mirror images of each other. If glycan masses map to the same glycopeptide identifications, that means identifications are shared between the ascites (left, gray) and enriched (right, color) conditions. Otherwise, glycopeptide-glycan connections that only appear on one side of the network are unique to that condition. In Fig. 6D, the majority of N-glycopeptides were identified in ascites rather than enriched samples, with relatively few N-glycopeptides mapping uniquely to the enriched samples. Conversely, Fig. 6E shows that the majority of O-glycopeptides were identified in the enriched samples, with the majority of those being unique to the enriched samples. Note, Fig. 6C denotes glycoPSMs whereas 6D and 6E are unique glycopeptide identifications. Slightly over 50% of all N-glycopeptide identifications in the enriched samples belonged to mucin-domain glycoproteins, while mucin-domain glycoproteins accounted for only ~15% of N-glycopeptide identifications from unenriched ascites fluid samples (Supplementary Fig. 9). Similarly, approximately two thirds (~66%) of all O-glycopeptide identifications in the enriched samples belonged to mucin-domain glycoproteins, with only ~10% of O-glycopeptide identifications from unenriched ascites fluid samples deriving to mucin-domain glycoproteins (Supplementary Fig. 9). Detailed data underlying these glycopeptide-glycan networks are available in Supplementary Data 12 and 13. Overall, these data provide further evidence that we can selectively enrich mucin-domain glycoproteins with a concomitant increase in O-glycopeptide identifications.
Discussion
A rapidly developing breadth of tools continues to shed light on glycobiology, which is historically understudied relative to other biomolecules. Mucin-domain glycoproteins represent one particularly challenging subset of the glycoproteome that remains poorly defined. Though canonical mucins are recognized as important contributors to health and disease, a “parts list” for the mucinome, i.e., a complete list of mucin-domain glycoproteins, remains elusive, even though the mucinome is poised to address many open questions in glycobiology.
Here, we used a point mutant of our mucin-selective protease, StcEE447D, along with a mucin-domain candidacy algorithm to address this problem. We chose to build this candidacy algorithm on the hallmark mucin domain feature of serine and threonine O-glycosylation, as predicted by NetOGlyc4.0, while not focusing on other sequence characteristics such as proline frequency. While the enrichment feature of this mucinome workflow appears robust, we note that the mucin-domain candidacy algorithm is imperfect; yet, it serves a functional purpose for evaluating mucin-domain glycoprotein enrichments. Identification of mucin-domain glycoproteins more abundantly detected in cell lysates rather than the elution could also indicate that certain mucin domains remain under-glycosylated depending on cellular state or cell type, meaning our mucinomics approach could be used to screen the mucin status of proteins under a variety of conditions. Additionally, our mucin-domain candidacy algorithm could improve substantially from enhanced O-glycosite and mucin domain prediction tools. That said, prediction of mucin-type O-glycosites, much less mucin domains, remains challenging due to the complex regulation of O-glycosites by a poorly resolved family of glycosyltransferases. Future iterations could also explore other O-glycosite prediction algorithms beyond NetOGlyc4.0, such as ISOGlyP66.
Though we have identified a subset of putative mucin-domain glycoproteins determined by the candidacy algorithm, we did not detect nearly 300 of these proteins. This can be likely be attributed to a number of reasons: first, we only explored 5 types of epithelial cancer cells; many other cancers and subsets of the same cancers are likely to express a different subset of mucin-domain glycoproteins. Also, we primarily used whole-cell lysates in this study, biasing toward membrane-tethered glycoproteins; given that mucin-domain glycoproteins can also exist as purely secreted biomolecules rather than membrane-tethered, it is possible that we missed a large number of mucin-domain glycoproteins only found in the secretome of cells. Further, it is entirely possible that the dense glycosylation in the mucin-domain glycoproteins renders them inaccessible to the in-gel digestion strategy used here. Current efforts are focused on optimizing the elution of the mucin-domain glycoproteins to enable in-solution digestion approaches. Finally, though previous experiments have suggested otherwise, it is possible that StcE enriches only a certain subset of mucin-domain glycoproteins from the samples. Interestingly, during the review process of this manuscript, Nason, Büll, et al. reported that the C-terminal domain of StcE can confer mucin-binding properties irrespective of the active site67, meaning that the selectively of StcEE447D enrichments is not purely based on the O-glycosylated TxT motif that dictates its protease activity. This generates interesting new directions to explore complexities of mucin binding harbored by catalytically inactive O-glycoprotease mutants.
Regardless, with this mucinomics platform, we enriched mucin-domain glycoproteins from several cancer-associated cell lines and crude ovarian cancer patient ascites fluid. We demonstrated high mucin overlap between ovarian cancer patients, and the enrichment strategy allowed us to detect hundreds of glycopeptides from the mucin proteins, with a substantial increase in O- over N-glycopeptides. We also identified many proteins previously unknown to contain a mucin domain, thus demonstrating the utility of this technique in discovering new mucin-domain glycoproteins. Future efforts will be devoted to expanding our patient cohort in order to determine whether the ovarian cancer mucinome can be used as a diagnostic and/or prognostic indicator.
Though this work represents a significant step forward in understanding mucin domains, several open questions remain. To begin, mucin domains are known to regulate interactions at cellular peripheries via biophysical effects and cell-to-cell interactions. However, these roles are likely extremely dynamic, and may depend on various glycan structures (alone or in combination), expression of the mucin domain, and the overall cellular milieu. Further, the role of an individual mucin domain is unlikely to be identical across all of the mucin-domain glycoproteins. Thus, future studies should be devoted to understanding the role that discrete mucin domains are playing in cellular function. We predict that these mucin domains will fall into subgroups with categorical roles in health and disease.
Additionally, while we have identified a large number of mucin-domain glycoproteins from cell lines and ascites fluid, many other mucin-domain glycoproteins are likely present on different cell types and in other indications. In particular, the immune cell mucinome is of incredible interest and may represent a class of new ‘checkpoint inhibitors’ with both glycan and peptide components to investigate16. Further, while we chose to focus our efforts on the cancer mucinome, several other mucinomes have yet to be studied in diseases known to involve dysregulated mucins. These mucinopathies include, but are not limited to, inflammatory bowel disease, cystic fibrosis, chronic obstructive pulmonary disease (COPD), Sjögren’s syndrome, and dry mouth/eyes. Ultimately, we believe our mucinomics strategy will find utility in several settings and will prove to be an invaluable tool for glycobiologists and biochemists alike.
Methods
O-glycoprotease cloning, expression, and purification
StcE and BT4244 were expressed as previously described39,40. Briefly, Natalie Strynadka (University of British Columbia) kindly provided the plasmid pET28b-StcE-∆35-NHis43. Robert Hirt (Newcastle University) kindly provided the plasmid pRSETA-BT424433. pET28b-StcEE447D-∆35-NHis and pRSETA-BT4244E575A were generated using the Q5 Site-Directed Mutagenesis Kit (New England Biolabs) with the following primers: StcEE447D_for 5′-TCAGTCATGACGTTGGTCATAATTATG-3’, StcEE447D_rev 5′-ACTCATTCCCCAATGTGG-3′, BT4244E575A_for 5′-CCAGCTCATGCAATTGGCCATG-3′, and BT4244E575A_rev 5′-TCCCCACGCGTTATCTTC-3′.
StcEE447D was expressed and purified as previously described40. BT4244E575A was expressed in BL21(DE3) E. coli (New England Biolabs) grown in Luria broth (LB) with 100 μg/mL ampicillin at 37 °C, 225 rpm. The culture was induced at OD 0.6–0.8 with 1 mM IPTG and grown overnight at 20 °C. Cell pellets were lysed in xTractor buffer (Clontech) and lysates were applied to 1 mL HisTrap HP columns (Cytiva Life Sciences) using an ÄKTA Pure FPLC. Columns were washed with 50 column volumes of 20 mM Tris-HCl, 100 mM NaCl, 15 mM imidazole, pH 8, and elution was performed with a linear gradient to 150 mM imidazole. For BT4244, fractions containing pure protein were concentrated using Amicon Ultra 10 kDa MWCO filters (Millipore Sigma), dialyzed into PBS, pH 7.4, and stored at −80 °C. BT4244E575A was further purified by size exclusion chromatography using a Superdex 200 Increase 10/300 GL column (Cytiva Life Sciences) in PBS, pH 7.4, and fractions containing pure protein were stored at −80 °C.
Cell culture
Cells were maintained at 37 °C and 5% CO2. HeLa cells (ATCC CCL-2) were cultured in DMEM supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin (P/S). Capan-2 cells (ATCC HTB-80) were cultured in McCoy’s 5a supplemented with 10% FBS and 1% P/S. K562 and SKBR3 cells (ATCC CRL-3344 and HTB-30, respectively) were cultured in RPMI supplemented with 10% FBS and 1% P/S. OVCAR-3 cells (ATCC HTB-161) were cultured in RPMI supplemented with 20% FBS, 0.01 mg/mL bovine insulin, and 1% P/S. To prepare lysate for pulldowns, cells plated in T75 flasks (Thermo Fisher Scientific) were grown until ~70% confluency, washed three times with DPBS, then lysed in 500 µL of RIPA buffer (Thermo Fisher Scientific) supplemented with EDTA-free protease inhibitor cocktail (Roche) and 0.1% benzonase (Millipore Sigma). Lysates were stored at −80 °C prior to pulldown.
Bead derivatization
An aliquot containing approximately 2 mg of StcEE447D (1 mL of 1.93 mg/mL) was added to 7–8 mg of POROS-AL beads, along with 1 µL of 80 mg/mL NaCNBH3. The reaction proceeded overnight, with shaking, at 4 °C. After conjugation, the beads were washed three times with 500 µL of ultrapure water, spinning at 8500 rpm for 5 min each time. To cap all excess aldehyde sites on the beads, 200 µL of Tris-HCl with 1 µL of 80 mg/mL NaCNBH3 was added to the beads. The reaction shook at room temperature for 2 h. Excess beads were stored at 4 °C for up to one month and were washed before each enrichment. Jacalin (Vector Laboratories, L-1150-25) derivatization was performed identically to the StcEE447D conjugation. For BT4244E575A conjugation, the enzyme concentration was 1.423 mg/mL, so 5.5 mg of POROS-AL beads was used. Otherwise, the conjugation and enrichment steps were identical.
Enrichment of mucin-domain glycoproteins from cell lysates and ascites fluid
Cell lysates were clarified by centrifuging for 20 min at 18,000 x g, and concentrations were determined using standard BCA assays. As per optimization experiments, the ideal ratio of lysate to beads (w/v) was determined to be 500 µg/100 µL, where 100 µL of the conjugated beads corresponded to 700 µg of beads in solution. The beads were pelleted at 8500 rpm for 5 min and the supernatant was removed. Then, 5 µL of 0.5 M EDTA and 500 µg of cell lysate was added to the beads and incubated at 4 °C overnight, with shaking. The reaction was performed six times, in tandem. After binding, the beads were spun at 8500 rpm for 5 min, and the supernatant was saved (“FT” or flow-through). Then, the beads were washed three times with 250 µL of PBS buffer containing 5 µL of 0.5 M EDTA. After the last wash, 32 µL of 4X protein loading buffer was added to the beads. For unenriched (control) samples, 30 µg of lysate was added to 10 µL of 4x protein loading buffer. All samples were then boiled at 95 °C for 5 min, spun for 2 min at 13,000 × g, and frozen for at least 1 h. The samples were then thawed and loaded onto 4–12% Criterion XT Bis-Tris precast gels (Bio-Rad), and run in 1x MOPS (Bio-Rad) for 90 minutes at 180 V. The total number of lanes for each experiment was 12, which included 6 control and 6 enriched lanes. After running, the lanes were stained using Bulldog Bio SafeStain and destained in ultrapure water. Eight bands were cut from each lane, giving a total of 96 slices per enrichment. The slices were frozen overnight at −80 °C.
For optimization and proof-of-principle purposes, only one replicate was performed, and all steps were run on a gel (FT, 3x washes, elution). Afterward, an anti-MUC16 Western blot was performed using anti-MUC16 antibody [X75] (Abcam, ab1107) at a dilution of 1:1000 and IRDye® 800CW Goat anti-Mouse IgG (LI-COR Biosciences, 926-32210) at a dilution of 1:25,000 according to manufacturer recommendations. Images (total protein, Western blot) were generated using an Odyssey CLx Near-Infrared Fluorescence Imaging System (LI-COR Biosciences).
Ascites from patients with gynecologic malignancies was collected with patient consent under an approved IRB protocol at from the Dept. of Obstetrics and Gynecology, Stanford Hospital. The study design and conduct complied with all relevant regulations regarding the use of human study participants and was in accordance with the criteria set by the Declaration of Helsinki. Ascites fluid was obtained from O.D. and V.K. and was de-identified prior to our handling. Samples were selected based on the amount of ascites available for the enrichment. BCA analysis revealed the average protein concentration to be 52 mg/mL, with a range of 33–64 mg/mL. In optimization experiments, the ideal ratio of lysate to beads (v/v) was determined to be 100 µL/100 µL, where 100 µL of the conjugated beads corresponded to 700 µg of beads in solution. Ascites was centrifuged at 4 °C at 18,000 × g for 20 min, and samples were removed from the supernatant. For control experiments, 6 µL of ascites was removed per lane for a total of 36 µL. Otherwise, the procedure was the same as above.
In-gel digest and C18 clean-up for mass spectrometry
All slices were thawed in 200 µL of ultrapure water (Pierce), followed by a rinse with 200 µL of acetonitrile (ACN, Fisher). Fresh 50 mM ammonium bicarbonate (“AmBic”) was made, and samples were rinsed in 200 µL of AmBic for 20 min at RT. Afterward, samples were reduced using 5 mM dithiothreitol (DTT, Sigma) in AmBic for 35 min at 65 °C, with shaking, followed by alkylation using 50 mM iodoacetamide (IAA, Sigma) in AmBic for 30 min at RT, in the dark. Then, slices were rinsed once in AmBic, followed by two washes in fresh 50:50 AmBic:ACN for 10 min each. Slices were then dried in a vacuum concentrator and rehydrated with 0.1 µg of trypsin in 200 µL of AmBic and reacted overnight at 37 °C. The following day, samples were acidified with 2 µL of formic acid (FA, Thermo) and held at 37 °C for 45 min. The supernatant was discarded and 100 µL of 0.1% FA in 70% ACN was added to the slices for 30 min at 37 °C. The elution was collected, and the step was repeated once. The elution of adjacent slices was combined, for a total of 48 samples per enrichment. The resultant elution (400 µL) was dried in a vacuum concentrator).
All samples were subjected to desalting with a 96-well HyperSep C18 plate (Thermo). For all steps, solvent was added to the plate and centrifuged at 2013 × g in a Sorvall Legend RT. To begin, wells were wet with 150 µL of ACN followed by equilibration with 150 µL of 0.1% FA in ultrapure water (“solvent A”). Samples were reconstituted in 150 µL of solvent A and added to the plate three times. The wells were then washed three times with 150 µL of solvent A, followed by elution three times using 100 µL of 0.1% FA in 80% ACN (“solvent B”). The combined elution for each sample (48), totaling 300 µL, was taken to dryness in a vacuum concentrator. All samples were reconstituted in 7 µL of solvent A.
Mass spectrometry
Samples were analyzed by online nanoflow LC-MS/MS using an Orbitrap Fusion Tribrid mass spectrometer (Thermo Fisher Scientific) coupled to a Dionex Ultimate 3000 HPLC (Thermo Fisher Scientific) controlled by Xcalibur4.1 software. A portion of the sample was loaded via autosampler isocratically onto a C18 nano pre-column using 0.1% formic acid in water (“Solvent A”). For all cell lysate samples and enriched ascites fluid, 6.5 µL of sample was injected onto the column; for unenriched ascites fluid, 0.5–6.5 µL of sample was loaded onto the column as determined by peptide BCA (approximately 1 µg per sample). For pre-concentration and desalting, the column was washed with 0.1% formic acid in ACN and 0.1% formic acid in water (“loading pump solvent”). Subsequently, the C18 nano pre-column was switched in line with the C18 nano separation column (75 µm x 250 mm EASYSpray containing 2 µm C18 beads) for gradient elution. The column was held at 45 °C using a column heater in the EASY-Spray ionization source (Thermo Fisher Scientific). The samples were eluted at a constant flow rate of 0.3 µL/min using a 90 min gradient. The gradient profile was as follows (min:% solvent B, 2% formic acid in acetonitrile) 0:3, 3:5, 93:25, 103:35, 104:90, 109:90, 110:3, 140:3. The instrument method used an MS1 resolution of 60,000 at FWHM 200 m/z, an AGC target of 3e5, and a mass range from 350 to 1,500 m/z. Dynamic exclusion was enabled with a repeat count of 3, repeat duration of 10 s, exclusion duration of 10 s. Only charge states 2–6 were selected for fragmentation. MS2s were generated at top speed for 3 s. HCD was performed on all selected precursor masses with the following parameters: isolation window of 2 m/z, 30% collision energy, orbitrap detection (resolution of 30,000), and an AGC target of 1e4 ions.
Mucin-domain candidacy algorithm
To build the mucin-domain candidacy algorithm, the entire human proteome was first downloaded from Uniprot (20,365 entries) and parsed into FASTA files containing 150 entries each (a total of 136 files). Each file was individually uploaded to the NetOGlyc4.0 Server (http://www.cbs.dtu.dk/services/NetOGlyc/) for O-glycosite prediction51. NetOglyc4.0 results were saved as.csv files for further processing, with 20,121 entries returning usable output. Those without a NetOGlyc4.0 output received a Mucin Score of NaN in the supplemental datafiles, which differs from a score of 0 that can be calculated through the description below. Cellular component (CC) GO terms for the human proteome were also downloaded from Uniprot, and phosphosite annotations were downloaded from Uniprot and PhosphoSitePlus52,53. Predictions from NetOGlyc4.0 were then screened for known phosphosites, and any overlap in phosphosites with predicted O-GalNAc sites resulted in removal of the predicted O-GalNAc site from consideration. To annotate proteins as extracellular, secreted, and/or transmembrane, cellular component localization terms from Uniprot were checked for each protein entry. A protein was annotated as “extracellular” if its CC GO terms contained the phrases “Cell Membrane”, “Cell membrane”, “pass membrane protein”, “Secreted”, “extracellular”, or “Extracellular”. Proteins also received the “extracellular” distinction if they contained GO accessions of 0005887, 0016021, or 0005576. Because many proteins have multiple locations, “extracellular” proteins were further denoted as “exclusively extracellular” if their GO term lists did NOT include “Mitochondrion”, “Cyto”, “cyto”, “Nucl”, or “cytoplasmic side”. Next, predicted O-glycosites were iterated over to determine if a given protein would pass our “mucin test”, which consisted of two calculations. First, we required a protein to have at least nine predicted O-glycosites within a 50-residue region. If a protein qualified for this benchmark, we applied our “12% rule” to determine the number of residues that separated any two given O-glycosites within this 50-residue region. The 12% rule applied to a 50-residue region meant that fewer than 6 residues could separate any given pair of O-glycosites. Both the “9 sites within 50 residues” metric and the “12% rule” were derived through hand annotation of known and thoroughly studied mucins mostly curated by the Mucin Biology Group (http://www.medkem.gu.se/mucinbiology/databases/db/Mucin-human-2015.htm)68,69. Although this could be considered both too stringent or too relaxed depending on perspective, empirical testing showed these rules (in conjunction with the other metrics discussed) to be reasonably reliable in properly annotating known mucin domains. Exploration of these “mucin test” metrics in particular is an interesting area for future studies looking to employ a mucin-domain candidacy algorithm. Finally, a threonine to serine ratio (T/S-ratio) was calculated for the predicted O-glycosites, mainly as a metric to discriminate O-GalNAc sites (slight threonine preference) from phosphosites (slight serine preference) due to the proclivity of NetOGlyc4.0 to predict dense regions of O-GalNAc sites in what are actually intracellular phosphorylation domains. Note, these preferences are based on empirical observations. If the number of serines and threonines were both greater than zero, the T/S-ratio was calculated by taking the number of threonines and dividing by the number of serines. If the number of threonines was > 0 but the number of serines was 0, the T/S-ratio was assigned a value of 2. Otherwise, the T/S-ratio was set at 0. With all of these determinations made, we then generated a Mucin Score. First, an integer score was calculated. If a protein was annotated as “extracellular” and passed the “mucin test”, it received an integer score of 1, while proteins “exclusively extracellular” and passing the mucin test received an integer score of 2. Integer scores were then augmented by 1 point if the predicted number of O-glycosites was greater than the number of annotated phosphosites. Finally, the integer score was multiplied by the T/S-ratio to generate the Mucin Score. This process was completed for all proteins in the human proteome that had predictions returned from NetOGlyc4.0 (~20,191 entries). Mucin Scores were used to determine confidence that a protein contained a mucin domain, including valuations of high confidence (Mucin Score > 2), medium confidence (2 > Mucin Score > 1.5), low confidence (1.5 > Mucin Score > 1.2), and non-mucin (Mucin Score < 1.2). This annotation was determined manually by assessing all of the factors above. For regions of 50 amino acids identified as putative mucin domains through this analysis, approximately 90% of them mapped to a single exon, which was evaluated manually using the neXtProt knowledgebase70.
Unmodified peptide MS data analysis (MaxQuant)
Raw data were processed using MaxQuant version 1.6.3.4, and tandem mass spectra were searched with the Andromeda search algorithm. Oxidation of methionine and protein N-terminal acetylation were specified as variable modifications, while carbamidomethylation of cysteine was set as a fixed modification. A precursor ion search tolerance of 20 ppm and a product ion mass tolerance of 20 ppm were used for searches, and two missed cleavages were allowed for full trypsin specificity. Peptide spectral matches were made against a target-decoy human reference proteome database downloaded from Uniprot. FBS contamination was not examined for the lysate samples. Peptides were filtered to a 1% FDR and a 1% protein FDR was applied according to the target-decoy method. Proteins were identified and quantified using at least one peptide (razor + unique), where razor peptide is defined as a non-unique peptide assigned to the protein group with the most other peptides (Occam’s razor principle). Proteins were quantified and normalized using MaxLFQ71 with a label-free quantification (LFQ) minimum ratio count of 1. LFQ intensities were calculated using the match between runs feature, and MS/MS spectra were required for LFQ comparisons. For quantitative Article comparisons, protein intensity values were log2-transformed before further analysis, and missing values were imputed from a normal distribution with width 0.3 and downshift value of 1.8 (that is, default values) using the Perseus software suite48. A Boolean value “IsAMucin” was also appended to each protein, with the value set as true if the Mucin Score was greater than 1. Mucin Scores and IsAMucin were input manually into MQ ‘protein groups’ txt files for manipulation in Perseus. Significance testing was performed in Perseus using a two-tailed t-test with 250 randomizations to correct for multiple comparisons, an FDR of 0.01, and an S0 value of 2 (all volcano plots), or in Microsoft Excel using a two-tailed t-test with heteroscedastic variance. We kept the standard Perseus column headers from these analyses, with “Significant” showing a “+” for proteins calculated as significant based on the t-test performed, -log(P-value) providing the y-axis value of the volcano plots that shows the log-transformed value of the t-test p-value, and “Difference” indicating the log2 fold change between the conditions (e.g., elute and lysate). Proteins were sorted by their Mucin Score and highlighted in red if the score was higher than 2 (“high probability mucin”), orange if between 2–1.5 (“medium probability mucin”, and yellow if between 1.5 and 1 (“low probability mucin”). Upset plots and the 5-sample Venn diagram (Figs. 4A and 5F, respectively) were generated using the Intervene Shiny app (https://intervene.shinyapps.io/intervene/)72. GO term enrichments were performed using DAVID55,56, with the human proteome as a background.
Glycopeptide MS data analysis (O-Pair Search)
For glycopeptide analysis, samples were loaded into MetaMorpheus in groups of 8, related to one individual replicate (e.g. “Lysate 1” slice 1–8)65,73. The human proteome was loaded into the database (downloaded from Uniprot June, 2016), and a “Glyco” search task was selected. For each group of 8 raw files, an N- and an O- glyco search was performed separately. Parameters for the O-Glycopeptide Search were as follows: O-glycan database “Oglycan.gdb” (the default 12-glycan database65,74), keep top 50 candidates, Dissociation type “HCD” and child scan “null”, 4 maximum Oglycan allowed, with OxoniumIonFit on. For the N-Glycopeptide Search, all parameters were the same except the “NGlycan182.gdb” database was used. These glycan databases are available in Supplementary Data 9. For general peptide parameters, the following features were used: tryptic cleavage, maximum missed 2 cleavages, maximum 2 modifications per peptide, with a peptide length of 5–60. Precursor mass tolerance was set to 10 ppm, product mass tolerance at 20, with a minimum score allowed of 3. Finally, carbamidomethyl of Cys was set as a fixed modification, whereas oxidation of Met was set as a variable modification. All glycopeptide hits were filtered to have a Q value of less than 0.01 and all decoy hits were removed. In O-glycopeptide searches, any peptides that had the “N-glyco sequon” as “TRUE” were also removed. Bar graphs in Fig. 6B, C were made using OriginPro 2022 and show the average value of the five data points shown indicated along with standard deviations. The glycopeptide-glycan networks in Fig. 6D, E were created in R 3.5.1 using the igraph library75.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We thank Natalie Strynadka (University of British Columbia) and Robert Hirt (Newcastle University) for their gifts of the StcE and BT4244 expression plasmids, respectively. We also thank Jessica Stark and Rishikesh Kulkarni for helpful discussions. This work was supported, in part, by National Cancer Institute Grant R01CA200423 (to C.R.B.) and the Stanford Women’s Cancer Center Innovation Award 123-0040-100-WCHGC (to C.R.B. and O.D.). S.A.M. was supported by a National Institute of General Medical Sciences F32 Postdoctoral Fellowship (F32-GM126663-01) and is currently supported by the Yale Science Development Fund. N.M.R. was funded through an NIH Predoctoral to Postdoctoral Transition Award (Grant K00 CA212454). D.J.S. was supported by a National Science Foundation Graduate Research Fellowship and Stanford Graduate Fellowship. K.P. was supported by a National Science Foundation Graduate Research Fellowship, a Stanford Graduate Fellowship, and the Stanford Chemistry, Engineering & Medicine for Human Health (ChEM-H) Chemistry/Biology Interface Predoctoral Training Program. Raw and processed data are available through the PRIDE database, accession PXD024995.
Source data
Author contributions
S.A.M. and C.R.B. designed research; S.A.M., N.M.R., D.J.S., and K.P. performed research; V.K. and O.D. contributed human clinical samples; S.A.M., N.M.R., D.J.S., and K.P. analyzed data. S.A.M., N.M.R., and C.R.B. wrote the paper with input from all authors.
Peer review
Peer review information
Nature Communications thanks Zsuzsanna Darula, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
The raw mass spectrometry data generated in this study have been deposited in the PRIDE database76 under accession code PXD024995. The SimpleCell dataset from Clausen and colleagues was obtained from Steentoft et al.51 (Supplemental Table 2 in that publication). The proteomics data generated from the mucinome enrichments of cell lysates and ascites fluid, the outputs from the mucin candidacy algorithm, the glycan databases used for glycopeptide searches, the glycoproteomics data generated from the mucinome enrichments of ascites fluid, data to make the N- and O-glycopeptide networks, and data to recreate figures are provided in the Supplementary Data files as indicated in the text. Source data are provided with this paper.
Code availability
Code for the mucinome candidacy algorithm is available as Supplementary Software 1.
Competing interests
S.A.M., D.J.S., K.P., and C.R.B. are coinventors on a Stanford nonprovisional utility patent application that has been filed and is pending in the US (number US20220003777) related to the use of inactive mucinases to enrich mucin-domain glycoproteins. C.R.B. is a co-founder and Scientific Advisory Board member of Lycia Therapeutics, Palleon Pharmaceuticals, Enable Bioscience, Redwood Biosciences (a subsidiary of Catalent), and InterVenn Biosciences, and a member of the Board of Directors of Eli Lilly & Company. O.D. has participated in advisory boards for Tesaro, Merck, and Geneos. O.D. is a speaker for Tesaro and AstraZeneca. The remaining authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Stacy A. Malaker, Nicholas M. Riley.
Contributor Information
Stacy A. Malaker, Email: stacy.malaker@yale.edu
Carolyn R. Bertozzi, Email: bertozzi@stanford.edu
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-31062-4.
References
- 1.Shurer CR, et al. Physical Principles of Membrane Shape Regulation by the Glycocalyx. Cell. 2019;177:1757–1770.e21. doi: 10.1016/j.cell.2019.04.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wagner CE, Wheeler KM, Ribbeck K. Mucins and Their Role in Shaping the Functions of Mucus Barriers. Annu. Rev. Cell Dev. Biol. 2018;34:189–215. doi: 10.1146/annurev-cellbio-100617-062818. [DOI] [PubMed] [Google Scholar]
- 3.Hansson GC. Mucins and the Microbiome. Annu. Rev. Biochem. 2020;89:769–793. doi: 10.1146/annurev-biochem-011520-105053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bennett EP, et al. Control of mucin-type O-glycosylation: A classification of the polypeptide GalNAc-transferase gene family. Glycobiology. 2012;22:736–756. doi: 10.1093/glycob/cwr182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Reily C, Stewart TJ, Renfrow MB, Novak J. Glycosylation in health and disease. Nat. Rev. Nephrol. 2019;15:346–366. doi: 10.1038/s41581-019-0129-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Möckl L. The Emerging Role of the Mammalian Glycocalyx in Functional Membrane Organization and Immune System Regulation. Front. Cell Dev. Biol. 2020;8:253. doi: 10.3389/fcell.2020.00253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kuo JCH, Gandhi JG, Zia RN, Paszek MJ. Physical biology of the cancer cell glycocalyx. Nat. Phys. 2018;14:658–669. doi: 10.1038/s41567-018-0186-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Singh PK, Hollingsworth MA. Cell surface-associated mucins in signal transduction. Trends Cell Biol. 2006;16:467–476. doi: 10.1016/j.tcb.2006.07.006. [DOI] [PubMed] [Google Scholar]
- 9.Kufe DW. Mucins in cancer: Function, prognosis and therapy. Nat. Rev. Cancer. 2009;9:874–885. doi: 10.1038/nrc2761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Jonckheere N, Van Seuningen I. The membrane-bound mucins: From cell signalling to transcriptional regulation and expression in epithelial cancers. Biochimie. 2010;92:1–11. doi: 10.1016/j.biochi.2009.09.018. [DOI] [PubMed] [Google Scholar]
- 11.Bhatia R, et al. Cancer-associated mucins: role in immune modulation and metastasis. Cancer Metastasis Rev. 2019;38:223–236. doi: 10.1007/s10555-018-09775-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hollingsworth MA, Swanson BJ. Mucins in cancer: Protection and control of the cell surface. Nat. Rev. Cancer. 2004;4:45–60. doi: 10.1038/nrc1251. [DOI] [PubMed] [Google Scholar]
- 13.Paszek MJ, et al. The cancer glycocalyx mechanically primes integrin-mediated growth and survival. Nature. 2014;511:319–325. doi: 10.1038/nature13535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Woods, E. C. et al. A bulky glycocalyx fosters metastasis formation by promoting g1 cell cycle progression. Elife6,e25752 (2017). [DOI] [PMC free article] [PubMed]
- 15.Van Putten JPM, Strijbis K. Transmembrane Mucins: Signaling Receptors at the Intersection of Inflammation and Cancer. J. Innate Immun. 2017;9:281–299. doi: 10.1159/000453594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wisnovsky S, et al. Genome-wide CRISPR screens reveal a specific ligand for the glycan-binding immune checkpoint receptor Siglec-7. Proc. Natl Acad. Sci. 2021;118:e2015024118. doi: 10.1073/pnas.2015024118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wang, L., Zuo, X., Xie, K. & Wei, D. The role of CD44 and cancer stem cells. in Methods in Molecular Biology1692, 31–42 (2018). [DOI] [PubMed]
- 18.Xu Z, Weiss A. Negative regulation of CD45 by differential homodimerization of the alternatively spliced isoforms. Nat. Immunol. 2002;3:764–771. doi: 10.1038/ni822. [DOI] [PubMed] [Google Scholar]
- 19.Carlow DA, et al. PSGL-1 function in immunity and steady state homeostasis. Immunol. Rev. 2009;230:75–96. doi: 10.1111/j.1600-065X.2009.00797.x. [DOI] [PubMed] [Google Scholar]
- 20.Canals Hernaez, D. et al. PODO447: A novel antibody to a tumor-restricted epitope on the cancer antigen podocalyxin. J. Immunother. Cancer.8, e001128 (2020). [DOI] [PMC free article] [PubMed]
- 21.Murakami Y. Involvement of a cell adhesion molecule, TSLC1/IGSF4, in human oncogenesis. Cancer Sci. 2005;96:543–552. doi: 10.1111/j.1349-7006.2005.00089.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sun S, et al. Comprehensive analysis of protein glycosylation by solid-phase extraction of N-linked glycans and glycosite-containing peptides. Nat. Biotechnol. 2016;34:84–88. doi: 10.1038/nbt.3403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Suttapitugsakul S, Sun F, Wu R. Recent Advances in Glycoproteomic Analysis by Mass Spectrometry. Anal. Chem. 2020;92:267–291. doi: 10.1021/acs.analchem.9b04651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Riley NM, Hebert AS, Westphall MS, Coon JJ. Capturing site-specific heterogeneity with large-scale N-glycoproteome analysis. Nat. Commun. 2019;10:1311. doi: 10.1038/s41467-019-09222-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Khatri K, et al. Comparison of Collisional and Electron-Based Dissociation Modes for Middle-Down Analysis of Multiply Glycosylated Peptides. J. Am. Soc. Mass Spectrom. 2018;29:1075–1085. doi: 10.1007/s13361-018-1909-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Woo CM, et al. Development of IsoTaG, a Chemical Glycoproteomics Technique for Profiling Intact N- and O-Glycopeptides from Whole Cell Proteomes. J. Proteome Res. 2017;16:1706–1718. doi: 10.1021/acs.jproteome.6b01053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Thomas DR, Scott NE. Glycoproteomics: growing up fast. Curr. Opin. Struct. Biol. 2021;68:18–25. doi: 10.1016/j.sbi.2020.10.028. [DOI] [PubMed] [Google Scholar]
- 28.Chernykh A, Kawahara R, Thaysen-Andersen M. Towards structure-focused glycoproteomics. Biochem. Soc. Trans. 2021 doi: 10.1042/bst20200222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yang, W., Ao, M., Hu, Y., Li, Q. K. & Zhang, H. Mapping the O‐glycoproteome using site‐specific extraction of O‐linked glycopeptides (EXoO). Mol. Syst. Biol. 14, e8486 (2018). [DOI] [PMC free article] [PubMed]
- 30.Yang S, et al. Deciphering Protein O-Glycosylation: Solid-Phase Chemoenzymatic Cleavage and Enrichment. Anal. Chem. 2018;90:8261–8269. doi: 10.1021/acs.analchem.8b01834. [DOI] [PubMed] [Google Scholar]
- 31.Levery SB, et al. Advances in mass spectrometry driven O-glycoproteomics. Biochimica et. Biophysica Acta - Gen. Subj. 2015;1850:33–42. doi: 10.1016/j.bbagen.2014.09.026. [DOI] [PubMed] [Google Scholar]
- 32.Ayala-Lujan JL, et al. Broad Spectrum Activity of a Lectin-Like Bacterial Serine Protease Family on Human Leukocytes. PLoS One. 2014;9:e107920. doi: 10.1371/journal.pone.0107920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Nakjang S, Ndeh DA, Wipat A, Bolam DN, Hirt RP. A Novel Extracellular Metallopeptidase Domain Shared by Animal Host-Associated Mutualistic and Pathogenic Microbes. PLoS One. 2012;7:e30287. doi: 10.1371/journal.pone.0030287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Noach I, et al. Recognition of protein-linked glycans as a determinant of peptidase activity. Proc. Natl Acad. Sci. U. S. A. 2017;114:E679–E688. doi: 10.1073/pnas.1615141114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Henderson IR, Czeczulin J, Eslava C, Noriega F, Nataro JP. Characterization of Pic, a secreted protease of Shigella flexneri and enteroaggregative Escherichia coli. Infect. Immun. 1999;67:5587–5596. doi: 10.1128/IAI.67.11.5587-5596.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Govindarajan, B. et al. A metalloproteinase secreted by Streptococcus pneumoniae removes membrane mucin MUC16 from the epithelial glycocalyx barrier. PLoS One.7, e32418 (2012). [DOI] [PMC free article] [PubMed]
- 37.Derrien, M. et al. Modulation of mucosal immune response, tolerance, and proliferation in mice colonized by the mucin-degrader Akkermansia muciniphila. Front. Microbiol. 2, 10.3389/fmicb.2011.00166 (2011). [DOI] [PMC free article] [PubMed]
- 38.Florencia Haurat M, et al. The glycoprotease CpaA secreted by medically relevant acinetobacter species targets multiple O-linked host glycoproteins. MBio. 2020;11:1–19. doi: 10.3391/mbi.2020.11.1.01. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Shon DJ, et al. An enzymatic toolkit for selective proteolysis, detection, and visualization of mucin-domain glycoproteins. Proc. Natl Acad. Sci. U. S. A. 2020;117:21299–21307. doi: 10.1073/pnas.2012196117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Malaker SA, et al. The mucin-selective protease StcE enables molecular and functional analysis of human cancer-associated mucins. Proc. Natl Acad. Sci. 2019;116:7278–7287. doi: 10.1073/pnas.1813020116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lathem WW, et al. StcE, a metalloprotease secreted by Escherichia coli O157:H7, specifically cleaves C1 esterase inhibitor. Mol. Microbiol. 2002;45:277–288. doi: 10.1046/j.1365-2958.2002.02997.x. [DOI] [PubMed] [Google Scholar]
- 42.Grys TE, Walters LL, Welch RA. Characterization of the StcE protease activity of Escherichia coli O157:H7. J. Bacteriol. 2006;188:4646–4653. doi: 10.1128/JB.01806-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Yu ACY, Worrall LJ, Strynadka NCJ. Structural insight into the bacterial mucinase StcE essential to adhesion and immune evasion during enterohemorrhagic E. coli infection. Structure. 2012;20:707–717. doi: 10.1016/j.str.2012.02.015. [DOI] [PubMed] [Google Scholar]
- 44.Woods RJ, et al. Engineered High‐Specificity Affinity Reagents for the Detection of Glycan Sialylation. FASEB J. 2019;33:801.2–801.2. [Google Scholar]
- 45.Riley NM, Bertozzi CR, Pitteri SJ. A Pragmatic Guide to Enrichment Strategies for Mass Spectrometry-based Glycoproteomics. Mol. Cell. Proteom. 2020 doi: 10.1074/mcp.r120.002277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Malaker SA, et al. Identification and Characterization of Complex Glycosylated Peptides Presented by the MHC Class II Processing Pathway in Melanoma. J. Proteome Res. 2017;16:228–237. doi: 10.1021/acs.jproteome.6b00496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Tyanova S, Temu T, Cox J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protoc. 2016;11:2301–2319. doi: 10.1038/nprot.2016.136. [DOI] [PubMed] [Google Scholar]
- 48.Tyanova S, et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods. 2016 doi: 10.1038/nmeth.3901. [DOI] [PubMed] [Google Scholar]
- 49.Lang T, Alexandersson M, Hansson GC, Samuelsson T. Bioinformatic identification of polymerizing and transmembrane mucins in the puffer fish Fugu rubripes. Glycobiology. 2004;14:521–527. doi: 10.1093/glycob/cwh066. [DOI] [PubMed] [Google Scholar]
- 50.Lang T, Hansson GC, Samuelsson T. Gel-forming mucins appeared early in metazoan evolution. Proc. Natl Acad. Sci. 2007;104:16209–16214. doi: 10.1073/pnas.0705984104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Steentoft C, et al. Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. EMBO J. 2013;32:1478–1488. doi: 10.1038/emboj.2013.79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Bateman A. UniProt: A worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47:D506–D515. doi: 10.1093/nar/gky1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Hornbeck PV, et al. PhosphoSitePlus, 2014: Mutations, PTMs and recalibrations. Nucleic Acids Res. 2015;43:D512–D520. doi: 10.1093/nar/gku1267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Vergnolle N. Protease inhibition as new therapeutic strategy for GI diseases. Gut. 2016;65:1215–1224. doi: 10.1136/gutjnl-2015-309147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
- 56.Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009 doi: 10.1093/nar/gkn923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Darula Z, Medzihradszky KF. Analysis of mammalian O-glycopeptides - We have made a good start, but there is a long way to go. Mol. Cell. Proteom. 2018;17:2–17. doi: 10.1074/mcp.MR117.000126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Windwarder M, Altmann F. Site-specific analysis of the O-glycosylation of bovine fetuin by electron-transfer dissociation mass spectrometry. J. Proteom. 2014;108:258–268. doi: 10.1016/j.jprot.2014.05.022. [DOI] [PubMed] [Google Scholar]
- 59.Pap A, Klement E, Hunyadi-Gulyas E, Darula Z, Medzihradszky KF. Status Report on the High-Throughput Characterization of Complex Intact O-Glycopeptide Mixtures. J. Am. Soc. Mass Spectrom. 2018;29:1210–1220. doi: 10.1007/s13361-018-1945-7. [DOI] [PubMed] [Google Scholar]
- 60.Khoo KH. Advances toward mapping the full extent of protein site-specific O-GalNAc glycosylation that better reflects underlying glycomic complexity. Curr. Opin. Struct. Biol. 2019;56:146–154. doi: 10.1016/j.sbi.2019.02.007. [DOI] [PubMed] [Google Scholar]
- 61.Riley NM, Malaker SA, Driessen M, Bertozzi CR. Optimal Dissociation Methods Differ for N- and O-glycopeptides. J. Proteome Res. 2020 doi: 10.1021/acs.jproteome.0c00218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Miyamoto S, et al. Glycoproteomic Analysis of Malignant Ovarian Cancer Ascites Fluid Identifies Unusual Glycopeptides. J. Proteome Res. 2016;15:3358–3376. doi: 10.1021/acs.jproteome.6b00548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Biskup K, Braicu EI, Sehouli J, Tauber R, Blanchard V. The ascites N-glycome of epithelial ovarian cancer patients. J. Proteom. 2017;157:33–39. doi: 10.1016/j.jprot.2017.02.001. [DOI] [PubMed] [Google Scholar]
- 64.Karlsson NG, McGuckin MA. O-Linked glycome and proteome of high-molecular-mass proteins in human ovarian cancer ascites: Identification of sulfation, disialic acid and O-linked fucose. Glycobiology. 2012;22:918–929. doi: 10.1093/glycob/cws060. [DOI] [PubMed] [Google Scholar]
- 65.Lu L, Riley NM, Shortreed MR, Bertozzi CR, Smith LM. O-Pair Search with MetaMorpheus for O-glycopeptide characterization. Nat. Methods. 2020;17:1133–1138. doi: 10.1038/s41592-020-00985-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Mohl JE, Gerken TA, Leung M-Y. ISOGlyP: de novo prediction of isoform-specific mucin-type O-glycosylation. Glycobiology. 2020 doi: 10.1093/glycob/cwaa067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Nason R, et al. Display of the human mucinome with defined O-glycans by gene engineered cells. Nat. Commun. 2021;12:1–16. doi: 10.1038/s41467-021-24366-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Lang T, et al. Searching the Evolutionary Origin of Epithelial Mucus Protein Components - Mucins and FCGBP. Mol. Biol. Evol. 2016;33:1921–1936. doi: 10.1093/molbev/msw066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Stavenhagen K, et al. N- and O-glycosylation Analysis of Human C1-inhibitor Reveals Extensive Mucin-type O-Glycosylation. Mol. Cell. Proteom. 2018;17:1225–1238. doi: 10.1074/mcp.RA117.000240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Zahn-Zabal M, et al. The neXtProt knowledgebase in 2020: data, tools and usability improvements. Nucleic Acids Res. 2020;48:D328–D334. doi: 10.1093/nar/gkz995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Cox J, et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteom. 2014;13:2513–2526. doi: 10.1074/mcp.M113.031591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Khan A, Mathelier A. Intervene: A tool for intersection and visualization of multiple gene or genomic region sets. BMC Bioinforma. 2017;18:287. doi: 10.1186/s12859-017-1708-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Solntsev SK, Shortreed MR, Frey BL, Smith LM. Enhanced Global Post-translational Modification Discovery with MetaMorpheus. J. Proteome Res. 2018;17:1844–1851. doi: 10.1021/acs.jproteome.7b00873. [DOI] [PubMed] [Google Scholar]
- 74.Mao J, et al. A new searching strategy for the identification of o-linked glycopeptides. Anal. Chem. 2019;91:3852–3859. doi: 10.1021/acs.analchem.8b04184. [DOI] [PubMed] [Google Scholar]
- 75.Csardi, G. & Nepusz, T. The igraph software package for complex network research. Int. J. Complex Sy. 1695 https://igraph.org (2006).
- 76.Perez-Riverol Y, et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucl. Acids Res. 2019;47:D442–D450. doi: 10.1093/nar/gky1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The raw mass spectrometry data generated in this study have been deposited in the PRIDE database76 under accession code PXD024995. The SimpleCell dataset from Clausen and colleagues was obtained from Steentoft et al.51 (Supplemental Table 2 in that publication). The proteomics data generated from the mucinome enrichments of cell lysates and ascites fluid, the outputs from the mucin candidacy algorithm, the glycan databases used for glycopeptide searches, the glycoproteomics data generated from the mucinome enrichments of ascites fluid, data to make the N- and O-glycopeptide networks, and data to recreate figures are provided in the Supplementary Data files as indicated in the text. Source data are provided with this paper.
Code for the mucinome candidacy algorithm is available as Supplementary Software 1.