Skip to main content
Glycobiology logoLink to Glycobiology
. 2021 Nov 22;32(3):201–207. doi: 10.1093/glycob/cwab116

A graphical representation of glycan heterogeneity

Xuyao Zeng 1, Milos V Novotny 2, David E Clemmer 3, Jonathan C Trinidad 4,
PMCID: PMC8966470  PMID: 34939082

Abstract

A substantial shortcoming of large-scale datasets is often the inability to easily represent and visualize key features. This problem becomes acute when considering the increasing technical ability to profile large numbers of glycopeptides and glycans in recent studies. Here, we describe a simple, concise graphical representation intended to capture the microheterogeneity associated with glycan modification at specific sites. We illustrate this method by showing visual representations of the glycans and glycopeptides from a variety of species. The graphical representation presented allows one to easily discern the compositions of all glycans, similarities and differences of modifications found in different samples and, in the case of N-linked glycans, the initial steps in the biosynthetic pathway.

Keywords: glycomics, glycoproteomics, mass spectrometry, N-glycosylation, quantification

Introduction

Overview

Recent advancements in sample preparation (Liu et al. 2017), analytical instrumentation (Riley et al. 2019) and bioinformatics (Lee et al. 2016) have enabled the acquisition of large glycomics and glycoproteomic datasets (Zou et al. 2017; Glover et al. 2018; Park et al. 2018; Song et al. 2019). Glycans are produced by a complex synthetic pathway initiated in the endoplasmic reticulum and culminating in the Golgi (for a review, see Moremen et al. 2012). In mammals, nitrogen (N)-linked glycans are initially attached to asparagine residues as a specific 14 residue precursor that is subsequently trimmed and modified to produce a final glycoform. Oxygen (O)-linked glycans are synthesized directly onto serine or threonine residues in a stepwise, non-template directed process. The number of secreted and/or transmembrane proteins in the human proteome has been estimated at 7887, corresponding to 39% of all proteins (Uhlen et al. 2015), with the vast majority of these secreted/transmembrane proteins likely to be glycosylated.

N-linked glycosylation in mammalian cells starts with the attachment of a precursor glycan Glc3Man9GlcNAc2 onto an Asn residue in the endoplasmic reticulum. This structure can then undergo stepwise enzymatic trimming of terminal glucose and mannose monosaccharides in the endoplasmic reticulum and as it is transferred to the cis-Golgi. In the Golgi, various monosaccharide residues can be added to the core (which consists of Man3GlcNAc2). Figure 1A illustrates the individual enzymatic steps for the synthesis of Gal2GlcNAc2Man3GlcNAc2. Broadly speaking, N-glycans can be divided into three classes: oligomannose, or high mannose, where one or more of the initial mannose residues remain on the two original arms of the core; complex, where GlcNAc residues have been added to the two arms extending from the core; hybrid, where the alpha 1–6 arm of the core still contains the original mannose, whereas the alpha 1–3 arm is modified with GlcNAc (and possibly additional sugars). These later two structures can be modified with sialic acid via the action of sialyltransferases, most commonly on terminal galactose residues (for a review of the sialyltransferases family, see Harduin-Lepers et al. 2005).

Fig. 1.

Fig. 1

(A) illustrates in cartoon form the endoplasmic reticulum (red) and cis-, medial- and trans-Golgi compartments (blue, yellow and green, respectively). Monosaccharide symbols follow the Symbol Nomenclature for Glycans system (Varki et al. 2015). Shown in the figure are the steps required for synthesis of Gal2GlcNAc2Man3GlcNAc2 after the initial transfer of the sugar precursor to the protein. (B) details the graphical location of glycan intermediates along this biosynthetic pathway for a specific Gal2GlcNAc2Man3GlcNAc2. The color scheme is the same as the ER/Golgi subdomains in (A). (C) is a zoomed-in image of (B), highlighting further processing that may occur within a given HexNAcHex composition. Core fucosylation is indicated by a rightward shift of 0.2 units. Each sequential addition of sialic acid is indicated by an upward shift of 0.2 units. The exact order of fucose and sialic acid addition is not necessarily fixed, leading to multiple pathways to go from Gal2GlcNAc2Man3GlcNAc2 to Gal2GlcNAc2Man3GlcNAc2Fuc1SA2. (D) shows the likely biosynthetic pathway for the synthesis of a specific HexNAc5Hex4. HexNAc5Hex4 is synthesized from HexNAc4Hex3 by the addition of a HexNAc and a galactose, and as addition of galactose is thought to inhibit N-acetyl-glucosamine transferases (GnTs), the addition of HexNAc occurs before the final galactosyltransferase step. (E) illustrates the potential biosynthetic pathways for a specific tetra-antennary HexNAc6Hex7SA2. Following the same logic as (B), addition of the GlcNAc arms proceeds prior to addition of any galactose. The two terminal sialic acids may have been added directly to HexNAc6Hex7SA2 or to any intermediates in which galactose was already present. The regions of the graph corresponding to high-mannose; hybrid; complex and paucimannose regions are indicated with arrows. An important caveat to the use of this representation is that in many instances, the researcher will not know the exact branching pattern for a given glycan, and therefore, multiple distinct glycans may occur at single positions in this map. Distinct isomers will have alternative routes within this map. This will limit the ability to precisely calculate biosynthetic fluxes.

Composition of the human glycome

The majority of mammalian glycans consist of four basic sugar families: hexoses (mannose, galactose, glucose), N-acetylhexosamines (GlcNAc, GalNAc), deoxyhexose (fucose) and sialic acids (NeuAc. NeuGc). Within the hexose and HexNAc families, individual sugar residues all have the same mass. Mass spectrometry (MS) studies of glycans and glycopeptides often report the subunit composition of the glycan species, because this is significantly more straightforward to determine than the exact linkage type of a given composition. The Consortium for Functional Glycomics database (http://www.functionalglycomics.org) lists 244 unique human N-glycan compositions. It would be extremely beneficial to represent the overall glycosylation pattern of a tissue, protein or peptide in a graphical manner where the relative positions of glycans indicate structural and biosynthetic relationships. Such a representation will allow researchers to quickly grasp the key similarities and differences between compared sample groups. A major limitation to such an approach has been the fact that it is difficult to display 4D glycan compositional data (i.e. hexose, HexNAc, deoxyhexose, sialic acid) on a 2D graph.

Results

Development of a standardized representation

Work by Losfeld et al. established a conceptual framework for representing glycans and their biosynthetic pathway as a relational 2D plot (Losfeld et al. 2017). However, this innovative approach was limited in several respects. Firstly, the X and Y coordinates of individual glycans on the map did not strictly relate to relative monosaccharide number. Secondly, while the authors focused on representing the core HexNAc and Hex compositions and effectively accounted for the addition of a single fucose by duplicating and reflecting the relative placements of the core composition, it would be extremely cumbersome to extend this duplication to account for permutations introduced by multiple fucosylation or sialylation events. Furthermore, glycan pairs with or without fucose were not colocalized in the same region of the graph, making it difficult to directly compare overall fucosylation patterns, which can be highly relevant in disease conditions (Li et al. 2018).

We propose a standardized representation of glycan heterogeneity that gives a unique position to each glycan composition on an XY coordinate plan. To encode four dimensions of data onto a 2D map, we assign integer values for the number of hexose and HexNAc residues on the X and Y axes, respectively. Then, within these values, we encode the number of fucose and sialic acid residues in 0.2-unit increments on the X and Y axes, respectively. For example, a glycan with six hexose residues would have a value of “6” on the X axis, whereas a glycan with six hexose residues and a single fucose residue would be given a value of “6.2.” Figure 1B demonstrates this orientation for the synthesis of Gal2GlcNAc2Man3GlcNAc2. The initial 14 residue glycan is positioned at (12,2) on the XY axis, corresponding to 12 hexose and 2 HexNAc residues. The hexose residues are sequentially removed, as represented by leftward arrows, until position (5,2) is reached. At this point, a HexNAc residue is added, as represented by an upward arrow to (5,3). Then two hexose residues are removed to arrive at (3,3). This is followed by a HexNAc residue addition to arrive at (3,4) and then two hexose additions to arrive at (5,4).

Representative glycan example: Gal2GlcNAc2Man3GlcNAc2

The specific glycan Gal2GlcNAc2Man3GlcNAc2 shown in Figure 1B is also commonly found modified with one or two sialic acid residues as well as a single fucose residue attached to the core. The relative orientation of these structures is illustrated in Figure 1C, with fucose-containing glycans positioned at X = 5.2 and sialic acid-modified glycans positioned at Y = 4.2 or 4.4, depending on whether the glycan bore one or two sialic acid moieties, respectively. For glycans more complex than GlcNAc2Man3GlcNAc2, all additional GlcNAc residues are thought to be added prior to any galactose addition (Figure 1D). An example of an even larger glycan is shown in Figure 1E, illustrating the likely pathway to synthesize a specific SA2Gal4GlcNAc4Man3GlcNAc2. The potential complex, hybrid, paucimannose and high-mannose glycan regions are indicated by arrows.

Representation of measured glycopeptides

We next examined the quantitative glycomics dataset by Cho et al. of N-glycans from cerebrospinal fluid of healthy individuals and those suffering from Alzheimer’s disease (Cho et al. 2019). Figure 2A and B show the average relative abundance of glycans from female subjects. Visual assessment can quickly determine that the overall patterns appear similar, with high-mannose glycans present in all datasets along the line at y = 2. The most prevalent glycans are HexNAc4Hex5SA2 and HexNAc5Hex3Fuc in Alzheimer’s disease and healthy individuals, respectively. A group of high-mass glycans around HexNAc7Hex7 can be seen in both datasets, albeit with relatively low abundance. A second glycomics example is from a dataset by Zahradnikova et al. examining changes in glycan expression as a function of chemotherapy resistance in ovarian cancer (Zahradnikova et al. 2021). Figure 2C and D show the glycomics patterns from the pooled population of patients in the drug-resistant and drug-sensitive populations, respectively. Overall, the patterns look similar with a high density of glycans bearing five HexNAc and five hexose residues.

Fig. 2.

Fig. 2

(AB) show the glycomics patterns for cerebrospinal fluid isolated from healthy controls or patients suffering from Alzheimer’s disease. Each graph is the average relative ratio of four individuals. The area of each circle is proportional to the relative abundance. (CD) show the glycomics sera patterns as a function of resistance to chemotherapy in ovarian cancer (Zahradnikova et al. 2021).

In addition to its application towards glycomics datasets, the representation can also be applied to glycoproteomic data on multiple scales. Figure 3AE shows the results from Liu et al. examining glycosylation on mouse proteins as a function of tissue of origin (Liu et al. 2017). For these figures, only the sialic acid NeuAc was plotted. For this analysis, the extracted ion chromatographic peak area of each identified glycopeptide was determined, and the summed intensity of all peptides bearing a given glycan was calculated. Distinct differences in the overall glycosylation pattern can be clearly observed with the liver bearing a high proportion of high-mannose glycans, whereas the brain and kidney showing relatively higher levels of larger complex glycans, in particular those with more than seven hexose and six HexNAc residues.

Fig. 3.

Fig. 3

(AE) shows the summed glycosylation patterns in five mouse organs: liver; kidney; lung; heart; brain. Individual glycopeptides were identified and the summed MS1 extracted ion intensity for all glycopeptides bearing individual glycans was calculated. The area of individual circles represents the percent abundance of that glycan in a given tissue. The y-axis represents HexNAc and NeuAc and, as such, NeuGc containing glycans were not considered.

These aggregate glycopeptide maps are similar to the type of data that would have been obtained from a glycomics experiment. However, in the case of glycopeptides, the data can be further broken down to the level of individual proteins and sites of glycosylation. Figure 4 shows the glycan distribution for individual glycosylation sites across the five organ systems for individual sites on three proteins: lysosome-associated membrane glycoprotein 1 (LAMP 1); sodium/potassium-transporting ATPase subunit beta-3 (ATPB-3); aminopeptidase N. It is clear that the glycosylation patterns are highly variable for a given site in a given tissue, which is the result of the cell-specific glycosylation processing.

Fig. 4.

Fig. 4

Glycosylation patterns for lamp 1, ATPB-3 and aminopeptidase N on the indicated asparagine residues. The observed glycosylation patterns in liver, kidney, lung, heart and brain are plotted. Areas of the circles are proportional to the fractional abundance of that glycan on a particular site in that tissue.

Insight into biosynthetic mechanisms

It is possible to measure the distributions of individual glycans at a given site on a protein and use this data to indirectly calculate the relative enzymatic activity underlying those processes without reference to the map itself. The results of such calculations can be plotted as arrows on the graphical representation, where the thickness of the arrows is proportional to the flux (Losfeld et al. 2017). For the glycans in the initial steps of the biosynthetic pathway, we can estimate the percentage of a given glycoform that is converted to the next step in the pathway. This will be the ratio of signal from all glycoforms that are “downstream” of that glycan divided by the sum off signal from all downstream glycans plus the glycan of interest. A caveat in such an analysis is that for datasets that do not determine the structure of the glycan, those glycans bearing four HexNAc residues may be either complex or hybrid. In that case, it may not be feasible to calculate conversion efficiencies since very different products may be formed.

Discussion

Our approach can be used to represent the relationships of data in both glycomics and glycoproteomics studies. Although the examples selected for our graphical representation come from different laboratories using somewhat technically different measurements, their value in making comparative displays on different samples is not diminished insofar as the analytical reproducibility of glycomic profiling has been maintained. As more targeted glycan isolation and MS ionization approaches become further developed to include some minor structures as well, our understanding of biosynthetic pathways and microheterogeneities will likely be improved further. The representation presented here as well as that developed by Losfeld et al. (Losfeld et al. 2017) can be used to visualize the relationship between glycans, either with or without arrows indicating biosynthetic steps (quantified or not). One important note with such applications is that the ability to map fluxes between positions on the graph requires some degree of information on the underlying structural connectivity of a given glycan composition. We have plotted the glycan intensity data shown in Figure 4 of Losfeld et al. as Supplemental Figure S1. This demonstrates how our current visualization allows more a direct comparison of glycoforms with or without fucose.

A significant number of glycan structural/linkage isomers may exist at a given composition and the current iteration of our visualization approach may have a hard time displaying them in a clearly distinguished manner. It is possible to extend our approach of representing multiple dimensions in a single number to encode information on more than two types of sugars. For example, rather than having the addition of any NeuAc result in a 0.2 shift on the Y axis, alpha 2–3 linked NeuAc could be represented by a 0.18-unit shift and alpha 2–6 linked NeuAc could be represented by a 0.22-unit shift. Admittedly such small shifts would not be clearly distinguishable at the scale of the entire glycan map, but it may be effective for a zoomed-in view of specific glycans.

The manner in which a given glycomics or glycoproteomics dataset is represented will depend on the aspects of the data the researchers wish to highlight. Our compact representation is particularly useful for comparisons of fucosylation and sialylation status and can be interpreted by readers of a manuscript who may not be familiar with exact biosynthetic pathways. Data for glycan abundances on proteins or at single glycosylation sites are typically presented in tables, or in the case of specific glycans, bar graphs may more easily indicate statistically significant changes between experimental samples. Stacked bar graphs showing the relative amount of major glycan classes can be used. For example, Shade et al. displayed the 12 most abundant glycans found on human IgE as colored stacked bar graphs (Shade et al. 2015). Similar colors were used for glycans sharing structural elements (oligomannose and antennary extent). We have plotted data from their Figure 4B in Supplementary Figure S2. In addition to demonstrating the increased sialylation levels of the more amino terminal glycosylation sites, it is easily seen in our representation that certain sites share a higher degree of overlap in distributions such as N140 and N265 compared with N383. As can be seen in Supplementary Figure S2, plotting the microheterogeneity at individual sites on a protein allows the reader to easily assess similarities between groups of sites and provides a “fingerprint” for each distinct microheterogeneity pattern.

While a large portion of glycomics and glycoproteomics studies examine mammalian systems, our representational scheme can be used in other classes of organisms. For example, Supplementary Figure S3 illustrates data from Scheys et al. where they examined glycans from male and female Nilaparvata lugens as well as instar stages N2 through N5 (Scheys et al. 2019). One significant difference in this dataset is the presence of simpler paucimannose, truncated glycans. If a given species has more than four unique glycan families, it becomes difficult to visualize these distributions using our approach. One potential solution would be to represent the addition of any fifth glycan as a duplication of the graph reflected upwards from the initial distribution, as has been demonstrated by Losfeld and colleagues (Losfeld et al. 2017). For species with significantly different glycan compositions, it may be necessary to redefine the X and Y axis glycan numbering to reflect the most abundant subunit type and alter the spacing of the secondary glycan (in our current representation, 0.2 units) to prevent overlap of spots in the case of five or more secondary glycans.

Materials and methods

All previously published data were downloaded from the publishers’ websites or the PRIDE repository (https://www.ebi.ac.uk/pride/). Glycoproteomic raw data were processed using Proteome Discoverer 2.1.1.21 (ThermoFisher Scientific, Waltham, MA). To specifically search glycopeptides, the Byonic search engine plug-in was used (Protein Metrics, Cupertino, CA) Searches were initially performed using PMI-Preview and the resulting suggested parameters were used for the PMI-Byonic node. Initial calculations were conducted in Microsoft Excel 2013 (Microsoft, Redmond, WA). Graphs were generated in OriginPro 2018 (Origin Lab, Northampton, MA) as bubble-mapped scattergrams.

To plot out individual glycosylation locations, the abundance of each glycoform was normalized to the total glycoform signal in that analysis, and then the square root of each value was taken. Plotting on a square root scale was chosen to improve visualization of low-abundance glycoforms. A list of the default Byonic 309 mammalian N-glycans is given in Supplementary Table S1 along with their X and Y positions in our coordinate structure.

Funding

National Institutes of Health, General Medical Sciences and Mental Health (NIH 5R01GM131100 to D.E.C., NIH R21GM118340 to M.V.N, NIH 1R01MH125235 to J.C.T.).

Abbreviations

SNFG, Symbol Nomenclature for Glycans; GnT, N-acetyl-glucosamine tranferase

Supplementary Material

combined_supplemental_20211108_cwab116

Contributor Information

Xuyao Zeng, Department of Chemistry, Indiana University Bloomington, 800 Kirkwood Avenue, Bloomington, IN 47405, USA.

Milos V Novotny, Department of Chemistry, Indiana University Bloomington, 800 Kirkwood Avenue, Bloomington, IN 47405, USA.

David E Clemmer, Department of Chemistry, Indiana University Bloomington, 800 Kirkwood Avenue, Bloomington, IN 47405, USA.

Jonathan C Trinidad, Department of Chemistry, Indiana University Bloomington, 800 Kirkwood Avenue, Bloomington, IN 47405, USA.

References

  1. Cho  BG, Veillon  L, Mechref  Y. 2019. N-glycan profile of cerebrospinal fluids from Alzheimer's disease patients using liquid chromatography with mass spectrometry. J Proteome Res. 18:3770–3779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Glover  MS, Yu  Q, Chen  ZW, Shi  XD, Kent  KC, Li  LJ. 2018. Characterization of intact sialylated glycopeptides and phosphorylated glycopeptides from IMAC enriched samples by EThcD fragmentation: Toward combining phosphoproteomics and glycoproteomics. Int J Mass spectrom. 427:35–42. [Google Scholar]
  3. Harduin-Lepers  A, Mollicone  R, Delannoy  P, Oriol  R. 2005. The animal sialyltransferases and sialyltransferase-related genes: A phylogenetic approach. Glycobiology. 15:805–817. [DOI] [PubMed] [Google Scholar]
  4. Lee  LY, Moh  ES, Parker  BL, Bern  M, Packer  NH, Thaysen-Andersen  M. 2016. Toward automated N-glycopeptide identification in glycoproteomics. J Proteome Res. 15:3904–3915. [DOI] [PubMed] [Google Scholar]
  5. Li  J, Hsu  HC, Mountz  JD, Allen  JG. 2018. Unmasking fucosylation: From cell adhesion to immune system regulation and diseases. Cell Chem Biol. 25:499–512. [DOI] [PubMed] [Google Scholar]
  6. Liu  MQ, Zeng  WF, Fang  P, Cao  WQ, Liu  C, Yan  GQ, Zhang  Y, Peng  C, Wu  JQ, Zhang  XJ  et al.  2017. pGlyco 2.0 enables precision N-glycoproteomics with comprehensive quality control and one-step mass spectrometry for intact glycopeptide identification. Nat Commun. 8:438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Losfeld  ME, Scibona  E, Lin  CW, Villiger  TK, Gauss  R, Morbidelli  M, Aebi  M. 2017. Influence of protein/glycan interaction on site-specific glycan heterogeneity. FASEB J. 31:4623–4635. [DOI] [PubMed] [Google Scholar]
  8. Moremen  KW, Tiemeyer  M, Nairn  AV. 2012. Vertebrate protein glycosylation: Diversity, synthesis and function. Nat Rev Mol Cell Biol. 13:448–462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Park  DD, Xu  G, Wong  M, Phoomak  C, Liu  M, Haigh  NE, Wongkham  S, Yang  P, Maverakis  E, Lebrilla  CB. 2018. Membrane glycomics reveal heterogeneity and quantitative distribution of cell surface sialylation. Chem Sci. 9:6271–6285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Riley  NM, Hebert  AS, Westphall  MS, Coon  JJ. 2019. Capturing site-specific heterogeneity with large-scale N-glycoproteome analysis. Nat Commun. 10:1311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Scheys  F, De Schutter  K, Shen  Y, Yu  N, Smargiasso  N, De Pauw  E, Van Damme  EJM, Smagghe  G. 2019. The N-glycome of the hemipteran pest insect Nilaparvata lugens reveals unexpected sex differences. Insect Biochem Mol Biol. 107:39–45. [DOI] [PubMed] [Google Scholar]
  12. Shade  KT, Platzer  B, Washburn  N, Mani  V, Bartsch  YC, Conroy  M, Pagan  JD, Bosques  C, Mempel  TR, Fiebiger  E  et al.  2015. A single glycan on IgE is indispensable for initiation of anaphylaxis. J Exp Med. 212:457–467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Song  M, Zeng  J, Jia  T, Gao  H, Zhang  R, Jiang  J, Li  G, Su  T. 2019. Effects of sialylated lactulose on the mouse intestinal microbiome using Illumina high-throughput sequencing. Appl Microbiol Biotechnol. 103:9067–9076. [DOI] [PubMed] [Google Scholar]
  14. Uhlen  M, Fagerberg  L, Hallstrom  BM, Lindskog  C, Oksvold  P, Mardinoglu  A, Sivertsson  A, Kampf  C, Sjostedt  E, Asplund  A  et al.  2015. Proteomics. Tissue-based map of the human proteome. Science. 347:1260419. [DOI] [PubMed] [Google Scholar]
  15. Varki  A, Cummings  RD, Aebi  M, Packer  NH, Seeberger  PH, Esko  JD, Stanley  P, Hart  G, Darvill  A, Kinoshita  T  et al.  2015. Symbol nomenclature for graphical representations of Glycans. Glycobiology. 25:1323–1324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Zahradnikova  M, Ihnatova  I, Lattova  E, Uhrik  L, Stuchlikova  E, Nenutil  R, Valik  D, Nalezinska  M, Chovanec  J, Zdrahal  Z  et al.  2021. N-Glycome changes reflecting resistance to platinum-based chemotherapy in ovarian cancer. J Proteomics. 230:103964. [DOI] [PubMed] [Google Scholar]
  17. Zou  G, Benktander  JD, Gizaw  ST, Gaunitz  S, Novotny  MV. 2017. Comprehensive analytical approach toward glycomic characterization and profiling in urinary exosomes. Anal Chem. 89:5364–5372. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

combined_supplemental_20211108_cwab116

Articles from Glycobiology are provided here courtesy of Oxford University Press

RESOURCES