Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 May 1.
Published in final edited form as: Trends Analyt Chem. 2019 Feb 13;114:143–150. doi: 10.1016/j.trac.2019.02.009

Mapping human N-linked glycoproteins and glycosylation sites using mass spectrometry

Liuyi Dang 1, Li Jia 1, Yuan Zhi 1, Pengfei Li 1, Ting Zhao 1, Bojing Zhu 1, Rongxia Lan 1, Yingwei Hu 2, Hui Zhang 2, Shisheng Sun 1,*
PMCID: PMC6907083  NIHMSID: NIHMS1525154  PMID: 31831916

Abstract

N-linked glycoprotein is a highly interesting class of proteins for clinical and biological research. Over the last decade, large-scale profiling of N-linked glycoproteins and glycosylation sites from biological and clinical samples has been achieved through mass spectrometry-based glycoproteomic approaches. In this paper, we reviewed the human glycoproteomic profiles that have been reported in more than 80 individual studies, and mainly focused on the N-glycoproteins and glycosylation sites identified through their deglycosylated forms of glycosite-containing peptides. According to our analyses, more than 30,000 glycosite-containing peptides and 7,000 human glycoproteins have been identified from five different body fluids, twelve human tissues (or related cell lines), and four special cell types. As the glycoproteomic data is still missing for many organs and tissues, a systematical glycoproteomic analysis of various human tissues and body fluids using a uniform platform is still needed for an integrated map of human N-glycoproteomes.

Keywords: Glycosylation, glycoprotein, glycosylation site, glycoproteome, mass spectrometry, human body fluids, human tissues, human cell lines

Introduction

Glycosylation is one of the most prevalent post-translational modifications of proteins. It not only plays major roles in folding, transport and localization of proteins [1], but also regulates various biological processes such as cell growth, viral replication and immune defense [24]. Aberrant glycosylation is usually associated with the pathological progression of many diseases [5]. N-linked glycoproteins are widely distributed, ranging from surface of various types of cells to different human body fluids such as serum, cerebrospinal fluid and urine [5, 6]. Transmembrane or cell surface glycoproteins are easily accessible to therapeutic drugs, antibodies, and ligands while glycoproteins secreted in body fluids are thought to provide a detailed window into the state of health of an individual. These features make glycoproteins a highly interesting class of proteins for clinical and biological research.

In the last decade, mass spectrometry (MS) has been widely used for large-scale glycoproteomic analysis. Typically, glycoproteins or glycopeptides are first enriched from complex samples using different enrichment methods, such as lectin affinity [7], hydrazide chemistry [8] and hydrophilic interaction liquid chromatography (HILIC) [9]; glycans are then cleaved from the glycoproteins or glycopeptides by enzymatic or chemical methods prior to MS analysis of deglycosylated peptides (Figure 1). As for N-linked glycosylation, the deamidation which occurs at the formerly N-glycosylation sites during the PNGase F treatment can serve as a mass tag (+0.98 Da, or +3 Da when performed in H218O) for N-glycosylation site identification using mass spectrometry [7, 8]. This simple but effective strategy has been widely applied into the large-scale analysis of N-linked glycoproteomes in various samples from human as well as other species.

Figure 1.

Figure 1.

The most commonly used workflow for N-linked glycoprotein and glycosylation site identification. The glycopeptides or glycoproteins from complex samples are enriched, N-linked glycans are then removed from the peptide portion using PNGase F digestion. Deglycosylated peptides are analyzed by LC-MS/MS analysis, and the peptide portions of the glycopeptides are identified by database searching. The deamidation (+0.98 Da) occurs at the formerly N-glycosylation sites (after PNGase F) can be used as a mass tag for N-glycosylation site identification using mass spectrometry.

Using this strategy, thousands of N-linked glycoproteins have been identified from various human-derived samples through identifying their glycosite-containing peptides using mass spectrometry [12]. Recently, we have collected more than 30,000 human glycosite-containing peptides, representing >10,000 N-glycosites from >7,000 N-glycoproteins, from over 100 studies regarding to human glycosite-containing peptide analysis published since 2003. These human-derived samples include various body fluids, tissues and cell lines, which can be classified into more than 20 different sample type groups. The entire human glycoproteome database as well as sub-databases associated with individual body fluids or tissues can be found in the N-GlycositeAtlas website at nglycositeatlas.biomarkercenter.org.

In this paper, we summarized the progress of MS-based identification of human N-linked glycoproteins, focusing mainly on the identification of N-glycoproteins and glycosylation sites based on the deglycosylated forms of the glycosite-containing peptides. The glycoproteomic data were classified into three major groups based on their sample sources, including human body fluids, tissues (and related cell lines), and some special cell types such as blood platelets, B-cells, T-cells, as well as spermatozoa. We also discussed the glycosylation at some atypical N-glycosylation sites identified previously by mass spectrometry. Despite the development of novel methods for intact glycopeptide analysis in recent years, these data are beyond the scope of this review.

Glycoproteins in body liquids

Body fluid is one of the main resources for the identification of disease-related biomarkers. Now that N-linked glycoproteins account for a large portion of the protein content in the body fluids, identifying the glycoprotein components in these body fluids is essential for their clinical utility. Till now, N-glycoproteomic data identified by MS has been reported in serum/plasma, urine, saliva, cerebrospinal fluid (CSF) and milk samples (Table 1).

Table 1.

The numbers of unique N-linked glycoproteins and glycosylation sites identified from various human-derived samples including body fluids, tissues and related cell lines, as well as some special cell types.

Category Sample Source Glycosites Glycoproteins
Body Fluid Serum or plasma ~2,600 ~1,500
Urine ~4,300 ~2,600
Saliva ~300 ~200
Cerebrospinal fluid ~1700 ~700
Human milk ~200 ~100
Respiratory System Lung ~900 ~600
Digestive System Liver ~5,800 ~2,500
Pancreas ~1,900 ~1,200
Colon ~2,100 ~1,100
Reproductive System Prostate ~6,200 ~3,300
Ovary ~5,700 ~2,600
Cervix ~1,300 ~600
Urinary System Kidney ~1,100 ~600
Bladder ~300 ~200
Other Tissues Breast ~3,500 ~1,700
Thyroid ~500 ~300
Bone marrow ~300 ~200
Special Cell Types Blood platelet ~700 ~400
T cells ~900 ~600
B cells ~2,200 ~1,100
Spermatozoa ~600 ~300
*

The data were obtained from the N-glycositeAtlas website (glycositeatlas.biomarkercenter.org).

The numbers for glycosites and glycoproteins in tissues includes both tissue samples and related cell lines.

A. Serum or plasma

Serum and plasma glycoproteomic data can be obtained from dozens of papers, with one of the main reasons being that serum is an easily accessible sample and therefore was widely used for method development. For example, during the hydrazide chemistry method development for glycopeptide enrichment in 2003, Zhang et al. identified more than 300 deglycosylated peptides from 97 serum N-glycoproteins [8]. In 2004, plasma was again used as a test sample to evaluate the performance of the HILIC method for glycospeptide enrichment [9]. Using the HILIC method, the authors identified 62 N-glycosylation sites in 37 glycoproteins from plasma. In 2005, Liu et al. developed an approach mainly for plasma glycoprotein analysis by immunoaffinity subtraction, hydrazide chemistry, and mass spectrometry [13]. In this study, the high abundance proteins were first subtracted from human blood plasma via immunoaffinity, then the remained glycoproteins were enriched by hydrazide chemistry method. After trypsin digestion and PNGase F treatment, the deglycosylated peptides were undertaken strong cation exchange (SCX) fractionation followed by LC−MS/MS analysis. Using this approach, they identified 639 N-glycosylation sites from 303 non-redundant glycoproteins.

In addition, N-glycoproteins from human plasma/serum have also been used for the development of many other methods, such as the glycoprotein/glycopeptide enrichment methods using acetone [14], hydrazide tips [15], multi-dimensional lectin affinity chromatography [16], and size exclusion chromatography [17]; methods for core-fucosylated glycoprotein/glycopeptide enrichment and identification [1821]; methods for sialylated glycoprotein/glycopeptide enrichment using titanium dioxide(TiO2) chromatography [22] and strong cation exchange [23]; material synthesis and preparation such as magnetic microspheres [24] and nanoparticles [25] for glycoprotein/glycopeptide enrichment; methods for glycoprotein quantification including 18O stable isotope labeling [26] and SWATH-MS [27], et al. The numbers of the serum/plasma glycoproteins and glycosylation sites identified in these studies were among tens to hundreds. Human serum (or plasma) has also been widely used for biomarker discovery. Till now, mass spectrometry-based N-glycoprotein and glycosite analysis has been performed in the serum of HIV elite suppressors [28], patients with Parkinson’s disease [29], lung cancer [30, 31], liver cancer [32], as well as esophagus disease [33].

Altogether, a total of more than 3,000 glycosylation sites from ~1,800 glycoproteins have been identified from human serum and plasma. Part of these glycosite-containing peptides have been included in the UniPep [35] and PeptideAtlas [36] databases. Even though, more efforts may still be needed to maximize the identification coverage of serum/plasma glycoproteins and glycosites by using the combination of different strategies, such as different enzymatic digestion [37] and enrichment methods. This is particularly important for some low abundance glycoproteins. In addition, serum contains many high abundance glycoproteins, which might inhibit the identification of low abundance glycoproteins in serum. Removal of these high abundance proteins before mass spectrometry-based proteomic or sub-proteomic analyses would increase the coverage of the identified glycoproteins [38].

B. Urine

Urine is another potentially attractive fluid for disease biomarker discovery and diagnostic tests [3943]. It is especially useful for the detection of lesions that are proximate to the urinary system, such as the kidney and prostate. There have been several publications focusing on the glycosylation site analysis of urinary glycoproteins in the last decade. In 2012, Yeh et al. identified 85 N-glycosites in 53 glycoproteins from urine samples using their novel magnetic bead-based zwitterionic HILIC (ZIC-HILIC) material [44]. One year later, Zhou et al. identified 865 N-glycosites from three urine samples using the GlycoFilter method [45]. In another paper, our group reported a total of 2,923 unique glycosite-containing peptides identified from the urine samples of prostate cancer patients with different Gleason scores using solid-phase extraction of N-linked glycosite-containing peptides (SPEG) and 2D-LC-MS/MS analysis [46]. Importantly, by comparing the identified glycoproteins among urine, tissues and serum from the same patients’ samples, we revealed that the majority of aggressive prostate cancer associated glycoproteins were more readily detected in patient’s urine than serum samples, suggesting the great potential of urinary glycoproteins for biomarker discovery.

C. Saliva

Whole saliva is a slightly cloudy colorless liquid which is mainly comprised of the secretions of the parotid, submandibular, sublingual and minor salivary glands, and it exhibits multiple host defense functions in the maintenance of oral health [47]. It has been reported that the protein content of saliva has a significant overlap with proteins from plasma, suggesting that saliva is also very promising in disease-related biomarker discovery as well as diagnostic tests [48, 49].

At least six mass spectrometry-based salivary glycoproteomic datasets have been published. In 2006, Ramachandran et al. enriched N-glycopeptides from the whole saliva using hydrazide chemistry method and analyzed the released formerly N-glycosylated peptides by LC−MS/MS [50]. With this approach, they first reported 84 deglycosylated peptides from 45 unique salivary N-glycoproteins. They subsequently extended the salivary glycoprotein catalogue using a modified hydrazide chemistry method and identified a total of 156 deglycosylated peptides, representing 77 unique N-glycoproteins in salivary fluid [51]. In 2007, by isolating the sialylated glycopeptides from whole salivary using the titanium dioxide chromatography, Larsen et al. identified 97 glycosylation sites from salivary glycoproteins [22], most of which were supposed to be sialic acid-containing glycopeptides (the sialiome) based on the principle of the enrichment method. Although titanium dioxide chromatography was initially developed to enrich the phosphorylated peptides, it was used for sialylated glycopeptide enrichment in this study. In 2014, by using a novel hexapeptide library method, Bandhakavi et al. further increased the number of identified salivary glycoproteins to 192 [52]. In the same year, with the combination of two complementary glycopeptide isolation methods, hydrazide chemistry and HILIC, our group identified a total of 156 non-redundant deglycosylated peptides (representing 164 unique N-glycosylation sites) and 85 N-linked glycoproteins in human whole saliva from six different gender and age groups [53].

D. Cerebrospinal fluid (CSF) and human milk

Several papers have been published on the MS-based analysis of formerly N-glycosylated peptides and glycosites from human CSF and milk. From CSF, a total of >1,600 glycosites from ~700 glycoproteins were identified in health and disease-related samples using multiple strategies, including gel separation, reversed phase-anion exchange and hydrazide-based glycopeptide capture [5457]. In human milk, 240 N-glycosylation sites from 114 glycoproteins were identified in human colostrum and mature milk using HILIC enrichment, deglycosylation via PNGase F treatment and MS analysis [5860]. These data might provide insight into the potential applications of some N-glycoproteins in infant formulae at different stages of development.

As preferred specimens for biomarker discovery, the analysis of human body fluids is very technically challenging because they contain a large number of proteins that could be modified in a variety of forms. Besides, detection of different diseases may require different body fluids but glycoproteomes of many other body fluids remain unknown, which include tear fluid, bile, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, amniotic fluid, and so on [61, 62]. Despite these challenges, glycoproteomics of body fluid is still one of the most important and promising ways for the study of human diseases and disease-related biomarker discovery. For the biological studies and potentially clinical utilities, comprehensive and deeper glycoproteomic analysis from body fluids is still needed.

Glycoproteins in human tissues and related cell lines

Resolving the molecular details of (glyco)proteome variation in different tissues and organs of the human body is critical for the understanding of human biology and disease [63]. Glycoproteomic analysis of disease-related tissues and cells has provided valuable information to identify promising targets for their diagnosis, prognosis and therapy. In this part, we summarized recent glycoproteomic studies in 12 different tissues (and related cell lines), involving some closely related diseases (Table 1).

A. Respiratory system

Lung.

Various glycoproteomic technologies have been used for the biomarker discovery of lung cancers. By using lectin affinity chromatography, Hirao et al. identified 1092 AAL-bound glycoproteins and 948 HHL/ConA-bound glycoproteins in non-small cell lung carcinoma (NSCLC) in 2014. Unfortunately, the results lack the glycosite information of the identified glycoproteins [64]. Recently, by coupling lectin on magnetic nanoprobes, Waniwan et al. performed site-specific glycosylation analysis in drug-sensitive and resistant NSCLC cell lines, in which 2290 and 2767 non-redundant glycopeptides were confidently identified (Byonic score ≥100) in EGFR-TKI-sensitive PC9 and resistant PC9-IR cells, respectively [65]. In addition, glycoproteomic analyses have also been performed on tissues from different subtypes of NSCLC tissues, fetal lung fibroblasts and NSCLC cell lines with KRAS and EGFR mutations [6668]. During the process, a few glycoproteins have been approved in Japan for clinical examinations of lung cancer, including SLX, CYFRA 21.1, SCC (squamous cell antigen), NCAM, Pro-gastrin-releasing peptide (ProGRP), NSE and carcinoembryonic antigen (CEA) [69].

B. Digestive system

Liver.

Besides the well-known marker AFP and AFP-L3, many biomarkers for HCC monitoring and diagnosis remain under research to improve the sensitivity and specificity of their use, which include several proteins with fucosylation, such as haptoglobin, kininogen and alpha-1-antitrypsin [70]. So far, many glycoproteins and glycopeptides have been determined to be altered in samples from HCC patients. For further information, the glycoproteomic studies of HCC and all methods involved have been well-summarized in a recent review by Zhu et al., 2018 [70].

Pancreas.

Currently, CA19–9 is the only clinical biomarker for management of pancreatic cancer, but it still suffers from the low sensitivity and specificity [71]. There have been several glycoproteomic analyses conducted in normal and disease-related pancreas samples. In 2011, Danzer et al. identified 956 unique N-linked glycoproteins including 611 N-linked glycoproteins in mouse MIN6 β-cells and 545 N-linked glycoproteins in human pancreatic islets [72]. Thereafter, to seek sialoglycoproteins associated with pancreatic cancer, Tian et al. analyzed the sialylated glycoproteins in metabolically oligosaccharide engineered pancreatic cells and identified 75 sialylated glycopeptides from 55 glycoproteins by lectin affinity chromatography combined with mass spectrometry [73]. More information can be obtained from the review by Pan et al. [71].

Colon.

As one of the most common malignant tumors of the gastrointestinal cancer, many attempts have been made to search for biomarkers for colorectal cancer (CRC) using glycoproteomic approaches [74, 75]. In 2009, Zhang et al. enriched the glycopeptides from human CRC tissues using a boronic acid functionalized core-satellite composite nanoparticles, in which they identified 194 unique glycosylation sites in 155 glycoproteins [76]. In 2011 Nagano et al. identified 219 glycosylated proteins by the monomeric avidin labeling and glycoprotein capturing in HCT-116 cells, and 312 N-glycosylated proteins in xenograft samples [77]. In 2014, Nicastri et al. used solid-phase extraction and 18O stable isotope labeling method and quantitatively identified 1459 glycopeptides and 770 glycoproteins, among which 54 glycoproteins were found to be up-regulated in colorectal cancer samples [78]. So far, there is still no effective early diagnostic markers for CRC and profiling of glycoproteome in CRC tissues/cells would provide new insights for the discovery of new biomarkers.

C. Reproductive system

Prostate.

Being one of the most important, death-related diseases in the male population, a number of glycoproteomic studies have been reported in prostate cancer. In 2012, a work by Whitmore et al. using hydrazide-based chemistry revealed 200 glycopeptides from 104 glycoproteins in 22Rv1 prostate cancer cells cultured in vitro [79]. Later, Liu et al. comparatively analyzed glycoproteomes of normal prostate, non-aggressive and metastatic prostate cancer tumor tissues and identified 5422 N-glycosites from 2460 glycoproteins using SWATH mass spectrometry, which essentially covers ~50% of the annotated human N-glycoproteome [80]. In 2015, we performed a quantitative glycoproteomic analysis on LNCaP and PC3 prostate cancer cell lines using solid-phase extraction and iTRAQ-labeled peptides followed by LC-MS/MS analysis. A total of 1810 unique N-glycopeptides from 653 identified N-glycoproteins were identified, among which 176 glycoproteins were observed to be differentially expressed between the two cell lines [81]. Due to their close relations, the glycosylation in bladder and prostate proteome is often analyzed together. In 2009, Goo et al. analyzed secreted glycoproteomes from both human prostate and bladder stromal cells, in which 116 prostate cell secreted glycoproteins and 84 bladder cell secreted glycoproteins were identified with Protein Prophet probability scores ≥ 0.9 using glycopeptide-capture method followed by mass spectrometry [82]. Biomarkers for prostate cancer have recently been reviewed by Intasqui et al., in which a list of clinical useful biomarkers/glycol-related biomarkers were also summarized [83].

Ovary.

Till now, the glycoprotein Cancer Antigen 125 (CA125) is the most frequently used biomarker for ovarian cancer detection and different glycoforms of CA125 have been investigated to improve its sensitivity and specificity [84, 85]. In 2011, our team analyzed three cases of normal ovary and ovarian cancer tissues and identified 368 N-glycosylation sites in 286 glycoproteins using a combined method of solid-phase extraction and iTRAQ labeled glycoproteins [86]. In 2013, we developed an approach for quantitative N-glycoproteomic analysis based on genomic N-glycosite prediction, which improved over three times the quantity of N-deglycopeptide assignments from ovarian cancer cell lines compared to the traditional method [87]. Later, we also developed a chemoenzymatic method called solid-phase extraction of N-linked glycans and glycosite-containing peptides (NGAG) for the comprehensive characterization of glycoproteins and identified 2,044 unique glycosite-containing peptides with an N-X-S/T motif from OVCAR-3 ovarian cancer cells [88].

Cervix.

The cervix is the lower part of the uterus in the human female reproductive system [89]. Infection with the human papillomavirus (HPV) can cause changes in the epithelium lined on the cervical canal, which can lead to cervix cancer [90]. In 2009, McDonald et al. identified 240 glycoproteins from the HeLa cells derived from cervical cancer cells, in which they combined the approaches of periodate/hydrazide chemistry and the M. amurensis lectin affinity chromatography [91]. In 2015, Weng et al. profiled the N-glycosylation sites of Hela cell line with combination of commonly applied protocol and a new method (N-terminal succinylation assisted enzymatic deglycosylation). In total, 1230 glycopeptides from 511 glycoproteins were mapped from HeLa cell lysate, which achieved deep coverage of glycosylation at N-terminal Asparagine of the glycopeptides [92].

D. Urinary system

Kidney.

The kidney is a main organ of the human body for urine generation. The human embryonic kidney 293T cell line (HEK 293T) has been used in the development of several glycoproteomic methods. In 2011, Chen et al. developed a novel two-step protease digestion (Lys-C and trypsin) and glycopeptide capture approach and applied it for the analysis of HEK 293T cells, where a total of 359 glycosites from 143 N-glycoproteins were identified [93]. In 2014, HEK 293T cell line was again analyzed by an integrated hydrophilic interaction chromatography solid-phase interaction (HILIC-SPE) and mass spectrometric strategy, and 811 N-glycosites from 567 proteins were identified [94]. The next year, cell surface N-glycoproteins from HEK293T were analyzed by integrating metabolic labeling, copper-free click chemistry, and MS-based proteomics methods and lead to the identification of 144 unique N-glycopeptides from 110 glycoproteins containing 152 N-glycosylation sites [95]. Unfortunately, no data is available yet for the glycoproteome of human renal tissues.

E. Other tissues

Breast.

Breast-related cancer has a high incidence rate in female and a large number of studies have been conducted on the diagnosis, occurrence, and treatment of breast cancer. In 2012, Yen et al. identified 486 glycoproteins from 14 breast cancer cell lines using hydrazide magnetic beads, in which a panel of differentially expressed glycoproteins were determined to allow the classification of the breast cancer [96]. One year later, Boersema et al. identified and quantified 1398 N-glycosylation sites from the supernatant of 11 cell lines that represent different stages of breast cancer development using the N-glyco-FASP technology coupled with super-SILAC accurate quantification [97]. In addition, glycoproteomic analysis was also performed in triple-negative and luminal breast tumors, in which a panel of glycoproteins were determined to be differentially expressed [98, 99].

Thyroid.

The glycoproteomic analysis of thyroid tissues and cell lines is still very limited. In 2009, glycoproteome profiles of cell surface and secreted proteins were reported using a serial of thyroid cancer cell lines, including papillary thyroid cancer (TPC-1), follicular thyroid cancer (FTC-133), Hürthle cell carcinoma (XTC-1), and anaplastic thyroid cancer (ARO and DRO-1), which represented the range of thyroid cancers of follicular cell origin [100]. An average of 150 glycoproteins were identified in each cell line but with distinct glycoproteomic patterns, of which more than 57% were known as cell surface or secreted glycoproteins. Based on the results obtained, a set of glycoprotein biomarker candidates, such as CD44, galectin 3 and metalloproteinase inhibitor 1, have been proposed for thyroid cancer diagnosis.

Bone marrow.

To understand the chemoresistance in the chemotherapy of leukemia, Zhang et al. investigated the differences between the glycoproteomes of adriamycin-sensitive cell line K562S and the adriamycin-resistant leukemia cell line K562A using the cell-surface capturing method and quantitative proteomics [101]. A total of 134 glycopeptides from 94 glycoproteins in K562S cells and 244 glycopeptides from 180 glycoproteins in K562A cells were identified with FDR<1% at the peptide level. Using two quantitative methods (isotopic dimethyl labeling and SWATH), 15 glycoproteins were found to display a consistent significant change trend between these cell lines. These 15 proteins included classical multidrug resistance-related glycoproteins such as ABCB1 as well as three glycoproteins (CTSD, FKBP10, and SLC2A1) that were shown to be novel participants in the chemoresistance of leukemia cancer cells.

With the rapid improvements of mass spectrometry and glycoproteomic technologies, more and more MS data related to the glycosites and glycoforms from different tissues will be generated. Therefore, how to take advantage of these data and solve problems in diseases and fundamental biology will be the next challenge.

Glycoproteins in special cell types

There are some special cell types in human that researchers are particularly interested in. Hereby, glycoproteomic studies in special cell types are summarized, including human blood platelets, T cells, B cells and spermatozoa (Table 1).

As a critical component of human blood, more than 1000 N-glycosylation sites have been identified from blood platelets by MS. In 2006, Lewandrowski et al. reported the first deglycopeptide dataset from human blood platelets by using both lectin and hydrazide chemistry enrichment methods, PNGase F treatment and mass spectrometry [102]. Using this approach, they identified 70 different glycosylation sites from 41 different platelet glycoproteins. One year later, they published the second paper reporting 148 N-linked glycosylation sites on 79 glycoproteins from the platelet membrane [103]. In 2008, the same group reported 125 glycosylation sites on 66 proteins from platelets by glycopeptides enrichment using electrostatic repulsion hydrophilic interaction chromatography (ERLIC) [104]. In total, more than 250 glycosylation sites annotated for platelets were identified using three different enrichment strategies (four different enrichment methods) [104]. In 2017, our group published another MS dataset related to platelet glycoproteins. By using iTRAQ labeling and SPEG (hydrazide chemistry) enrichment, we identified 799 unique N-linked glycosylation sites in platelets [105], which further extended the depth of the glycoprotein coverage of human platelets. Up to now, more than one thousand N-glycoproteins have been identified from human platelets by MS.

Besides blood platelets, glycoproteomic analysis on T cells, B cells and spermatozoa have been reported as well. From T cell line ACH-2, Yang et al. identified 563 unique glycopeptides from 247 unique glycoproteins using hydrazide chemistry enrichment method [106]. In 2014, Deeb et al. identified 2383 unique glycosylation sites on 1321 glycoproteins from the cell surface of diffuse large B-cell lymphoma subtypes by glyco-FASP method and MS analysis [107]. Using glyco-FASP method coupled with LC-MS/MS, Wang et al. identified 554 N-glycosylation sites and 297 N-glycoproteins from human spermatozoa [108].

Atypical N-glycosylation sites

In addition to the canonical N-linked glycosylation motif asparagine-X-serine/threonine (N-XS/T, X is any amino acid except proline) [5, 109], many sites with atypical sequons have been identified by MS as N-linked glycosylation sites. These reported atypical glycosites include NX-C [110], N-X-V [111, 112] and N-X-G [112]. Except for the N-X-C motif, which has been confirmed in several known glycoproteins [110], all other atypical motifs were only identified based on the deamidation of asparagine (N) residues in the peptides after PNGase F treatment (with/without 18O labeling) using mass spectrometry-based glycoproteomics [111, 112]. However, the atypical sites identified based on deamidation of N are potentially false positives as it could occur naturally or be induced during sample preparation [10, 11].

Recently, by using the NGAG method coupled with high resolution mass spectrometry, we identified two atypical N-glycosites with 146N#HV and 156N#SC motifs on Protein sel-1 homolog 1 (SEL1L) and Kunitz-type protease inhibitor 2 (SPINT2) from OVCAR-3 cell line, respectively [113]. These two atypical glycosites were further confirmed via the direct identification of their intact N-glycopeptide forms (with glycan still attached at the glycosylation site). The results showed that the glycosite 146N#HV was modified by the glycan Man9 (Figure 2A) while the glycosite 156N#SC was modified by four different oligo-mannose glycans (HexNAc2Hex7 -HexNAc2Hex10, Figure 2B). Using the same strategy, another two atypical N-glycosites with the N-X-V motif, which included 68N#EV on Albumin (ALB) and 62N#GV on Alpha-1B-glycoprotein (A1BG), were identified and verified from human serum [34]. Both glycosites were modified by complex glycans N4H5S1 and N4H5S2 (Figure 2C and 2D).

Figure 2.

Figure 2.

Atypical N-glycosites identified from human glycoproteins. Four glycosites identified from (A) Protein sel-1 homolog 1 (SEL1L), (B) Kunitz-type protease inhibitor 2 (SPINT2), (C) Albumin (ALB), and (D) Alpha-1B-glycoprotein (A1BG).

Compared to previously reported atypical N-glycosites that were identified based on the deamidation of asparagine residues after PNGase F treatment [111, 112], these two studies further validated the identified atypical motif glycosites by directly identifying their intact glycopeptides. Since the deamidation of N can sometimes cause false-positives, direct identification of the intact glycopeptides with glycan attached is particularly important for confirmation of these atypical N-glycosylation sites detected by mass spectrometry-based glycoproteomics.

Future perspectives

Owing to the huge improvement of mass spectrometry technology, liquid chromatography separation, analysis software and glycoprotein/glycosite-containing peptide isolation methods in recent years, many mass spectrometry methods and strategies have been developed for large-scale profiling of N-linked glycoproteins and glycosylation sites. Prior to MS analysis, glycopeptide enrichment is usually performed to increase the concentration of the glycopeptides. Then glycans are normally removed from the glycosite-containing peptides to eliminate the micro-heterogeneity of the glycosylation. By using this strategy, large-scale glycosylation analysis has been achieved by employing proteomic approaches, and the glycan attached sites can be identified through the deamidation (with +0.98 Da shift) at the formerly glycosylation sites [7, 8]. The majority of human N-linked glycoproteins and glycosylation sites known nowadays were identified though this strategy. This strategy, however, would cause false identification when the mass accuracy of the detected peptide is not high enough. The high resolution and accuracy of mass spectrometers developed in recent years have greatly increased the identification confidence of the glycosites and glycoproteins, and increased the numbers of identified glycosite-containing peptides, but the false positive rates of identified glycosylation sites using this strategies are still much higher than we expected, as deamidation of asparagine could be induced during sample preparation, and even occur naturally in human body, as reported in recent studies [10, 11]. The naturally occurred deamidation can be influenced by several factors, such as pH (in particular pH conditions for PNGase F driven deglycosylation), temperature, ionic strength, and others [10]. Therefore, appropriate controls and extreme care have to be taken in both sample preparation and data interpretation to prevent false-positive identifications [11].

Based on the data summarized above, although several thousand glycoproteins could be identified from one single body fluid or tissue, there are still many tissues (such as heart, stomach, brain, and kidney) in which no mass spectrometry-based glycosite data has been generated yet. In addition, the identification numbers of glycoproteins and glycosylation sites are largely determined by the approaches of sample preparation, mass spectrometry analysis as well as data analysis. Therefore, a systematically glycoproteomic analysis of various human body fluids and tissues using a uniform platform may still be needed. This uniform platform may need to combine several different strategies, such as using different enzymatic digestion [37], enrichment methods, as well as intact glycopeptide analysis, to maximize the identification coverage of human glycoproteins and glycosites. Such work will help to create a high quality of human glycoproteome database to facilitate the future studies of glycoprotein structures and functions.

Along with glycosylation sites, characterization of the glycans attached is also essential for the understanding the biological roles of a glycoprotein. Although the overall glycoform diversities can be revealed by analyzing the released glycans or de-glycosylated peptides, the “true” glycan heterogeneity still need to be uncovered at the intact glycopeptides or glycoproteins level. In recent years, direct analysis of intact glycopeptide/glycoproteins was enabled with the development of mass spectrometry technologies and softwares [88, 125129]. Detailed information is available from reviews by Cao et al., 2016 and Yang et al., 2017 [130, 131]. Intact glycopeptide analysis can provide additional site-specific glycosylation information, which allows us to know which glycans are attached to which glycosylation sites [88]. These technologies will indisputably push glycoproteomic researches to a higher stage.

HIGHLIGHTS.

  • Large-scale profiling of N-linked glycoproteins and glycosylation sites from various human–derived samples has been achieved by mass spectrometry.

  • Glycoproteomic profiles from five body fluids, 12 human tissues, and four special cell types were summarized.

  • Mass spectrometry has identified more than 30,000 glycosite-containing peptides from >7,000 human N-glycoproteins.

  • N-glycosylation at the atypical sites with N-X-C and N-X-V sequons has been confirmed in several human glycoproteins.

  • A systematical glycoproteomic analysis of all kinds of body fluids and human tissues using a uniform platform would be necessary for an integrated map of human N-glycoproteomes.

ACKNOWLEDGEMENTS

This work was supported by National Natural Science Foundation of China (Grant No. 91853123, 21705127, 81773180 and 81800655), and Natural Science Foundation of Shaanxi Province (Grant No. 2018JM7086074). Dr. Hui Zhang was supported by the National Institutes of Health, National Cancer Institute, the Early Detection Research Network (EDRN, U01CA152813), the Clinical Proteomic Tumor Analysis Consortium (CPTAC, U24CA210985), National Heart Lung and Blood Institute, Programs of Excellence in Glycosciences (PEG, P01HL107153), and the National Institute of Allergy and Infectious Diseases (R21AI122382).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

COMPETING INTERESTS

The authors declare that they have no competing interests.

REFERENCES

RESOURCES