Abstract
Glycosylation of proteins and lipids involves over 200 known glycosyltransferases (GTs), and deleterious defects in many of the genes encoding these enzymes cause disorders collectively classified as congenital disorders of glycosylation (CDGs). Most known CDGs are caused by defects in glycogenes that affect glycosylation globally. Many GTs are members of homologous isoenzyme families and deficiencies in individual isoenzymes may not affect glycosylation globally. In line with this, there appears to be an underrepresentation of disease-causing glycogenes among these larger isoenzyme homologous families. However, genome-wide association studies have identified such isoenzyme genes as candidates for different diseases, but validation is not straightforward without biomarkers. Large-scale whole-exome sequencing (WES) provides access to mutations in, for example, GT genes in populations, which can be used to predict and/or analyze functional deleterious mutations. Here, we constructed a draft of a functional mutational map of glycogenes, GlyMAP, from WES of a rather homogenous population of 2000 Danes. We cataloged all missense mutations and used prediction algorithms, manual inspection and in case of carbohydrate-active enzymes family GT27 experimental analysis of mutations to map deleterious mutations. GlyMAP (http://glymap.glycomics.ku.dk) provides a first global view of the genetic stability of the glycogenome and should serve as a tool for discovery of novel CDGs.
Keywords: damaging mutations, glycogenes, nonsynonymous mutations, nsSNV, MAF
Introduction
Glycosylation of proteins and lipids in human involves complex nontemplate-driven processes orchestrated by hundreds of enzymes, transporters, chaperones and lectins (Stanley and Okajima 2010; Moremen et al. 2012). Glycosylation is by far the most diverse and complex class of posttranslational modification (PTM) and includes N-linked, multiple O-linked and C-type glycans attached to glycoproteins and proteoglycans. The diversity of glycan structures found on proteins and lipids is enormous (Haltiwanger and Lowe 2004; Stanley 2011; Moremen et al. 2012). Overwhelming evidence implicates significant functions for glycosylation in most biological processes that contribute to health and disease, and in support of this >100 congenital disorders of glycosylation (CDGs) caused by deficiencies in genes involved in glycosylation (glycogenes) have been identified (Haeuptle and Hennet 2009; Jaeken 2011; Freeze et al. 2014). CDGs are generally recessive with two nonfunctional alleles or one nonfunctional and one hypomorphic alleles and only very few autosomal-dominant inherited glycosylation defects are known (Freeze et al. 2014).
Most of these deficiencies result in severe multisystemic disorders caused by global defects in N-glycosylation of proteins, identifiable by analysis of the abundant serum N-glycoprotein, transferrin (Freeze 2013). The second largest group of known disease-causing glycogenes affects the mannose O-glycosylation (O-Man) pathway that causes related phenotypes within the group of congenital muscular dystrophies (Godfrey et al. 2007; 2011). Other CDGs are caused by deficiencies in glycogenes controlling different O-linked glycosylation pathways including O-Fuc, O-Glc, O-GlcNAc, O-Xyl, O-GalNAc and HYL-Gal, as well as glycolipid and glycosylphosphatidylinositol (GPI) anchors (Freeze et al. 2014) (Figure 1). A common feature of most of the glycogenes identified as causing CDGs to date is that they represent genes without substantial genetic backup, that is, genes which do not have apparent potential paralogous isoenzymes that may be predicted to provide partial functional backup. Thus, there appears to be a striking underrepresentation of glycogenes that are part of larger homologous gene families among known disease-causing glycogenes. A fundamental question is therefore whether the many paralogous glycogenes in human are spared from deleterious mutations, or if we have overlooked their role because they induce subtle nonglobal changes in glycosylation and perhaps also more subtle disease phenotypes.
Fig.1.

The human glycans synthesized by the 208 GT genes depicted for the major biosynthetic pathways in endoplasmic reticulum, Golgi and cytosol PTM. The CAZy GT families involved in each glycosylation step are denoted.
Glycosyltransferases (GTs) are classified in GT families in the carbohydrate-active enzymes (CAZy) database (Bourne and Henrissat 2001; Lombard et al. 2014) (www.cazy.org) based on sequence and structure analyses and to date 44 human GT families exist. Large homologous GT families of isoenzymes with related properties cover many steps in glycosylation in human. This is perhaps most pronounced for steps in elongation, branching and capping of N-acetyllactosamine-based structures with large β4Gal-Ts (CAZy family GT7) (Almeida et al. 1997; Amado et al. 1999), β3Gal-Ts (GT31) (Amado et al. 1998), β3GlcNAc-Ts (GT31) (Sasaki et al. 1997; Isshiki et al. 2003) and capping by α2/3/4FUTs (GT10 and 11) (Becker and Lowe 2003) or α3/6STGals (GT29) (Audry et al. 2011), but many steps have two or more isoenzymes with potential partially redundant functions (see Figure 1). Moreover, the initiation step of O-GalNAc glycosylation is covered by up to 20 polypeptide GalNAc-transferases (GalNAc-Ts) belonging to GT27, which provides for the highest degree of differential regulation of a single glycosidic linkage and the O-GalNAc glycoproteome (Bennett et al. 2012). So far only a few of the GT genes that are members of the large isoenzyme families have been shown to cause CDGs (GT7: B4GALT1, B4GALT7; GT27: GALNT3; GT29: ST3GAL3, ST3GAL5 and GT31: B3GALNT1, B3GALNT2, B3GALT6, B3GALTL and LFNG). Nevertheless, genome-wide association studies (GWAS) and other gene association strategies increasingly point to glycogenes belonging to homologous gene families as candidate genes for diseases. Examples include several members of the GalNAc-T gene family (GT27), β3GalT family (GT31) and the sialyltransferase family (GT29) (Bennett et al. 2012). A major obstacle in validating and discovery of potential disease-causing glycogene candidates belonging to large gene families is the lack of simple phenotypic screening assays, because the changes in glycosylation are expected to be subtle and not global.
Knowledge of one or more validated deleterious alleles of a glycogene implicated as a disease-causing candidate gene by, for example, GWAS would enable rapid confirmation of a causal role of this gene in the particular disease if the frequency of the deleterious alleles were higher in a disease population compared with controls. Importantly, this discovery and validation strategy does not rely on a comprehensive set of deleterious alleles in the population because all other allelic variants that accumulate with one or more of the validated deleterious alleles would be candidates for full or partial inactive alleles. Next-generation sequencing (NGS) is now providing access to massive amounts of data based on whole-genome sequencing and whole-exome sequencing (WES) from various normal as well as disease populations (Li et al. 2010; Albrechtsen et al. 2013). These data can serve as a discovery platform for the identification and estimation of null allele frequencies in populations, provided these can be reliably predicted or experimentally determined.
We have recently reported WES data for a rather homogenous population of 2000 Danes (LuCamp Initiative) (Albrechtsen et al. 2013; Lohmueller et al. 2013), which provides a unique resource for discovery of deleterious mutant alleles with allele frequencies down to 0.01%. Here, we have examined 208 GT genes for functional deleterious mutant alleles in the Danish population using prediction algorithms, manual inspection, as well as experimental validation for one large group of isoenzymes in GT27 (GalNAc-Ts). Results from these analyses of unpublished WES data from 2000 individual represent the data extracted from Danish exomes and these data were mirrored into WES data from the NHLBI EVS server representing the same 208 GTs. The combined data provide the first global visualization of mutation rates and stability of the glycogenome, which forms the basis for a Functional Mutation Map of Glycogenes, here coined GlyMAP. In line with others we demonstrate that known CDG disease-causing alleles, with a few exceptions, are rare and not represented in a small largely healthy population like the Danish population studied here. However, importantly we demonstrate that the approach can identify null alleles in this population with fairly low allele frequency. Thus, we identified and experimentally validated two mutant alleles of GALNT5 and GALNT14 with allele frequencies of 0.2 and 0.1%, respectively. In summary, GlyMAP in its current form describes the frequency of functionally validated GT nonsynonymous single-nucleotide variations (nsSNVs) identified in a homogenous Danish cohort. Future efforts are directed at expanding the identification and validation of nsSNVs to include alternate genes that participate in shaping the human glycome such as hydrolases, carbohydrate-binding proteins, nucleotide transporters and others. Ultimately, GlyMAP has the potential to embrace ongoing WES efforts and, in a comprehensive manner, display the frequencies of functionally validated nsSNVs in both healthy and disease cohorts.
Results
Human GT genes analyzed
We collected 198 human GT genes from 44 of the 96 classified GT families in CAZy containing human genes (Table I; Supplementary data, Table SI). We further included 10 GT genes, which were not annotated in CAZy. These genes are predicted to encode enzymes involved in the N-glycosylation pathway (ALG1L2 in GT33 and ALG10B in GT59) and C-mannosylation (DPY19L1, DPY19L2, DPY19L3) (Carson et al. 2006; Buettner et al. 2013), as well as putative GT genes with roles in glycosylation of α-dystroglycan (FKRP, FKTN and TMEM5) (Kobayashi et al. 1998; Brockington et al. 2001; Vuillaumier-Barrot et al. 2012).
Table I.
Human glycosyltransferase genes
| CAZy family | Number of genes | Gene name (HGNC nomenclature) |
|---|---|---|
| GT1 | 23 | ALG13, ALG14, UGT1A1–10, UGT2A1, UGT2A2, UGT2A3, UGT2B10a, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7, UGT3A1, UGT3A2, UGT8 |
| GT2 | 6 | ALG5, B3GNTL1, DPM1, HAS1, HAS2, HAS3 |
| GT3 | 2 | GYS1, GYS2 |
| GT4 | 5 | ALG11, ALG2, GLT1D1, GTDC1, PIGA |
| GT6 | 3 | ABO, GBGT1, GLT6D1 |
| GT7/31 | 15 | B4GALT1, B4GALT2, B4GALT3, B4GALT4, B4GALT5, B4GALT6, B4GALT7, CHPFb, CHPF2b, CHSY1b, CHSY3b, CSGALNACT1, CSGALNACT2, B4GALNT3, B4GALNT4 |
| GT8/49 | 9 | GLT8D1, GLT8D2, GXYLT1, GXYLT2, GYG1, GYG2, GYLTL1Bb, LARGEb, XXYLT1 |
| GT10 | 8 | FUT10, FUT11, FUT3, FUT4, FUT5, FUT6, FUT7, FUT9 |
| GT11 | 2 | FUT1, FUT2 |
| GT12 | 2 | B4GALNT1, B4GALNT2 |
| GT13 | 2 | MGAT1, POMGNT1 |
| GT14 | 8 | GCNT1, GCNT2, GCNT3, GCNT4, GCNT6a, GCNT7, XYLT1, XYLT2 |
| GT16 | 1 | MGAT2 |
| GT17 | 1 | MGAT3 |
| GT18 | 2 | MGAT5, MGAT5B |
| GT21 | 1 | UGCG |
| GT22 | 4 | ALG12, ALG9, PIGB, PIGZ |
| GT23 | 1 | FUT8 |
| GT24 | 2 | UGGT1, UGGT2 |
| GT25 | 3 | CERCAM, GLT25D1, GLT25D2 |
| GT27 | 20 | GALNT1, GALNT2, GALNT3, GALNT4, GALNT5, GALNT6, GALNT7, GALNT8, GALNT9, GALNT10, GALNT11, GALNT12, GALNT13, GALNT14, GALNT15, GALNT16, GALNT17 (L6), GALNT18, GALNT19 (WBSCR17), GALNT20(L5) |
| GT29 | 20 | ST3GAL1, ST3GAL2, ST3GAL3, ST3GAL4, ST3GAL5, ST3GAL6, ST6GAL1, ST6GAL2, ST6GALNAC1, ST6GALNAC2, ST6GALNAC3, ST6GALNAC4, ST6GALNAC5, ST6GALNAC6, ST8SIA1, ST8SIA2, ST8SIA3, ST8SIA4, ST8SIA5, ST8SIA6 |
| GT31 | 21 | B3GALNT1, B3GALNT2, B3GALT1, B3GALT2, B3GALT4, B3GALT5, B3GALT6a, B3GALTL, B3GNT2, B3GNT3, B3GNT4, B3GNT5, B3GNT6, B3GNT7, B3GNT8, B3GNT9a, C1GALT1, C1GALT1C1, LFNG, MFNG, RFNG |
| GT32 | 2 | A4GALT, A4GNT |
| GT33 | 3 | ALG1, ALG1L, ALG1L2 |
| GT35 | 3 | PYGB, PYGL, PYGM |
| GT39 | 2 | POMT1, POMT2 |
| GT41 | 1 | OGT |
| GT43 | 3 | B3GAT1, B3GAT2, B3GAT3 |
| GT47/GT64 | 5 | EXT1b, EXT2b, EXTL1b, EXTL2, EXTL3b |
| GT49 | 3 | B3GNT1 |
| GT50 | 1 | PIGM |
| GT54 | 3 | MGAT4A, MGAT4B, MGAT4C |
| GT57 | 2 | ALG6, ALG8 |
| GT58 | 1 | ALG3 |
| GT59 | 1 | ALG10 |
| GT61 | 2 | EOGT, POMGNT2 (GTDC2) |
| GT65 | 1 | POFUT1 |
| GT66 | 2 | STT3A, STT3B |
| GT68 | 1 | POFUT2 |
| GT76 | 1 | PIGV |
| GT90 | 2 | KDELC1, POGLUT1 |
| GTnc | 9 | PLOD3, DPY19L1c, DPY19L2c, DPY19L3c, DPY19L4c, FKRPc, FKTNc, TMEM5c |
WES data not available for B3GALT6, B3GNT9, GCNT6 and UGT2B10.
Genes with a tandem of two GT domains on a single polypeptide.
Poplypeptides with inferred GT activities.
The predicted biosynthetic roles of the analyzed GTs and their CAZy families in generating the human glycome are outlined in Figure 1, and the figure illustrates the potential functional redundancies exist for many biosynthetic steps. The human GT families with the largest number of isoenzymes with potential redundant functions include the GT27 (polypeptide N-acetylgalactosaminyltransferases) followed by GT29 (α2,3- and α2,6-sialyltransferases), GT31 (β3-galactosyltransferases) and GT14 (β3-galactosyltransferases) families. Detailed information of the 208 human GT genes is summarized in Supplementary data, Table SI.
The overall aim with building GlyMAP was to catalog nsSNVs and use prediction strategies as well as functional evaluation to identify GT genes that encode inactive enzymes in a well-defined population (Danish population). Such a map of inactive GT alleles in a defined population will be highly useful to validate and dissect GT genes identified as candidate genes for diseases by large-scale gene association studies. The strategy for building GlyMAP was divided into three phases: (i) assembly of exome information for human GTs, (ii) prediction of damaging nsSNVs that affect function of the encoded GTs and (iii) experimental validation of predictions for GT27 (Figure 2).
Fig. 2.
The selection process for finding putative damaging nsSNVs included three phases: phase 1, assembly an SNV database for the 208 human glycosyltransferases from the LuCAMP and the EVS public database (dbGlyMAP); phase 2, filtering dbGlyMAP for potentially damaging nsSNVs using knowledge-based predictions; and phase 3, experimental validation of selected damaging nsSNVs in GT27.
Phase 1: Assembly of exome information of GTs
WES data derived from 2000 Danish individuals sequenced in the LuCAMP initiative (Albrechtsen et al. 2013; Lohmueller et al. 2013) were used for analysis of the 208 GT genes. The WES study population consisted of 1000 healthy individuals and 1000 with diagnosed type 2 diabetes, overweight and hypertension but otherwise healthy; and complete WES data from 983 controls and 982 cases, in total 1965 individuals, were extractable. Clinical and biochemical characteristics of the 2000 individuals selected from three different Danish study populations were described previously (Albrechtsen et al. 2013; Lohmueller et al. 2013).
The SNVs mapped in the LuCAMP dataset for 208 GT genes were analyzed and all nsSNVs were collected and included for phase 2 analysis. Insertions and deletions (indels) were not included in the LuCAMP WES data and are therefore not represented here. Variations in introns including splice sites and flanking UTRs were also not included in the study because the functional consequences of these cannot be predicted. A summary of nucleotide and amino positions, minor allele frequencies (MAFs) and genotype frequencies for all nsSNVs analyzed is available in a database designated dbGlyMAP (www.CAZy.org).
We identified a total of 6588 SNVs in the LuCAMP GT WES data; 45% of these (2977) were located in exons and 28% (1875) represented nsSNVs (Table II and Figure 3A). The majority of these nsSNVs had MAF values <0.001 corresponding to less than five alleles in the study population (Figure 3B). A number of nsSNVs were only found once or twice in the study population, and were initially considered potentially uncertain and too infrequent to serve the purpose of GlyMAP. The effect of excluding these rare nsSNVs is illustrated in Figure 3C and the number of nsSNVs is reduced from 1736 to 553 when only analyzing alleles occurring three times or more in the LuCAMP dataset. The following analysis is therefore focused on these nsSNVs.
Table II.
SNVs in dbGlyMap
| Cohort | Number of samples | Total SNVs | Intron/intergenic SNVsa | Exon SNVsa | nsSNVsb | Population-specific nsSNVsc |
|---|---|---|---|---|---|---|
| LuCamp | 1965 | 6588 | 3611 (55%) | 2977 (45%) | 1875 (28%) | 881 (47%) |
| EVS EA | 4300 | 12,889 | 4997 (39%) | 7892 (61%) | 4586 (36%) | 3086 (67%) |
| EVS AA | 2203 | 11,569 | 4694 (41%) | 6875 (59%) | 3815 (33%) | 2598 (68%) |
The exonic regions represent both UTR and CDS regions, and the percentage SNVs represent fraction of exonic variants out of total SNV.
Percentage nsSNVs out of total SNVs.
Percentage population specific out of total number of nsSNVs.
Fig. 3.
The distribution of SNVs in the LuCAMP, the EVS_EA and the AA_EVS populations. (A) The distribution of the SNV in the intron/intergenic and the exon regions are shown for the three datasets. (B) The distribution of all SNVs, nsSNVs and nsSNVs-predicted damaging by PolyPhen2 is shown for LuCAMP, EVS_EA and AA_EVS. The MAF values are represented by four intervals: [0.5;0.1], [0.1;0.01], [0.01;0.001] and [0.001;0.0001]. (C) The Venn diagrams show the distribution of shared nsSNVs, nsSNVs represented in two populations and nsSNVs represented in one population; (a) all nsSNVs in the three populations, (b) nsSNVs represented by three or more alleles and (c) by five or more alleles. (D) The nsSNV distribution in the three populations vs. cumulative number of alleles shown for shared nsSNVs, nsSNVs found in two populations and nsSNVs found only in one population. (E) Venn diagram for the phase 2-predicted damaging nsSNVs. (F) Distribution of CAZy GT families for the phase 2-predicted damaging nsSNVs in the LuCAMP and EVS populations, nsSNVs-predicted damaging by all predictions tools including manual curation are marked dark gray.
Comparison of the LuCAMP data with similar data publicly available from the NHLBI Exome Sequencing Project (Exome Variant Server, EVS—ESP6500) demonstrated that the LuCAMP data are similar to the European American (EVS_EA) SNV data, and as expected genetically distant to the African-American SNV data (EVS_AA). Differences in the total number of SNVs and the intron/intergenic/exon distribution (Table II and Figure 3A) reflect the different WES platforms and/or data-filtering protocols used (see Material and Methods). The distribution of SNV, nsSNVs and predicted damaging nsSNVs is similar with respect to MAF values for all three populations with the majority having MAF values <0.001 (Figure 3B). The population occurrence of all nsSNVs including singletons and duplets and nsSNVs represented by ≥3 and ≥5 alleles is illustrated in Figure 3C. For the LuCAMP and the EVS_EA data, the population-specific nsSNVs reduce from 47 to 5% and 67 to 12%, respectively, for allele occurrence of ≥1 to ≥5 alleles (Figure 3C). The drastic decrease in nsSNVs private for the LuCAMP population declines with the occurrence of ≥3, whereas the portion of nsSNVs found in all three populations remains almost constant and represents common nsSNVs (Figure 3D). The same distribution between population-specific and shared nsSNVs is found for the two other populations (data not shown).
Phase 2: Prediction of nsSNVs that affect function of the encoded GTs
nsSNVs with allele occurrence ≥3 in the LuCAMP dataset were analyzed with the bioinformatic prediction tools PolyPhen2, SIFT and Provean as well as by manual evaluation using multisequence alignment of paralogous and orthologous proteins as well as domain structure of GTs and structural information when available (Figure 2). Phase 2 resulted in a total of 52 nsSNVs represented by three or more alleles. These nsSNVs were found in 36 different GT genes representing 17 of the 44 human GT families, and all were predicted to affect the function of the encoded enzymes (Supplementary data, Table SII).
Comparison with the EVS data showed that six of the potentially damaging nsSNVs were only found in the Danish population and included genes in GT2 (DPM1; p.Met1Leu), GT27 (GALNT12; p.Ala188Val, GALNT14: p.Lys401Gln and p.Arg91Gln), GT31 (MFNG; p.Leu308Phe) and GT54 (MGAT4B: p.Arg373Trp). Another seven nsSNVs were found both in the LuCAMP and EVS_EA datasets but not in EVS_AA and these were in families GT4, GT27, GT31 and GT39; the remaining 44 nsSNVs were found in all three populations (Figure 3E) and represented 16 GT families (Supplementary data, Table SII). Collectively, the large GT27 family contained most of the possible damaging nsSNVs occurring three or more times (14/52; see Supplementary data, Table SII). The MAF values of the LuCAMP nsSNVs predicted to affect enzyme functions ranged from 0.0003 to 0.12 and included one nsSNV (GALNT20: p.Cys124Arg) that can be categorized as a common polymorphism (MAF > 0.01).
Surveying the more rare nsSNVs in the Danish population occurring one or two times (total number 1183), we found that 61 nsSNVs introduced gain or loss of a stop codon. These likely deleterious SNVs were most frequently found in the GT27 (6 nsSNVs) and GT14 (8 nsSNVs) families, and within family GT27 these included GALNT1 (p.Arg368*, one allele), GALNT4 (p.Glu87*, one allele), GALNT8 (p.Gln453*, two alleles, also in EVS_EA with three alleles and EVS_AA with one allele), GALNT14 (p.Arg315*, one allele), GALNT15 (p.Arg639*, two alleles) and GALNT18 (p.Gln556*, one allele). Although these are likely to affect the function of the encoded enzymes, the low frequency of occurrence suggests that validation of the sequencing is required. Furthermore, since they, except for the SNV identified in GALNT8, are only found in the Danish population and at so low frequency, they are less useful for validating potential disease candidates identified by GWAS results.
Based on the fact that the large GT27 family contained most of the possible damaging nsSNVs and due to our in-depth knowledge of the enzymes belonging to this family (Bennett et al. 2012), we chose this family for experimental validation of putative damaging nsSNVs. In total, the phase 2 selection processes identified 13 potentially inactivating GT27 nsSNVs in LuCAMP data and one in the EVS_EA data (Table III). Both GalNAc-T5 mutations reside in the Gal/GalNAc domain of the catalytic unit (Fritz et al. 2004). The p. Asp678Ala mutation affects a residue in a junction between a β-sheet and an α-helix conserved in 11 of the 20 GalNAc-Ts, and the p.Gly697Arg mutation affects an essential and highly conserved residue in the active site involved in UDP-GalNAc and acceptor peptide interactions (Bennett et al. 2012). The GalNAc-T7: p.Cys325Ser mutation resides in a nonessential cysteine, and the p.Pro529Thr mutation is affecting a nonconserved residue and resides in the linker region interspacing the catalytic and lectin domains (Fritz et al. 2006). The GalNAc-T12 mutation p.Ala188Val affects a nonconserved residue in an α-helix within the Rossmann fold of the enzyme and the p.Asp261Asn mutation affects an aspartate residue conserved in the two GalNAc subfamilies Ic (T3 and T6) and IIa (T4 and T12) (Bennett et al. 2012), and the p.Asp303Asn mutation affects a nonconserved residue close to the Gal/GalNAc domain. Notably, the p.Asp261Asn and p.Asp303Asn have been previously identified and in the latter case correlated with colon cancer (Guda et al. 2009). The GalNAc-T13 p.Asp378Gly mutation affects a nonconserved residue in proximity of the Gal/GalNAc domain. A total of five nsSNVs in GalNAc-T14 were selected for experimental validation. The three mutations such as p.Arg82Gln, p.Arg86Trp and p.Arg91Gln are closely grouped in a region preceding the catalytic domain; the first two positions are nonconserved, whereas the last affected residue is highly conserved among all GT27 members, and a mutation in this conserved position in GalNAc-T3 has been reported pathogenic (Ichikawa et al. 2010). The p.Lys401Asn mutation affects a nonconserved lysine residue positioned in the linker region interspacing the catalytic and the lectin domains. The p.Asp519Asn mutation found only in the EVS_EA dataset was included in the experimental analysis. This mutation affects an essential residue in the lectin domain predicted to be involved in carbohydrate binding.
Table III.
Summary of GT27 SNV predictions and experimental validation
| Gene | Amino acid changes | Experimental | Predictionsa |
Populationsb |
|||||
|---|---|---|---|---|---|---|---|---|---|
| Phase 2 | PolyPhen | Provean | SIFT | LuCAMP | EVS-EA | EVS-AA | |||
| GALNT2 | p.Gln216His | Active | N | D | N | N | 0;7;1958 | 0;9;4291 | 0;2;2201 |
| GALNT2 | p.Asp314Ala | Active | N | N | D | D | 0;3;1962 | 0;30;4270 | 0;4;2199 |
| GALNT2 | p.Val554Met | Active | N | D | N | N | 7;170;1788 | 9;420;3871 | 1;105;2097 |
| GALNT5 | p.Asp678Ala | Active | D | D | D | D | 0;33;1932 | 2;90;4208 | 0;10;2193 |
| GALNT5 | p.Gly697Arg | Inactive | D | D | D | D | 0;7;1958 | 0;5;4295 | 0;1;2202 |
| GALNT7 | p.Cys325Ser | Active | D | D | N | N | 0;17;1948 | 0;37;4263 | 0;4;2199 |
| GALNT7 | p.Pro529Thr | Active | D | D | D | D | 0;5;1960 | 0;22;4278 | 0;4;2199 |
| GALNT11 | p.Val376Ala | Active | D | D | D | D | 0;54;1911 | 1;68;4231 | 0;8;2195 |
| GALNT11 | p.Glu409Gly | Active | N | N | D | N | 0;43;1922 | 0;64;4236 | 0;9;2194 |
| GALNT11 | p.Val495Leu | Active | N | N | N | N | 0;4;1961 | 0;4;4296 | 0;1;2202 |
| GALNT11 | p.Val575Ala | Active | N | D | N | N | 0;14;1951 | 0;9;4291 | 0;0;2203 |
| GALNT12 | p.Ala188Val | Active | D | D | N | D | 0;8;1957 | 0;0;0 | 0;0;0 |
| GALNT12 | p.Asp261Asn | Active | D | D | D | N | 0;40;1925 | 0;98;4202 | 0;7;2196 |
| GALNT12 | p.Asp303Asn | Active | D | N | N | D | 0;3;1962 | 0;11;4289 | 0;1;2202 |
| GALNT13 | p.Asp378Gly | Active | D | N | D | N | 0;12;1953 | 0;5;4295 | 0;0;2203 |
| GALNT14 | p.Arg82Gln | Active | D | N | N | N | 0;5;1960 | 0;4;4296 | 0;3;2200 |
| GALNT14 | p.Arg86Trp | Active | D | D | N | N | 0;6;1959 | 0;3;4297 | 0;0;2203 |
| GALNT14 | p.Arg91Gln | Inactive | D | D | D | D | 0;5;1960 | 0;0;0 | 0;0;0 |
| GALNT14 | p.Lys401Asn | Active | D | D | D | D | 0;3;1962 | 0;0;0 | 0;0;0 |
| GALNT14 | p.Asp519Asn | Active | D | D | D | D | 0;0;0 | 0;1;4299 | 0;0;2203 |
| GALNT20 (L5) | p.Cys124Arg | not analyzed | D | D | D | D | 21;449;1495 | 90;1060;3150 | 139;832;1232 |
| GALNT20 (L5) | p.Gly206Ala | not analyzed | D | D | D | D | 1;3;1961 | 0;24;4276 | 4;181;2018 |
| Percentage neutral predictions | 30% (6/20) | 30% (6/20) | 45% (9/20) | 50% (10/20) | |||||
| Percentage experimentally confirmed damaging | 14% (2/14) | 14% (2/14) | 18% (2/11) | 20% (2/10) | |||||
| Percentage experimentally confirmed neutral | 100% (6/6) | 100% (6/6) | 100% (8/8) | 100% (10/10) | |||||
Predictions: damaging gray D; neutral N; bold values highlight amino acid changes experimentally shown to be inactive.
Number of persons with the genotypes: minor allele/minor allele, minor allele/major allele and major allele/major allele.
To validate the approach, we analyzed two frequent nsSNVs known to be associated with CDGs. One nsSNV in the PMM2 (phosphomannomutase-2) gene (p.Arg141His) has previously been reported to be associated with CDGIa (Matthijs et al. 1997, 1998), and this was identified in the LuCAMP dataset with a similar MAF value as previously reported for the Danish population (0.0158 for LuCAMP and 0.0167 by Kjaergaard et al. 2001). These findings supported the fidelity of the LuCAMP WES dataset. Moreover, a total of 34 previously published nsSNVs were also found in the LuCAMP data, and these were in GTs involved in the O-Man pathway (POMGNT1; GT13), the N-glycosylation pathway (ALG1; GT33, ALG6; GT57 and ALG12; GT22), and formation of blood group-related antigens (ABO; GT6, FUT1 and FUT2; GT11, FUT3 and FUT6; GT10, GCNT2; GT14 and B3GALNT1; GT31) (Supplementary data, Table SIII). These latter findings were all in agreement with previously reported allele frequencies for the commonly found nsSNVs in the respective glycogenes providing further support for the fidelity of the LuCAMP WES data.
The LuCAMP WES presented genotype data for the healthy control group as well as the type 2 diabetes case group. Analyzing these data for variation in control vs. case group alleles, we applied an allele-based χ² test for 52 phase 2 predicted damaging nsSNVs as well as for all 1875 nsSNVs in the 208 GT genes and we were not able to find any statistically significant difference in allele distribution for the two subpopulations (data not shown).
Phase 3: Experimental validation of nsSNV predictions for family GT27
We choose CAZy family GT27 for experimental validation, in particular because of the relatively high number of 14 predicted damaging nsSNVs (Table III) out of a total of 52, and readily available recombinant expression and enzyme assays for a large number of members (Schjoldager et al. 2011; 2012). Furthermore, many of the 20 GALNT genes in GT27 family have been identified as candidate genes for diseases by GWAS and other association studies (Bennett et al. 2012). The 14 predicted possible damaging nsSNVs found in six GalNAc-T isoforms (GALNT5, GALNT7, GALNT11, GALNT12, GALNT13 and GALNT14) were analyzed experimentally.
Six additional GALNT neutral nsSNVs were included in the analysis. Three were in GALNT2; two have previously been suggested to be associated with triglyceride clearance and high HDL cholesterol (p.Gln216His and p.Asp314Ala) (Holleboom et al. 2011; Tietjen et al. 2012), one was included as a neutral control (p.Val554Met). Three remaining nsSNVs were in GALNT11, where two were in the conserved residues in the catalytic domain (p.Glu409Gly) and the lectin domain (p.Val495Leu) and one was included as a neutral control (p.Val575Ala).
Wild-type and variant recombinant enzymes were expressed as secreted soluble enzymes and purified to near homogeneity, and their function assessed by time-course enzyme assays monitored by MALDI-TOF with appropriate peptide substrates (Supplementary data, Figure S1). All the variants could be expressed and purified albeit with different yields. The analysis demonstrated that one nsSNV (p.Gly697Arg) in GALNT5 and one (p.Arg91Gln) in GALNT14 resulted in inactive enzymes, while all other tested variant enzymes appeared to exhibit normal activity (Table III). The p.Gly697Arg nsSNV in GALNT5 changes a semiconserved small neutral amino acid with a charged bulky residue in the catalytic domain. This nsSNV was found heterozygote in seven LuCAMP samples yielding an MAF of 0.0018, and the same mutation was found with an MAF of 0.0006 for EVS_AE and 0.0002 for EVS_AA (Table III). The p.Arg91Gln nsSNV in GALNT14 affects a highly conserved basic residue in the catalytic domain, and a mutation of the same Arg residue (p.Arg162Gln) conserved in the GALNT3 paralog has been identified in a patient with familial tumoral calcinosis caused by deficiency in the enzyme function (Ichikawa et al. 2010). The p.Arg91Gln mutation was found heterozygote in five LuCamp samples yielding an MAF of 0.0013, and interestingly this mutation was not found in the EVS data.
These results demonstrate that our GlyMAP strategy, based on WES data of large populations, can identify deleterious SNVs that inactivate GT function. We were surprised to find that the majority (12 of 14) of the predicted deleterious nsSNVs in the GT27 family did not affect enzyme function substantially, although the assays employed in this study cannot rule out that these nsSNVs result in minor changes in the kinetic properties of these mutant enzymes. However, such minor changes are unlikely to introduce disease. For the six predicted neutral GalNAc-T nsSNVs included in the experimental assays, the three GalNAc-T2 and the three GalNAc-T11 nsSNVs were all found active.
Stability of the human glycogenome
Based on GlyMAP, we wanted to determine whether the genetic variation observed within the CAZy GT families was comparable and whether the genetic variation in the glycogenes of one protein class was similar to the genetic variation found in other protein classes such as hydrolases, kinases and histones. For this, the number of nsSNVs per amino acid was calculated for all the genes in the GlyMAP and presented as a box plot showing the degree of statistical dispersion and mean values for each GT family as a measure of the genetic stability (Figure 4A). Not surprisingly, the GT6 family including the ABO blood group genes was found to have the highest mean value and therefore the lowest genetic stability. The GT66 (oligosaccharyltransferase, OST, dolichyl-diphosphooligosaccharide-protein subunits, STTA3 and STTB3) and GT41 (OGT) families had the lowest median values and therefore the highest genetic stability. For the GT27 family, found to exhibit the highest number of nsSNVs, the calculated stability ranks this GT family in the midrange median values (Figure 4A).
Fig. 4.

The genome stability for the human GT CAZy families is shown as a function of number of nsSNVs per amino acid. (A) A box plot for the 44 human CAZy GT families shows the highest median for the GT6 family (the ABO gene, the Forssman synthase GBGT1 and GLT6D1) and the lowest median values for the GT66 family (the oligosaccharyltransferase complex genes STT3A and STT3B) and the GT41 (the cytoplasmic OGT gene). The three large GT families GT27, GT29 and GT31 are denoted by asterisks and represent average mean values. (B) In order to analyze the genome stability of the human GT genes, the nsSNV frequency for the GT genome is compared with the frequencies for five other protein classes representing the G-protein-coupled receptors, including the olfactory receptors, the protein kinases, the homeobox transcription factors, the histones and the PTM-involved proteins encompassing the oligosaccharyltransferases (OST), the conserved oligomeric Golgi complex (COG), the glycoside hydrolases (GH), the carbohydrate estereases (CE) and the carbohydrate-binding modules (CBM). The non-PTM protein classes were selected using the Panther Classification System (Panther version 8.1) and the ER/Golgi-located PTM proteins were manually selected using the CAZy classification. The nsSNV frequency was calculated using the LuCAMP dataset and the interquartile range (IQR) represents the one-third quartiles (20–75%) and the ±1.5IQR intervals are marked by dotted lines. Outliers are denoted by black boxes and the number of genes in each GT family is shown in parentheses.
Genetic stability was also calculated using LuCAMP WES data from the dataset for G-protein-coupled receptors (465 genes), protein kinases (467 genes), homeobox transcription factors (210 genes) and other PTM genes including glycoside hydrolases (113 genes) and histones (63 genes) (Figure 4B). The box plot showed that the genetic stability of the human GTs is similar to the stability found for the other selected protein classes. This suggests the GTs to be as evolutionary stable as other important protein classes such as protein kinases, receptors, transcription factors and histones.
Discussion
Here we constructed a draft of a catalog, designated as GlyMAP, of nsSNVs in 208 human GTs from WES of a rather homogenous population of 2000 Danes with the aim to identify deleterious mutations, especially in GT genes with high degree of potential genetic backup, that is, in large homologous GT gene families. Our hypothesis was that deleterious mutations in such genes would produce rather subtle phenotypes only in homozygous and compound heterozygous state, and hence deleterious alleles could exist with relatively high frequency in the general population. The inspiration for using this strategy was that most of the known CDGs are caused by deleterious mutations in GT genes without predicted genetic and functional backup, while association studies (GWAS) have pointed to more common disease-causing roles of many of the genes in large homologous gene families with seemingly large degree of potential functional redundancy (Bennett et al. 2012). The study first demonstrated that prediction of functional consequences of mutations, despite considerable insight into the structure and mechanisms of the GTs was quite poor. This calls for use of experimental validation of all mutations at this time. The second important conclusion was that the strategy is viable and that two rather frequent null alleles of GALNT5 and GALNT14 were identified and experimentally validated. In contrast and perhaps as expected, we failed to find many of the deleterious mutations described for rare CDGs known to date with few exceptions. GlyMAP provides a first global view of the genetic stability of the glycogenome and should serve as a tool for discovery of novel CDGs.
The LuCAMP dataset represents a unique source of DNA variants identified by NGS of the exomes from 2000 individuals of the general Danish population. Almost half of the typed SNVs in the GT genes were located in exons and approximately one-quarter of the typed SNVs lead to amino acid substitutions or nonsense mutations. The majority (69% or in total 1,291) of the nsSNVs were represented by one or two alleles and the remaining 584 nsSNVs, constituting less than one-third, were represented by ≥3 alleles corresponding to an MAF value of ≥0.0008. This high proportion of private low frequent alleles is in line with what has been reported from other population exome studies (Li et al. 2010; Fu et al. 2013).
Using the GlyMAP approach, a total of 134 potentially damaging nsSNVs were identified whereof 52 were found in the LuCAMP dataset and 82 only in the EVS dataset (Figure 3E; Supplementary data, Table SIII). A large proportion of the nsSNVs were found within the large homologous gene families such as GT14, GT27 and GT31 accounting for 53 of 134 (Figure 3F). There was 25% compliance (33 of 134) between the three algorithm-based prediction tools PolyPhen2, SIFT and Provean and the manual-based prediction for impairment of protein function used in the phase 2 selection of nsSNVs (Supplementary data, Table SII), which strongly highlight the vulnerability of the current tools to predict impact of amino acid substitution on structure and function. In the validation study of GT27 members, we only identified mutations potentially affecting the catalytic and lectin domain and we therefore used recombinant expression and enzyme activity assays to probe functionality. For genes where nonconserved mutations are found in the transmembrane and immediate juxtamembrane region, it may be necessary to also address membrane retention of the encoded enzymes in cells.
Focusing on the GT27 GALNT gene family, a total of 20 nsSNVs were selected for experimental tests (Table III). Fourteen of these were predicted damaging by our manual predictions based on the extensive structural knowledge of the GalNAc-Ts with their well-defined catalytic and lectin domains (Fritz et al. 2004, 2006; Kubota et al. 2006), the remaining six nsSNVs were included as controls or they have been reported having functional consequences. Of the 14 predicted damaging nsSNVs, 7 were damaging according to PolyPhen2, SIFT and Provean, but only 2 (GalNAc-T5 p.Gly697Arg and the GalNAc-T14 p.Arg91Gln) were confirmed to substantially affect the enzyme function by functional validation (Table III). This result was unexpected, and the poor ability to predict deleterious effects of nsSNVs will require improvements to harness the full power of our GlyMAP approach.
Our knowledge of the functions of GALNT5 and GALNT14, harboring the two inactivating nsSNVs, is currently very limited and knockout animals display no overt phenotype (Ten Hagen et al. 2003). GalNAc-T5 has only been characterized in rodents with a few peptide substrates (Ten Hagen et al. 1998), and recently we have assessed the human GalNAc-T5 with a large panel of peptide substrates, and showed that this isoform has very few peptide substrates compared with other isoforms such as GalNAc-T1 and -T2 (Kong et al. 2014). GalNAc-T14 is predominantly expressed in the kidney and is a close paralog of GalNAc-T2 with more restricted peptide substrate specificities (unpublished data). GalNAc-T14 has been implicated in Apo2L/TRAIL death-receptor-mediated apoptosis (Wagner et al. 2007) and may play a role in drug sensitivity to TRAIL therapy (Stern et al. 2010). Interestingly, the GalNAc-T14 p.Arg91Gln nsSNV was found only in the Danish population and not in the EVS data. The three GALNT12 nsSNVs (p.Asp261Asn, p.Asp303Asn and p.Ala188Val)-predicted damaging were shown to be active in our experimental assay. Guda et al. (2009) has previously reported the GALNT12: p.Asp303Asn mutation to be associated with colon cancer, and demonstrated that the mutant protein had 37% of the wild-type enzyme activity and the p.Asp261Asn mutation had 84% of wild-type enzyme activity. In our experimental analysis, none of the three GALNT12 nsSNVs including the control p.Ala188Val affected the enzyme function. The remaining predicted-damaging nsSNVs were all tested active experimentally (Table III).
The six nsSNVs previously reported in the literature (GALNT2: p.Gln216His and p.Asp314Ala) or selected as control nsSNVs (GALNT2: p.Val554Met and GALNT11: p.Glu409Gly, p.Val495Leu and p.Val575Ala) with questionable damaging prediction were all experimentally tested active. The two GALNT2 nsSNVs were selected based on a recent genome study association of the locus with dysfunctional lipid metabolism (Kathiresan et al. 2008; 2009; Teslovich et al. 2010), and a recent study claims that the nsSNVs affect function of GalNAc-T2 (Holleboom et al. 2011; Tietjen et al. 2012). We did confirm that both (p.Gln216His and p.Asp314Ala) were present at low allele frequencies in the Danish population (Table III); however, we were unable to demonstrate reduced enzyme activity using an IgA hinge acceptor substrate. Holleboom et al. (2011) reported heterozygote carriers of GALNT2: p.Asp314Ala to have reduced glycosylation of ApoC III. ApoC III has one very effective and specific GalNAc-T2 O-glycosylation site (Schjoldager et al. 2012; Schjoldager and Clausen 2012), and it is therefore unlikely that a slightly reduced GalNAc-T2 activity in a heterozygote state can affect O-glycosylation. We have additionally demonstrated that a complete knockout of the GALNT2 gene in a liver cell line is required to affect glycosylation (Schjoldager et al. 2012). The complete lack of damaging nsSNVs in the coding region of GALNT2 and the association of a region in intron 1 with HDL and triglyceride metabolism (Kathiresan et al. 2008, 2009; Teslovich et al. 2010) suggests a dysregulatory rather than dysfunctional cause of the HDL/lipid metabolism phenotypes.
CDG mutations are predominantly rare, and we therefore expected that only a few pathogenic glycogene mutations were to be found in the 4000 LuCAMP exomes. Analyzing the LuCAMP data for pathogenic CDG mutations reported in the literature identified nsSNVs in four GT genes and PMM2 (Supplementary data, Table SIII). The frequent PMM2 Northern European founder mutation, p.Arg141His, was found with an MAF value of 0.01578 corresponding to the published carrier prevalence of 1 : 60 in Denmark (Schollen et al. 2000). The mutation is suggested lethal and never observed homozygous, but mainly compound heterozygote with the p.Phe119Leu mutation (Kjaergaard et al. 1998; Schollen et al. 2000), which was found in LuCAMP with an MAF value of 0.00102. The POMGNT1 p.Asp556Asn mutation, reported to cause a mild limb-girdle muscular dystrophy phenotype (Clement et al. 2008), was found in 56 carriers corresponding to an MAF value of 0.01374 in LuCAMP and with similar MAF values in the EVS dataset (data not shown). A POMGNT1 splice site mutation in intron 17 (c.1539+1G>A) reported as a Finish founder mutations found in 18 of 19 muscular dystrophy patients (Diesen et al. 2004) was represented by nine carries in LuCAMP and was found with lower MAF values in the EVS data (data not shown). Five nsSNVs affecting N-glycosylation were found in the genes ALG1, ALG6 and ALG12, respectively. Three of these have been reported as pathogenic and two reported as polymorphisms (Supplementary data, Table SIII). In addition, several common nsSNVs were found in the blood group genes ABO, FUT1/2/3/6, B3GALNT1 and GCNT1 in the LuCAMP data (Supplementary data, Table SIII), but a more detailed discussion of their allele frequencies in the Danish population is out of the scope of this study, and we refer to recent reviews covering this area (Storry and Olsson 2004, 2009). In general, only a few published mutations were detected in the LuCAMP or EVS data. Since CDGs are rare syndromes with a published estimated prevalence of 1:20,000 for the most common subtype, PMM2-CDG, and since the total number of identified CDG cases caused by pathogenic glycogene mutations is very limited, these results were expected.
The completed cataloging of glycogene nsSNVs prompted us to question whether the number of variations identified in large redundant GT families resembled the number of variations observed in nonredundant GT families (Figure 4A). As expected, the GT6 family containing the highly polymorphic ABO gene with its 161 reported allelic variants (Patnaik and Blumenfeld 2011) had the highest nsSNV frequency. At the other extreme, the GT66 and GT41 families containing the oligosaccharyl complex subunits SST3A/B and OGT, respectively, possessed the lowest nsSNV frequencies. Somewhat surprisingly, the GT1 family with its genetically polymorphic UGT genes (Stingl et al. 2014) and the GT10/11 families containing the polymorphic FUT1/2 genes (Patnaik and Blumenfeld 2011) possessed intermediate nsSNV frequencies in a range with the large homologous GT families GT27 (GalNAc-Ts), GT29 (Sialyl-Ts) and GT31 (B3Gal-Ts). These observations suggest that nsSNVs have not accumulated to a greater extent in large homologous GT families compared with nonhomologous GT families, and at a more global level, it does not seem that nsSNV frequencies in the CAZy GT class as a whole differ from the nsSNV frequencies observed in other protein classes (Figure 4B).
In conclusion, GlyMAP provides the first global view of the nsSNV landscape for the human GTs mapped in the homogenous population of 2000 Danes. The WES-based strategy was proven effective in identifying a number of glycogene nsSNVs in a defined population. On the other hand, we demonstrate that actual damaging nsSNVs are difficult to predict using the currently available tools and cannot be determined without extensive experimental validation. Clearly, a deeper insight into the structures and catalytic mechanisms of the GTs is needed before reliable functional consequences of amino acid substitutions can be predicted. Our approach demonstrated a substantial need for better functional knowledge of the GTs and more refined prediction tools. Thus, assessment of the impact of each individual nsSNV on enzyme function for GTs belonging to families other than GT27, will require greater insight and knowledge of the enzymes in question, and a comprehensive functional analysis will require considerable efforts.
Taken together, GlyMAP may serve as a useful tool for disease discovery in specific disease cohorts where yet unknown glycogene dysfunction can be associated with common disease traits. A GlyMAP database has been established hosting information for the human Glyco-genes (GlyMAP: http://glymap.glycomics.ku.dk).
Materials and Methods
The human GT-encoded glycogenes
A total of 198 different human GT protein annotations were extracted from the CAZy classification system database (CAZY.org) and additional 10 GTs were included based on sequence homologies or reported putative GT activities (Table I; Supplementary data, Table SI).
The study populations and exome sequencing
The GT exome data were extracted from the LuCAMP exome sequencing project (Lundbeck Foundation Centre for Applied Medial Genomics in Personalised Disease Prediction, Prevention and Care, www.lucamp.org), and from the NHLBI Exome Sequencing Project (evs.gs.washington.edu/EVS/). The LuCAMP project included 2000 Danish individuals, of which half had type 2 diabetes, moderate adiposity (body mass index, BMI > 27.5 kg/m2) and hypertension (systolic/diastolic blood pressure, BP > 140/90 mmHg or use of antihypertensive medication). The others were healthy individuals who all had fasting plasma glucose <5.6 mmol/L, 2-h OGTT-based plasma glucose <7.8 mmol/L, BMI < 27.5 kg/m2 and BP < 140/90 mmHg and no antihypertensive treatment. The public NHLBI Exome Sequencing Project (http://evs.gs.washington.edu/EVS/) included data from 6500 unrelated individuals phenotyped in 15 different projects with the goal of discover novel genes and mechanisms contributing to heart, lung and blood disorders (NCBI Bioproject ID:165957 and dbGaP (https://esp.gs.washington.edu/drupal/dbGaP_Releases)).
The LuCAMP exomes were captured using the Agilent SureSelect All Exon Kit v.2 (46 Mb target region), and sequenced using an Illumina HiSeq 2000 machine with a mean sequencing depth of 56.3× and an exome coverage ranged from 94.11 to 98.76% (average of 97.27% per sample). The WES data were aligned to GRCh37/hg19 human reference genome and annotation of sequence variants was performed using the SeattleSeq Annotation 137 server (Lohmueller et al. 2013). The NHLBI Exome Sequencing Project samples were exome captured using Roche/NimbleGen capture or Agilent reagents; all SNP data were called simultaneously using the UMAKE pipeline, for details see EVS homepage (http://evs.gs.washington.edu/EVS/). The study was approved by The Danish National Ethical Committee on Health Research and is in accordance with the ethical scientific principles of the Helsinki Declaration II.
The GlyMAP prediction strategy
The GlyMAP strategy was divided into three phases (Figure 2).
Phase 1: Assembly of exome information
The SNVs for the 208 human GTs (Table I) were extracted from the LuCAMP WES dataset and the EVS server and merged into a database, dbGlyMAP. The LuCAMP data included only SNVs wherefore all INDEL data were excluded from the EVS dataset. The dbGlyMAP includes, in addition to chromosomal location and nucleotide variation MAF values, genotype frequencies and rs annotations if known. For genes represented by more than one coding transcript, the longest transcript with respect to the coding region was selected for the SNV data.
Phase 2: Prediction of functionally inactive nsSNVs
The nsSNV data for the LuCAMP and the EVS datasets were analyzed for inactivating nsSNVs by WEB-based algorithms and manually curated for possible damaging variants. Three prediction tools were used: Polyphen2 (Adzhubei et al. 2010), SIFT (Kumar et al. 2009) and Provean (Choi et al. 2012). Manual prediction was based on (i) occurrence of three or more alleles in the LuCAMP dataset, (ii) alignment of paralogous proteins based on CAZy GT family classification, (iii) alignment of orthologous proteins, (iv) protein domain information and/or crystallographic data if available and web-based prediction of possible impact of an amino acid substitution on the structure and function. Additional inclusion criteria were gain or loss of stop codons and nsSNVs affecting start methionines. Multisequence alignments were done using MAFFT or Clustal Omega, protein domain structures were adapted from UniProt, CDD and SMART, and crystallographic information was retrieved from PDB (Figure 2).
Phase 3: Experimental validation of predictions
Phase 2 selected nsSNVs found in the GalNAc genes (GT27) were selected for experimental validation and expression constructs encoding wild-type and nsSNVs harboring proteins were generated (Table III). Recombinant expression of secreted wild-type and mutant nsSNVs constructs was done essentially as described previously (Bennett et al. 1996). Products were analyzed by matrix-assisted laser desorption ionization mass spectrometry imaging and compared with the glycosylation capacity of wild-type enzyme. In brief, GT-encoded sequences lacking the membrane anchoring domain were fused N-terminally with an in-frame 6× His tag allowing for downstream NiNTA enzyme purification. Constructs were inserted into pAcGP67 Baculovirus insect cell expression vector system and expressed in Hi5 insect cells. nsSNVs were either introduced into existing expression constructs using QuikChange Site-Directed Mutagenesis (Agilent Technologies, Waldbronn, Germany) or synthesized synthetically (GeneWiz, London, UK). Wild-type and mutant enzyme was purified by NiNTA purification schemes essentially as described previously (Pedersen et al. 2011). All enzymes were purified to homogeneity (Supplementary data, Figure S1) and tested in an in vitro product development glycosylation assays using appropriate acceptor substrates for the respective GTs (Table III, Figure 1).
Statistics and comparative analysis
Box plot depiction for numbers of nsSNVs divided by protein length was calculated as the nsSNV frequency for each protein. The nsSNV data were extracted from the LuCAMP dataset and the protein classes were achieved from the Panther Classification System or from CAZy. The protein groups were G-protein-coupled receptors (465 proteins, Panther class PC00021), protein kinases (467 proteins, Panther PC00193), homeobox transcription factors (210 proteins, Panther PC00119), histones (63 proteins, Panther PC00118), GTs (201 proteins) and as one group consisting of glycoside hydrolases, carbohydrate estereases, carbohydrate-binding modules, OST and conserved oligomeric Golgi complex, in total 113 proteins.
Web sources
CAZy: www.CAZy.org
GlyMAP: http://glymap.glycomics.ku.dk
CDD, Conserved Domain Database: http://www.ncbi.nlm.nih.gov/cdd/
Clustal Omega: http://www.ebi.ac.uk/Tools/msa/clustalo/
HGMD, The Human Gene Mutation Database: http://www.hgmd.cf.ac.uk/
NHLBI GO Exome Sequencing Project (ESP), Seattle, WA: http://evs.gs.washington.edu/EVS/, Sep 2013
MAFFT: http://www.ebi.ac.uk/Tools/msa/mafft/
Panther version 8.1: http://www.pantherdb.org/
PDB, Protein Data Bank: http://www.rcsb.org/pdb/home/home.do
PolyPhen2 (Polymorphism Phenotyping v2): http://genetics.bwh.harvard.edu/pph2/
Provean (Protein Variation Effect Analyzer): http://provean.jcvi.org/index.php
SIFT: http://sift.jcvi.org/
SMART: http://smart.embl-heidelberg.de/
UniProt: http://www.uniprot.org/
Supplementary Material
Supplementary Material is available at http://glycob.oxfordjournals.org/ online.
Abbreviations
BMI, body mass index; BP, blood pressure; CAZy, carbohydrate-active enzymes; CDG, congenital disorders of glycosylation; EVS, Exome Variant Server; GalNAc-Ts, GalNAc-transferases; GPI, glycosylphosphatidylinositol; GT, glycosyltransferase; GWAS, genome-wide association study; MAF, minor allele frequency; NGS, next-generation sequencing; nsSNV, nonsynonymous SNV; OST, oligosaccharyltransferases; PTM, posttranslational modification; SNV, single-nucleotide variation; WES, whole-exome sequencing
Supplementary Material
Funding
This work was supported by Kirsten og Freddy Johansen Fonden, A.P. Møller og Hustru Chastine Mc-Kinney Møllers Fond til Almene Formaal, The Carlsberg Foundation, The Novo Nordisk Foundation, The Danish Research Councils, a program of excellence from the University of Copenhagen, The Danish National Research Foundation (DNRF107), and The Rocket Fund and R01DK99551. The Danish whole exome study supported by the Lundbeck Foundation (The Lundbeck Foundation Centre for Applied Medical Genomics in Personalised Disease Prediction, Prevention and Care [LuCamp], https://vpn.sund.ku.dk/,DanaInfo=.awxyCpzihuyJz3t+" \t "pmc_ext" www.lucamp.org) and The Danish Council for Independent Research. The Novo Nordisk Foundation Center for Basic Metabolic Research is an independent Research Center at the University of Copenhagen partially funded by an unrestricted donation from the Novo Nordisk Foundation (https://vpn.sund.ku.dk/,DanaInfo=.awxyCqjzhjxvKw7Ns0+" \t "pmc_ext" www.metabol.ku.dk).
References
- Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–249. doi: 10.1038/nmeth0410-248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Albrechtsen A, Grarup N, Li Y, Sparsø T, Tian G, Cao H, Jiang T, Kim SY, Korneliussen T, Li Q, et al. Exome sequencing-driven discovery of coding polymorphisms associated with common metabolic phenotypes. Diabetologia. 2013;56:298–310. doi: 10.1007/s00125-012-2756-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Almeida R, Amado M, David L, Levery SB, Holmes EH, Merkx G, van Kessel AG, Rygaard E, Hassan H, Bennett E, et al. A family of human beta4-galactosyltransferases. Cloning and expression of two novel UDP-galactose:beta-n-acetylglucosamine beta1, 4-galactosyltransferases, beta4Gal-T2 and beta4Gal-T3. J Biol Chem. 1997;272:31979–31991. doi: 10.1074/jbc.272.51.31979. [DOI] [PubMed] [Google Scholar]
- Amado M, Almeida R, Carneiro F, Levery SB, Holmes EH, Nomoto M, Hollingsworth MA, Hassan H, Schwientek T, Nielsen PA, et al. A family of human beta3-galactosyltransferases. Characterization of four members of a UDP-galactose:beta-N-acetyl-glucosamine/beta-n-acetyl-galactosamine beta-1,3-galactosyltransferase family. J Biol Chem. 1998;273:12770–12778. doi: 10.1074/jbc.273.21.12770. [DOI] [PubMed] [Google Scholar]
- Amado M, Almeida R, Schwientek T, Clausen H. Identification and characterization of large galactosyltransferase gene families: galactosyltransferases for all functions. Biochim Biophys Acta. 1999;1473:35–53. doi: 10.1016/s0304-4165(99)00168-3. [DOI] [PubMed] [Google Scholar]
- Audry M, Jeanneau C, Imberty A, Harduin-Lepers A, Delannoy P, Breton C. Current trends in the structure-activity relationships of sialyltransferases. Glycobiology. 2011;21:716–726. doi: 10.1093/glycob/cwq189. [DOI] [PubMed] [Google Scholar]
- Becker DJ, Lowe JB. Fucose: biosynthesis and biological function in mammals. Glycobiology. 2003;13:41–53. doi: 10.1093/glycob/cwg054. [DOI] [PubMed] [Google Scholar]
- Bennett EP, Hassan H, Clausen H. cDNA cloning and expression of a novel human UDP-N-acetyl-alpha-D-galactosamine. Polypeptide N-acetylgalactosaminyltransferase, GalNAc-t3. J Biol Chem. 1996;271:17006–17012. doi: 10.1074/jbc.271.29.17006. [DOI] [PubMed] [Google Scholar]
- Bennett EP, Mandel U, Clausen H, Gerken TA, Fritz TA, Tabak LA. Control of mucin-type O-glycosylation: a classification of the polypeptide GalNAc-transferase gene family. Glycobiology. 2012;22:736–756. doi: 10.1093/glycob/cwr182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bourne Y, Henrissat B. Glycoside hydrolases and glycosyltransferases: families and functional modules. Curr Opin Struct Biol. 2001;11:593–600. doi: 10.1016/s0959-440x(00)00253-0. [DOI] [PubMed] [Google Scholar]
- Brockington M, Blake DJ, Prandini P, Brown SC, Torelli S, Benson MA, Ponting CP, Estournet B, Romero NB, Mercuri E, et al. Mutations in the fukutin-related protein gene (FKRP) cause a form of congenital muscular dystrophy with secondary laminin alpha-2 deficiency and abnormal glycosylation of alpha-dystroglycan. Am J Hum Genet. 2001;69:1198–1209. doi: 10.1086/324412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buettner FF, Ashikov A, Tiemann B, Lehle L, Bakker H. C. elegans DPY-19 is a C-mannosyltransferase glycosylating thrombospondin repeats. Mol Cell. 2013;50:295–302. doi: 10.1016/j.molcel.2013.03.003. [DOI] [PubMed] [Google Scholar]
- Carson AR, Cheung J, Scherer SW. Duplication and relocation of the functional DPY19L2 gene within low copy repeats. BMC Genomics. 2006;7:45. doi: 10.1186/1471-2164-7-45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choi Y, Sims GE, Murphy S, Miller JR, Chan AP. Predicting the Functional Effect of Amino Acid Substitutions and Indels. PLoS ONE. 2012;7:e46688. doi: 10.1371/journal.pone.0046688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clement EM, Godfrey C, Tan J, Brockington M, Torelli S, Feng L, Brown SC, Jimenez-Mallebrera C, Sewry CA, Longman C, et al. Mild POMGnT1 mutations underlie a novel limb-girdle muscular dystrophy variant. Arch Neurol. 2008;65:137–141. doi: 10.1001/archneurol.2007.2. [DOI] [PubMed] [Google Scholar]
- Diesen C, Saarinen A, Pihko H, Rosenlew C, Cormand B, Dobyns WB, Dieguez J, Valanne L, Joensuu T, Lehesjoki AE. POMGnT1 mutation and phenotypic spectrum in muscle-eye-brain disease. J Med Genet. 2004;41:e115. doi: 10.1136/jmg.2004.020701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Freeze HH. Understanding human glycosylation disorders: biochemistry leads the charge. J Biol Chem. 2013;288:6936–6945. doi: 10.1074/jbc.R112.429274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Freeze HH, Chong JX, Bamshad MJ, Ng BG. Solving Glycosylation Disorders: Fundamental Approaches Reveal Complicated Pathways. Am J Hum Genet. 2014;94:161–175. doi: 10.1016/j.ajhg.2013.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fritz TA, Hurley JH, Trinh LB, Shiloach J, Tabak LA. The beginnings of mucin biosynthesis: The crystal structure of UDP-GalNAc:polypeptide {alpha}-N-acetylgalactosaminyltransferase-T1. Proc Natl Acad Sci USA. 2004;101:15307–15312. doi: 10.1073/pnas.0405657101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fritz TA, Raman J, Tabak LA. Dynamic association between the catalytic and lectin domains of human UDP-GalNAc:polypeptide alpha-N-acetylgalactosaminyltransferase-2. J Biol Chem. 2006;281:8613–8619. doi: 10.1074/jbc.M513590200. [DOI] [PubMed] [Google Scholar]
- Fu W, O'Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, Gabriel S, Rieder MJ, Altshuler D, Shendure J, et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013;493:216–220. doi: 10.1038/nature11690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Godfrey C, Clement E, Mein R, Brockington M, Smith J, Talim B, Straub V, Robb S, Quinlivan R, Feng Lbbs S, et al. Refining genotype phenotype correlations in muscular dystrophies with defective glycosylation of dystroglycan. Brain. 2007;130:2725–2735. doi: 10.1093/brain/awm212. [DOI] [PubMed] [Google Scholar]
- Godfrey C, Foley AR, Clement E, Muntoni F. Dystroglycanopathies: coming into focus. Curr Opin Genet Dev. 2011;21:278–285. doi: 10.1016/j.gde.2011.02.001. [DOI] [PubMed] [Google Scholar]
- Guda K, Moinova H, He J, Jamison O, Ravi L, Natale L, Lutterbaugh J, Lawrence E, Lewis S, Willson JK, et al. Inactivating germ-line and somatic mutations in polypeptide N-acetylgalactosaminyltransferase 12 in human colon cancers. Proc Natl Acad Sci USA. 2009;106:12921–12925. doi: 10.1073/pnas.0901454106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haeuptle MA, Hennet T. Congenital disorders of glycosylation: an update on defects affecting the biosynthesis of dolichol-linked oligosaccharides. Hum Mutat. 2009;30:1628–1641. doi: 10.1002/humu.21126. [DOI] [PubMed] [Google Scholar]
- Haltiwanger RS, Lowe JB. Role of glycosylation in development. Annu Rev Biochem. 2004;73:491–537. doi: 10.1146/annurev.biochem.73.011303.074043. [DOI] [PubMed] [Google Scholar]
- Holleboom AG, Karlsson H, Lin RS, Beres TM, Sierts JA, Herman DS, Stroes ES, Aerts JM, Kastelein JJ, Motazacker MM, et al. Heterozygosity for a loss-of-function mutation in GALNT2 improves plasma triglyceride clearance in man. Cell Metab. 2011;14:811–818. doi: 10.1016/j.cmet.2011.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ichikawa S, Baujat G, Seyahi A, Garoufali AG, Imel EA, Padgett LR, Austin AM, Sorenson AH, Pejin Z, Topouchian V, et al. Clinical variability of familial tumoral calcinosis caused by novel GALNT3 mutations. Am J Med Genet A. 2010;152A:896–903. doi: 10.1002/ajmg.a.33337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Isshiki S, Kudo T, Nishihara S, Ikehara Y, Togayachi A, Furuya A, Shitara K, Kubota T, Watanabe M, Kitajima M, et al. Lewis type 1 antigen synthase (beta3Gal-T5) is transcriptionally regulated by homeoproteins. J Biol Chem. 2003;278:36611–36620. doi: 10.1074/jbc.M302681200. [DOI] [PubMed] [Google Scholar]
- Jaeken J. Congenital disorders of glycosylation (CDG): it's (nearly) all in it! J Inherit Metab Dis. 2011;34:853–858. doi: 10.1007/s10545-011-9299-3. [DOI] [PubMed] [Google Scholar]
- Kathiresan S, Melander O, Guiducci C, Surti A, Burtt NP, Rieder MJ, Cooper GM, Roos C, Voight BF, Havulinna AS, et al. Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Nat Genet. 2008;40:189–197. doi: 10.1038/ng.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kathiresan S, Willer CJ, Peloso GM, Demissie S, Musunuru K, Schadt EE, Kaplan L, Bennett D, Li Y, Tanaka T, et al. Common variants at 30 loci contribute to polygenic dyslipidemia. Nat Genet. 2009;41:56–65. doi: 10.1038/ng.291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kjaergaard S, Schwartz M, Skovby F. Congenital disorder of glycosylation type Ia (CDG-Ia): phenotypic spectrum of the R141H/F119L genotype. Arch Dis Child. 2001;85:236–239. doi: 10.1136/adc.85.3.236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kjaergaard S, Skovby F, Schwartz M. Absence of homozygosity for predominant mutations in PMM2 in Danish patients with carbohydrate-deficient glycoprotein syndrome type 1. Eur J Hum Genet. 1998;6:331–336. doi: 10.1038/sj.ejhg.5200194. [DOI] [PubMed] [Google Scholar]
- Kobayashi K, Nakahori Y, Miyake M, Matsumura K, Kondo-Iida E, Nomura Y, Segawa M, Yoshioka M, Saito K, Osawa M, et al. An ancient retrotransposal insertion causes Fukuyama-type congenital muscular dystrophy. Nature. 1998;394:388–392. doi: 10.1038/28653. [DOI] [PubMed] [Google Scholar]
- Kong Y, Joshi HJ, Schjoldager KT, Madsen TD, Gerken TA, Vester-Christensen MB, Wandall HH, Bennett EP, Levery SB, Vakhrushev SY, et al. Probing polypeptide GalNAc-transferase isoform substrate specificities by in vitro analysis. Glycobiology. 2014;25:55–65. doi: 10.1093/glycob/cwu089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kubota T, Shiba T, Sugioka S, Furukawa S, Sawaki H, Kato R, Wakatsuki S, Narimatsu H. Structural basis of carbohydrate transfer activity by human UDP-GalNAc: polypeptide alpha-N-acetylgalactosaminyltransferase (pp-GalNAc-T10) J Mol Biol. 2006;359:708–727. doi: 10.1016/j.jmb.2006.03.061. [DOI] [PubMed] [Google Scholar]
- Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4:1073–1081. doi: 10.1038/nprot.2009.86. [DOI] [PubMed] [Google Scholar]
- Li Y, Vinckenbosch N, Tian G, Huerta-Sanchez E, Jiang T, Jiang H, Albrechtsen A, Andersen G, Cao H, Korneliussen T. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat Genet. 2010;42:969–972. doi: 10.1038/ng.680. [DOI] [PubMed] [Google Scholar]
- Lohmueller KE, Sparsø T, Li Q, Andersson E, Korneliussen T, Albrechtsen A, Banasik K, Grarup N, Hallgrimsdottir I, Kiil K, et al. Whole-exome sequencing of 2,000 Danish individuals and the role of rare coding variants in type 2 diabetes. Am J Hum Genet. 2013;93:1072–1086. doi: 10.1016/j.ajhg.2013.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 2014;42:D490–D495. doi: 10.1093/nar/gkt1178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matthijs G, Schollen E, Pardon E, Veiga-Da-Cunha M, Jaeken J, Cassiman JJ, Van Schaftingen E. Mutations in PMM2, a phosphomannomutase gene on chromosome 16p13, in carbohydrate-deficient glycoprotein type I syndrome (Jaeken syndrome) Nat Genet. 1997;16:88–92. doi: 10.1038/ng0597-88. [DOI] [PubMed] [Google Scholar]
- Matthijs G, Schollen E, Van Schaftingen E, Cassiman JJ, Jaeken J. Lack of homozygotes for the most frequent disease allele in carbohydrate-deficient glycoprotein syndrome type 1A. Am J Hum Genet. 1998;62:542–550. doi: 10.1086/301763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moremen KW, Tiemeyer M, Nairn AV. Vertebrate protein glycosylation: diversity, synthesis and function. Nat Rev Mol Cell Biol. 2012;13:448–462. doi: 10.1038/nrm3383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patnaik SK, Blumenfeld OO. Patterns of human genetic variation inferred from comparative analysis of allelic mutations in blood group antigen genes. Hum Mutat. 2011;32:263–271. doi: 10.1002/humu.21430. [DOI] [PubMed] [Google Scholar]
- Pedersen JW, Bennett EP, Schjoldager KT, Meldal M, Holmér AP, Blixt O, Cló E, Levery SB, Clausen H, Wandall HH. Lectin domains of polypeptide GalNAc transferases exhibit glycopeptide binding specificity. J Biol Chem. 2011;286:32684–32696. doi: 10.1074/jbc.M111.273722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sasaki K, Kurata-Miura K, Ujita M, Angata K, Nakagawa S, Sekine S, Nishi T, Fukuda M. Expression cloning of cDNA encoding a human beta-1,3-N-acetylglucosaminyltransferase that is essential for poly-N-acetyllactosamine synthesis. Proc Natl Acad Sci USA. 1997;94:14294–14299. doi: 10.1073/pnas.94.26.14294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schjoldager KT, Clausen H. Site-specific protein O-glycosylation modulates proprotein processing – deciphering specific functions of the large polypeptide GalNAc-transferase gene family. Biochim Biophys Acta. 2012;1820:2079–2094. doi: 10.1016/j.bbagen.2012.09.014. [DOI] [PubMed] [Google Scholar]
- Schjoldager KT, Vakhrushev SY, Kong Y, Steentoft C, Nudelman AS, Pedersen NB, Wandall HH, Mandel U, Bennett EP, Levery SB, et al. Probing isoform-specific functions of polypeptide GalNAc-transferases using zinc finger nuclease glycoengineered SimpleCells. Proc Natl Acad Sci USA. 2012;109:9893–9898. doi: 10.1073/pnas.1203563109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schjoldager KT, Vester-Christensen MB, Goth CK, Petersen TN, Brunak S, Bennett EP, Levery SB, Clausen H. A systematic study of site-specific GalNAc-type O-glycosylation modulating proprotein convertase processing. J Biol Chem. 2011;286:40122–40132. doi: 10.1074/jbc.M111.287912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schollen E, Kjaergaard S, Legius E, Schwartz M, Matthijs G. Lack of Hardy-Weinberg equilibrium for the most prevalent PMM2 mutation in CDG-Ia (congenital disorders of glycosylation type Ia) Eur J Hum Genet. 2000;8:367–371. doi: 10.1038/sj.ejhg.5200470. [DOI] [PubMed] [Google Scholar]
- Stanley P. Golgi glycosylation. Cold Spring Harb Perspect Biol. 2011;3 doi: 10.1101/cshperspect.a005199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stanley P, Okajima T. Roles of glycosylation in Notch signaling. Curr Top Dev Biol. 2010;92:131–164. doi: 10.1016/S0070-2153(10)92004-8. [DOI] [PubMed] [Google Scholar]
- Stern HM, Padilla M, Wagner K, Amler L, Ashkenazi A. Development of immunohistochemistry assays to assess GALNT14 and FUT3/6 in clinical trials of dulanermin and drozitumab. Clin Cancer Res. 2010;16:1587–1596. doi: 10.1158/1078-0432.CCR-09-3108. [DOI] [PubMed] [Google Scholar]
- Stingl JC, Bartels H, Viviani R, Lehmann ML, Brockmöller J. Relevance of UDP-glucuronosyltransferase polymorphisms for drug dosing: A quantitative systematic review. Pharmacol Ther. 2014;141:92–116. doi: 10.1016/j.pharmthera.2013.09.002. [DOI] [PubMed] [Google Scholar]
- Storry JR, Olsson ML. Genetic basis of blood group diversity. Br J Haematol. 2004;126:759–771. doi: 10.1111/j.1365-2141.2004.05065.x. [DOI] [PubMed] [Google Scholar]
- Storry JR, Olsson ML. The ABO blood group system revisited: A review and update. Immunohematology. 2009;25:48–59. [PubMed] [Google Scholar]
- Ten Hagen KG, Fritz TA, Tabak LA. All in the family: the UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases. Glycobiology. 2003;13:1–16. doi: 10.1093/glycob/cwg007. [DOI] [PubMed] [Google Scholar]
- Ten Hagen KG, Hagen FK, Balys MM, Beres TM, Van Wuyckhuyse B, Tabak LA. Cloning and expression of a novel, tissue specifically expressed member of the UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase family. J Biol Chem. 1998;273:27749–27754. doi: 10.1074/jbc.273.42.27749. [DOI] [PubMed] [Google Scholar]
- Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, Koseki M, Pirruccello JP, Ripatti S, Chasman DI, Willer CJ, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature. 2010;466:707–713. doi: 10.1038/nature09270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tietjen I, Hovingh GK, Singaraja RR, Radomski C, Barhdadi A, McEwen J, Chan E, Mattice M, Legendre A, Franchini PL, et al. Segregation of LIPG, CETP, and GALNT2 mutations in Caucasian families with extremely high HDL cholesterol. PLoS ONE. 2012;7:e37437. doi: 10.1371/journal.pone.0037437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vuillaumier-Barrot S, Bouchet-Séraphin C, Chelbi M, Devisme L, Quentin S, Gazal S, Laquerrière A, Fallet-Bianco C, Loget P, Odent S, et al. Identification of mutations in TMEM5 and ISPD as a cause of severe cobblestone lissencephaly. Am J Hum Genet. 2012;91:1135–1143. doi: 10.1016/j.ajhg.2012.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wagner KW, Punnoose EA, Januario T, Lawrence DA, Pitti RM, Lancaster K, Lee D, von Goetz M, Yee SF, Totpal K, et al. Death-receptor O-glycosylation controls tumor-cell sensitivity to the proapoptotic ligand Apo2L/TRAIL. Nat Med. 2007;13:1070–1077. doi: 10.1038/nm1627. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


