Abstract
Heparan sulfate is a highly modified O-linked glycan that performs diverse physiological roles in animal tissues. Though quickly modified, it is initially synthesised as a polysaccharide of alternating β-d-glucuronosyl and N-acetyl-α-d-glucosaminyl residues by exostosins. These enzymes generally possess two glycosyltransferase domains (GT47 and GT64)—each thought to add one type of monosaccharide unit to the backbone. Although previous structures of murine exostosin-like 2 (EXTL2) provide insight into the GT64 domain, the rest of the bi-domain architecture is yet to be characterised; hence, how the two domains co-operate is unknown. Here, we report the structure of human exostosin-like 3 (EXTL3) in apo and UDP-bound forms. We explain the ineffectiveness of EXTL3’s GT47 domain to transfer β-d-glucuronosyl units, and we observe that, in general, the bi-domain architecture would preclude a processive mechanism of backbone extension. We therefore propose that heparan sulfate backbone polymerisation occurs by a simple dissociative mechanism.
Subject terms: Glycobiology, Cryoelectron microscopy, Enzymes, Enzyme mechanisms
Heparan sulphate (HS)—a common cell surface decoration—is a carbohydrate of alternating sugars assembled by bi-domain enzymes such as EXTL3. Here, authors present the structure of EXTL3, explain EXTL3 lost activity, and propose that HS extension is distributive.
Introduction
Heparan sulfate proteoglycans (HSPGs) are proteins decorated with heparan sulfate (HS) carbohydrate moeities1. HSPGs play a variety of important roles in animal tissues, including acting as cell surface receptors, modulating enzyme activities, and fulfilling structural roles in the extracellular matrix1–3. These functions are primarily achieved through the ability of HS to bind avidly to a very wide range of interactors—including growth factors, cytokines, and the spike protein of SARS-CoV-21,3,4. The fundamental importance of HS is reflected in its invariant presence in animal extracellular spaces3, as well as its conservation in metazoans from Cnidaria to vertebrates5,6.
Along with chondroitin sulfate/dermatan sulfate (CS/DS), hyaluronic acid, and other similar polymers, HS is a glycosaminoglycan (GAG): a polysaccharide of alternating hexosamine and hexose/hexuronic acid residues7. Most GAGs are covalently linked to a core protein; such conjugates are known as proteoglycans2. In particular, HS and CS/DS chains are O-linked to serine residues in the core protein via a tetrasaccharide linker with structure -4-GlcA-β1,3-Gal-β1,3-Gal-β1,4-Xyl-β1-1. HS is initially synthesised as a polysaccharide of alternating α1,4-linked N-acetylglucosamine (α-GlcNAc) and β1,4-linked glucuronic acid (β-GlcA) sugars. Concomitantly with chain polymerisation, the HS chain undergoes modifications at various positions including N-deacetylation/N-sulfation, epimerization, 2-O-sulfation, 6-O-sulfation, and 3-O-sulfation, resulting in the enormous structural diversity of these macromolecules7,8.
The HS backbone is synthesised in the Golgi apparatus by enzymes known as exostosins8–10. The human genome encodes five exostosins: exostosin 1 (EXT1), exostosin 2 (EXT2) and exostosin-like 1–3 (EXTL1–3). Most exostosins contain two glycosyltransferase domains; it has been proposed that each domain is responsible for one of the two types of linkage in the nascent HS polysaccharide9,11,12. The N-terminal domain, which belongs to CAZy13–15 family GT47, is thought to be capable of adding β-GlcA16,17, while the C-terminal domain belongs to the GT64 family and is capable of adding α-GlcNAc18. The HS chain can be initiated from the tetrasaccharide linker by EXTL3 or (in some cases) by EXTL2, which add the first α-GlcNAc residue (GlcNAcT-I activity)8,9. Importantly, this step determines that the tetrasaccharide will become decorated with HS, and not CS/DS19. From here, the addition of β-GlcA and further α-GlcNAc residues (GlcAT-II and GlcNAcT-II activity, respectively) is achieved mainly by EXT1 and EXT2, which are thought to operate as a hetero-oligomer9,20. EXTL3 and EXTL1 are also able to transfer α-GlcNAc residues during elongation (GlcNAcT-II); however, both have been reported to lack GlcAT-II activity21,22. Recent evidence now indicates that EXTL2, which lacks a GT47 domain13,14, is involved in blocking the initiation of HS and CS/DS chains by adding α-GlcNAc to the linkage tetrasaccharide before an inhibitory phosphate can be removed from the reducing-end xylose23.
It is frequently reported that pathway-related Golgi enzymes are physically co-localised24–26. Furthermore, a number of tandem (or ‘bi-domain’) glycosyltransferase fusions have been identified in the Golgi—including exostosins (GT47–GT64), chondroitin synthases (GT31–GT7), and LARGE enzymes (GT8–GT49), all of which are thought to synthesise alternating polysaccharides9,13,14,27,28. However, because no such enzyme complex or fusion has yet been structurally characterised, the function of this physical clustering is unclear. Therefore, the bi-domain members of the exostosin family represent an interesting target for structural investigation. So far, only the sole GT64 domain of the smallest exostosin, EXTL2, has been crystallised29. The crystal structure revealed that EXTL2 exists as a symmetric homodimer, and that the GT64 domain adopts a GT-A fold, binding UDP-GlcNAc or UDP-GalNAc with the aid of a manganese cofactor. While the GT47 domain of the larger exostosins has been tentatively predicted to adopt a GT-B fold30, the overall bi-domain structure remains to be determined, and the GT47 fold is yet to be described. This missing information could provide insight into the mechanism of HS chain extension and the diverse activities of GT47 family members in plants17,31.
Although humans produce five exostosins, only EXT1, EXT2, and EXTL3 are conserved across the animal kingdom5,9, suggesting that these enzymes encompass the core elements required for HS backbone synthesis. Unlike EXT1 and EXT2, whose activities are reported to vary according to hetero-oligomeric state20,32, EXTL3 is likely homomeric and not known to require any additional factor for complete activity, simplifying structure-function analyses. We recently described a low-resolution SAXS structure of EXTL3, the largest exostosin30. Here, we present high-resolution cryo-EM structures of the 170 kDa globular portion of an EXTL3 homodimer in the apo-form (2.4 Å resolution) and bound with UDP and Mn2+ (2.9 Å). We locate the active sites, describe the GT47 fold, explain the loss of GlcAT-II activity in EXTL3 compared with other exostosins, and speculate as to how EXTL3 achieves specificity for HS addition sites. Together, these results help to explain some of the molecular mechanisms for genetic diseases caused by mutations in exostosin genes (such as hereditary multiple exostoses and spondylo-epi-metaphyseal dysplasia) and provide insight into the organisation of glycosylation reactions within the Golgi.
Results
A sensitive assay can detect GlcNAcT-II and weak GlcAT-II activity in EXTL3ΔN preparations
We previously created a soluble form of human EXTL3 lacking the first 51 amino acids from the N terminus (EXTL3ΔN)30. This protein contains a predicted coiled coil domain33, a GT64 glycosyltransferase domain, and a GT47 glycosyltransferase domain, but lacks a transmembrane helix (Fig. 1a). To confirm that this form of EXTL3 is catalytically active, we expressed EXTL3ΔN in human embryonic kidney (EBNA 293) cells and purified the protein from the cell culture medium by Ni-NTA and size exclusion chromatography. Label-free quantification (LFQ) mass spectrometry experiments indicated that the abundances of endogenous EXT1 and EXT2 were 10,000–100,000-fold less than that of EXTL3ΔN; furthermore, EXTL1 and EXTL2 were not detected (Supplementary Tables 1 and 2). Hence, we proceeded under the assumption that the activity in these preparations originated from EXTL3ΔN.
Next, we measured the N-acetylglucosaminyltransferase activity of EXTL3ΔN. Since we were unable to obtain an appropriate acceptor for measuring GlcNAcT-I initiation activity (the primary activity of EXTL3), we instead observed the GlcNAcT-II activity of EXTL3ΔN through its ability to extend K5 heparosan oligosaccharides, which have an identical structure to the nascent HS backbone and are a well-established substrate in exostosin activity assays11,34. Accordingly, we incubated purified EXTL3ΔN with UDP-GlcNAc and a DP8 (degree of polymerisation = 8) K5 acceptor ([GlcA-GlcNAc]4) terminating in β-GlcA. The products were then characterised by polysaccharide analysis by carbohydrate electrophoresis (PACE). Consistent with the reported activity of EXTL3, the [GlcA-GlcNAc]4 acceptor was completely converted to the DP9 oligosaccharide GlcNAc-[GlcA-GlcNAc]4 following overnight incubation (Fig. 1b). No GlcNAcT-II activity was observed when EXTL3ΔN and/or UDP-GlcNAc were omitted, or when MnCl2 and MgCl2 were replaced with EDTA. Taken together, these results indicate that our preparations of EXTL3ΔN exhibit a metal-dependent GlcNAcT-II activity.
Although our mass spectrometry experiments indicated that the levels of EXT1/2 were very low, we also assayed our EXTL3ΔN preparations for potential GlcAT-II activity. Accordingly, we incubated the preparation of EXTL3ΔN with UDP-GlcA and a DP9 K5 acceptor (GlcNAc-[GlcA-GlcNAc]4) terminating in α-GlcNAc. Surprisingly, we detected appreciable, albeit incomplete, conversion of the DP9 acceptor to a putative DP10 [GlcA-GlcNAc]5 oligosaccharide following overnight incubation (Fig. 1c). Conversely, no conversion of DP9 acceptor to DP10 was observed when EXTL3ΔN and/or UDP-GlcNAc were omitted. However, unlike the GlcNAcT-II activity, the apparent GlcAT-II activity was only partially sensitive to EDTA treatment. To verify the structure of the product, we analysed the reactants and products from a completed reaction by matrix-assisted laser desorption ionisation–time of flight (MALDI-TOF) mass spectrometry. The mass spectrum indicated a mass increase of 176 Da between acceptor and product, consistent with the addition of a β-GlcA residue (Supplementary Fig. 1a, b). Furthermore, PACE analysis revealed that the product was sensitive to β-glucuronidase digestion (Supplementary Fig. 1d). We also incubated the enzyme and DP9 acceptor in the presence of UDP-GlcNAc instead of UDP-GlcA. In this case, no activity was seen by PACE analysis (Supplementary Fig. 1d). By contrast, when enzyme and DP9 acceptor were incubated simultaneously with UDP-GlcA and UDP-GlcNAc, a ladder of larger products was observed (Supplementary Fig. 1d). The masses of these products were consistent with a series of oligosaccharides with general formula GlcNAcn+1GlcAn, as well as a much smaller population with general formula GlcAnGlcNAcn (Supplementary Fig. 1c). These results indicate that, in addition to GlcNAcT-II activity, our preparations of EXTL3ΔN also exhibited a weaker level of GlcAT-II activity.
The GlcAT-II activity of EXTL3ΔN preparations is reduced when EXT1 is targeted by CRISPR-Cas9
The EXT1/2 complex has been shown to exhibit very strong GlcAT-II activity, with comparatively little GlcNAcT-II activity20,32. As EXTL3 has been previously shown to lack GlcAT-II activity21,22, we considered the possibility that the GlcAT-II activity in our preparations might in fact originate from trace levels of EXT1/2. To investigate this, we mutated endogenous EXT1 in EXTL3ΔN-expressing EBNA 293 cells using CRISPR-Cas9. We grew three separate cell cultures: regular EXTL3∆N-expressing cells (EXTL3∆N), EXTL3∆N-expressing cells transfected with a non-specific CRISPR-Cas9 construct (not targeting any known gene; EXTL3∆N CRISPRcontrol), and EXTL3∆N-expressing cells transfected with a CRISPR-Cas9 construct targeting EXT1 (EXTL3∆N CRISPREXT1). Immunofluorescence microscopy and slot blot assays indicated a substantial decrease (by 81 ± 10% and 67 ± 25%, respectively) in immuno-reactive EXT1 in the EXTL3∆N CRISPREXT1 culture compared with the EXTL3∆N CRISPRcontrol, though the signal was not totally abolished (Supplementary Figs. 2 and 3).
Subsequently, we collected and purified EXTL3ΔN from the three different cell cultures. After equalising the total protein concentration between the three EXTL3ΔN preparations, we subjected them to LFQ mass spectrometry analysis, as described above. As expected, all three samples exhibited very similar levels of EXTL3ΔN (Fig. 2a and Supplementary Table 3). Consistent with our previous results, EXTL3ΔN preparations from EXTL3∆N and EXTL3∆N CRISPRcontrol cells exhibited levels of EXT1 and EXT2 that were 10,000–100,000-fold lower than those of EXTL3ΔN. By contrast, the abundance of EXT1 in the EXTL3∆N CRISPREXT1 sample was below the threshold for detection. Interestingly, the level of EXT2 was also partially reduced in this sample. Hence, the abundance of contaminating EXT1/2 heteromer was almost certainly diminished in the EXTL3∆N CRISPREXT1 sample.
We then proceeded to quantify the GlcNAcT-II and GlcAT-II activities of the three different preparations. To do so, we quantified the percentage substrate conversion after 15 and 60 min using PACE. Firstly, the results indicated that the GlcNAcT-II activity was generally at least fourfold greater than the GlcAT-II activity (Fig. 2b–e). Furthermore, the GlcNAcT-II activity was essentially invariant between the three preparations (Fig. 2b, d; no significant difference from one-way ANOVA; p > 0.05). In contrast, the GlcAT-II activity was inconsistent between all three preparations (Fig. 2c, e; p < 0.001), and appeared to correlate with the abundance of EXT1 (as determined by LFQ). Most importantly, the GlcAT-II activity was significantly reduced in the EXTL3∆N CRISPREXT1 preparation compared to both the EXTL3∆N preparation (Tukey’s HSD; in relative terms, a reduction of 37 ± 31% after 15 min, p < 0.01; and 47 ± 28%, p < 0.01, after 60 min) and the EXTL3∆N CRISPRcontrol preparation (55 ± 23%, p < 0.001 after 15 min; 63 ± 20%, p < 0.001 after 60 min). These results strongly suggest that the observed GlcAT-II activity—but not GlcNAcT-II activity—was dependent on the presence of EXT1. Although some GlcAT-II activity was still present in the EXTL3∆N CRISPREXT1 preparation, this might be attributable to EXT2, or trace amounts of residual EXT1. Hence, given that EXTL3 has previously been found to lack GlcAT-II activity21,22, and that the EXT1/2 complex is known to exhibit excessively high GlcAT-II activity20,32, it seems likely that the majority of the observed GlcAT-II activity did not originate from EXTL3ΔN.
Quaternary architecture of EXTL3
Our previous SAXS structure of EXTL3 suggests that it forms a homodimer30, which would include four glycosyltransferase domains. To gain further insight into the arrangement of these domains, we solved the structure of EXTL3ΔN by single-particle cryo-EM, producing a high-quality Coulomb potential map at 2.4 Å resolution (Fig. 3 and Supplementary Figs. 4–6). We then built and refined an ab initio model of the 170 kDa globular domain of an EXTL3 homodimer (Fig. 3). In each chain, we were able to model two partial N-glycans at Asn592 and Asn790, respectively (Supplementary Fig. 7). The coiled coil domain, however, was not visible in the map, likely due to its flexibility with respect to the catalytic domain.
Detailed analysis of the globular structure revealed that it is indeed composed of four glycosyltransferase-type domains, with one GT47 and GT64 domain provided by each chain. Within each chain, the GT47 and GT64 domains are connected by a linker region of 124 amino acids. Curiously, the N-terminal half of the linker region constitutes an extended loop that traverses 40 Å over the surface of the GT64 domain, forming a ‘cradle’ (Supplementary Fig. 8). The GT64 domain and linker region appear to contribute the most to the homodimeric interaction (see Supplementary Table 4 for PISA35-calculated energy contributions), which appears to be stabilised by a pair of intermolecular disulfide bridges between Cys793 and Cys915 in the GT64 domain (Fig. 3).
GT64 domain structure
The reported crystal structure of mouse EXTL2 constitutes a GT64-domain homodimer, with no GT47 domain or linker region29. We aligned the GT64 portion of EXTL3 with the apo-EXTL2 structure (PDB: 1OMX), finding a good agreement, with an RMSD of 0.94 Å between the 161 aligned Cα atoms of the monomeric EXTL2 and EXTL3 GT64 domains (Fig. 4a). Furthermore, the homodimerisation of the EXTL3 GT64 domain, which involves the formation of an intermolecular β-sheet, highly resembles that of EXTL2, suggesting that this mode of interaction may also be conserved in other exostosins (Fig. 4b).
In contrast, the EXTL3 GT64 domain differs substantially from EXTL2 in the structure of the C-terminal loop (Fig. 4a). A further structural alignment with acceptor-analogue-bound EXTL2 (PDB: 1ON8; Fig. 4c) revealed that the loop sits proximally to the GT64 acceptor binding site. Whereas in EXTL2 this loop is relatively short and terminates in the solvent, in EXTL3 the loop is considerably longer and sees the C-terminus embedded in the protein. Furthermore, the EXTL3 loop is rich in cationic sidechains and contains a solvent-exposed phenylalanine side chain (Phe918; Fig. 4c). These observations are interesting in light of the previous proposition that EXTL3 might recognise the anionic and hydrophobic residues that commonly surround HS sites in core proteins36.
GT47 domain structure
Inspection of the structure revealed that the EXTL3 GT47 domain adopts a GT-B fold, with two distinct Rossman-fold subdomains (Fig. 5a). However, the EXTL3 GT47 domain appears to constitute a minimal GT-B fold, with only five β-strands in each subdomain, and only eight substantial α-helices in the entire fold (Fig. 5b, c). Using the DALI server37 to detect structural homologues, we identified glycosyltransferases from GT-B families GT90, GT63, GT113, GT4, and GT72 as the five most similar matches (Supplementary Table 5). Notably, the structure of GtfC from Streptococcus agalactiae (PDB: 4W6Q; GT113) was also identified as a close match when we searched using only the N-terminal or C-terminal subdomains, suggesting that the bacterial GT113 family might be closely related to the eukaryotic GT47 family.
Unlike GT-A-fold glycosyltransferases, which commonly possess a DxD metal-binding motif, GT-B-fold enzymes are not thought to contain any widespread amino acid motifs38,39. Nevertheless, at least some GT-B enzymes exhibit a conserved aspartate/glutamate residue in the fourth α-helix of the C-terminal subdomain (Cα4);40 the side chain of this residue typically forms a hydrogen bond with the ribose of the nucleotide sugar38,41. To investigate whether such a pattern might be present in EXTL3, we examined the GT47 structural alignments produced by DALI. Indeed, in the EXTL3 GT47 domain, Glu453 aligns well with similarly placed acidic residues in other GT-B structures (Fig. 5d). To examine the conservation of this residue amongst the wider GT47 family, we aligned the EXTL3 GT47 domain sequence with all other characterised GT47 glycosyltransferases from Homo sapiens and Arabidopsis thaliana. Indeed, Asp/Glu residues were strongly conserved at this position, supporting the idea that these residues are involved in binding the nucleotide sugar in those enzymes (Fig. 5e).
In spite of this sequence conservation, the GT47 domain of EXTL3 exhibits several unique features that point to a loss of activity in this particular enzyme. For instance, the side chains of Glu453 and Arg421 are engaged in a salt-bridge, which would direct Glu453 away from making interactions with the substrate (Fig. 5d). Moreover, our structural alignment revealed that, in contrast to other GT-B structures, the EXTL3 Cα4 helix exhibits an extra turn at its N-terminus that occludes a putative phosphate/donor sugar binding pocket (Fig. 5d). In the other substrate-bound GT-B structures that we examined, the phosphates and ribose moiety of the donor sugar were consistently bound in this pocket, as has been previously documented for GT-B-fold and other Rossmann-fold enzymes38,42. Consultation of the GT47 domain alignment revealed that both of these unusual features are absent from other characterised GT47 enzymes: no analogue to Arg421 was present in other GT47s (except in the case of EXTL1, which is not known to possess GlcAT-II activity), and the conspicuous pre-helical insertion was not observed in any of the other exostosin or plant GT47 sequences (Fig. 5e).
During manuscript preparation, high-accuracy structural predictions were made available for the human exostosins at the AlphaFold Protein Structure Database43,44 (https://alphafold.ebi.ac.uk/). Comparison of the aligned structures indicated that, whereas both the experimental and predicted EXTL3 structures exhibit a four-turn Cα4 helix and Glu453–Arg521 salt bridge, both EXT1 and EXT2 models exhibit a more canonical three-turn Cα4 helix and no salt bridge, providing space for nucleotide sugar binding (Supplementary Fig. 9). Supporting the hypothesis that UDP-GlcA binds to this site in EXT1, numerous point mutations that inactivate GlcAT-II activity in Chinese hamster EXT116 align to positions around the N-terminus of its predicted Cα4 helix (Supplementary Fig. 10). These include a mutation to Glu349 (the equivalent to Glu453 in EXTL3). Furthermore, nearby Arg340 (which lacks an equivalent in EXTL3) is the most commonly mutated EXT1 residue in hereditary multiple exostoses45–47. Hence, it seems that the unique features of EXTL3’s active site likely disrupt UDP-GlcA binding and must have occurred relatively recently in the evolution of the GT47 family.
UDP binds in the GT64 domain but not in the GT47 domain
EXTL3 is thought to possess both GlcNAcT-I and GlcNAcT-II activities9. The location(s) of the corresponding UDP-GlcNAc-binding active site(s) within EXTL3 are unknown—although analogy with EXTL2 suggests that the GlcNAcT-I activity, at least, is catalysed by the GT64 domain. Furthermore, the presence of GlcAT-II activity in our EXTL3ΔN preparations presented the possibility that EXTL3 possesses an additional UDP-GlcA-binding active site. In structural studies, the active sites of nucleotide-sugar-utilising glycosyltransferases are routinely located through co-crystallisation with the relevant donor nucleotide48, which binds stably to the active site. Hence, to locate the positions of all potential active site(s) simultaneously, we used single-particle cryo-EM to solve the structure of EXTL3 in the presence of UDP and MnCl2, achieving a resolution of 2.9 Å (Supplementary Figs. 11–13).
In the GT64 domain of the substrate-bound map, new density corresponding to UDP and a Mn2+ ion was clearly visible (Supplementary Fig. 14). The interactions between EXTL3 and UDP are highly similar to those between mouse EXTL2 and UDP-GlcNAc (PDB: 1ON6; Fig. 6a). The uracil base appears to be held in place by hydrogen bonding to the side chains of Asn697 and Asn723 and parallel-displaced π-stacking with Tyr670, while the ribose likely makes hydrogen bonds with the main-chain carbonyl group of Leu668 and the side chain of Asp745. The sidechain of Arg672 interacts with the α-phosphate. The Mn2+ ion is co-ordinated by the side chain of Asp746 (one of the DxD aspartates) and both phosphate moieties.
In contrast, despite the relatively high concentration of UDP present during grid freezing (10 mM), thorough examination of the GT47 domain in the UDP-bound map did not reveal any differences to that in the apo-structure map (Supplementary Fig. 15). This suggests that, as predicted, the EXTL3 GT47 domain does not bind UDP and is therefore unlikely to possess a glycosyltransferase activity.
Nevertheless, we attempted to use the EXTL3 structure to provide insight into the workings of its homologue EXT1, which also exhibits a bi-domain configuration but has retained its GT47 GlcAT-II activity. Since EXT1 possesses both GlcAT-II and GlcNAcT-II activities in the same chain, it is reasonable to propose that these alternating activities might be spatially and/or mechanistically linked for increased efficiency. To evaluate this hypothesis, we estimated the distance between the GT47 and GT64 active sites in the EXTL3 structure, which is likely to exhibit the same overall architecture. We observed the distance between the β-phosphate of UDP bound in the EXTL3 GT64 domain and the side chain of Glu453 in the GT47 domain to be over 45 Å (Fig. 6b). Furthermore, a high-confidence model of the EXT1/2 heterodimer generated using the AlphaFold-Multimer49 Colab server (https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb) exhibited a very similar architecture to the EXTL3 homodimer, including the GT64 dimer intermolecular β-sheet seen in EXTL2 and EXTL3 (Supplementary Fig. 16). Importantly, the GT47 and GT64 active sites were separated by almost 50 Å in EXT1—and by over 60 Å in EXT2. Hence, our results indicate that exostosins are highly unlikely to exhibit a processive mechanism of chain extension and must therefore use a simple dissociative mechanism to alternate between the two types of activity.
The EXTL3 structure rationalises exostosin dysfunction
Deleterious missense mutations in the EXTL3 gene can cause a range of developmental and neurological disorders in humans. In particular, six amino acid changes have been implicated in disease: P318L, R339W, P461L, R513C, N657S, and Y670D50–54. We mapped these mutations onto the EXTL3 structure and deduced that the first five of these mutations likely cause protein destabilisation, as they are distant from the GT64 active site and possess side chains buried in the core of the protein—in several cases connecting potentially flexible loops to other structural elements (Supplementary Table 6 and Supplementary Fig. 17). This is consistent with the fact that the EXTL3[R513C] protein is undetectable in the Golgi bodies of fibroblasts from affected individuals, and is therefore suggested to be mislocalised or degraded51. In contrast, our structural data indicated that Tyr670, which relates to the sixth mutation, appears to help bind the uracil moiety of UDP. Its mutation to aspartate therefore likely impairs or inhibits substrate binding in the active site. This is consistent with the fact that, although EXTL3[Y670D] appears to localize normally in fibroblasts, HS levels are nevertheless reduced in affected individuals51.
GT47 evolution and the origin of bi-domain exostosins
In contrast to animals, which encode only a few GT47 glycosyltransferase enzymes, plants are known to express dozens of GT47-family enzymes, which have been grouped into six clades: GT47-A–F17,55. These enzymes have diverse activities and are highly important in cell wall synthesis. However, the evolutionary relationship between plant and animal GT47 enzymes has not yet been investigated. We used a Hidden Markov Model (HMM; http://bcb.unl.edu/dbCAN2/blast.php) to extract GT47-family protein sequences from diverse opisthokonts and plants. After aligning the sequences and truncating them to their GT47 domain, a maximum-likelihood phylogeny was constructed (Fig. 7). Although Arabidopsis sequences could be grouped into six distinct clades, as reported previously, we noted the existence of a seventh, well-supported clade that grouped the opisthokont sequences with sequences from the plants Ginkgo biloba and Physcomitrium patens. Although these plant enzymes do not possess a C-terminal GT64 domain, their high level of similarity with animal exostosins suggests that they could possess a related activity. We propose that this clade of sequences be named GT47-G.
Of the metazoans known to produce HS, the most distantly related to humans are the Cnidaria. However, our analysis indicated that the choanoflagellate Monosiga brevicollis (a sister to metazoans) and the poriferan sponge Amphimedon queenslandica (one of the earliest diverging metazoans) also exhibit GT47-G sequences. Interestingly, whereas the M. brevicollis sequence (Monbr1|21955) lacks a GT64 domain, A. queenslandica possesses three bi-domain exostosins (Aqu2.1.28641_001, Aqu2.1.41542_001, and Aqu2.1.32581_001) that appear orthologous to EXT1, EXT2, and EXTL3 (Fig. 7 and Supplementary Fig. 18). This suggests that the bi-domain architecture (and perhaps HS itself) arose at the outset of metazoan evolution.
Discussion
In this work, we used single-particle cryo-EM to produce a high-resolution map of the largest bi-domain exostosin, EXTL3. The structure reveals that EXTL3’s globular domain forms a symmetrical homodimer in a very similar fashion to murine EXTL2—consistent with the prediction that its stem domain forms a homodimeric coiled coil. This apparent belt-and-braces approach suggests that dimerisation could be important for EXTL3’s function. However, the increasing number of homodimeric Golgi GT structures deposited in the Protein Data Bank56 hints that GT homodimerisation plays a more fundamental role in Golgi biology.
By conducting a phylogenetic analysis of animal and plant sequences, we established that some plants possess enzymes closely related to exostosins. It appears that these GT47-G sequences have been omitted from several previous phylogenies of P. patens GT47 sequences17,57—perhaps due to the lack of any orthologous sequences in Arabidopsis. As this clade is likely to have emerged before the divergence of animals and plants, its existence calls into question the hypothesis that the ExAD-related GT47 clade (GT47-E), which contains all known GT47 sequences from chlorophyte algae, represents the origin of all plant GT47 enzymes17,58.
Using the EXTL3 structure, we were able to determine that GT47-family glycosyltransferase domains exhibit a GT-B-type fold with a conserved Asp/Glu located on the Cα4 helix, which corresponds to a residue essential for the GlcAT-II activity of Chinese hamster EXT116. Analogies with similar GT-B structures implicated this residue in donor substrate binding. Interestingly, the importance of the equivalent residue (Glu293) in GT47 member IRX10—a xylan backbone β1,4-xylosyltransferase from Arabidopsis59—has recently been demonstrated by showing that the ectopic expression of an IRX10 E293Q mutant abrogates xylan synthesis in a dominant-negative fashion60.
In EXTL3, however, the Cα4 helix is extended, and its conserved glutamate is engaged in a salt bridge that appears to orient the side chain so that it cannot play a catalytic role. These active-site-occluding features, as well as its apparent lack of UDP binding, indicate that the domain has likely lost activity through evolution. This is consistent with previous biochemical results21,22. Despite this, we were able to detect GlcAT-II activity in our preparations of EXTL3ΔN. This inverting activity, which was not fully inhibited by EDTA treatment, is unlikely to be a side-activity of the EXTL3 GT64 domain, as the GT64 catalytic mechanism is both retaining and Mn2+/Mg2+-dependent. Since the activity was substantially reduced after CRISPR-Cas9 targeting of endogenous EXT1 in the expression host, we attribute it primarily to background activity. The substantial level of background might be explained by the high sensitivity of our assay (necessary to detect the minor GlcNAcT-II activity of EXTL3) and the disproportionately high specific activity of the EXT1/2 heterodimer. It is also possible that EXTL3 further stimulates the activity of EXT1 and/or EXT2, perhaps through additional protein interactions. Nevertheless, a possibility still remains that EXTL3 possesses some intrinsic GlcAT-II activity. If so, significant conformational changes would likely be required for UDP-GlcA binding, perhaps triggered by the binding of an allosteric regulator or even the catalytic cycle of the GT64 domain.
If, as we propose, the GT47 domain has been inactivated, it raises the question as to why it has not been lost in its entirety (as it has in EXTL2). It is possible that the GT47 domain has some role in stabilising the complex—or is capable of mediating protein-protein interactions. Another explanation lies in the fact that EXTL3’s stem domain (which connects the globular part to the N-terminal transmembrane helix) has been extended and rigidified by the presence of the long putative coiled coil. This suggests that physical separation of the C-terminal GT64 domain from the membrane might be important to EXTL3 function. If so, then the GT47 domain (which sits between the coiled coil and the GT64 domain) may simply contribute to the gap between membrane and GlcNAcT-I/II active site. However, some EXTL3 substrates appear to be closely associated with the membrane; for instance, glypican-1 possesses an HS addition site a mere 40 residues from its GPI anchor61. It is hard to explain how EXTL3’s rigid stem would help it to access such a substrate. It is possible that the combined length of the stem and GT47 domain allows the GT64 domain to reach across the Golgi lumen to substrates at the opposing membrane.
Previous results indicate that bi-domain EXTL3 possesses two distinct activities: GlcNAcT-I and GlcNAcT-II activity53. In contrast, single-domain EXTL2 possesses GlcNAcT-I-type activity, but not GlcNAcT-II activity62,63. Our results confirm that, in EXTL3, both activities ought to be catalysed by the GT64 domain. Hence, EXTL2 must exhibit differences in its GT64 active site relative to EXTL3 that exclude HS backbone acceptors in favour of the (phosphorylated) linkage tetrasaccharide. In the absence of an acceptor-bound EXTL3 structure, we could not make strong conclusions about the difference in substrate specificity. Nevertheless, we note the presence of several gross structural differences between the two different acceptor binding sites, including the divergent C-termini and the presence of a nearby N-glycan (Asn790) in EXTL3. Importantly, the polybasic C-terminus appears the best explanation for EXTL3’s preference for polyacidic HS addition sites in core proteins.
In any case, we found that the EXTL3 GT64 domain active site is separated from the potential active site of the GT47 domain by a substantial distance—similarly to the two GT2 domains of E. coli K4 chondroitin polymerase64. Given that the same domain organisation was predicted in the EXT1/2 heterodimer, this observation strongly suggests that the GlcAT-II and GlcNAcT-II reactions in bi-domain exostosins are not concerted. Therefore, the combination of both domains in one protein may simply help to constrain diffusion of the acceptor, thereby increasing its local concentration. Not only may this increase catalytic efficiency, but it may also protect from potential interference posed by promiscuous glycosyltransferases. This idea is consistent with the fact that Golgi GTs involved in the same pathway often form hetero-oligomeric complexes—a phenomenon that has been proposed to facilitate substrate channeling25,65. Hence, our structural data shed light on the requirement for fine-scale organisation of the Golgi glycosylation machinery in order to meet its demands for glycan biogenesis.
Methods
Expression and purification of EXTL3ΔN
EXTL3ΔN was expressed in human embryonic kidney cells (expressing the Epstein-Barr virus nuclear antigen-1; EBNA 293) as described previously30. Briefly, a cDNA encoding amino acids 52–919 of human EXTL3 was cloned into expression vector pCEP4-BM40-HisTEV. EBNA 293 cells transfected with this construct were grown to confluence before undergoing a 3–4-day expression period in EX-CELL® 325 PF CHO Serum-Free Medium (Merck, Darmstadt, Germany).
Secreted EXTL3ΔN was then purified from filtered culture medium by nickel affinity chromatography (using a 1 ml HisTrap column; Cytiva, Marlborough, MA, USA) and size exclusion chromatography (using a Superdex® 200 Increase 10/300 GL column or a HiLoad 16/60 Superdex® 200 pg column; Cytiva) in a buffer containing 50 mM Tris-HCl, pH 6.8, 100 mM NaCl, and 50 mM KCl.
CRISPR/Cas9 targeting of EXT1 in EXTL3ΔN-expressing cells
EBNA 293 cells expressing EXTL3ΔN were transfected either with a trio of human EXT1-targeted CRISPR/Cas9 knockout plasmids (EXT1 CRISPR/Cas9 KO Plasmid (h), sc-404635; Santa Cruz Biotechnology, Dallas, TX, USA) or a non-specific CRISPR/Cas9 control plasmid (not targeting any known gene; Control CRISPR/Cas9 Plasmid, sc-418922; Santa Cruz Biotechnology) according to the manufacturer’s instructions. All plasmids encoded a GFP marker to indicate transfection. The generated cells were denoted EXTL3∆N CRISPREXT1 cells and EXTL3∆N CRISPRcontrol cells, respectively. Successful transfections were visually confirmed by detection of GFP via deconvolution fluorescent microscopy according to the manufacturer’s instructions. The EXTL3∆N CRISPREXT1 cells were then further transfected with a set of EXT1 homology-directed repair (HDR) template plasmids (EXT1 HDR Plasmid (h), sc-404635-HDR; Santa Cruz Biotechnology) for permanent expression according to the manufacturer’s instructions. Cells that had undergone HDR were then selected by virtue of their puromycin resistance (derived from the integrated HDR template).
Deconvolution immunofluorescence microscopy
Prior to EXT1 HDR plasmid transfection, expression of EXT1 in EXTL3∆N CRISPREXT1 cells and EXTL3∆N CRISPRcontrol cells was examined by immunofluorescence microscopy66. In detail: cells were washed with PBS (137 mM NaCl, 2.7 mM KCl, 8 mM Na2HPO4, and 2 mM KH2PO4, pH 7.4) and fixed in acetone in order to retain cellular and subcellular structures. The fixed cells were first pre-coated with 10% anti-mouse total Ig and then exposed to primary anti-EXT1 antibody (A-7, sc-515144; Santa Cruz Biotechnology; dilution 1:100) overnight. After extensive washings with PBS, the cells were treated with Alexa Fluor 594-tagged goat anti-mouse IgG (A-11005; ThermoFisher, Waltham, MA, USA; dilution 1:500) for 4 h. To visualise nuclei, DNA staining was performed with 4′,6-diamidino-2-phenylindole (DAPI; ThermoFisher; diluted to 300 μM), as well as staining with antibodies, as recommended by the manufacturers. In the controls, the primary antibody was omitted. The fluorescent images were analysed by using a Carl Zeiss AxioObserver inverted fluorescence microscope with deconvolution technique and equipped with objective EC “Plan-Neofluar” 63 X/1.25 Oil M27 and AxioCam MRm Rev Camera. Identical exposure settings and times were used for all images. During microscopy, the entire slides were scanned and immunofluorescence images at 20 X and 100 X magnifications were captured. The low magnification images were used to identify representative locations for high magnification images. Quantification of EXT1 expression level was analysed by scanning entire 20 X magnification images using Zeiss AxioVision Release 4.8 software.
Slot blot
Cells (104 cells) were extracted with radio-immunoprecipitation assay (RIPA) buffer (0.1% w/v SDS, 0.5% v/v Triton X-100, 0.5% w/v sodium deoxycholate in PBS supplemented with proteinase inhibitors (cOmplete mini) and 0.5 mM phenylmethylsulfonyl fluoride) at 4 °C. RIPA extracts were analysed by slot blotting to PVDF membranes that were incubated with anti-EXT1 antibody (A-7, sc-515144; 1:200 dilution) or anti-actin beta antibody (AC-15 anti-β-actin-peroxidase, A3854; Sigma-Aldrich, St Louis, MO, USA; 1:25,000 dilution) followed by visualization using horseradish peroxidase-conjugated anti-mouse IgG (Bio-Rad, Hercules, CA, United States; 172-1011; dilution 1:500). The membranes were developed by chemiluminescence (GE Healthcare, Sweden) using a Fujifilm detector. In the controls, the primary antibody was omitted. Beta-actin was used as loading control. Staining intensities were recorded by densitometry using Gel-Pro Analyser software, version 3.0.00.00. Images from uncropped blots (as well as all other uncropped gels) are provided in Supplementary Fig. 19.
Peptide mass spectrometry
Protein solutions were reduced (DTT) and alkylated (iodoacetamide) and subjected to enzymatic digestion with sequencing-grade trypsin (Promega, Madison, WI, USA) overnight at 37 °C. After digestion, the supernatant was pipetted into a sample vial and loaded onto an autosampler for automated LC-MS/MS analysis. All LC-MS/MS experiments were performed using a Dionex Ultimate 3000 RSLC nanoUPLC (Thermo Fisher Scientific Inc, Waltham, MA, USA) system and a Q Exactive Orbitrap mass spectrometer (Thermo Fisher Scientific Inc, Waltham, MA, USA). Separation of peptides was performed by reverse-phase chromatography at a flow rate of 300 nl min−1 and a Thermo Scientific reverse-phase nano Easy-spray column (Thermo Scientific PepMap C18, 2 µm particle size, 100 Å pore size, 75 µm i.d. × 50 cm length). Peptides were loaded onto a pre-column (Thermo Scientific PepMap 100 C18, 5 µm particle size, 100 Å pore size, 300 µm i.d. × 5 mm length) from the Ultimate 3000 autosampler with 0.1% formic acid for 3 min at a flow rate of 10 µl min−1. After this period, the column valve was switched to allow elution of peptides from the pre-column onto the analytical column. Solvent A was 0.1% formic acid and solvent B was 80% acetonitrile with 0.1% formic acid. The linear gradient employed was 2–40% B in 30 min. Further wash and equilibration steps gave a total run time of 60 min.
The LC eluant was sprayed into the mass spectrometer by means of an Easy-Spray source (Thermo Fisher Scientific Inc.). All m/z values of eluting ions were measured in an Orbitrap mass analyzer, set at a resolution of 35,000 and scanning range between m/z 380–1500. Data-dependent scans (Top 20) were employed to automatically isolate and generate fragment ions by higher energy collisional dissociation (HCD, NCE:25%) in the HCD collision cell and measurement of the resulting fragment ions was performed in the Orbitrap analyser, set at a resolution of 17,500. Singly charged ions and ions with unassigned charge states were excluded from MS/MS and a dynamic exclusion window of 20 s was employed.
Post-run, the data was processed using Protein Discoverer (version 2.3, ThermoFisher). Briefly, all MS/MS data were submitted to the Mascot search algorithm (Matrix Science, London UK, version 2.6.0) and searched against a common contaminants database (cRAP 20190401, 125 sequences; 41129 residues) and the UniProt human database (CCP_UniProt_homo_sapiens_proteome_20180409 database, 93,609 sequences; 37,041,084 residues). Variable modifications of oxidation (M), deamidation (NQ), and a fixed modification of carbamidomethyl (C) were applied. Peak areas for each identified peptide were generated and combined to give protein abundance. The peptide and fragment mass tolerances were set to 25 ppm and 0.1 Da, respectively.
Scaffold (version Scaffold_4.10.0, Proteome Software Inc., Portland, OR) was used to validate MS/MS based peptide and protein identifications. Peptide identifications were accepted if they could be established at greater than 90.0% probability by the Scaffold Local FDR algorithm. Protein identifications were accepted if they could be established at greater than 99.0% probability and contained at least two identified peptides. Protein probabilities were assigned by the Protein Prophet algorithm67. Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony. Proteins sharing significant peptide evidence were grouped into clusters.
Preparation of K5 heparosan oligosaccharide acceptors
DP10 lyase product from E. coli K5 capsular polysaccharide was purchased from Elicityl (Crolles, France). To remove the non-reducing-terminal Δ-4,5-unsaturated uronic acid, the DP10 oligosaccharide was treated with BT4658GH88 from Bacteroides thetaiotamicron VPI-548268 in an overnight reaction containing 6 mM DP10 oligosaccharide, 50 mM ammonium acetate, pH 5.5, and 0.3 mg ml−1 BT4658GH88 at 30 °C. BT4658GH88 was a gift of Clelton Santos (LNBr/CNPEM, Brazil). Glycosidase was removed by passing the sample through a 30 kDa NanoSep centrifugal filter (Pall, New York, USA). The resultant DP9 GlcNAc-[GlcA-GlcNAc]4 oligosaccharide was in turn converted to DP8 [GlcA-GlcNAc]4 by treatment with PaGH89 α-N-acetylglucosaminidase (Novozymes, Bagsværd, Denmark) in an overnight reaction containing 6 mM DP9 oligosaccharide, 25 mM ammonium acetate, pH 5.5, and 0.4 mg ml−1 PaGH89 at 30 °C. As before, glycosidase was removed by passing the sample through a 30 kDa NanoSep centrifugal filter.
K5 heparosan extension assay
Reactions were carried out in a thermocycler at 37 °C for 15 min, 1 h, or 16 h before termination at 99 °C for 10 min. Each 10 µl reaction contained 25 mM ammonium acetate, pH 6.5, 10 μM DP8/DP9 heparosan acceptor, 25 µg ml−1 purified EXTL3ΔN, 1.5 mM UDP-GlcNAc/UDP-GlcA, 3 mM MnCl2, and 3 mM MgCl2. For quantitative assays, protein concentration was adjusted by BCA assay for every technical replicate. For cofactor depletion experiments, MnCl2 and MgCl2 were replaced with 10 mM EDTA. Completed reactions were dried using a centrifugal evaporator. For β-glucuronidase treatment, products were resuspended in 10 μl 50 mM ammonium acetate, pH 5.5, supplemented with either 0.1 mg ml−1 bovine liver β-glucuronidase, type B-1 (BtGUSB; Merck) or 50 μg ml−1 TharGH79a from Trichoderma harzianum (THAR02_03122; GenBank: KKP04785.1; Novozymes) before incubating at 37 °C overnight and re-drying. Derivatisation of reducing ends with 8-aminonaphthalene-1,3,6-trisulfonic acid (ANTS) was then achieved by resuspending each sample in 15 µl of labelling reagent containing 33 mM ANTS, 0.45 M 2-picoline-borane, 67% DMSO, and 5% acetic acid and incubating at 37 °C overnight. Derivatised products were then analysed by Polysaccharide Analysis using Carbohydrate Electrophoresis (PACE) as described previously69. Briefly, samples were dried using a GeneVac miVac DNA centrifugal concentrator (Genevac Ltd, Ipswich, UK) at 60 °C before being resuspended in 5 µl 6 M urea. From each sample, 2.5 µl was loaded into a 240 × 180 × 0.75 mm polyacrylamide gel comprising a stacking gel with 10% polyacrylamide and a resolving gel with 20% polyacrylamide, both containing 0.1 M Tris-borate, pH 8.2. Gels were run in 0.1 M Tris-borate buffer in a Hoefer SE660 electrophoresis tank (Hoefer Inc, Holliston, MA, USA) at 200 V, 5 mA for 30 min and then 1000 V, 30 mA for 135 min before imaging using a G-box (Syngene, Cambridge, UK) fitted with 365 nm UV tubes, applying a 500–600 nm short pass detection filter. Images were acquired with GeneSnap software (Syngene, version 7.12).
Oligosaccharide mass spectrometry
Reactions were carried out as above, except that volumes were scaled up by a factor of ten. Completed reactions were passed through a 10 kDa NanoSep centrifugal filter (Pall) and dried using a centrifugal concentrator before desalting with a Dowex (50WX8, Na+ form, 100–200 mesh; Merck) cation exchange column as described previously70. Briefly, Dowex beads were washed thrice in 4 M HCl before extensive washing in MilliQ water followed by three washes with 5% acetic acid. A glass Pasteur pipette was plugged with glass wool and fitted with a valve (consisting of a piece of hose closed with a screw compressor clamp) before packing with 500 µl Dowex beads. The beads were then washed with 1 ml 5% acetic acid; flow was permitted until the top of the liquid phase drew level with the top of the resin bed. Dried sample was resuspended in 50 µl 5% acetic acid and loaded onto the column; just enough liquid was allowed to drain so that the meniscus returned to the previous level. After placing the hose in a collection vial, the sample was then eluted with further 5% acetic acid. The eluate was dried in a centrifugal evaporator before derivatisation using 2-aminobenzamide (2-AB), followed by GlycoClean S (Agilent, Santa Clara, CA, USA) clean-up, as described previously70. Briefly, dried sample was resuspended in 10 µl labelling reagent (60 mg ml−1 2-AB and 60 mg ml−1 sodium cyanoborohydride in 3:7 acetic acid: DMSO) and incubated at 65 °C for 3 h. Each GlycoClean S cartridge was washed with 1 ml MilliQ water, then 5 ml 30% acetic acid, then 4 ml 100% acetonitrile. After cooling to room temperature, labelled sample was spotted onto the wetted cartridge membrane and incubated for 15 min. The cartridge was washed with 1 ml 100% acetonitrile followed by five sequential 1 ml washes with 96% acetonitrile. The labelled oligosaccharides were then eluted with 1.5 ml MilliQ water. After drying, samples were resuspended in water and mixed 1:1 (v/v) with 2,5-dihydroxybenzoic acid (DHB) matrix (20 mg ml−1 in 50% methanol with 0.4 mg ml−1 ammonium sulfate) before spotting 2 µl on a ground-steel target plate for mass spectrometry. Data were collected on an ultrafleXtreme MALDI/TOF-TOF instrument (Bruker) using a 2-kHz smartbeam-II laser and acquired on reflector negative ion mode (mass range 1000–4500 Da). On average, 10,000 shots were used to obtain high enough resolution. Bruker flexControl and flexAnalysis software (versions 3.4 and 3.4, respectively) were used for acquisition and analysis, respectively.
Cryo-EM sample preparation
Purified EXTL3ΔN was concentrated to 0.1–0.4 mg ml−1. Using a PELCO easiGlow system (Ted Pella), QUANTIFOIL® holey carbon grids (R 1.2/1.3 for apo-structure; R 2/2 for UDP-bound structure) were glow-discharged for 60 s at 25 mA, before application and vitrification of a 3 µl protein sample using a FEI Vitrobot Mark IV system (Thermo Fisher Scientific) at 4 °C and 95% humidity, with a blot time of 3 s. For the UDP-bound structure, UDP and MnCl2 were added to the protein preparation 150 min before grid freezing at final concentrations of 10 mM and 2.5 mM, respectively.
Cryo-EM data collection
Grids were screened using a 200 kV Talos Arctica microscope (Thermo Fisher Scientific) and movies were collected using a 300 kV Titan Krios microscope (Thermo Fisher Scientific) at the BiocEM facility, Department of Biochemistry, University of Cambridge. All data collection parameters are listed in Supplementary Table 7.
Cryo-EM data processing
Refinement parameters are listed in Supplementary Table 5. Apo-EXTL3 was processed in RELION 3.0.171. Initially, 1440 movies were corrected for beam-induced motion using MotionCor272. CTF estimation was performed using Gctf v1.0673. A total of 656,292 particles were extracted using auto-picking (binned 3×). After several rounds of 2D classification, an initial model with C1 symmetry was constructed using an SGD algorithm implemented in RELION. Several rounds of 3D classification reduced the total number of particles to 171,285. These particles were re-extracted with a 420 px box size (binned 1×); subsequently, a new initial model with C2 symmetry was constructed, and a 3D refinement was performed. After movie refinement and particle polishing, 3D refinement and post-processing at 0.67 Å/px were used to create the final map with 2.4 Å resolution (FSC = 0.143 criterion). Ab initio model-building was achieved using Buccaneer74 and Coot75 (version 0.8.9.2) via the CCP-EM interface76 (version 1.4.1) before refinement in ISOLDE77 (on the UCSF ChimeraX78 platform, version 1.1) and Phenix real-space refine79 (Phenix version 1.18.2-3874). Refinement parameters are listed in Supplementary Table 7.
The UDP-bound structure was processed using RELION 3.0.171. Initially, 2573 movies were corrected for beam-induced motion using MotionCor2. The CTF was estimated using Gctf v1.06. After visual inspection, 38 movies were removed before auto-picking and extraction of 1,133,069 particles with a box size of 240 Å (binned 3×). These particles underwent several rounds of 2D classification before construction of an initial 3D model via an SGD algorithm implemented in RELION. A single round of 3D classification produced a 3D reference in order to auto-pick and extract 1,696,344 particles afresh (binned 3×). The same reference was used for further 3D classification in order to select 238,379 particles that were subsequently re-extracted with a box size of 360 px (binned 1×). After finding no discernible differences between the two monomers, a new initial model with C2 symmetry was constructed in order to conduct a further 3D refinement. Two rounds of CTF refinement and Bayesian polishing were carried out before post-processing at 1.06 Å/px to create the final map with 2.9 Å resolution (FSC = 0.143 criterion). The apo-structure was aligned to the map and used as a starting model before further refinement in Coot, ISOLDE, and Phenix real-space refine. Refinement parameters are listed in Supplementary Table 7.
Sequence alignments, logos, and phylogenetics
For the sequence alignment of characterised GT47s, sequences were downloaded manually from UniProtKB (https://www.uniprot.org/) or from TAIR (https://www.arabidopsis.org/). The C-terminal halves of the human protein sequences were removed before alignment of all sequences with MUSCLE80,81 v3.8.31.
To extract GT47 domain-containing sequences from further animal and plant proteomes, proteome models were downloaded for Homo sapiens (from NCBI Genome; https://www.ncbi.nlm.nih.gov/genome/), Drosophila melanogaster (NCBI), Caenorhabditis elegans (NCBI), Amphimedon queenslandica (EnsemblMetazoa; https://metazoa.ensembl.org/Amphimedon_queenslandica/Info/Index), Monosiga brevicollis MX182 (JGI MycoCosm; https://mycocosm.jgi.doe.gov/Monbr1/Monbr1.home.html), Arabidopsis thaliana (JGI Phytozome v12; https://phytozome.jgi.doe.gov/pz/portal.html), Ginkgo biloba83,84 (GigaDB; http://gigadb.org/dataset/100613), and Physcomitrium patens (JGI Phytozome v12). A GT47 Hidden Markov Model (HMM) was downloaded from the dbCAN2 server85,86 (http://bcb.unl.edu/dbCAN2/) and used to search the proteomes for GT47-family sequences using hmmsearch from the HMMER package87(version 3.1b2) with an E-value cut-off of 10−10. For the animal sequences, the GT64 domain portion was removed manually for each sequence following a preliminary alignment of these sequences with MUSCLE. Similarly, the sulfatase domain of Monbr1|21955 was removed. All sequences were then aligned with MAFFT v7.310, and each aligned sequence was truncated to a portion corresponding to residues 196–538 of human EXTL3 using a custom Python (version 2.7.17, alitrunc.py) script. ProtTest 388,89 (version 3.4.2) was subsequently used to determine the most appropriate model of protein evolution, before constructing a phylogeny using RAxML90,91 (version 8.2.11) with 100 rapid bootstraps. The resultant tree was visualised using FigTree v1.4.4 and edited for publication using Inkscape (version 1.0.1).
Figure preparation
All figures were prepared using Inkscape. Structures were rendered using ChimeraX and PyMOL (version 2.4.0a0 Open-Source).
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
This work was funded by grants from the University of Cambridge, OpenPlant (BB/L014130/1, P.D.), the Swedish Research Council (2014-03402, K.M.; 2016-04855, D.T.L.), Cancerfonden (21 1426 Pj 01 H, K.M.), and the Wellcome Trust (200873/Z/16/Z, B.F.L.). L.F.L.W. was supported by the University of Cambridge. T.D. was supported by an AstraZeneca studentship. We thank Professor emeritus Ingemar Carlstedt for financial support to the Glycobiology group. We thank the Lund Protein Production Platform, Lund University, Sweden (http://www.lu.se/lp3) for providing support for protein purification and the Cambridge Centre for Proteomics, Cambridge University, UK, for carrying out proteomic analyses. We thank Lee Cooper, University of Cambridge, for help with grid preparation and Clelton Santos, State University of Campinas, for glycosidase preparation.
Source data
Author contributions
L.F.L.W., P.D., K.M., and D.T.L. designed experiments. K.M. performed protein expressions and CRISPR-Cas9 experiments. S.W.H., L.F.L.W., and T.D. performed protein purification and cryo-EM sample preparation. L.F.L.W. and A.E.-P. carried out activity assays. T.T. and L.F.L.W. carried out carbohydrate mass spectrometry. K.B.R.M.K. identified a relevant glycosyl hydrolase. D.Y.C. performed cryo-EM data collection. T.D. and L.F.L.W. processed the cryo-EM data. L.F.L.W. wrote the manuscript. P.D., D.T.L., K.M., and B.F.L. supervised the project and contributed to the manuscript.
Peer review
Peer review information
Nature Communications thanks Hiroshi Kitagawa and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
The data that support this study are available from the corresponding authors upon reasonable request. Numerical data for Fig. 2d,e, Supplementary Fig. 2e, and Supplementary Fig. 3b can be found in the Source Data file. Cryo-EM maps for apo- and UDP-bound EXTL3 were deposited in the Electron Microscopy Data Bank under accession codes EMD-11923 (apo structure) and EMD-11926 (UDP-bound structure). Atomic co-ordinates for apo- and UDP-bound EXTL3 were submitted to the Protein Data Bank under accession codes 7AU2 (apo structure) and 7AUA (UDP-bound structure). The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD032145 (regular EXTL3ΔN purification) and PXD032144 (CRISPR experiments). Proteomic search databases are available at UniProt (human proteome reference [https://www.uniprot.org/proteomes/UP000005640]) and The Global Proteome Machine (cRAP common contaminants database [https://www.thegpm.org/crap/]). Individual protein sequences were downloaded from the UniProtKB (HsEXT1: Q16394; HsEXT2: Q93063; HsEXTL1: Q92935; HsEXTL3: O43909) or TAIR (AtMUR3: AT2G20370; AtXLT2: AT5G62220; AtXUT1: AT1G63450; AtARAD1: AT2G35100; AtIRX10: AT1G27440; AtIRX10L: AT5G61840; AtIRX7: AT2G28110; AtIRX7L: AT5G22940; AtExAD: AT3G57630; AtXGD1: AT5G33290) databases. AlphaFold pre-computed structural predictions are available from the AlphaFold Protein Structure Database (HsEXT1: Q16394; HsEXT2: Q93063; HsEXTL3: O43909). Proteome models are available at NCBI Genome (Homo sapiens: RefSeq GCF_000001405.40; Drosophila melanogaster: RefSeq GCF_000001215.4; Caenorhabditis elegans: RefSeq GCF_000002985.6), JGI Phytozome (Arabidopsis thaliana: 167 [https://phytozome-next.jgi.doe.gov/info/Athaliana_TAIR10]; Physcomitrium patens: 318 [https://phytozome-next.jgi.doe.gov/info/Ppatens_v3_3]), JGI MycoCosm (Monosiga brevicollis: Monosiga brevicollis MX1 [https://mycocosm.jgi.doe.gov/Monbr1/Monbr1.home.html]), EnsemblMetazoa (Amphimedon queenslandica: Aqu1 [https://metazoa.ensembl.org/Amphimedon_queenslandica/Info/Index]), and GigaDB (Ginkgo biloba: 100613). The GT47 Hidden Markov Model used in this work is available from the dbCAN2 server (dbCAN-HMMdb-V9). Source data are provided with this paper.
Code availability
The short Python script used to truncate alignments in this work (alitrunc.py) is available at Zenodo (10.5281/zenodo.6562402). Graphical views of phylogenetic trees were made using FigTree v1.4.4. [http://tree.bio.ed.ac.uk/software/figtree/].
Competing interests
The authors declare the following competing interest: KBK is an employee of Novozymes, which is a major enzyme producing company. The remaining authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
K. Mani, Email: katrin.mani@med.lu.se
P. Dupree, Email: pd101@cam.ac.uk
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-31048-2.
References
- 1.Sarrazin S, Lamanna WC, Esko JD. Heparan sulfate proteoglycans. Cold Spring Harb. Perspect. Biol. 2011;3:a004952. doi: 10.1101/cshperspect.a004952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Couchman JR, Pataki CA. An introduction to proteoglycans and their localization. J. Histochem. Cytochem. 2012;60:885–897. doi: 10.1369/0022155412464638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bernfield M, et al. Functions of cell surface heparan sulfate proteoglycans. Annu. Rev. Biochem. 1999;68:729–777. doi: 10.1146/annurev.biochem.68.1.729. [DOI] [PubMed] [Google Scholar]
- 4.Clausen TM, et al. SARS-CoV-2 infection depends on cellular heparan sulfate and ACE2. Cell. 2020;183:1043–1057.e15. doi: 10.1016/j.cell.2020.09.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Feta A, Do AT, Rentzsch F, Technau U, Kusche-Gullberg M. Molecular analysis of heparan sulfate biosynthetic enzyme machinery and characterization of heparan sulfate structure in Nematostella vectensis. Biochem. J. 2009;419:585–593. doi: 10.1042/BJ20082081. [DOI] [PubMed] [Google Scholar]
- 6.Yamada S, Sugahara K, Özbek S. Evolution of glycosaminoglycans: Comparative biochemical study. Commun. Integr. Biol. 2011;4:150–158. doi: 10.4161/cib.4.2.14547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Esko, J. D., Kimata, K. & Lindahl, U. Proteoglycans and sulfated glycosaminoglycans. in Essentials of Glycobiology (eds. Varki, A., Cummings, R. & Esko, J.) (Cold Spring Harbor Laboratory Press, 2009). [PubMed]
- 8.Kreuger J, Kjellén L. Heparan sulfate biosynthesis: regulation and variability. J. Histochem. Cytochem. 2012;60:898–907. doi: 10.1369/0022155412464972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Busse-Wicher M, Wicher KB, Kusche-Gullberg M. The exostosin family: Proteins with many functions. Matrix Biol. 2014;35:25–33. doi: 10.1016/j.matbio.2013.10.001. [DOI] [PubMed] [Google Scholar]
- 10.Xu D, Esko JD. A Golgi-on-a-chip for glycan synthesis. Nat. Chem. Biol. 2009;5:612–613. doi: 10.1038/nchembio0909-612. [DOI] [PubMed] [Google Scholar]
- 11.Lind T, Lindahl U, Lidholt K. Biosynthesis of heparin/heparan sulfate. Identification of a 70-kDa protein catalyzing both the D-glucuronosyl- and the N-acetyl-D-glucosaminyltransferase reactions. J. Biol. Chem. 1993;268:20705–20708. doi: 10.1016/S0021-9258(19)36835-8. [DOI] [PubMed] [Google Scholar]
- 12.Fransson LÅ, et al. Biosynthesis of decorin and glypican. Matrix Biol. 2000;19:367–376. doi: 10.1016/S0945-053X(00)00083-4. [DOI] [PubMed] [Google Scholar]
- 13.Campbell JA, Davies GJ, Bulone V, Henrissat B. A classification of nucleotide-diphospho-sugar glycosyltransferases based on amino acid sequence similarities. Biochem. J. 1997;326:929–939. doi: 10.1042/bj3260929u. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Coutinho PM, Deleury E, Davies GJ, Henrissat B. An evolving hierarchical family classification for glycosyltransferases. J. Mol. Biol. 2003;328:307–317. doi: 10.1016/S0022-2836(03)00307-3. [DOI] [PubMed] [Google Scholar]
- 15.Drula E, et al. The carbohydrate-active enzyme database: functions and literature. Nucleic Acids Res. 2022;50:D571–D577. doi: 10.1093/nar/gkab1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wei G, et al. Location of the glucuronosyltransferase domain in the heparan sulfate copolymerase EXT1 by analysis of Chinese hamster ovary cell mutants. J. Biol. Chem. 2000;275:27733–27740. doi: 10.1074/jbc.M002990200. [DOI] [PubMed] [Google Scholar]
- 17.Geshi, N., Harholt, J., Sakuragi, Y., Krüger Jensen, J. & Scheller, H. V. Glycosyltransferases of the GT47 Family. in Annual Plant Reviews (ed. Ulvskov, P.) vol. 41 265–283 (Wiley-Blackwell, 2011).
- 18.Edvardsson, E. et al. The Plant Glycosyltransferase Family GT64: In Search of a Function. in Annual Plant Reviews (ed. Ulvskov, P.) vol. 41 285–303 (Wiley-Blackwell, 2011).
- 19.Prydz K. Determinants of glycosaminoglycan (GAG) structure. Biomolecules. 2015;5:2003–2022. doi: 10.3390/biom5032003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.McCormick C, Duncan G, Goutsos KT, Tufaro F. The putative tumor suppressors EXT1 and EXT2 form a stable complex that accumulates in the Golgi apparatus and catalyzes the synthesis of heparan sulfate. Proc. Natl Acad. Sci. USA. 2000;97:668–673. doi: 10.1073/pnas.97.2.668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Busse M, et al. Contribution of EXT1, EXT2, and EXTL3 to heparan sulfate chain elongation. J. Biol. Chem. 2007;282:32802–32810. doi: 10.1074/jbc.M703560200. [DOI] [PubMed] [Google Scholar]
- 22.Kim B-T, et al. Human tumor suppressor EXT gene family members EXTL1 and EXTL3 encode α1,4-N-acetylglucosaminyltransferases that likely are involved in heparan sulfate/heparin biosynthesis. Proc. Natl Acad. Sci. 2001;98:7176–7181. doi: 10.1073/pnas.131188498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kitagawa H. Unexpected roles of Exostosin-like 2, EXTL2, in glycosaminoglycan biosynthesis and function. Trends Glycosci. Glycotechnol. 2019;31:SE15–SE17. doi: 10.4052/tigg.1907.2SE. [DOI] [Google Scholar]
- 24.de Graffenried CL, Bertozzi CR. The roles of enzyme localisation and complex formation in glycan assembly within the Golgi apparatus. Curr. Opin. Cell Biol. 2004;16:356–363. doi: 10.1016/j.ceb.2004.06.007. [DOI] [PubMed] [Google Scholar]
- 25.Kellokumpu S, Hassinen A, Glumoff T. Glycosyltransferase complexes in eukaryotes: Long-known, prevalent but still unrecognized. Cell. Mol. Life Sci. 2016;73:305–325. doi: 10.1007/s00018-015-2066-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Oikawa A, Lund CH, Sakuragi Y, Scheller HV. Golgi-localized enzyme complexes for plant cell wall biosynthesis. Trends Plant Sci. 2013;18:49–58. doi: 10.1016/j.tplants.2012.07.002. [DOI] [PubMed] [Google Scholar]
- 27.Kitagawa H, Uyama T, Sugahara K. Molecular cloning and expression of a human chondroitin synthase. J. Biol. Chem. 2001;276:38721–38726. doi: 10.1074/jbc.M106871200. [DOI] [PubMed] [Google Scholar]
- 28.Inamori K, et al. Dystroglycan function requires xylosyl- and glucuronyltransferase activities of LARGE. Science. 2012;335:93–96. doi: 10.1126/science.1214115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Pedersen LC, et al. Crystal structure of an α1,4-N-acetylhexosaminyltransferase (EXTL2), a member of the exostosin gene family involved in heparan sulfate biosynthesis. J. Biol. Chem. 2003;278:14420–14428. doi: 10.1074/jbc.M210532200. [DOI] [PubMed] [Google Scholar]
- 30.Awad W, Kjellström S, Svensson Birkedal G, Mani K, Logan DT. Structural and biophysical characterization of human EXTL3: domain organization, glycosylation, and solution structure. Biochemistry. 2018;57:1166–1177. doi: 10.1021/acs.biochem.7b00557. [DOI] [PubMed] [Google Scholar]
- 31.Zhong R, Ye Z-H. Unraveling the functions of glycosyltransferase family 47 in plants. Trends Plant Sci. 2003;8:565–568. doi: 10.1016/j.tplants.2003.10.003. [DOI] [PubMed] [Google Scholar]
- 32.Busse M, Kusche-Gullberg M. In vitro polymerization of heparan sulfate backbone by the EXT proteins. J. Biol. Chem. 2003;278:41333–41337. doi: 10.1074/jbc.M308314200. [DOI] [PubMed] [Google Scholar]
- 33.Zak BM, Crawford BE, Esko JD. Hereditary multiple exostoses and heparan sulfate polymerization. Biochim. Biophys. Acta - Gen. Subj. 2002;1573:346–355. doi: 10.1016/S0304-4165(02)00402-6. [DOI] [PubMed] [Google Scholar]
- 34.Lidholt K, Lindahl U. Biosynthesis of heparin. The D-glucuronosyl- and N-acetyl-D-glucosaminyltransferase reactions and their relation to polymer modification. Biochem. J. 1992;287:21–29. doi: 10.1042/bj2870021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 2007;372:774–797. doi: 10.1016/j.jmb.2007.05.022. [DOI] [PubMed] [Google Scholar]
- 36.Esko JD, Zhang L. Influence of core protein sequence on glycosaminoglycan assembly. Curr. Opin. Struct. Biol. 1996;6:663–670. doi: 10.1016/S0959-440X(96)80034-0. [DOI] [PubMed] [Google Scholar]
- 37.Holm L. DALI and the persistence of protein shape. Protein Sci. 2020;29:128–140. doi: 10.1002/pro.3749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Hu Y, Walker S. Remarkable structural similarities between diverse glycosyltransferases. Chem. Biol. 2002;9:1287–1296. doi: 10.1016/S1074-5521(02)00295-8. [DOI] [PubMed] [Google Scholar]
- 39.Breton C, Najdrová LŠ, Jeanneau C, Koca J, Imberty A. Structures and mechanisms of glycosyltransferases. Glycobiology. 2006;16:29–37. doi: 10.1093/glycob/cwj016. [DOI] [PubMed] [Google Scholar]
- 40.Wrabl JO, Grishin NV. Homology between O-linked GlcNAc transferases and proteins of the glycogen phosphorylase superfamily. J. Mol. Biol. 2001;314:365–374. doi: 10.1006/jmbi.2001.5151. [DOI] [PubMed] [Google Scholar]
- 41.Martinez-Fleites C, et al. Insights into the synthesis of lipopolysaccharide and antibiotics through the structures of two retaining glycosyltransferases from Family GT4. Chem. Biol. 2006;13:1143–1152. doi: 10.1016/j.chembiol.2006.09.005. [DOI] [PubMed] [Google Scholar]
- 42.Hol W, Duijnen P, Van, Berendsen H. The alpha-helix dipole and the properties of proteins. Nature. 1978;273:443–446. doi: 10.1038/273443a0. [DOI] [PubMed] [Google Scholar]
- 43.Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Varadi M, et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022;50:D439–D444. doi: 10.1093/nar/gkab1061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wuyts W, et al. Mutations in the EXT1 and EXT2 genes in hereditary multiple exostoses. Am. J. Hum. Genet. 1998;62:346–354. doi: 10.1086/301726. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Ishimaru D, et al. Large-scale mutational analysis in the EXT1 and EXT2 genes for Japanese patients with multiple osteochondromas. BMC Genet. 2016;17:52. doi: 10.1186/s12863-016-0359-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Fusco C, et al. Mutational spectrum and clinical signatures in 114 families with hereditary multiple osteochondromas: Insights into molecular properties of selected exostosin variants. Hum. Mol. Genet. 2019;28:2133–2142. doi: 10.1093/hmg/ddz046. [DOI] [PubMed] [Google Scholar]
- 48.Moremen KW, Haltiwanger RS. Emerging structural insights into glycosyltransferase-mediated synthesis of glycans. Nat. Chem. Biol. 2019;15:853–864. doi: 10.1038/s41589-019-0350-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv 2021.10.04.463034. 10.1101/2021.10.04.463034 (2021).
- 50.Guo L, et al. Identification of biallelic EXTL3 mutations in a novel type of spondylo-epi-metaphyseal dysplasia. J. Hum. Genet. 2017;62:797–801. doi: 10.1038/jhg.2017.38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Oud MM, et al. Mutations in EXTL3 cause neuro-immuno-skeletal dysplasia syndrome. Am. J. Hum. Genet. 2017;100:281–296. doi: 10.1016/j.ajhg.2017.01.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Volpi S, et al. EXTL3 mutations cause skeletal dysplasia, immune deficiency, and developmental delay. J. Exp. Med. 2017;214:623–637. doi: 10.1084/jem.20161525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Yamada S. Specific functions of Exostosin-like 3 (EXTL3) gene products. Cell. Mol. Biol. Lett. 2020;25:39. doi: 10.1186/s11658-020-00231-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Bajaj S, et al. An ultra-rare case of immunoskeletal dysplasia with neurodevelopmental abnormalities in an Indian patient with homozygous c.953C > T variant in EXTL3 gene: a case report. BMC Pediatr. 2022;22:78. doi: 10.1186/s12887-022-03143-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Li X, Cordero I, Caplan J, Mølhøj M, Reiter WD. Molecular analysis of 10 coding regions from Arabidopsis that are homologous to the MUR3 xyloglucan galactosyltransferase. Plant Physiol. 2004;134:940–950. doi: 10.1104/pp.103.036285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Harrus D, Kellokumpu S, Glumoff T. Crystal structures of eukaryote glycosyltransferases reveal biologically relevant enzyme homooligomers. Cell. Mol. Life Sci. 2018;75:833–848. doi: 10.1007/s00018-017-2659-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Harholt J, et al. The glycosyltransferase repertoire of the spikemoss Selaginella moellendorffii and a comparative study of its cell wall. PLoS One. 2012;7:e35846. doi: 10.1371/journal.pone.0035846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Ulvskov P, Paiva DS, Domozych D, Harholt J. Classification, naming and evolutionary history of glycosyltransferases from sequenced green and red algal genomes. PLoS One. 2013;8:e76511. doi: 10.1371/journal.pone.0076511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Jensen JK, Johnson NR, Wilkerson CG. Arabidopsis thaliana IRX10 and two related proteins from psyllium and Physcomitrella patens are xylan xylosyltransferases. Plant J. 2014;80:207–215. doi: 10.1111/tpj.12641. [DOI] [PubMed] [Google Scholar]
- 60.Brandon AG, Birdseye DS, Scheller HV. A dominant negative approach to reduce xylan in plants. Plant Biotechnol. J. 2020;18:5–7. doi: 10.1111/pbi.13198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Pan J, Ho M. Role of glypican-1 in regulating multiple cellular signaling pathways. Am. J. Physiol. - Cell Physiol. 2021;321:C846–C858. doi: 10.1152/ajpcell.00290.2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Kitagawa H, Shimakawa H, Sugahara K. The tumor suppressor EXT-like Gene EXTL2 Encodes an α1,4-N-acetylhexosaminyltransferase that transfers N-acetylgalactosamine and N-acetylglucosamine to the common glycosaminoglycan-protein linkage region. J. Biol. Chem. 1999;274:13933–13937. doi: 10.1074/jbc.274.20.13933. [DOI] [PubMed] [Google Scholar]
- 63.Kitagawa H, et al. rib-2, a Caenorhabditis elegans homolog of the human tumor suppressor EXT genes encodes a novel α1,4-N-Acetylglucosaminyltransferase involved in the biosynthetic initiation and elongation of heparan sulfate. J. Biol. Chem. 2001;276:4834–4838. doi: 10.1074/jbc.C000835200. [DOI] [PubMed] [Google Scholar]
- 64.Osawa T, et al. Crystal structure of chondroitin polymerase from Escherichia coli K4. Biochem. Biophys. Res. Commun. 2009;378:10–14. doi: 10.1016/j.bbrc.2008.08.121. [DOI] [PubMed] [Google Scholar]
- 65.Young WW. Organization of golgi glycosyltransferases in membranes: Complexity via complexes. J. Membr. Biol. 2004;198:1–13. doi: 10.1007/s00232-004-0656-5. [DOI] [PubMed] [Google Scholar]
- 66.Cheng F, et al. Amyloid precursor protein (APP)/APP-like protein 2 (APLP2) expression is required to initiate endosome-nucleus-autophagosome trafficking of glypican-1-derived heparan sulfate. J. Biol. Chem. 2014;289:20871–20878. doi: 10.1074/jbc.M114.552810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Nesvizhskii AI, Keller A, Kolker E, Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 2003;75:4646–4658. doi: 10.1021/ac0341261. [DOI] [PubMed] [Google Scholar]
- 68.Cartmell A, et al. How members of the human gut microbiota overcome the sulfation problem posed by glycosaminoglycans. Proc. Natl Acad. Sci. USA. 2017;114:7037–7042. doi: 10.1073/pnas.1704367114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Goubet F, Jackson P, Deery MJ, Dupree P. Polysaccharide analysis using carbohydrate gel electrophoresis. A method to study plant cell wall polysaccharides and polysaccharide hydrolases. Anal. Biochem. 2002;300:53–68. doi: 10.1006/abio.2001.5444. [DOI] [PubMed] [Google Scholar]
- 70.Tryfona, T. & Stephens, E. Analysis of carbohydrates on proteins by offline normal-phase liquid chromatography MALDI-TOF/TOF-MS/MS. in Methods in Molecular Biology (eds. Cutillas, P. R. & Timms, J. F.) vol. 658 137–151 (Humana Press, 2010). [DOI] [PubMed]
- 71.Zivanov J, et al. New tools for automated high-resolution cryo-EM structure determination in RELION-3. eLife. 2018;7:e42166. doi: 10.7554/eLife.42166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Zheng SQ, et al. MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy. Nat. Methods. 2017;14:331–332. doi: 10.1038/nmeth.4193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Zhang K. Gctf: Real-time CTF determination and correction. J. Struct. Biol. 2016;193:1–12. doi: 10.1016/j.jsb.2015.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Hoh SW, Burnley T, Cowtan K. Current approaches for automated model building into cryo-EM maps using Buccaneer with CCP-EM. Acta Crystallogr. Sect. D. Struct. Biol. 2020;76:531–541. doi: 10.1107/S2059798320005513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Emsley P, Lohkamp B, Scott WG, Cowtan K. Features and development of Coot. Acta Crystallogr. Sect. D. Biol. Crystallogr. 2010;66:486–501. doi: 10.1107/S0907444910007493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Burnley T, Palmer CM, Winn M. Recent developments in the CCP-EM software suite. Acta Crystallogr. Sect. D. Struct. Biol. 2017;73:469–477. doi: 10.1107/S2059798317007859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Croll TI. ISOLDE: A physically realistic environment for model building into low-resolution electron-density maps. Acta Crystallogr. Sect. D. Struct. Biol. 2018;74:519–530. doi: 10.1107/S2059798318002425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Pettersen EF, et al. UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci. 2021;30:70–82. doi: 10.1002/pro.3943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Afonine PV, Headd JJ, Terwilliger TC, Adams PD. New tool: phenix.real_space_refine. Comput. Crystallogr. Newsl. 2013;4:43–44. [Google Scholar]
- 80.Edgar RC. MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinforma. 2004;5:113. doi: 10.1186/1471-2105-5-113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Edgar RC. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.King N, et al. The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature. 2008;451:783–788. doi: 10.1038/nature06617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Guan R, et al. Draft genome of the living fossil Ginkgo biloba. Gigascience. 2016;5:49. doi: 10.1186/s13742-016-0154-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Guan, R. et al. Updated genome assembly of Ginkgo biloba GigaScience Database. 10.5524/100613 (2019).
- 85.Yin Y, et al. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2012;40:445–451. doi: 10.1093/nar/gks479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Zhang H, et al. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2018;46:W95–W101. doi: 10.1093/nar/gky418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Eddy SR. Accelerated profile HMM searches. PLoS Comput. Biol. 2011;7:e1002195. doi: 10.1371/journal.pcbi.1002195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Abascal F, Zardoya R, Posada D. ProtTest: Selection of best-fit models of protein evolution. Bioinformatics. 2005;21:2104–2105. doi: 10.1093/bioinformatics/bti263. [DOI] [PubMed] [Google Scholar]
- 89.Darriba D, Taboada GL, Doallo R, Posada D. ProtTest 3: Fast selection of best-fit models of protein evolution. Bioinformatics. 2011;27:1164–1165. doi: 10.1093/bioinformatics/btr088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Stamatakis A. RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–2690. doi: 10.1093/bioinformatics/btl446. [DOI] [PubMed] [Google Scholar]
- 91.Stamatakis A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support this study are available from the corresponding authors upon reasonable request. Numerical data for Fig. 2d,e, Supplementary Fig. 2e, and Supplementary Fig. 3b can be found in the Source Data file. Cryo-EM maps for apo- and UDP-bound EXTL3 were deposited in the Electron Microscopy Data Bank under accession codes EMD-11923 (apo structure) and EMD-11926 (UDP-bound structure). Atomic co-ordinates for apo- and UDP-bound EXTL3 were submitted to the Protein Data Bank under accession codes 7AU2 (apo structure) and 7AUA (UDP-bound structure). The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD032145 (regular EXTL3ΔN purification) and PXD032144 (CRISPR experiments). Proteomic search databases are available at UniProt (human proteome reference [https://www.uniprot.org/proteomes/UP000005640]) and The Global Proteome Machine (cRAP common contaminants database [https://www.thegpm.org/crap/]). Individual protein sequences were downloaded from the UniProtKB (HsEXT1: Q16394; HsEXT2: Q93063; HsEXTL1: Q92935; HsEXTL3: O43909) or TAIR (AtMUR3: AT2G20370; AtXLT2: AT5G62220; AtXUT1: AT1G63450; AtARAD1: AT2G35100; AtIRX10: AT1G27440; AtIRX10L: AT5G61840; AtIRX7: AT2G28110; AtIRX7L: AT5G22940; AtExAD: AT3G57630; AtXGD1: AT5G33290) databases. AlphaFold pre-computed structural predictions are available from the AlphaFold Protein Structure Database (HsEXT1: Q16394; HsEXT2: Q93063; HsEXTL3: O43909). Proteome models are available at NCBI Genome (Homo sapiens: RefSeq GCF_000001405.40; Drosophila melanogaster: RefSeq GCF_000001215.4; Caenorhabditis elegans: RefSeq GCF_000002985.6), JGI Phytozome (Arabidopsis thaliana: 167 [https://phytozome-next.jgi.doe.gov/info/Athaliana_TAIR10]; Physcomitrium patens: 318 [https://phytozome-next.jgi.doe.gov/info/Ppatens_v3_3]), JGI MycoCosm (Monosiga brevicollis: Monosiga brevicollis MX1 [https://mycocosm.jgi.doe.gov/Monbr1/Monbr1.home.html]), EnsemblMetazoa (Amphimedon queenslandica: Aqu1 [https://metazoa.ensembl.org/Amphimedon_queenslandica/Info/Index]), and GigaDB (Ginkgo biloba: 100613). The GT47 Hidden Markov Model used in this work is available from the dbCAN2 server (dbCAN-HMMdb-V9). Source data are provided with this paper.
The short Python script used to truncate alignments in this work (alitrunc.py) is available at Zenodo (10.5281/zenodo.6562402). Graphical views of phylogenetic trees were made using FigTree v1.4.4. [http://tree.bio.ed.ac.uk/software/figtree/].