Abstract
Soluble HMW1C-like N-glycosyltransferases (NGTs) catalyze the glycosylation of Asn residues in proteins, a process fundamental for bacterial autoaggregation, adhesion and pathogenicity. However, our understanding of their molecular mechanisms is hindered by the lack of structures of enzymatic complexes. Here, we report structures of binary and ternary NGT complexes of Aggregatibacter aphrophilus NGT (AaNGT), revealing an essential dyad of basic/acidic residues located in the N-terminal all α-domain (AAD) that intimately recognizes the Thr residue within the conserved motif Asn0-X+1-Ser/Thr+2. Poor substrates and inhibitors such as UDP-galactose and UDP-glucose mimetics adopt non-productive conformations, decreasing or impeding catalysis. QM/MM simulations rationalize these results, showing that AaNGT follows a SN2 reaction mechanism in which the acceptor asparagine uses its imidic form for catalysis and the UDP-glucose phosphate group acts as a general base. These findings provide key insights into the mechanism of NGTs and will facilitate the design of structure-based inhibitors to treat diseases caused by non-typeable H. influenzae or other Gram-negative bacteria.
Subject terms: X-ray crystallography, Computational biology and bioinformatics, Enzymes
NGTs glycosylate asparagine residues in proteins, crucial for bacterial adhesion and pathogenicity. Here, the authors provide insights via crystallography and simulations, showing acceptor asparagine uses imidic form for catalysis and UDP-glucose phosphate group acts as general base.
Introduction
The attachment of sugar molecules to asparagine residues in proteins, also known as N-glycosylation, is a widely occurring post-translational modification found in most eukaryotes1 and some prokaryotes2,3. It has been reported that more than 7000 human proteins are N-glycosylated4. These N-glycans play essential roles in cellular function, mostly in eukaryotes, and are involved in processes such as protein folding and stability, protein trafficking, and signal transduction, playing a major role in health and disease5,6. Together with protein O-glycosylation, N-glycosylation is present in most approved or preclinical protein therapeutics7–10. It affects immunogenicity11 and potency12, motivating the close study of glycosylation pathways and glycosylation mechanisms5,13.
N-glycosylation is initiated by the membrane-bound oligosacharyltransferase enzyme (OST), which attaches a preassembled oligosaccharide to Asn residues in nascent glycoproteins. OST is a complex oligomeric enzyme in proteins of animals, plants, and fungi14, whereas it is a monomeric enzyme in bacteria, archaea, and protozoa15,16. It was discovered in 200317 that bacteria are able to perform a simpler version of N-glycosylation in which a single monosaccharide or a disaccharide, rather than a complex oligosaccharide, is attached to specific asparagine residues. This process is catalyzed by soluble N-glycosyltransferase enzymes (NGTs). Bacterial N-glycosylation, which is either accomplished by OSTs or NGTs, is important for bacterial survival, adhesion, autoaggregation, and pathogenicity6,16–18.
The first known example of an NGT was discovered in non-typeable Haemophilus influenzae17. This enzyme, termed HMW1C, was demonstrated to be capable of adding mono- and di-hexose units onto Asn residues of proteins, such as HMW1 adhesin18. Similar enzymes were later discovered in many other Gram-negative bacteria, including Actinobacillus pleuropneumoniae19, Haemophilus ducreyi and Mannheimia haemolytica20, Yersinia pestis and Escherichia coli10, Aggregatibacter aphrophilus11, Kingella kingae, and Bibersteinia trehalosi21.
Although OSTs and NGTs differ in the type of donor substrate (lipid-bound sugars for OSTs versus nucleotide-bound sugars for NGTs), they are both inverting GTs, i.e. they catalyze the formation of a glycosidic bond with inversion of configuration of the donor anomeric carbon22,23. They also share the Asn0-X+1(X ≠ P)-Ser/Thr+2 conserved sequence motif in their protein acceptor substrates24 (Fig. 1a). The numbering for the sequon follows a specific convention. The acceptor Asn residue is assigned the number 0, and the subsequent amino acids are then numbered in a positive manner. However, the architecture of OSTs and NGTs differ significantly. Structural studies have shown that bacterial and eukaryotic OST catalytic domains are multi-spanning membrane glycosyltransferases (GTs) that adopt a GT-C fold15,25,26. In contrast, NGTs are two-domain enzymes that consists of an N-terminal all-α domain (AAD) and a C-terminal catalytic domain that adopts a GT-B fold, as shown for Actinobacillus pleuropneumoniae NGT (ApNGT)23. In addition, while the OSTs require a metal, preferably Mn+2, to catalyze the reaction, no metal is necessary for the NGTs13,15,16,18.
Despite previous structural studies on the free form of ApNGT and in complex with UDP, the peptide substrate recognition and the catalytic mechanism of either OSTs or NGTs enzymes remain unknown27. Most inverting GTs follow a mechanism in which a catalytic base deprotonates the incoming nucleophile of the acceptor (here Asn; Fig. 1a). However, previous attempts to gain a better comprehension of NGT catalytic mechanism via site-directed mutagenesis of active site residues have been unsuccessful, raising questions regarding the identity of the catalytic base that deprotonates the Asn acceptor, which remains unidentified28.
Here, we have employed a multidisciplinary approach involving structural biology, synthetic chemical biology, kinetic experiments, and computational techniques to uncover the recognition of substrates and the catalytic mechanism of Aggregatibacter aphrophilus NGT (AaNGT), an inverting GT that is an orthologue of ApNGT. Our results indicate that Thr residues at positions +2 and +3 are the main determinants of peptide-NGT interactions, while poor donor substrates such as UDP-galactose (UDP-Gal), and inhibitors such as UDP-Glc mimetics, adopt unproductive conformations that compete with the peptide substrate, resulting in reduced or impeded catalysis, respectively. QM/MM metadynamics simulations show that AaNGT follows a concerted single-displacement SN2 mechanism in which the acceptor Asn attacks the sugar donor anomeric carbon via the imidic form and the α-phosphate of UDP-Glc acts as the catalytic base.
Results
Kinetic and binding of AaNGT against a model peptide and sugar nucleotides
To perform biophysical experiments using AaNGT, we designed a full-length construct that was expressed in E. coli (residues M1-I621; Supplementary Fig. 1 and “Methods” section). To evaluate its activity, we synthesized a peptide containing the GNWT motif that was found to be robustly glucosylated by other NGTs in the context of a longer peptide20. Additionally, our peptide contained a Thr residue at +3, which was demonstrated to improve glucosylation, and a Phe residue at −2, which was found to be tolerated in terms of glycosylation20. AaNGT showed a hyperbolic profile in the presence of variable concentrations of UDP-Glc or FGNWTT (Fig. 1b). The Kms for UDP-Glc and FGNWTT were determined to be 90 ± 30 and 79 ± 11 μM, respectively, and the kcat values were 20 ± 2 and 16 ± 1 min−1, respectively (Fig. 1b and Supplementary Table 1). These values are in accordance with previously reported kcat values for other NGTs, which range from 18 to 300 min−1 depending on the peptide sequences20,28. To investigate other sugar nucleotides, we compared the initial velocities of AaNGT using UDP-Glc, UDP-Gal, and UDP-Glc mimetics (UDP-2F-Glc and UDP-5S-Glc), being the latest prepared according to our previous methodologies29 (see Supplementary Figs. 3 and 4 and “Methods” section). We used 500 µM concentrations of the sugar nucleotides and the peptide (saturated conditions found for UDP-Glc and the peptide), and found that the initial velocity with UDP-Gal was ~40-fold lower than that of UDP-Glc and completely inactive with either UDP-2F-Glc or UDP-5S-Glc (Fig. 1c), implying that we could potentially trap co-crystals of AaNGT complexed to UDP-Glc mimetics and the peptide. Note that similar results with the sugar nucleotides were previously found for other NGTs and AaNGT20,28,30. We then performed isothermal titration calorimetry (ITC) experiments with the sugar nucleotides, UDP and the peptide. While reasonable titration curves were obtained with the sugar nucleotides and UDP (Fig. 1d and Supplementary Fig. 2), the peptide unfortunately precipitated during the ITC experiment, which impeded obtaining useful data. The results of our ITC experiments demonstrated that the product of the reaction, UDP, binds to AaNGT ~ 3–5.8-fold and 18-fold better than UDP-Glc mimetics and UDP-Gal, respectively (Fig. 1e and Supplementary Table 2). Analysis of the thermodynamic parameters indicated that the interaction of the enzyme with UDP and UDP-Gal is enthalpically favored, while that with UDP-2F-Glc and UDP-5S-Glc is entropically favored. These findings suggest that the binding modes and/or interactions of the ligand-protein complexes vary depending on the nucleotide (Supplementary Table 2).
Overall, our results of kinetic and binding experiments indicate that UDP-Glc is a preferred substrate of AaNGT compared to UDP-Gal, whereas UDP-Glc mimetics act as inhibitors of the enzyme. In addition, the data suggest that the enzyme binds worse to the sugar nucleotides than UDP.
Architecture of binary and ternary complexes of AaNGT
Having characterized AaNGT with different sugar nucleotides, UDP, and the peptide FGNWTT, we obtained P212121 crystals of AaNGT in complex with UDP-Gal, UDP-2F-Glc, and UDP plus the peptide (Table 1). Diffraction was poor for crystals obtained in the presence of UDP-5S-Glc, thus precluding obtaining useful data with this sugar nucleotide. Attempts to obtain a ternary complex that resembles a Michaelis complex by combining the UDP-2F-Glc or UDP-Gal with the peptide yielded a complex without peptide (see below for an explanation). These crystal structures have resolutions between 1.76 and 2.80 Å, and contain two molecules per asymmetric unit (Table 1). Root mean square deviation (RMSD) values of ~0.32 Å and 0.54 Å were computed between chains A and B of the asymmetric unit for the enzyme complex with UDP-sugar (UDP-Gal/UDP-2F-Glc) and UDP-peptide, respectively. This reveals that the two AaNGT molecules are slightly more different in the complex with UDP and the peptide compared to the complexes with UDP-sugar. Analysis of the AaNGT structure with the DALI server31 revealed structural homology to three top hits: ApNGT (PDB entries 3Q3E, 3Q3I, and 3Q3H), the protein O-fucosyltransferase SPINDLY from Arabidopsis thaliana (PDB entry 7Y4I) and the Xanthomonas campestris putative O-GlcNAc transferase, OGT (PDB entries 2VSY, 2JLB and 2VSN). As expected, ApNGT has the lowest RMSD of 1.4 Å on 618 aligned residues, due to the high structural homology between both NGTs (Fig. 2a and Supplementary Fig. 1). On the contrary, the second and third top hits superimposed less well with AaNGT (RMSDs of ~3.4 and ~4.8 Å between SPINDLY and AaNGT, and XcOGT and AaNGT crystal structures, respectively; the superimposed residues ranged from 453 to 467 residues). This is likely due to their different GT-B fold catalytic domains, as these GTs bind to different sugar nucleotides and glycosylate different protein substrates32–34. Furthermore, NGTs possess a distinct N-terminal AAD domain, while XcOGT and SPINDLY contain N-terminal TPRs, which differ in fold from the AAD domain.
Table 1.
AaNGT-UDP-Gal | AaNGT-UDP-2F-Glc | AaNGT-UDP-peptide | |
---|---|---|---|
Data collection | |||
Space group | P212121 | P212121 | P212121 |
Wavelength (Å) | 0.9792 | 0.9792 | 0.9792 |
Cell dimensions | |||
a, b, c (Å) | 47.34, 113.33, 260.11 | 47.07, 114.00, 258.01 | 47.03, 111.48, 256.37 |
α, β, γ (°) | 90, 90, 90 | 90, 90, 90 | 90, 90, 90 |
Number of protein molecules per asymmetric unit | 2 | 2 | 2 |
Resolution (Å) | 20–1.76 (1.86–1.76)a | 258.01–2.73 (2.88–2.73)a | 20–2.80 (2.95–2.80)a |
Rmerge | 0.065 (1.796) | 0.104 (2.173) | 0.177 (2.049) |
Rpim | 0.025 (0.691) | 0.044 (0.914) | 0.063 (0.740) |
Mn(I) half-set correlation CC(1/2) | 0.999 (0.430) | 0.998 (0.458) | 0.996 (0.352) |
I / σI | 14.2 (1.2) | 8.7 (0.9) | 8.2 (2.1) |
Completeness (%) | 99.9 (100) | 100 (100) | 99.7 (100) |
Redundancy | 7.7 (7.7) | 6.6 (6.6) | 8.4 (8.1) |
Total number of reflections | 1074677 | 250180 | 286978 |
Total number unique reflections | 139852 | 38131 | 34243 |
Refinement | |||
Resolution (Å) | 1.76 | 2.73 | 2.80 |
Rwork/Rfree | 0.173 (0.2096) | 0.197/0.245 | 0.178/0.237 |
No. atoms | |||
Protein | 9958 | 9966 | 9919 |
Peptide | – | – | 79 |
UDP | – | 25 | 50 |
Ethylenglycol | 102 | – | – |
UDP-Gal | 36 | – | – |
UDP-2F-Glc | – | 36 | – |
Waters | 787 | 8 | 5 |
B-factors (Å2) | |||
Protein | 40.66 | 111.37 | 88.56 |
Peptide | – | – | 109.35 |
UDP | – | 144.216 | 86.30 |
Ethylenglycol | 57.15 | – | – |
UDP-Gal | 47.30 | – | – |
UDP-2F-Glc | – | 124.72 | – |
Waters | 45.87 | 74.16 | 60.51 |
R.m.s. deviations | |||
Bond lengths (Å) | 0.0138 | 0.0061 | 0.0095 |
Bond angles (°) | 1.8771 | 1.396 | 1.6565 |
One crystal was used to determine the crystal structure.
aValues in parentheses are for highest-resolution shell.
The heart-shaped structure of AaNGT (Fig. 2a), either complexed to UDP-Gal/UDP-2F-Glc or UDP-peptide, shows the N-terminal AAD domain and C-terminal GT-B fold catalytic domain, which is composed of an N- and C-Rossman fold subdomains. All ligands were well resolved, except for the Phe1-2 side chain of the peptide, which did not display any density (Fig. 2b), likely explaining why Phe at site -2 is only tolerated by NGTs without any preference20. A surface representation of the ternary complex reveals that UDP is partly buried and mainly recognized by the C-Rossmann fold subdomain while the peptide is more solvent exposed and tethered mostly by the N-Rossmann fold subdomain and the AAD domain (Fig. 2c).
The sugar nucleotide binding site of AaNGT
The AaNGT substrate binding site is formed by the sugar nucleotide and the peptide binding sites (Fig. 3a). The uracil moiety of UDP/UDP-Gal/UDP-2F-Glc is stabilized by CH–π interactions with His494 and Tyr500, alongside hydrogen bonds to the side chain and backbone of Ser495. The ribose moiety is recognized in all three complexes by hydrogen bonds to Asp524. Additionally, interactions with Arg280 are observed in the ternary complex. The UDP pyrophosphate of the ternary complex (AaNGT + UDP + peptide) exhibits more interactions than in the binary complexes, with Thr519, Asn520, and Gly521 backbones, and Ser277, Thr437, Lys440, and Asn520 side chains contributing to this. In contrast, the UDP-Gal and UDP-2F-Glc pyrophosphates are only recognized by the side chains of Lys440 and Asn520. The sugar molecules interact with the side chains of His276 and Ser277. Furthermore, the Gal moiety specifically interacts with the Gly369 backbone, while the 2F-Glc moiety interacts with the side chain of His370 (Fig. 3a). Therefore, these more extensive interactions of UDP with the enzyme could explain the higher affinity of UDP compared to the sugar nucleotides (Figs. 1e and 3a). Furthermore, superposition of the nucleotide structures reveals significant differences in the uracil moiety for UDP versus UDP-Gal/UDP-2F-Glc (specifically, the β-phosphate orientation differs significantly), as well as in the ribose and pyrophosphate between all structures and the sugar moieties (Fig. 3b). This suggests that the binding of the nucleotides to the enzyme is flexible and dynamic, which could contribute to the catalytic pathway of NGTs.
The peptide binding site of AaNGT
Multiple hydrogen bond interactions are observed at the peptide binding site in the ternary complex (AaNGT-peptide-UDP). These interactions include the backbone atoms of Phe1-2 and Ser277, the backbone of Asn30 and the UDP α-phosphate, the side chain of Asn30 and Gly369, the backbone of Thr5+2 and the Arg177 side chain, the Thr5+2 and Asp215 side chains, the Thr6+3 backbone and the Arg177 side chain, and the Thr6+3 and His214 side chains (Fig. 3a). The structural data demonstrate that the key residues involved in recognition of the peptide are Arg177, His214, and Asp215 from the AAD domain. Notably, Thr5+2, which is part of the Asn0-X+1-Ser/Thr+2 sequence motif, is differently recognized by OSTs, which instead use the WWD sequence to recognize it. This indicates that the mechanisms employed to recognize peptide substrates by NGTs and OSTs differ. Furthermore, the interactions between Thr6+3 and Arg177/His214 provide an explanation for the previously reported kinetic data on other NGTs, which showed that Thr at position +3 improves the kinetics against peptides with this particular residue20.
CH-π interactions between the peptide and the protein were also observed, in particular those involving Trp4+1 and the methylene group of Gln468 (AaNGT). The latter residue is located at the peptide binding site and its importance for enzyme activity was previously demonstrated. In fact, the equivalent residue in ApNGT (Gln469) was found to be deleterious for NGT activity, whereas its mutation to Ala significantly improved NGT activity35. The superposition of all ligands for all three structures (AaNGT/UDP, AaNGT/UDP-Gal, and AaNGT/UDP-2F-Glc) clearly shows that the sugar moieties of UDP-Gal and UDP-2F-Glc collide with Gly2-1 and Asn30 (Fig. 3b). This suggests that the sugar nucleotides adopt unproductive conformations in their respective enzyme complexes, which could explain why we were unable to get a ternary complex resembling a Michaelis complex at a structural level. Furthermore, this could also explain why UDP-Gal is a poor substrate, and UDP-Glc mimetics are inhibitors of AaNGT.
To gain insight into the role of the AAD residues of AaNGT involved in peptide recognition, we tested Ala mutations of Arg177, His214, and Asp215 to Ala residues. The resulting mutants were characterized at the in vitro level under the same conditions used for the wild-type enzyme. The results showed that R177A and D215A are inactive while H214A exhibit an 11-fold and 27-fold decrease in activity and catalytic efficiency compared to those of the WT. Additionally, slight variations were observed regarding the Kms of the peptide (Fig. 3c and Supplementary Table 1). A triple mutant to Ala residues, as expected, was also inactive (Fig. 3c).
In summary, the structural analysis shows that NGT shares two critical and conserved residues among NGTs (Supplementary Fig. 1), Arg177 and Asp215, in the AAD domain that recognize Thr and likely Ser at the +2 position of the Asn0-X+1-Ser/Thr+2 sequence motif. The Asn at the acceptor position 0 in the sequence motif (Asn30) is located in front of the UDP pyrophosphate, although its side chain does not participate in interactions with any residue of AaNGT that could potentially serve as a catalytic base. Consequently, our ternary complex does not supply any evidence on how the Asn is glycosylated and thus no insights into the NGT catalytic mechanism could be inferred.
Modeling the Michaelis complex
To get insight into the catalytic mechanism of AaNGT, we turned to computer simulation using the structures determined in the present work. As pointed out above, the Michaelis complex, i.e. the complex of AaNGT with UDP-Glc and the peptide acceptor, cannot be reconstructed by structural superposition of the ternary and binary complexes (AaNGT-UDP-Gal and AaNGT-peptide, respectively) due to strong steric clash between the sugar and the peptide (Fig. 3b), which precludes using structural superposition to start Molecular dynamics (MD) simulations. Therefore, we used molecular docking to insert UDP-Glc in the binding site of the AaNGT-peptide binary complex. This resulted in a structure with no steric clash that provided a very good starting point for MD simulation.
Interestingly, the binding pose of UDP-Glc in the modeled Michaelis complex was found to be remarkably similar to that observed for the UDP-GlcNAc donor in O-GlcNAc transferase (OGT). O-GlcNAc transferase is an inverting glycosyltransferase with a GT-B fold highly similar to that of AaNGT. It is noteworthy that despite OGT catalyzing a distinct reaction (O-glycosylation), both enzymes are classified in the same CAZy family, GT41, owing to their similarities at the catalytic domain level36. MD simulations for up to 400 ns (Supplementary Fig. 5) were performed to further accommodate the UDP-Glc and the acceptor peptide in the active site. As shown in Fig. 4a, the hydroxyl substituent groups of the donor glucose are engaged in hydrogen bond interactions with the side chain of Asn520 and the backbone of Leu368. The interaction of the 2-OH and the β-phosphate group could explain why the enzyme cannot recognize UDP−2F-Glc in a productive manner. Likewise, the interactions between OH3 and OH4 of Glc with Leu368 could explain the poor interaction with UDP-Gal.
The simulations showed that the peptide Asn residue (Asn30) is located on the β face of the donor sugar, opposite to the sugar-phosphate bond, with the amino group close to the Glc anomeric carbon (N-C1 ≈ 3.7 Å, see Fig. 4b). It was previously suggested that the Asn uses a twisted form of the amide group as a prerequisite to enhance the nucleophilicity of the amine group in order to facilitate its attack on the sugar anomeric carbon in OST15,25,26. Such twist of the amide was not observed in AaNGT, which shows the amide of the Asn in its most common planar conformation (Fig. 4). The different architectures of the two active sites, in particular the lack of metal-coordinating residues in NGTs, is likely the cause of this difference.
As inferred from the crystal structure complexes, no residue serving as a general catalytic base in the anticipated SN2 reaction could be identified. This was attributed to the lack of hydrogen bonding between the amino group of Asn30 and a nearby amino acid that could potentially act as a catalytic base. However, the amino group forms a persistent hydrogen bond interaction with one of the negatively charged oxygen atoms of the α-phosphate (H··Oα ≈ 1.8 Å, Fig. 4c). This suggests that the α-phosphate could deprotonate Asn30 during the SN2 reaction. In fact, the α-phosphate has been proposed to be the general base in other glycosylation mechanisms, such as the O-glycosylation mechanism of the closely related O-GlcNAc transferase (OGT) and the recently investigated plant protein O-fucosyltransferase SPINDLY (in both cases, the α-phosphate is presumed to deprotonate the hydroxyl group of Ser/Thr)34,37. The possible catalytic base character of the α-phosphate is further supported by NMR and computational experiments that demonstrate that the pKa of the α-phosphate in UDP, UDP-GlcNAc, and UDP-S-GlcNAc is ∼6.538,39.
Modeling the N-glycosylation reaction
To model the reaction mechanism of AaNGT, we selected one representative snap-shot of the classical MD simulation and performed quantum mechanics/molecular mechanics (QM/MM) MD simulations (for 5 ps), using a QM region that included most of the donor molecule (Glc and the two phosphate groups), and the side chain of Asn30 (44 QM atoms, 106371 MM atoms). The active site remained in a similar configuration as in the previous MD simulations, in which the Asn30 amino group is engaged in a hydrogen bond with the α-phosphate. Afterwards, we tried to model the N-glycosylation reaction using QM/MM metadynamics, as well as other enhanced-sampling methods (see Methods) previously used to study glycosylation reactions40–43. Unfortunately, all our attempts to obtain a feasible reaction pathway failed. Even though the Asn30 residue became glycosylated during the simulation (Supplementary Fig. 6), the free energy barrier of the reaction was found to be huge (>50 kcal/mol), indicating that N-glycosylation via the amide side chain of Asn30 is not feasible. We reasoned that this is due to the particular orientation of Asn3p with respect to the sugar donor, which results in an unfavorable stereochemistry for the approach of the nucleophile (<N-C1-OP ≈ 115°, very far from the optimum value of 180°).
Surprisingly, one of the attempts to model the N-glycosylation reaction resulted in tautomerization of the amide group of Asn30 via one oxygen atom of the α-phosphate. In other words, one of the amide protons transferred to Oα and, subsequently, the amide carbonyl abstracted the proton from Oα, resulting in the imidic form of Asn30. It was interesting to observe that the N atom of the imidic Asn is better poised for nucleophillic attack than the Michaelis complex with Asn30 in the amide form. In particular, the NAsn-C1-O1 angle in the imidic Asn increases by ≈ 15° (from 115° to 130°) with respect to the angle in the amide complex (Supplementary Fig. 8). At the same time, the hydrogen atom of the Asn30 hydroxyl group forms a hydrogen bond with the α-phosphate, favoring proton transfer. These results made us think that the imidic form of Asn30, rather than the most common amidic form, could be operative in the N-glycosylation reaction.
To evaluate the possibility of N-glycosylation via the imidic form of Asn30, we performed QM/MM metadynamics44 simulations of the chemical reaction, starting from the imide form Asn30, which turned out to be stable in the active site (both in MD and QM/MM MD simulations). We used two collective variables corresponding to the main covalent bonds that need to be formed or broken during the reaction: the nucleophilic attack distance (N-C1) and the leaving group distance (C1-OP) (Fig. 5a). During the simulation, the system successfully evolved from the Michaelis complex (AaNGT + UDP-Glc + peptide) to the reaction products (AaNGT + UDP + Glc-peptide), in which Asn30 is glycosylated (Fig. 5b, c). The reaction free energy landscape (Fig. 5b) shows a unique transition state (TS), thus it is consistent with a concerted one-step SN2 reaction. The computed free energy barrier (24.9 kcal/mol) is still somewhat higher than the one estimated from the experimental rate constant (18.4 kcal/mol, assuming Transition State Theory)45, probably due to an imperfect position of the phosphate groups in the initial structures. However, it is similar to the one previously computed for OGT (23.5 kcal/mol)46. Most importantly, the free energy barrier is much reduced compared to the one obtained for the reaction via the amide form of Asn30 (> 50 kcal/mol), indicating that the reaction occurs preferably via the imidic form of Asn30.
Representative structures of the MC, transition state (TS), and reaction products (P) are shown in in Fig. 5d. Interestingly, a hydrogen bond between the α-phosphate and Trp4+1 is present at the MC, which probably contributes to position the peptide in the active site. This might explain recent experimental results that show that particular peptide sequences with a Trp at that position improves the glucosylation efficiency20. The computed reaction pathway also shows a TS in which the sugar-phosphate bond is being broken (C1-OP = 2.2 Å) and the sugar-Asn30 bond is being formed (C1-N = 2.1 Å) (Fig. 5c and Supplementary Table 3). Simultaneously, the proton of the imide hydroxyl group is being transferred, leading to the products of the reaction, which is lower in energy with respect to the reactants. Note that the structure of the reaction products does not depend on the tautomerization state of Asn30 at the initial state (MC).
Finally, we sought to elucidate the most likely mechanism of Asn tautomerization in the AaNGT active site. To this end, we performed QM/MM metadynamics simulations of the tautomerization process considering two possible scenarios: tautomerization mediated by the α-phosphate or tautomerization via active site water molecules (Supplementary Fig. 7). In both cases, two collective variables were used to drive the system from the amidic to the imidic form of the Asn30 side chain. Whereas tautomerization via the α-phosphate involves an energy barrier of 29.5 kcal/mol, the energy barrier reduces to 17.5 kcal/mol when Asn30 undergoes tautomerization via water molecules. This indicates that asparagine tautomerization in the active site is feasible and it is mediated by active site water molecules that are properly positioned for proton shuttle.
In summary, our simulations suggest that the N-glycosylation reaction in AaNGT, and probably other NGTs, takes place via a SN2 reaction in which the Asn30 in its imidic form attacks the sugar donor anomeric carbon. The nucleophilic attack is assisted by proton transfer to the α-phosphate, which can be considered as the general base of the N-glycosylation reaction in AaNGT (Fig. 5e).
Discussion
N-glycosylation is the most common post-translational modification (PTM) of proteins47. However, it is intriguing to understand why Nature has chosen the amide group of Asn residues, which is one of the least reactive nucleophiles in proteins, for glycosylation. Although the exact reason for selecting Asn residues for glycosylation remains unknown, we can investigate the mechanistic strategies utilized by various types of GTs to accomplish this PTM, which have arisen throughout evolution and are widespread in eukaryotes and certain bacteria. These GTs include single or complex membrane-bound oligosaccharyltransferases (OSTs) and soluble N-glycosyltransferases (NGTs).
In this study, we adopted a multidisciplinary approach to shed light on the mechanism of N-glycosylation catalyzed by NGTs. We focused on AaNGT, an inverting GT that catalyzes the transfer of Glc from UDP-Glc to Asn residues in the consensus sequence Asn-X-Ser/Thr. We provide high-resolution crystal structures for the enzyme in complex with a poor donor substrate (UDP-Gal) and a donor mimic (UDP-2F-Glc), as well as the enzyme in complex with a peptide acceptor (FGNWTT) and UDP, together with kinetic/ITC experiments and QM/MM metadynamics simulations of the reaction mechanism. Our results indicate that Thr residues at positions +2 and +3 are the main determinants of peptide-NGT interactions and the AAD domain recognizes these residues. Donor substrates less favorable than UDP-Glc, like UDP-Gal and inhibitors such as UDP-Glc mimetics, adopt unproductive conformations that compete with the peptide substrate, ultimately leading to decreased or hindered catalysis. These results are confirmed by MD simulations of the reconstructed Michalis complex (Fig. 4a), which show a very different conformation of the Glc in UDP-Glc compared with the ones observed for UDP-Gal and UDP-2F-Glc (Supplementary Fig. 9). The modeled MC shows that, as expected from the ternary complex structure, the side chain of the acceptor asparagine is not engaged in any interaction with a basic protein residue that could act as a general base in the reaction. However, the amino group of the amide side chain interacts with the α-phosphate of UDP, suggesting a pathway for acceptor deprotonation that is reminiscent of the one proposed for the closely related O-GlcNAc transferase (OGT), an inverting GT from the same GT family that shares the GT-B architecture37,46,48.
QM/MM metadynamics simulations of the reaction mechanism showed that the reaction starting from the most common amide of Asn0 involves a high energy barrier, which we attribute to a highly restrained nucleophilic attack stereochemistry. However, Asn0 can tautomerize in the active site and adopt the imidic form, which exhibits a more favorable configuration for the nucleophilic attack, resulting in a significant decrease of the reaction free energy barrier of the reaction. Our findings indicate that tautomerization of Asn residues is crucial for glycosylation in NGTs. Recently, asparagine tautomerization has also been invoked to mediate or initiate glycosylation reactions in other carbohydrate-active enzymes, such as protein O-fucosyltransferase 1 (POFUT1)49, cellulase Cel45A50 and GH85 endo-β-glucosaminidases51.
Another significant discovery from our study is the identification of the α-phosphate as the base responsible for deprotonating the acceptor Asn. This differs from the proposed mechanism for OSTs, which involves a twisted amide promoted by the binding of the amide group to metal-coordinated acidic residues15, but is similar to the one proposed37 (and confirmed by QM/MM simulations46) for O-glycosylation catalyzed by OGT. Additionally, we found that NGTs recognize Thr+2 differently from OSTs15,26, suggesting that the two enzymes might employ distinct mechanisms to achieve N-glycosylation.
NGTs are not only interesting for their intriguing structural and mechanistic aspects, but also for their biotechnological properties as possible therapeutic targets for treating infectious diseases. In terms of biotechnological applications, NGTs have been targeted to synthesize more homogeneous N-glycans in combination with endo-β-N-acetylglucosaminidases (ENGases) and synthetic oligosaccharides. This approach has proven useful in enhancing the stability of therapeutic peptides such as glucagon-like peptide 152. Moreover, NGTs have been employed alongside subsequent GTs to produce custom-made glycoproteins and glycoprotein-based nanomaterials with potential biomedical applications53. However, it is worth noting that NGTs primarily add glucose onto Asn residues, while GlcNAc residues are more desirable in certain cases for N-glycans in proteins from eukaryotes. Our structural and mechanistic insights might provide a potential solution to facilitate the engineering of NGTs to achieve N-GlcNAcylation. Furthermore, our work also opens up possibilities for synthetizing structure-based inhibitors to treat diseases caused by non-typeable H. influenzae or other Gram-negative bacteria, considering the high similarity between the active sites of NGTs.
In summary, our experimental and computational work reveals the molecular basis of UDP-Glc and peptide recognition by AaNGT and propose that the enzyme follows a concerted single-displacement mechanism using one UDP phosphate group as general base. Additionally, we highlight the importance of the tautomeric form of Asn acceptor residues as a necessary step for glycosylation. This study exemplifies how GTs employ different strategies to activate less reactive nucleophilic groups such as Asn residues to achieve glycosylation.
Methods
Cloning and purification of AaNGT
The DNA sequence encoding amino acid residues of the AaNGT protein (aa 1-621) was codon optimized and synthesized by GenScript (USA) for expression in E. coli cells. This construct was a gift from Dr. Min Chen at Shandong University30. The construct was subcloned in pMALC2x, rendering the vector pMALC2x-10Hist-PP-AaNGT. The plasmid contained a sequence encoding a 10xHis tag and a PreScission protease (PP) cleavage site between the maltose binding protein (MBP) and the protein of interest. AaNGT mutants (R177A, H214A, D215A, and R177A-H214A-D215A) were generated by GenScript via site-directed mutagenesis using the vector pMALC2x-10HistTAG-PP-AaNGT.
The plasmids were transformed into BL21 (DE3) Gold cells and colonies were selected on LB/Agar plates containing 100 μg/ml of ampicillin. The cultures were grown at 37 °C in 2XTY medium (16 g/l tryptone, 10 g/l yeast extract powder, 5 g/l NaCl, pH 7.5) containing 100 μg/ml of ampicillin. When the optical density of the cultures reached 0.6, they were induced with 1 mM IPTG (isopropyl β-d-thiogalactoside) and incubated at 18 °C for 18 h. The cells were harvested by centrifugation at 17,700 × g at 4 °C for 10 min and resuspended in buffer A (25 mM TRIS pH 7.5, 300 mM NaCl, 10 mM Imidazole). The protein was then loaded onto a His-Trap column (GE Healthcare) and eluted with an imidazole gradient from 10 mM to 500 mM (buffer B: 25 mM TRIS pH 7.5, 300 mM NaCl, 500 mM imidazole). Buffer exchange to buffer C (25 mM TRIS pH 7.5, 150 mM NaCl) was carried out using a HiPrep 26/10 Desalting Column (GE Healthcare). To remove the MBP, the protein PP was added to the fusion protein and the mix was incubated at 4 °C for 18 h. The cleavage of the fused protein was confirmed with a SDS-page gel and the MBP was removed with a His-Trap Column. The fractions containing the AaNGT protein were then concentrated to ~2.5 mL using centrifugal filter units of 30,000 MWCO cutoff (Millipore). Subsequently, gel filtration chromatography was carried out using Superdex 75 XK26/60 column (Cytiva) in buffer C to further remove impurities. The protein was then concentrated once again and protein concentration was measured by absorbance at 280 nm and by using its theoretical extinction coefficient (ε280nmAaNGT = 67270 M−1cm−1). Note that a similar extinction coefficient was used to determine the concentration for the mutants. If the protein was used to produce crystals, it was buffer exchanged to buffer D (25 mM TRIS pH 7.5) prior to get concentrated. Additionally, the purity of the proteins was confirmed by SDS-PAGE.
Isothermal titration microcalorimetry
ITC was used to characterize the interaction of AaNGT with different ligands. All experiments were carried out in an Auto-iTC200 (Microcal, GE Healthcare) at 25 °C with AaNGT at 100 μM and UDP at 1.2 mM, UDP-2F-Glc/UDP-5S-Glc at 4 mM and UDP-Gal at 3.5 mM in 25 mM TRIS pH 7.5 150 mM NaCl. All the experiments were repeated at least two times independently with similar results, and one representative plot with the derived dissociation constant KD and standard error of fitting for each experiment is shown (Fig. 1d, e, and Supplementary Fig. 2 and Table 2). Data integration, correction, and analysis were carried out in Origin 7 (Microcal) and the data were fitted to a one-site equilibrium-binding model. Stoichiometry (n) of binding in all cases was ~1:1.
Kinetic analysis
Enzyme kinetics for the wild type and mutants were determined using the UDP-Glo luminescence assays (Promega). Initially, the wild type and mutants were tested in reactions containing 500 nM of enzyme in 25 mM Tris pH 7.5, 150 mM NaCl, 500 μM of peptide FGNWTT and 500 μM of UDP-Glc. To determine the Km of UDP-Glc, the reaction contained 500 μM of the peptide and UDP-Glc concentrations ranging from 5 μM to 1 mM. To measure the Km of the peptide, the reaction contained 500 μM UDP-Glc and the peptide concentrations ranging from 5 μM to 1 mM. To assess the different nucleotides (UDP-Glc, UDP-Gal, UDP-2F-Glc, and UDP-5S-Glc), reactions contained 500 nM of the wild-type enzyme in 25 mM Tris pH 7.5, 150 mM NaCl, and 500 μM of the sugar nucleotides and the peptide FGNWTT. The reactions were incubated for 30 min at 37 °C and stopped using 5 μl of UDP-detection reagent at a 1:1 ratio in a white, opaque 384-well plate and incubated in the dark for 1 h at room temperature before measuring with a Synergy HT (Biotek). To estimate the amount of UDP produced in the glycosyltransferase reaction, we created a UDP standard curve. The values were corrected against the UDP-Glc (or other sugar nucleotides when applicable) hydrolysis and were fit to a non-linear Michaelis–Menten program in GraphPad Prism 8 software from which the Km, kcat, and Vmax along with their standard errors were obtained. All experiments were performed in duplicate.
Solid-phase peptide synthesis
The peptide was synthesized by stepwise microwave-assisted solid-phase synthesis on a Liberty Blue synthesizer using the Fmoc strategy on Rink Amide MBHA resin (0.1 mmol). All other Fmoc amino acids (5.0 equiv.) were automatically coupled using oxyma pure/DIC (N,N′-diisopropylcarbodiimide). The peptide was then released from the resin, and all acid-sensitive sidechain protecting groups were simultaneously removed using TFA 95%, TIS (triisopropylsilane) 2.5% and H2O 2.5%, followed by precipitation with cold diethyl ether. The crude products were purified by HPLC on a Phenomenex Luna C18(2) column (10 μm, 250 mm × 21.2 mm) and a dual absorbance detector, with a flow rate of 10 mL/min.
Peptide preparation
The peptide used in this work was dissolved at 100 mM in buffer 25 mM Tris-HCl pH 7.5. The pH of each solution was measured with pH strips and when needed adjusted to pH 7–8 through the addition of 0.1–5 μL of 2 M NaOH.
Crystallization
Crystals of the AaNGT complexes were grown by sitting drop experiments at 18 °C by mixing 0.4 μl of protein solution (17 mg/mL AaNGT and 5 mM ligands in buffer D “25 mM TRIS pH 7.5”) with an equal volume of a reservoir solution. The crystals for the different complexes were obtained in different conditions. The crystals for the AaNGT-UDP-peptide complex were obtained in a 22% polyacrylic acid 5100 sodium salt, 100 mM HEPES sodium salt pH 7. We further soaked these crystals with the same condition saturated with peptide (a tiny little amount of solid was dissolved in the drop with the crystals) and 25 mM UDP for 30 min before flash freezing in a cryoprotectant solution containing 20% ethylene glycol. The crystals for the AaNGT-UDP-Gal complex were obtained in 0.1 M magnesium chloride, 0.1 M Na HEPES pH 7.5, 10% (w/v) PEG 4000 solution. These crystals were further soaked 5 min with 37.5 mM UDP-Gal prepared in buffer D and flash frozen in the cryoprotectant solution. The crystals for the AaNGT-UDP-2F-Glc complex were obtained in a 0.1 M calcium acetate, 0.1 M sodium acetate pH 4.5, 10% (w/v) PEG 4000 solution. We further soaked these crystals with 30 mM UDP-2F-Glc for 5 min before flash freezing in the cryoprotectant solution.
Structure determination and refinement
Diffraction data for the three crystals of AaNGT were collected on synchrotron beamlines I03 of the Diamond Light Source (Harwell Science and Innovation Campus, Oxfordshire, UK) and XALOC beamline at the ALBA synchrotron (Barcelona, Spain) at a wavelength of 0.97 Å and a temperature of 100 K. XDS54 and CCP4 software packages55 were used for data processing and scaling. Relevant statistics are presented in Table 1. Molecular replacement with Phaser55 and PDB entry 3Q3E as a template was used to solve the crystal structures. Initial phases were further improved by cycles of manual model building in Coot56 and restrained refinement with REFMAC555. Further rounds of model building in Coot with TLS refinement in REFMAC5 were performed for all complexes. The crystal structures were validated with PROCHECK and model statistics are presented in Table 1. The Ramachandran plot for the AaNGT-UDP-Gal complex shows that 95.1%, 4.6%, 0.1%, and 0.2% of the amino acids are in most favored, allowed, generously allowed and disallowed regions, respectively. The Ramachandran plot for the AaNGT-UDP-2F-Glc complex shows that 90.6%, 8.8%, 0.4%, and 0.2% of the amino acids are in most favored, allowed, generously allowed and disallowed regions, respectively. The Ramachandran plot for the AaNGT-UDP-peptide complex shows that 90.1%, 8.6%, 1.0%, and 0.3% of the amino acids are in most favored, allowed, generously allowed and disallowed regions, respectively. The asymmetric unit of the P212121 crystals contained two molecules of AaNGT.
NMR
1H and 19F NMR spectra were recorded at 400 and 376 MHz using a Bruker AVANCE 400 Plus Nanobay in chloroform-d or deuterium oxide. 13C and 31P NMR spectra were recorded at 101 and 162 MHz with the same instruments in chloroform-d or deuterium oxide. Chemical shifts are given in ppm (δ) and referenced to tetramethylsilane or to the internal solvent signal used as an internal standard. Assignments in the NMR spectra were made by first-order analysis of the spectra, and were supported by 1H−1H COSY, 1H−13C HMQC correlation results. High-resolution mass spectrometry was performed on a Waters Synapt G2-S HDMS spectrometer. Unless otherwise stated, all the commercially available solvents and reagents were purchased from FUJIFILM Wako Pure Chemical Corporation and Merck KGaA without further purification. During purification by silica gel column chromatography, the absorbances of all fractions were measured at 262 nm using a JASCO UV-2075 Plus detector.
Synthesis of UDP-2F-Glc and UDP-5S-Glc
Synthesis of UDP-2F-Glc (7) and UDP-5S-Glc (14) and their precursor species (1-6 and 8-13; see Supplementary Figs. 3 and 4) was performed as described below.
1,3,4,6-Tetra-O-acetyl-β-D-mannopyranose 1 (200 mg, 574 μmol) was dissolved in 1,4-dioxane (5.00 mL) and cooled at 0 °C. The solution was added (diethylamino)sulfur Trifluoride (220 μL, 1.68 mmol) and the mixture was then heated at 100 °C by irradiating microwave for 5 min. The mixture was diluted with dichloromethane and washed with ice water, aqueous sodium hydrogen carbonate and brine, dried over anhydrous sodium sulfate, filtered, and evaporated. The residue was purified by silica gel chromatography with 5:1 to 3:1 (v/v) hexane:ethyl acetate to give compound 2 (1,3,4,6-Tetra-O-acetyl-2-deoxy-2-fluoro-β-D-glucopyranose; 182 mg, 91%); 1H NMR (400 MHz, CDCl3) δ 5.77 (dd, 1H, J1,2 = 8.1 Hz, J1,F = 3.1 Hz, H-1), 5.37 (dt, 1H, J3,F = 14.4 Hz, J2,3 = 9.1 Hz, J3,4 = 9.2 Hz, H-3), 5.06 (t, 1H, H-4), 4.45 (ddd, 1H, J2,3 = 9.1 Hz, J2,F = 50.8 Hz, H-2), 4.29 (dd, 1H, H-6a, J6a,6b = 12.6 Hz), 4.14 (dd, 1H, H-6b), 3.86 (ddd, 1H, H-5), 2.18 (s, 3H, Ac), 2.09 (s, 3H, Ac), 2.08 (s, 3H, Ac), 2.04 (s, 3H, Ac); 19F NMR (376 MHz, CDCl3) δ –200.9 (ddd, JF,2 = 53.7 Hz, JF,1 = 2.9 Hz, JF,3 = 15.2 Hz).
Compound 2 (101 mg, 289 μmol) was dissolved in DMF (900 μL) added ammonium carbonate (233 mg, 2.42 mmol). The mixture was then stirred at 20 °C for 8 h. The mixture was diluted with ethyl acetate and washed with water and brine, dried over anhydrous sodium sulfate, filtered, and evaporated. The residue was purified by silica gel chromatography with 6:1 to 3:2 (v/v) hexane:ethyl acetate to give compound 3 (3,4,6-Tri-O-acetyl-2-deoxy-2-fluoro-α/β-d-glucopyranose; 74.1 mg, 83%); 1H NMR (400 MHz, CDCl3) δ 5.59 (dd, 1H, J3,F = 12.1 Hz, J3,4 = 9.5 Hz, H-3α), 5.48 (d, 1H, J1,2 = 3.6 Hz, H-1α), 5.32 (dt, 1H, J3,F = 14.1 Hz, J2, 3 = 5.4 Hz, J3,4 = 5.4 Hz, H-3β), 5.04 (m, 2H, H-4α, H-4β), 4.92 (dd, 1H, J1,2 = 7.6 Hz, J1,F = 2.7 Hz, H-1β), 4.52 (ddd, 1H, J2,F = 49.4 Hz, J2, 3 = 9.5 Hz, H-2α), 4.30–4.09 (m, 6H, H-2β, H-5α, H-6αa, H-6αb, H-6βa, H-6βb), 3.94 (s, 1H, OH-1β), 3.77 (dq, 1H, H-5β), 3.55 (s, 1H, OH-1α), 2.09 (s, 3H, Ac), 2.09 (s, 3H, Ac), 2.08 (s, 6H, Ac), 2.04 (s, 3H, Ac), 2.04 (s, 3H, Ac).
Compound 3 (42.7 mg, 139 μmol) and DMAP (102 mg, 834 μmol) were co-evaporated with dry toluene. The mixture was dissolved in dry dichloromethane (1.30 mL) and cooled at –10 °C. The solution was then added diphenyl chlorophosphate (86.0 μL, 416 μmol), and the mixture was stirred at 20 °C for 30 min. The mixture was diluted with dichloromethane and washed with ice water, aqueous 1 M HCl, aqueous sodium hydrogen carbonate and brine, dried over anhydrous sodium sulfate, filtered, and evaporated. The residue was purified by silica gel chromatography with 5:1 to 2:1 (v/v) hexane:ethyl acetate to give compound 4 (3,4,6-Tri-O-acetyl-2-deoxy-2-fluoro-α-d-glucopyranosyl diphenylphosphate; 29.0 mg, 39%,); 1H NMR (400 MHz, CDCl3) δ 7.40-7.22 (m, 10H, Ph), 6.14 (dd, 1H, J1,2 = 3.6 Hz, J1,F = 6.6 Hz, H-1), 5.56 (dt, 1H, J3,F = 11.8 Hz, J3,4 = 9.6 Hz, H-3), 5.07 (t, 1H, J4,5 = 10.0 Hz, H-4), 4.62 (dq, 1H, J2,F = 48.4 Hz, J2,3 = 9.6 Hz, H-2), 4.15 (dd, 1H, J6a,6b = 12.7 Hz, J5,6b = 3.9 Hz, H-6b), 4.00 (m, 1H, H-5), 3.79 (dd, 1H, J5,6a = 2.1 Hz, H-6a), 2.01 (s, 3H, Ac), 2.05 (s, 3H, Ac), 2.03 (s, 3H, Ac); 13C[1H] NMR (101 MHz, CDCl3) δ 170.5, 170.00, 169.5 (COCH3×3), 130.1, 130.0, 125.9, 125.9, 120.5, 120.4, 120.2, 120.2 (Ph), 94.7 (dd, C-1, J1,P = 5.4 Hz, J1,F = 22.6 Hz), 86.6 (dd, C-2, J2,P = 8.4 Hz, J2,F = 198 Hz), 70.0 (d, C-3, J3,F = 19.5 Hz), 69.6 (C-5), 67.0 (d, C-4, J4,F = 7.4 Hz), 60.9 (C-6), 20.8, 20.7, 20.6 (COCH3).
Compound 4 (28.4 mg, 52.6 μmol) was dissolved in ethyl acetate/methanol (1:1, 1.00 mL) and the flask was filled with argon. The solution was added Platinum oxide (IV) (15.2 mg) and the mixture was then stirred at 20 °C for 4 h in hydrogen atmosphere. The mixture was filtered through celite, and the filtrate was added triethylamine (16.0 μL, 115 μmol) and stirred for 30 min. The solution was evaporated to give a syrup including compound 5 (Bis (triethylammonium) 3,4,6-tri-O-acetyl-2-deoxy-2-fluoro-α-d-glucopyranosyl phosphate).
The mixture containing compound 5 (31.0 mg, 52.6 μmol) was co-evaporated with pyridine and dissolved in pyridine (500 μL). The solution was added to UMP-morpholidate (54.4 mg, 79.2 μmol), which evaporated with pyiridine, in the reaction flask. Moreover, 1H-tetrazole (13.4 mg, 191 μmol) was co-evaporated with pyridine and dissolved in pyridine (500 μL). The solution was transferred to the reaction flask by a syringe and the mixture was then stirred at room temperature for 39 h. The mixture was evaporated and the residue was purified by silica gel chromatography with 9:1 to 6:1 (v/v/v) acetonitrile:water. The absorbance of each fraction was measured at 262 nm and the combined fractions were evaporated. The residue was purified by silica gel chromatography with 15:2:1 to 6:2:1 (v/v/v) ethyl acetate:methanol:water to give compound 6 (Uridine 5′-diphospho-3,4,6-tri-O-acetyl-2-deoxy-2-fluoro-α-d-glucopyranose; 25.7 mg, 70%); 1H NMR (400 MHz, D2O) δ 7.97 (d, 1H, J5′′,6′′ = 8.1 Hz, H-6′′), 5.95–5.91 (m, 2H, H-1′, H-5′′), 5.81 (dd, 1H, J1, 2 = 3.6 Hz, J1, F = 7.4 Hz, H-1), 5.43 (dt, 1H, J3, F = 12.1 Hz, J2, 3 = 9.4 Hz, J3, 4 = 9.4 Hz, H-3), 5.03 (t, 1H, H-4), 4.79–4.65 (m, 1H, H-2), 4.38–4.37(m, 2H, H-2′, H-3′), 4.34 (s, 1H, H-5), 4.25–4.22 (m, 2H, H-4′, H-5′), 4.17–4.09 (m, 2H, H-6a, H-6b); 13C[1H] NMR (101 MHz, D2O) δ 171.4 (COCH3), 170.9 (COCH3), 170.6 (COCH3), 164.0 (C-4′′), 149.5 (C-2′′), 139.5 (C-6′′), 100.4 (C-1′), 89.9 (dd, J1,P = 5.2 Hz, J1,F = 22 Hz, C-1), 86.4 (C-5′′), 84.5 (dd, J2,P = 8.6 Hz, J2,F = 194 Hz, C-2), 80.8 (d, J4′,P = 9.3 Hz, C-4′), 71.8 (C-2′), 68.8 (d, J3,F = 19.4 Hz, C-3), 67.2 (C-3′), 65.8 (C-5), 65.4 (d, J4,F = 7.5 Hz, C-4), 62.7 (d, J5′,P = 5.3 Hz, C-5′), 59.3 (C-6), 18.0 (COCH3), 17.9 (COCH3), 17.8 (COCH3); 19F NMR (376 MHz, D2O) δ –200.9 (ddd, JF,2 = 53.7 Hz, JF,1 = 2.9 Hz, JF,3 = 15.2 Hz); 31P NMR (162 MHz, D2O) δ –11.59 (d, JP,P = 17.5 Hz), –13.89 (d, JP,P = 15.7 Hz).
Compound 6 (18.9 mg, 27.2 μmol) was dissolved in triethylamine/methanol/water (1:2:2, 5.0 mL) and stirred at −20 °C for 12 h. The solution was then evaporated and lyophilized to give compound 7 (Uridine 5′-diphospho-2-deoxy-2-fluoro-α-d-glucopyranose triethylammonium salt; 16.9 mg, 93%); 1H NMR (400 MHz, D2O) δ 7.88 (d, 1H, J5′′,6′′ = 8.0 Hz, H-6′′), 5.94–5.90 (m, 2H, H-1′, H-5′′), 5.74 (dd, 1H, J1,2 = 3.6 Hz, J1,F = 7.4 Hz, H-1), 4.46–4.29 (dd, 1H, H-2), 4.31 (m, 2H, H-2′, H-3′), 4.23–4.11 (m, 2H, H-4′, H-5′), 3.97 (dt, 1H, J2,3 = 3.6 Hz, J3,4 = 3.6 Hz, J3,F = 12.8 Hz, H-3), 3.86 (dq, 1H, H-5), 3.80 (dd, 1H, J6a,6b = 12.7 Hz, J5,6b = 3.9 Hz, H-6b), 3.72 (dd, 1H, J5,6a = 2.1 Hz, H-6a), 3.47 (t, 1H, H-4), 3.14 (q, 1.8H, (CH3CH2)3N), 1.21 (t, 3H, (CH3CH2)3N); 13C[1H] NMR (101 MHz, D2O) δ 167.8 (C-4′′), 153.0 (C-2′′), 141.5 (C-6′′), 102.7 (C-5′′), 92.6 (dd, C-1, J1,P = 5.5 Hz, J1,F = 22.6 Hz), 90.4–88.5 (dd, C-2, J2,P = 8.3 Hz, J2,F = 188 Hz), 88.5 (C-1′), 83.1 (d, C-4′, J4′ P = 9.0 Hz), 73.7 (C-2′), 72.7 (C-5), 71.0 (d, C-3, J3,F = 17.3 Hz), 69.7 (C-3′), 68.6 (d, C-4, J4,F = 7.9 Hz), 65.0 (d, C-5′, J5′,P = 5.4 Hz), 60.1 (C-6); 19F NMR (376 MHz, D2O) δ –200.38 (dd, JF,2 = 49.0 Hz, JF,1 = 12.9 Hz); 31P NMR (162 MHz, D2O) δ –11.71 (d, JP,P = 18.5 Hz), –13.45 (dd, JP,P = 18.7 Hz); HRMS (ESI/Q-TOF) m/z: [M − H]− Calcd for C15H22F1N2O16P2−: 567.0434; Found: 567.0447.
5-Thio-d-glucopyranose 8 (29.7 mg, 151 μmol) was dissolved in pyridine (1.80 mL) and added acetic anhydride (900 μL, 9.52 mmol). The mixture was stirred at 20 °C for 4 h. The mixture was then evaporated to give compound 9 (1,2,3,4,6-Penta-O-acetyl-5-thio-α/β-d-glucopyranose; 61.6 mg, quant.); 1H NMR (400 MHz, CDCl3) δ 6.15 (d, 1H, J1α,2 = 3.2 Hz, H-1α), 5.89 (d, 1H, J1β,2 = 8.6 Hz, H-1β), 5.44 (t, 1H, J3α,4 = 10.2 Hz, H-3α), 5.39–5.27 (m, 3H, H-4α, H-2β, H-4β), 5.24 (dd, 1H, J2α,3 = 10.2 Hz, H-2α), 5.11 (t, 1H, J3β,2 = 9.0 Hz, J3β,4 = 9.0 Hz, H-3β), 4.38 (dd, 1H, J6b,6a = 11.9 Hz, J6b,5 = 5.6 Hz, H-6αb) 4.31 (dd, 1H, H-6βb), 4.15 (dd, 1H, J6a,5 = 3.0 Hz, H-6βa), 4.06 (dd, 1H, H-6αb), 3.59 (dq, 1H, H-5α), 3.31 (dq, 1H, H-5β), 2.08 (s, 3H, Ac), 2.07 (s, 3H, Ac), 2.04 (s, 3H, Ac), 2.04 (s, 3H, Ac), 2.02 (s, 3H, Ac), 2.02 (s, 3H, Ac), 2.01 (s, 3H, Ac), 1.99 (s, 3H, Ac).
Compound 9 (48.8 mg, 120 μmol) was dissolved in DMF (500 μL) added hydrazine monohydrate (8.77 μL, 180 μmol) and acetic acid (10.2 μL, 179 μmol). The mixture was then stirred at 20 °C for 30 min. The mixture was diluted with ethyl acetate and washed with water, aqueous sodium hydrogen carbonate and brine, dried over anhydrous sodium sulfate, filtered, and evaporated. The residue was purified by silica gel chromatography with 3:1 to 1:1 (v/v) hexane:ethyl acetate to give compound 10 (2,3,4,6-Tetra-O-acetyl-5-thio-α/β-d-glucopyranose; 30.6 mg, 70%); 1H NMR δ (400 MHz, CDCl3) 5.53 (dt, 1H, J2α,3 = 9.5 Hz, J3α,4 = 9.5 Hz, H-3α), 5.29 (dd, 1H, J4α,5 = 10.8 Hz, H-4α), 5.25–5.20 (m, 2H, H-2β, H-4β), 5.16–5.13 (m, 2H, H-1α, H-2α), 5.14 (t, 1H, H-3β), 4.86 (d, 1H, J1β,2 = 9.0 Hz, H-1β), 4.37 (dd, 1H, J6b,6a = 12.0 Hz, J6b,5 = 4.9 Hz, H-6αb), 4.28 (dd, 1H, H-6βb), 4.13–4.09 (m, 1H, H-6βa), 4.07 (dd,1H, J6a,5 = 3.2 Hz, H-6αa), 3.68 (dq, 1H, H-5α), 3.23 (dq, 1H, H-5β), 2.07-2.00 (s, 24H, Ac).
Compound 10 (30.6 mg, 84.0 μmol) and DMAP (60.0 mg, 491 μmol) were co-evaporated with dry toluene. The mixture was dissolved in dry dichloromethane (1.30 mL) and cooled at –10 °C. The solution was then added diphenyl chlorophosphate (51.0 μL, 247 μmol), and the mixture was stirred at 20 °C for 30 min. The mixture was diluted with dichloromethane and washed with ice water, aqueous 1 M HCl, aqueous sodium hydrogen carbonate and brine, dried over anhydrous sodium sulfate, filtered, and evaporated. The residue was purified by silica gel chromatography with 4:1 to 2:1 (v/v) hexane:ethyl acetate to give compound 11 (2,3,4,6-Tetra-O-acetyl-5-thio-α-d-glucopyranosyl diphenylphosphate; 27.4 mg, 55%); 1H NMR (400 MHz, CDCl3) δ7.39–7.21 (m, 10H, Ph), 5.87 (dd, J1,2 = 2.9 Hz, 1H, H-1), 5.49 (t, 1H, H-3, J3,4 = 9.9 Hz), 5.30 (dd, 1H, H-4, J4,5 = 10.9 Hz), 5.15 (dt, 1H, H-2, J2,3 = 9.9 Hz), 4.32 (dd, 1H, H-6b, J6b,6a = 12.2 Hz, J6b,5 = 4.7 Hz), 3.89 (dd, 1H, H-6a, J6a,5 = 2.9 Hz), 3.41 (dq, 1H, H-5); 13C[1H] NMR (101 MHz, CDCl3) δ 170.5, 169.8, 169.7, 169.5 (COCH3×4), 130.0, 125.9, 120.7, 120.7, 120.4, 120.4 (Ph), 78.0 (d, J1,P = 7.2 Hz, C-1), 73.8 (d, J2,P = 6.1 Hz, C-2), 71.4 (C-4), 70.20 (C-3), 60.69 (C-6), 39.73 (C-5), 20.7, 20.6, 20.6, 20.3 (COCH3×4).
Compound 11 (27.4 mg, 45.9 μmol) was dissolved in ethyl acetate/methanol (1:1, 1.00 mL) and the flask was filled with argon. The solution was added Platinum oxide (IV) (17.1 mg) and the mixture was then stirred at 20 °C for 4 h in hydrogen atmosphere. The mixture was filtered through celite, and the filtrate was added triethylamine (16.0 μL, 115 μmol) and stirred for 20 min. The solution was evaporated to give a syrup including compound 12 (Bis (triethylammonium) 2,3,4,6-tetra-O-acetyl-5-thio-α-d-glucopyranosyl phosphate).
The mixture containing compound 12 (29.7 mg, 45.9 μmol) was co-evaporated with pyridine and dissolved in pyridine (500 μ). The solution was added to UMP-morpholidate (47.2 mg, 68.7 μmol), which evaporated with pyiridine, in the reaction flask. Moreover, 1H-tetrazole (11.3 mg, 161 μmol) was co-evaporated with pyridine and dissolved in pyridine (500 μL). The solution was transferred to the reaction flask by a syringe and the mixture was then stirred at room temperature for 42 h. The mixture was evaporated and the residue was purified by silica gel chromatography with 9:1 to 6:1 (v/v/v) acetonitrile:water. The absorbance of each fraction was measured at 262 nm and the combined fractions were evaporated. The residue was purified by silica gel chromatography with 9:2:1 to 7:2:1 (v/v/v) ethyl acetate:methanol:water to give compound 13 (Uridine 5′-diphospho-2,3,4,6-tetra-O-acetyl-5-thio-α-d-glucopyranose triethylammonium salt; 18.3 mg, 53%); 1H NMR δ (D2O, 400 MHz) 7.97 (d, 1H, H-6′′), 5.96–5.93 (m, 2H, H-1′, H-5′′), 5.55 (dd,1H, J1,2 = 2.9 Hz, J1,P = 7.0 Hz, H-1), 5.40 (t, 1H, J3,4 = 9.7 Hz, H-3), 5.26 (dd, 1H, J4,5 = 10.8 Hz, H-4), 5.16 (dt, 1H, H-2), 4.46 (dd, 1H, J6b,6a = 12.4 Hz, J6b,5 = 3.7 Hz, H-6b), 4.33–4.28 (m, 2H, H-2′, H-3′), 4.26–4.17 (m, 2H, H-4′, H-5′), 4.04 (dd, 1H, H-6a, J6a,5 = 2.6 Hz), 3.72 (dt, 1H, H-5); 13C[1H] NMR δ (101 MHz, D2O) 173.7 (COCH3), 172.9 (COCH3), 172.8 (COCH3), 166.2 (C-4′′), 151.8 (C-2′′), 141.9 (C-6′′), 102.6 (C-5′′), 88.6 (C-1′), 83.0 (d, J4′ P = 9.2 Hz, C-4′), 74.6 (d, J1,P = 9.2 Hz, C-1), 74.3 (d, J2,P = 7.1 Hz, C-2), 73.9 (C-2′), 72.0 (C-4), 71.4 (C-3), 69.5 (C-3′), 65.0 (d, J5′,P = 5.3 Hz, C-5′), 61.3 (C-6), 38.3 (C-5), 20.3 (COCH3), 20.0 (COCH3), 20.0 (COCH3); 31P NMR (162 MHz, D2O) δ –11.59 (d, JP,P = 16.4 Hz), –13.40 (d, JP,P = 10.3 Hz).
Compound 13 (29.6 mg, 39.4 μmol) was dissolved in triethylamine/methanol/water (1:2:2, 8.0 mL) and stirred at −20 °C for 12 h. The solution was then evaporated and lyophilized to give compound 14 (Uridine 5′-diphospho-5-thio-α-d-glucopyranose; 22.9 mg, 85%); 1H NMR (400 MHz, D2O) δ 8.04 (d, 1H, J5′′,6′′ = 8.1 Hz, H-6′′), 5.94–5.91 (m, 2H, H-1′, H-5′′), 5.54 (d, 1H, J1,2 = 5.0 Hz, H-1), 4.41 (dt, 1H, H-2), 4.35 (dd, 1H, H-2′), 4.30 (t, 1H, H-3′, J2′,3′ = 4.3 Hz, J2′,1′ = 4.3 Hz), 4.21 (m, 1H, H-4′), 4.00 (dt, 1H, H-5′b), (m, 1H, H-5′a), 3.88–3.86 (m, 1H, H-6ab), 3.77 (t, 1H, J2,3 = 9.6 Hz, J3,4 = 9.6 Hz, H-3), 3.60 (t, 1H, J4,5 = 10.0 Hz, H-4), 3.19–3.17 (m, 1H, H-5); 13C[1H] NMR (100 MHz, D2O) δ 166.3 (C-4′′), 151.9 (C-2′′), 102.6 (C-5′′), 88.4 (C-1′), 84.0 (d, C-4′, J4′,P = 8.7 Hz), 82.4 (C-2), 78.3 (d, C-1, J1,P = 3.5 Hz), 74.4 (C-3), 73.9 (C-2′), 71.4 (C-4), 70.0 (C-3′), 63.3 (C-5′, J5′,P = 4.4 Hz), 59.5 (C-6), 43.9 (C-5); 31P NMR (162 MHz, D2O) δ13.21 (d, JP,P = 19.7 Hz), 2.89 (s); HRMS (ESI/Q-TOF) m/z: [M − H]− Calcd for C15H23N3O16P2S1−: 581.0249; Found: 581.0237.
ESI-high-resolution mass spectrometry spectra are shown for UDP-2F-Glu (compound 7) and UDP-5S-Glc (compound 14) (see Supplementary Fig. 10).
Molecular dynamics simulations in explicit water
The X-ray structure obtained in this work (PDB 8P0Q), containing the enzyme and the acceptor FGNWTT peptide, was used in all simulations. In order to obtain a productive Michaelis complex, involving the donor UDP-glucose and the acceptor peptide, a docking procedure was needed, as the straightforward alignment of the crystal structures of the enzyme in complex with acceptor and donor (AaNGT + peptide and AaNGT + UDP-Gal or AaNGT + UDP-Glc-2F, respectively) was not possible (the sugar moiety completely clashed with residues from the peptide). The structure was prepared for docking by removing subunit B, as well as the UDP molecules. The protonation of titratable residues (Asp, Glc, His) at pH 7 was decided by visual inspection as well as the online software MolProbity.
Docking of UDP-Glc into the structure of AaNGT+peptide was performed with AutoDock Vina57 and AutoDockTools 1.5.6. A group of selected residues near the active site were made flexible to better accommodate the substrate. Rotation of all the chemical bonds of UDP-Glc was allowed except for the dihedrals of the phosphates, to better mimic the phosphate configuration of UDP-GlcNAc of O-GlcNAc transferase (OGT)58. The values of the box size were taken as 32 x 30 x 26 Å, with the center of the box at 47.959, 62.228, 56.623 Å. Only one hit was obtained, with an affinity of –13.8 kcal/mol, which was used to perform molecular dynamics (MD) simulations.
The ternary complex obtained from the docking procedure was then prepared for classical MD simulations. The necessary files were obtained with the AmberTools suite59. UDP charges were parametrized with Gaussian0960 and the antechamber program. The forcefields used for the protein, glucose, and UDP were F14SB61, GLYCAM0662, and GAFF63, respectively. TIP3P was used for water molecules64. The enzyme was solvated in a box of dimensions 97.868 × 112.646 × 114.217 Å, resulting in a total of 32129 water molecules. The system was neutralized by adding 18 sodium ions.
Amber20 was used to perform the MD simulations59. The simulation procedure was as follows. First, the solvent was minimized without the protein and substrate, using positional restraints. Then, the whole system was minimized. Afterwards, the system was heated up to 300 K in a step-wise manner. The density of the system was taken to approximately 1 g/cm3. Subsequently, MD in the NPT ensemble was performed, restraining the distance between the donor and the acceptor for 60 ns before the restraint was released. The simulation was enlarged up to 460 ns, from which the last 400 ns were taken as production run. The evolution of the protein and acceptor peptide RMSD is provided in Supplementary Fig. 5.
Additional MD simulations were performed considering Asn30 in its tautomeric (imidic acid) form, following the same protocol as the one described above. Smooth restraints were applied in the distance between the donor and acceptor as well as the dihedral angles of the donor sugar were restrained for the first 50 ns of the MD simulation, followed by 150 ns of unrestrained MD, which was taken as production run. The system was found to be stable, with the Asn30 side chain well oriented for nucleophilic attack for most of the run. The evolution of the protein and acceptor peptide RMSD is provided in Supplementary Fig. 11.
QM/MM MD simulations
A representative snap-shot from the classical MD simulations in which the amide side chain interacts with the α-phosphate was selected to start the QM/MM MD simulations. All QM/MM simulations were performed using CP2K 9.165 interfaced with PLUMED 2.866. The QM/MM simulations combined Born-Oppenheimer molecular dynamics at DFT level for the QM atoms, handled with the QM program QUICKSTEP65,67, with classical MD for the rest of the system (simulated with the CP2K MM program FIST65). The QM region consisted of 44 atoms (counting capping atoms), including the donor glucose, two pyrophosphates, and the acceptor Asn (divided at the Cβ). The MM region included 106371 atoms. The boundary between the QM and MM regions was handled with the use of link H atoms via the Integrated Molecular Orbital Molecular Mechanics (IMOMM) method68. The QM region was enclosed inside a cell of 11.989 x 12.882 x 18.059 Å, and it was treated with the PBE functional69, along with DFTD3 corrections70, employing the GPW (Gaussian and plane-waves) scheme. GTH pseudopotentials71 were used, and the Gaussian triple-ζ valence polarized (TZV2P) basis set. A plane-wave cut-off of 400 Ry was employed, together with a MD time step of 0.5 fs. The simulations were performed in the NVT ensemble, using a Nosé-Hoover thermostat to keep the temperature around 300 K. Non-bonded cut-off was 12 Å.
The initial snapshot from the classical MD was optimized using the conjugate gradient algorithm, then equilbrated for 5 ps before starting the metadynamics44,72 simulations of the reaction the mechanism. The collective variable (CV) chosen was the difference between the Asn(N)-C1 and UDP-C1 distances. An upper wall was placed on distances related to the nucleophilic attack and deprotonation in order to limit the phase space accessible for the simulation (parabolic-type, with a force constant of 150 in internal PLUMED units). The Gaussian height was 1.0 kcal/mol, the width 0.1 Å and the deposition pace was set at 80 MD steps. The system reached the reaction products after having deposited 600 Gaussian functions (24 ps in terms of simulation time). However, the energy barrier was found to be very high (>50 kcal/mol), indicating that the reaction is not feasible. Thus, we stopped the simulation at this point. Alternative trials using different collective variables gave a similar negative result. Representative states of the reaction can be found in Supplementary Fig. 6.
An alternative enhanced sampling method (OPES) was also used to model the glycosylation reaction, within the QM/MM formalism. OPES73 is a recently developed technique, related to metadynamics, that in its “explore” formulation is very useful for a fast inspection of possible reaction mechanisms. An QM/MM OPES simulation of the reaction was performed starting from the same structure as the previous QM/MM metadynamics simulation. The same collective variable including the distance difference between the main bonds to be broken/formed (C1-OP and N-C1) was used. Additionally, since we knew from the previous simulation that the α-phosphate abstracts the amide proton, we used another collective variable that includes the two distances involved in the proton transfer (N-H and H–Oα). The previously described upper wall on distances was used. The OPES parameters used were 25 kcal/mol for the energy barrier, and 80 steps for the MD pace (40 fs). Surprisingly, the simulations show that Asn30 rapidly undergoes tautomerization before performing the nucleophilic attack. Even though the simulation was only exploratory, it gave us the idea that Asn could react in its imidic form.
An unbiased QM/MM MD (7.2 ps) of the system with imidic Asn was performed to check its stability, before another QM/MM metadynamics simulation of the chemical reaction was launched. Two distances involving in the nucleophilic attack, N-C1 and leaving group departure (C1-OP) were used as collective variables. The same upper wall on distances was used. The metadynamics simulations were performed using a Gaussian height of 1.2 kcal/mol and a width of 0.1 (CV units) for both CVs, together with a pace of 80 MD steps (40 fs). A total of 2900 Gaussian functions were deposited, which in terms of real time amount to 116 ps. The simulation was stopped once the simulation crossed twice over the TS, as recommended for chemical reactions74. The location of the TS on the computed free energy landscape was further refined by committor analysis, using 30 independent replicas (18 reactants, 12 products).
To estimate the free energy required for the Asn to undergo tautomerization, we performed further QM/MM metadynamics simulations considering tautomerization via the α-phosphate (as observed serendipitously in a previous simulation). We used two collective variables; the difference in coordination numbers between H–Op and N-H (CV1) and the difference in coordination numbers between OAsn-H and H–Op (CV2). The simulations used a Gaussian height of 0.8 kcal/mol, a width of 0.04 Å for both CVs, and a deposition pace of 100 MD steps (50 fs). The simulation run for a total of 166 ps, until recrossing over the transition state took place. The computed free energy barrier (29.5 kcal/mol) indicated that tautomerization via the α-phosphate is not particularly favored (it is more difficult than tautomerization of amides in water solution75). Therefore, we launched a second simulation to assess whether tautomerization with the participation of water molecules was more likely. To this end, a larger QM region, including two active site water molecules (Wat1 and Wat2) and the Lys440 side chain (coordinating the pyrophosphate groups) was used. The QM/MM metadynamics simulation was performed using two CVs: CV1 = CN(N-H) + CN(HWat-OWat1) + CN(HWat2-OWat2) and CV2 = CN(H–OWat1) + CN(HWat1-OWat2) + CN(HWat2-O) (CN = coordination number). An upper wall was placed on distances related to the nucleophilic attack (not deprotonation) (parabolic-type, with a force constant of 150 in internal PLUMED units). The first CV is maximum when Asn is in its amide form, whereas the second one accounts for the imidic acid form. The simulation was performed using a Gaussian height of 1 kcal/mol and a width of 0.08 Å for both CVs, together with a deposition pace of 80 MD steps (40 fs). The simulation run for a total of 9.2 ps. In all cases, we used the following formula for coordination number (Eq. 1, see below):
1 |
Where CNij is the coordination number between atoms i and j, r the distance between them, n are set at 6 and 12 respectively, and r0 is 2.1 for bonds not involving H atoms and 1.2 for bonds that do involve them.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Source data
Acknowledgements
We thank the Diamond Light Source (Oxford, UK) synchrotron beamline I03 (experiment number MX20229-7) and the ALBA (Barcelona, Spain) synchrotron beamline XALOC. We thank the Aragón Foundation for Research & Development (ARAID), the Agency for Management of University and Research Grants of Catalonia (AGAUR, 2021-SGR-00680 to C.R.), the Agencia Estatal de Investigación (AEI; BFU2016-75633-P, PID2019-105451GB-I00 and PID2022−136362NB-I00 to R.H.-G., PID2021−127622OB-I00 to F.C., PID2020-118893GB-I00 and CEX2021-001202-M to C.R.), the European Research Council (ERC-2020-SyG-951231”Carbocentre” to C.R.) and Gobierno de Aragón (E34_R17, E35_17R, and LMP58_18) with FEDER (2014-2020) funds for “Building Europe from Aragón” for financial support. B.P. acknowledges AGAUR for a PhD scholarship (2020 FI_B 00423), and M.G. acknowledges Marie Skłodowska-Curie fellowship (grant agreement No 101034288) for financial support. The authors would like to thank the technical support provided by the Barcelona Supercomputing Center (BSC) and Red Nacional de Supercomputación (RES; application BCV-2023-2-0016) for computer resources at MareNostrum IV and CTE-Power supercomputers.
Author contributions
R.H.-G. designed the crystallization construct and solved the crystal structures. J.M.-L and B.V. performed the expression and purification of all proteins, the ITC experiments, and crystallized the complexes. A.G.-G. and V.T. performed the enzyme kinetics experiments. M.G. and I.C. synthetized the peptide. B.P. performed the computational experiments. A.M. and S.M. performed the synthesis of the UDP-Glc mimetics. R.H.-G. and C.R. wrote the article with mainly the contribution of F.C. and A.M. All authors read and approved the final manuscript.
Peer review
Peer review information
Nature Communications thanks Shutong Xu, Toru Saito, and Timothy Keys for their contribution to the peer review of this work. A peer review file is available.
Data availability
The crystal structures of the AaNGT-UDP-Gal, AaNGT-UDP-2F-Glc, and AaNGT-UDP-peptide complexes were deposited at the RCSB PDB with accession code, 8P0O, 8P0P, and 8P0Q, respectively. Previously published PDB structures used in this study are available under the accession codes: 3Q3E, 3Q3I, 3Q3H, and 7Y4I. Other data are available from the corresponding author upon request. The kinetics and ITC data generated in this study are provided in the source data file. Source data are provided as a Source Data file. Source data are provided with this paper.
Code availability
Data files of the classical MD simulation and QM/MM metadynamics simulations have been deposited in Zenodo (10.5281/zenodo.8081487).
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Change history
2/13/2024
A Correction to this paper has been published: 10.1038/s41467-024-45708-y
Contributor Information
Carme Rovira, Email: c.rovira@ub.edu.
Ramon Hurtado-Guerrero, Email: rhurtado@bifi.es.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-023-41238-1.
References
- 1.Aebi M. N-linked protein glycosylation in the ER. Biochim. Biophys. Acta. 2013;1833:2430–2437. doi: 10.1016/j.bbamcr.2013.04.001. [DOI] [PubMed] [Google Scholar]
- 2.Wacker M, et al. N-linked glycosylation in Campylobacter jejuni and its functional transfer into E. coli. Science. 2002;298:1790–1793. doi: 10.1126/science.298.5599.1790. [DOI] [PubMed] [Google Scholar]
- 3.Nothaft H, Szymanski CM. New discoveries in bacterial N-glycosylation to expand the synthetic biology toolbox. Curr. Opin. Chem. Biol. 2019;53:16–24. doi: 10.1016/j.cbpa.2019.05.032. [DOI] [PubMed] [Google Scholar]
- 4.Sun S, et al. N-GlycositeAtlas: a database resource for mass spectrometry-based human N-linked glycoprotein and glycosylation site mapping. Clin. Proteom. 2019;16:35. doi: 10.1186/s12014-019-9254-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Reily C, Stewart TJ, Renfrow MB, Novak J. Glycosylation in health and disease. Nat. Rev. Nephrol. 2019;15:346–366. doi: 10.1038/s41581-019-0129-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lauc, G., Trbojevic-Akmacic, I. The Role Of Glycosylation In Health And Disease. (Springer Cham, 2021).
- 7.Pinho SS, Reis CA. Glycosylation in cancer: mechanisms and clinical implications. Nat. Rev. Cancer. 2015;15:540–555. doi: 10.1038/nrc3982. [DOI] [PubMed] [Google Scholar]
- 8.Pinho SS, et al. Gastric cancer: adding glycosylation to the equation. Trends Mol. Med. 2013;19:664–676. doi: 10.1016/j.molmed.2013.07.003. [DOI] [PubMed] [Google Scholar]
- 9.Rodrigues JG, et al. Glycosylation in cancer: selected roles in tumour progression, immune modulation and metastasis. Cell. Immunol. 2018;333:46–57. doi: 10.1016/j.cellimm.2018.03.007. [DOI] [PubMed] [Google Scholar]
- 10.Carvalho S, et al. O-mannosylation and N-glycosylation: two coordinated mechanisms regulating the tumour suppressor functions of E-cadherin in cancer. Oncotarget. 2016;7:65231–65246. doi: 10.18632/oncotarget.11245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Chung CH, et al. Cetuximab-induced anaphylaxis and IgE specific for galactose-alpha-1,3-galactose. N. Engl. J. Med. 2008;358:1109–1117. doi: 10.1056/NEJMoa074943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lin CW, et al. A common glycan structure on immunoglobulin G for enhancement of effector functions. Proc. Natl. Acad. Sci. USA. 2015;112:10611–10616. doi: 10.1073/pnas.1513456112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Schjoldager KT, Narimatsu Y, Joshi HJ, Clausen H. Global view of human protein glycosylation pathways and functions. Nat. Rev. Mol. Cell. Biol. 2020;21:729–749. doi: 10.1038/s41580-020-00294-x. [DOI] [PubMed] [Google Scholar]
- 14.Bai L, Li H. Cryo-EM is uncovering the mechanism of eukaryotic protein N-glycosylation. FEBS J. 2019;286:1638–1644. doi: 10.1111/febs.14705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lizak C, Gerber S, Numao S, Aebi M, Locher KP. X-ray structure of a bacterial oligosaccharyltransferase. Nature. 2011;474:350–355. doi: 10.1038/nature10151. [DOI] [PubMed] [Google Scholar]
- 16.Mohanty S, Chaudhary BP, Zoetewey D. Structural Insight into the Mechanism of N-Linked Glycosylation by Oligosaccharyltransferase. Biomolecules. 2020;10:624. doi: 10.3390/biom10040624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Grass S, et al. The Haemophilus influenzae HMW1 adhesin is glycosylated in a process that requires HMW1C and phosphoglucomutase, an enzyme involved in lipooligosaccharide biosynthesis. Mol. Microbiol. 2003;48:737–751. doi: 10.1046/j.1365-2958.2003.03450.x. [DOI] [PubMed] [Google Scholar]
- 18.Grass S, Lichti CF, Townsend RR, Gross J, St Geme JW., 3rd The Haemophilus influenzae HMW1C protein is a glycosyltransferase that transfers hexose residues to asparagine sites in the HMW1 adhesin. PLoS Pathog. 2010;6:e1000919. doi: 10.1371/journal.ppat.1000919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Choi KJ, Grass S, Paek S, St Geme JW, 3rd, Yeo HJ. The Actinobacillus pleuropneumoniae HMW1C-like glycosyltransferase mediates N-linked glycosylation of the Haemophilus influenzae HMW1 adhesin. PLoS ONE. 2010;5:e15888. doi: 10.1371/journal.pone.0015888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kightlinger W, et al. Design of glycosylation sites by rapid synthesis and analysis of glycosyltransferases. Nat. Chem. Biol. 2018;14:627–635. doi: 10.1038/s41589-018-0051-2. [DOI] [PubMed] [Google Scholar]
- 21.Meng Q, et al. Probing peptide substrate specificities of N-glycosyltranferase isoforms from different bacterial species. Carbohydr. Res. 2019;473:82–87. doi: 10.1016/j.carres.2018.12.016. [DOI] [PubMed] [Google Scholar]
- 22.Lairson LL, Henrissat B, Davies GJ, Withers SG. Glycosyltransferases: structures, functions, and mechanisms. Annu. Rev. Biochem. 2008;77:521–555. doi: 10.1146/annurev.biochem.76.061005.092322. [DOI] [PubMed] [Google Scholar]
- 23.Schwarz F, Fan YY, Schubert M, Aebi M. Cytoplasmic N-glycosyltransferase of Actinobacillus pleuropneumoniae is an inverting enzyme and recognizes the NX(S/T) consensus sequence. J. Biol. Chem. 2011;286:35267–35274. doi: 10.1074/jbc.M111.277160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kowarik M, et al. Definition of the bacterial N-glycosylation site consensus sequence. EMBO J. 2006;25:1957–1966. doi: 10.1038/sj.emboj.7601087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wild R, et al. Structure of the yeast oligosaccharyltransferase complex gives insight into eukaryotic N-glycosylation. Science. 2018;359:545–550. doi: 10.1126/science.aar5140. [DOI] [PubMed] [Google Scholar]
- 26.Ramirez AS, Kowal J, Locher KP. Cryo-electron microscopy structures of human oligosaccharyltransferase complexes OST-A and OST-B. Science. 2019;366:1372–1375. doi: 10.1126/science.aaz3505. [DOI] [PubMed] [Google Scholar]
- 27.Kawai F, et al. Structural insights into the glycosyltransferase activity of the Actinobacillus pleuropneumoniae HMW1C-like protein. J. Biol. Chem. 2011;286:38546–38557. doi: 10.1074/jbc.M111.237602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Naegeli A, et al. Substrate specificity of cytoplasmic N-glycosyltransferase. J. Biol. Chem. 2014;289:24521–24532. doi: 10.1074/jbc.M114.579326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Miyagawa A, et al. One-step synthesis of sugar nucleotides. J. Org. Chem. 2020;85:15645–15651. doi: 10.1021/acs.joc.0c01943. [DOI] [PubMed] [Google Scholar]
- 30.Kong Y, et al. N-Glycosyltransferase from Aggregatibacter aphrophilus synthesizes glycopeptides with relaxed nucleotide-activated sugar donor selectivity. Carbohydr. Res. 2018;462:7–12. doi: 10.1016/j.carres.2018.03.008. [DOI] [PubMed] [Google Scholar]
- 31.Holm L, Laakso LM. Dali server update. Nucleic Acids Res. 2016;44:W351–W355. doi: 10.1093/nar/gkw357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Clarke AJ, et al. Structural insights into mechanism and specificity of O-GlcNAc transferase. EMBO J. 2008;27:2780–2788. doi: 10.1038/emboj.2008.186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Martinez-Fleites C, et al. Structure of an O-GlcNAc transferase homolog provides insight into intracellular glycosylation. Nat. Struct. Mol. Biol. 2008;15:764–765. doi: 10.1038/nsmb.1443. [DOI] [PubMed] [Google Scholar]
- 34.Zhu L, et al. Structural insights into mechanism and specificity of the plant protein O-fucosyltransferase SPINDLY. Nat. Commun. 2022;13:7424. doi: 10.1038/s41467-022-35234-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Song Q, et al. Production of homogeneous glycoprotein with multisite modifications by an engineered N-glycosyltransferase mutant. J. Biol. Chem. 2017;292:8856–8863. doi: 10.1074/jbc.M117.777383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Drula E, et al. The carbohydrate-active enzyme database: functions and literature. Nucleic Acids Res. 2021;50:D571–D577. doi: 10.1093/nar/gkab1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Schimpl M, et al. O-GlcNAc transferase invokes nucleotide sugar pyrophosphate participation in catalysis. Nat. Chem. Biol. 2012;8:969–974. doi: 10.1038/nchembio.1108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Vipperla B, Griffiths TM, Wang X, Yu H. Theoretical pKa prediction of the α-phosphate moiety of uridine 5′-diphosphate-GlcNAc. Chem. Phys. Lett. 2017;667:220–225. doi: 10.1016/j.cplett.2016.12.001. [DOI] [Google Scholar]
- 39.Jancan I, Macnaughtan MA. Acid dissociation constants of uridine-5′-diphosphate compounds determined by 31phosphorus nuclear magnetic resonance spectroscopy and internal pH referencing. Anal. Chim. Acta. 2012;749:63–69. doi: 10.1016/j.aca.2012.08.052. [DOI] [PubMed] [Google Scholar]
- 40.Ardevol A, Iglesias-Fernandez J, Rojas-Cervellera V, Rovira C. The reaction mechanism of retaining glycosyltransferases. Biochem. Soc. Trans. 2016;44:51–60. doi: 10.1042/BST20150177. [DOI] [PubMed] [Google Scholar]
- 41.Darby JF, et al. Substrate engagement and catalytic mechanisms of N-acetylglucosaminyltransferase V. ACS Catal. 2020;10:8590–8596. doi: 10.1021/acscatal.0c02222. [DOI] [Google Scholar]
- 42.Bilyard MK, et al. Palladium-mediated enzyme activation suggests multiphase initiation of glycogenesis. Nature. 2018;563:235–240. doi: 10.1038/s41586-018-0644-7. [DOI] [PubMed] [Google Scholar]
- 43.Ardèvol A, Rovira C. Reaction mechanisms in carbohydrate-active enzymes: glycoside hydrolases and glycosyltransferases. Insights from ab initio quantum mechanics/molecular mechanics dynamic simulations. J. Am. Chem. Soc. 2015;137:7528–7547. doi: 10.1021/jacs.5b01156. [DOI] [PubMed] [Google Scholar]
- 44.Barducci A, Bonomi M, Parrinello M. Metadynamics. WIREs Comput. Mol. Sci. 2011;1:826–843. doi: 10.1002/wcms.31. [DOI] [Google Scholar]
- 45.Garcia-Viloca M, Gao J, Karplus M, Truhlar DG. How enzymes work: Analysis by modern rate theory and computer simulations. Science. 2004;303:186–195. doi: 10.1126/science.1088172. [DOI] [PubMed] [Google Scholar]
- 46.Kumari M, et al. Exploring reaction pathways for O-GlcNAc transferase catalysis. a string method study. J. Phys. Chem. B. 2015;119:4371–4381. doi: 10.1021/jp511235f. [DOI] [PubMed] [Google Scholar]
- 47.Khoury GA, Baliban RC, Floudas CA. Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Sci. Rep. 2011;1:90. doi: 10.1038/srep00090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Withers SG, Davies GJ. The case of the missing base. Nat. Chem. Biol. 2012;8:952–953. doi: 10.1038/nchembio.1117. [DOI] [PubMed] [Google Scholar]
- 49.Piniello B, et al. Asparagine tautomerization in glycosyltransferase catalysis. the molecular mechanism of protein O-fucosyltransferase 1. ACS Catal. 2021;11:9926–9932. doi: 10.1021/acscatal.1c01785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Nakamura A, et al. “Newton’s cradle” proton relay with amide-imidic acid tautomerization in inverting cellulase visualized by neutron crystallography. Sci. Adv. 2015;1:e1500263. doi: 10.1126/sciadv.1500263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Abbott DW, Macauley MS, Vocadlo DJ, Boraston AB. Streptococcus pneumoniae endohexosaminidase D, structural and mechanistic insight into substrate-assisted catalysis in family 85 glycoside hydrolases. J. Biol. Chem. 2009;284:11676–11689. doi: 10.1074/jbc.M809663200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Liu H, et al. Identification of the effect of N-glycan modification and its sialylation on proteolytic stability and glucose-stabilizing activity of glucagon-like peptide 1 by site-directed enzymatic glycosylation. RSC Adv. 2022;12:31892–31899. doi: 10.1039/D2RA05872C. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Tytgat HLP, et al. Cytoplasmic glycoengineering enables biosynthesis of nanoscale glycoprotein assemblies. Nat. Commun. 2019;10:5403. doi: 10.1038/s41467-019-13283-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Kabsch W. Xds. Acta Crystallogr. D Biol. Crystallogr. 2010;66:125–132. doi: 10.1107/S0907444909047337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Collaborative Computational Project, Number 4. The CCP4 suite: programs for protein crystallography. Acta Crystallogr. D Biol. Crystallogr. 1994;50:760–763. doi: 10.1107/S0907444994003112. [DOI] [PubMed] [Google Scholar]
- 56.Emsley P, Cowtan K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D Biol. Crystallogr. 2004;60:2126–2132. doi: 10.1107/S0907444904019158. [DOI] [PubMed] [Google Scholar]
- 57.Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010;31:455–461. doi: 10.1002/jcc.21334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Lazarus MB, et al. Structural snapshots of the reaction coordinate for O-GlcNAc transferase. Nat. Chem. Biol. 2012;8:966–968. doi: 10.1038/nchembio.1109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Case, D. A. et al. Amber 2020 (University of California, 2021).
- 60.Frisch, M. J. et al. Gaussian 09, Revision B.01 (Wallingford CT, 2009).
- 61.Maier JA, et al. ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. Theor. Comput. 2015;11:3696–3713. doi: 10.1021/acs.jctc.5b00255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Kirschner KN, et al. GLYCAM06: a generalizable biomolecular force field. Carbohydrates. J. Comput. Chem. 2008;29:622–655. doi: 10.1002/jcc.20820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA. Development and testing of a general amber force field. J. Comput. Chem. 2004;25:1157–1174. doi: 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
- 64.Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983;79:926–935. doi: 10.1063/1.445869. [DOI] [Google Scholar]
- 65.Kuhne TD, et al. CP2K: An electronic structure and molecular dynamics software package - quickstep: efficient and accurate electronic structure calculations. J. Chem. Phys. 2020;152:194103. doi: 10.1063/5.0007045. [DOI] [PubMed] [Google Scholar]
- 66.Tribello GA, Bonomi M, Branduardi D, Camilloni C, Bussi G. PLUMED 2: new feathers for an old bird. Comp. Phys. Commun. 2014;185:604–613. doi: 10.1016/j.cpc.2013.09.018. [DOI] [Google Scholar]
- 67.VandeVondele J, et al. Quickstep: fast and accurate density functional calculations using a mixed Gaussian and plane waves approach. Comput. Phys. Commun. 2005;167:103–128. doi: 10.1016/j.cpc.2004.12.014. [DOI] [Google Scholar]
- 68.Maseras F, Morokuma K. IMOMM - a new integrated ab-initio plus molecular mechanics geometry optimization scheme of equilibrium structures and transition-states. J. Comput. Chem. 1995;16:1170–1179. doi: 10.1002/jcc.540160911. [DOI] [Google Scholar]
- 69.Perdew JP, Burke K, Ernzerhof M. Generalized gradient approximation made simple. Phys. Rev. Lett. 1996;77:3865–3868. doi: 10.1103/PhysRevLett.77.3865. [DOI] [PubMed] [Google Scholar]
- 70.Grimme S, Antony J, Ehrlich S, Krieg H. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. J. Chem. Phys. 2010;132:154104. doi: 10.1063/1.3382344. [DOI] [PubMed] [Google Scholar]
- 71.Hartwigsen C, Goedecker S, Hutter J. Relativistic separable dual-space Gaussian pseudopotentials from H to Rn. Phys. Rev. B. 1998;58:3641–3662. doi: 10.1103/PhysRevB.58.3641. [DOI] [PubMed] [Google Scholar]
- 72.Laio A, Parrinello M. Escaping free-energy minima. Proc. Natl. Acad. Sci. USA. 2002;99:12562–12566. doi: 10.1073/pnas.202427399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Invernizzi M, Parrinello M. Rethinking metadynamics: from bias potentials to probability distributions. J. Phys. Chem. Lett. 2020;11:2731–2736. doi: 10.1021/acs.jpclett.0c00497. [DOI] [PubMed] [Google Scholar]
- 74.Ensing B, Laio A, Parrinello M, Klein ML. A recipe for the computation of the free energy barrier and the lowest free energy path of concerted reactions. J. Phys. Chem. B. 2005;109:6676–6687. doi: 10.1021/jp045571i. [DOI] [PubMed] [Google Scholar]
- 75.Constantino E, Solans-Monfort X, Sodupe M, Bertran J. Basic and acidic bifunctional catalysis: application to the tautomeric equilibrium of formamide. Chem. Phys. 2003;295:151–158. doi: 10.1016/j.chemphys.2003.08.003. [DOI] [Google Scholar]
- 76.Marcos-Alcalde I, López-Viñas E, Gómez-Puertas P. MEPSAnd: minimum energy path surface analysis over n-dimensional surfaces. Bioinformatics. 2019;36:956–958. doi: 10.1093/bioinformatics/btz649. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The crystal structures of the AaNGT-UDP-Gal, AaNGT-UDP-2F-Glc, and AaNGT-UDP-peptide complexes were deposited at the RCSB PDB with accession code, 8P0O, 8P0P, and 8P0Q, respectively. Previously published PDB structures used in this study are available under the accession codes: 3Q3E, 3Q3I, 3Q3H, and 7Y4I. Other data are available from the corresponding author upon request. The kinetics and ITC data generated in this study are provided in the source data file. Source data are provided as a Source Data file. Source data are provided with this paper.
Data files of the classical MD simulation and QM/MM metadynamics simulations have been deposited in Zenodo (10.5281/zenodo.8081487).