Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2020 Jan 20;29(4):930–940. doi: 10.1002/pro.3807

Misannotations of the genes encoding sugar N‐formyltransferases

Nicholas M Girardi 1, James B Thoden, Hazel M Holden
PMCID: PMC7096703  PMID: 31867814

Abstract

Tens of thousands of bacterial genome sequences are now known due to the development of rapid and inexpensive sequencing technologies. An important key in utilizing these vast amounts of data in a biologically meaningful way is to infer the function of the proteins encoded in the genomes via bioinformatics techniques. Whereas these approaches are absolutely critical to the annotation of gene function, there are still issues of misidentifications, which must be experimentally corrected. For example, many of the bacterial DNA sequences encoding sugar N‐formyltransferases have been annotated as l‐methionyl‐tRNA transferases in the databases. These mistakes may be due in part to the fact that until recently the structures and functions of these enzymes were not well known. Herein we describe the misannotation of two genes, WP_088211966.1 and WP_096244125.1, from Shewanella spp. and Pseudomonas congelans, respectively. Although the proteins encoded by these genes were originally suggested to function as l‐methionyl‐tRNA transferases, we demonstrate that they actually catalyze the conversion of dTDP‐4‐amino‐4,6‐dideoxy‐d‐glucose to dTDP‐4‐formamido‐4,6‐dideoxy‐d‐glucose utilizing N 10‐formyltetrahydrofolate as the carbon source. For this analysis, the genes encoding these enzymes were cloned and the corresponding proteins purified. X‐ray structures of the two proteins were determined to high resolution and kinetic analyses were conducted. Both enzymes display classical Michaelis–Menten kinetics and adopt the characteristic three‐dimensional structural fold previously observed for other sugar N‐formyltransferases. The results presented herein will aid in the future annotation of these fascinating enzymes.

Keywords: 4‐formamido‐4,6‐dideoxy‐d‐glucose; lipopolysaccharide; misannotation of bacterial genomes; N‐formyltransferase; O‐antigen; Pseudomonas congelans; Shewanella spp


Abbreviations

dTDP

thymidine diphosphate

Fuc3NFo

3‐formamido‐3,6‐dideoxy‐d‐galactose

HEPPS

N‐2‐hydroxyethylpiperazine‐N′‐3‐propanesulfonic acid

HPLC

high‐performance liquid chromatography

MOPS

3‐(N‐morpholino)propanesulfonic acid

N10‐formyl‐THF

N 10‐formyltetrahydrofolate

N‐formylperosamine

4‐formamido‐4,6‐dideoxy‐d‐mannose

Tris

tris‐(hydroxymethyl)aminomethane

Qui4N

4‐amino‐4,6‐dideoxy‐d‐glucose

Qui3NFo

3‐formamido‐3,6‐dideoxy‐d‐glucose

Qui4NFo

4‐formamido‐4,6‐dideoxy‐d‐glucose

1. INTRODUCTION

As of November 2019, there were over 22,0000 whole/draft prokaryotic genomes listed in the National Center for Biotechnology Information. Many of the deposited genes coding for putative proteins have been annotated for function based upon their predicted amino acid sequences. Questions arise, however, as to the extent of possible misannotations, how these affect database quality and utility, and how they are propagated to additional database entries.1, 2 Indeed, this concern was raised as early as 1999 by Kyrpides and Ouzounis who admonished against the use of “overambitious annotation projects” and advised for continuous curation by experts to correct mistakes.3 Indeed, the generation of an enormous amount of data does not always translate into biochemical insight.

Here, we describe and correct the misannotation of two genes, WP_088211966.1 and WP_096244125.1, from Shewanella spp. and Pseudomonas congelans, respectively. These genes were originally suggested to encode for l‐methionyl‐tRNA transferases. Our biochemical and X‐ray crystallographic data demonstrate that they, in fact, encode for sugar N‐formyltransferases that are involved in the production of the unusual sugar 4‐formamido‐4,6‐dideoxy‐d‐glucose, hereafter referred to as Qui4NFo (Scheme 1).

Scheme 1.

Scheme 1

Predicted pathway for the production of dTDP‐Qui4NFo

Whereas N‐formylated sugars were first observed in 1985, only recently have reports begun to appear in the literature regarding the structures and functions of the N‐formyltransferases that are required for the biosynthesis of Qui4NFo, as well as for 3‐formamido‐3,6‐dideoxy‐d‐glucose (Qui3NFo), 3‐formamido‐3,6‐dideoxy‐d‐galactose (Fuc3NFo), and 4‐formamido‐4,6‐dideoxy‐d‐mannose (N‐formylperosamine).4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 Typically, these sugars, such as Qui3NFo and Qui4NFo, are found on the lipopolysaccharides or lipooligosaccharides of pathogenic Gram‐negative bacteria such as Campylobacter jejuni, the organism responsible for gastroenteritis worldwide and Francisella tularensis, the causative agent of rabbit fever.6, 7 N‐formylperosamine, on the other hand, has been identified thus far only on the surface homopolysaccharides of Brucella abortus, an intracellular pathogen responsible for chronic bovine disease.16 The jury is still out regarding the role of N‐formylated sugars in disease, but loss of activity of an N‐formyltransferase in Brucella melitensis results in a bacterial strain with attenuated pathogenicity.17

Whether part of the lipopolysaccharide or a component of surface homopolysaccharides, the biosynthetic pathways for these sugars initiate with the attachment of a nucleoside monophosphate to either glucose‐1‐phosphate, galactose‐1‐phosphate, or mannose‐1‐phosphate. As indicated in Scheme 1, the biosynthesis of Qui4NFo, which is the focus of this investigation, requires three additional steps, namely a dehydration, an amination, and finally an N‐formylation. It is the last step that we demonstrate is catalyzed by WP_088211966.1 and WP_096244125.1, from Shewanella spp. and P. congelans, respectively. Both of these enzymes used N 10‐formyltetrahydrofolate as the carbon source. In terms of biological relevance, P. congelans, first isolated in Germany and possibly phytopathogenic, is a Gram‐negative fluorescent bacterium associated with the phyllosphere of grasses.18 It has also been shown recently to be a prominent species in spoiled chicken meat.19 Shewanella species are Gram‐negative bacteria that inhabit marine environments and that may play a role in human disease.

The analysis reported herein provides new insight into these fascinating enzymes, and it will help to further improve database quality. Indeed, the sugar N‐formyltransferases may be more common than originally envisioned.

2. RESULTS

2.1. The structure and function of WP_088211966.1 from Shewanella spp.

Our previous investigations on the sugar N‐formyltransferases revealed three residues that are strictly conserved amongst those enzymes that utilize dTDP‐Qui4N as a substrate.7, 8, 14, 15 In the Shewanella spp. protein, these correspond to His 77, Lys 79, and Asn 224. The conserved histidine/lysine couple is intimately involved in positioning the pyrophosphoryl moiety of the dTDP‐sugar substrate into the active site whereas the conserved asparagine participates in a hydrogen bonding interaction with the thymine ring of the nucleotide. Given this observed conservation, we tested the Shewanella protein for activity against dTDP‐Qui4N via a discontinuous assay. The carbon source was N 10‐formyltetrahydrofolate. Shown in Figure 1a is a plot of the initial velocity kinetic data. The Shewanella protein demonstrates a k cat of 1.01 ± 0.11 s−1, a K M of 0.21 ± 0.03 mM (for dTDP‐Qui4N), and an overall catalytic efficiency of 4.8 × 103 (±600) M−1 s−1. This catalytic efficiency is similar to that observed for the enzymes from F. tularensis, Providencia alcalifaciens O30, Mycobacterium tuberculosis, and Pantoea ananatis.5, 7, 14, 15

Figure 1.

Figure 1

Initial velocity kinetic data. Shown in (a) is a plot of initial velocity versus substrate concentration for the Shewanella enzyme. The concentration of N 10‐formyltetrahydrofolate was held constant at 5 mM, whereas the concentration of dTDP‐Qui4N ranged from 0.1 to 2.5 mM. A plot of initial velocity versus substrate concentration for the Pseudomonas enzyme is presented in (b). The concentration of N 10‐formyltetrahydrofolate was held constant at 5 mM, whereas the concentration of dTDP‐Qui4N ranged from 0.1 to 5.0 mM. In presenting the data as we do, we are adhering to standard conventions in enzymology. Measuring velocities over a wide range of substrate concentrations allows us to obtain data that define both k cat and k cat/K M well, which is not accomplished by measuring replicates at fewer different concentrations. The graphs shown allow for a qualitative appreciation of the quality of the data; the quantitative goodness‐of‐fit to the Michaelis–Menten equation is given by the standard errors as described in Section 2

After demonstrating N‐formyltransferase activity, the enzyme was subsequently crystallized in the presence of N 5‐formyltetrahydrofolate (a non‐catalytically competent analog of N 10‐formyltetrahydrofolate) and dTDP‐Qui4N. The crystals belonged to the space group C2221 with one dimer in the asymmetric unit. The model was refined at 1.9 Å resolution to an overall R‐factor of 18.7%. A ribbon representation of the dimer is presented in Figure 2a. The α‐carbons for the two subunits correspond root‐mean‐square deviation of 0.5 Å. The basic core of each subunit, defined by the N‐terminus to His 203, contains a seven‐stranded mixed β‐sheet flanked on either side by seven α‐helices. The remaining residues, from Ile 204 to Lys 245, fold into a β‐strand (Ile 204 to Thr 206), an α‐helix (Leu 207 to Ala 216), and two additional β‐strands (Ala 225 to Leu 228 and Lys 234 to Lys 245). These last β‐strands run anti‐parallel to one another and, as shown in Figure 2a, are situated such that they extend the two‐stranded β‐sheet into a four‐stranded anti‐parallel β‐sheet across the subunit:subunit interface. The smaller β‐strand, defined by Ile 204 to Thr 206, also runs across the subunit:subunit interface to form a two‐stranded anti‐parallel β‐sheet as can be seen in Figure 2b. The subunit:subunit interface is decidedly hydrophobic in nature (Figure 2b).

Figure 2.

Figure 2

Quaternary structure of the Shewanella enzyme. A ribbon representation of the dimer is display in (a). A close‐up stereo view of the subunit:subunit interface is presented in (b). This figure and Figures 3, 4, 5, and 7 were prepared with the software package PyMOL30

When the structure of the Shewanella protein was first solved, the electron density map demonstrated that the occupancy of the N 5‐formyltetrahydrofolate ligand was low. In an attempt to increase the occupancy, the crystals were subsequently soaked in N 10‐formyltetrahydrofolate and dTDP‐Qui4N. Shown in Figure 3a is the electron density corresponding to the two ligands. The electron density for the dTDP‐Qui4N ligand was well defined throughout the nucleotide portion, but was weaker for the pyranosyl moiety. As can be seen, only density for a folate fragment was observed.

Figure 3.

Figure 3

Active site of the Shewanella enzyme. The electron densities corresponding to the folate fragment and the dTDP sugar in Subunit 1 are shown in stereo in (a). These densities were calculated with coefficients of the form F oF c where F o and F c are the native and calculated structure factor amplitudes, respectively. The map was contoured at 3σ. A close‐up stereo view of the active site is displayed in (b). The ligands are colored in orange bonds whereas the important side chains lining the active site are display in wheat. Those residues highlighted in green bonds correspond to the three conserved residues that are observed in those N‐formyltransferases that use N 10‐formyltetrahydrofolate as the coenzyme. In the Shewanella enzyme, these correspond to Asn 94, His 96, and Asp 131. It is thought that His 96 serves as the catalytic base to remove the proton from the C‐4′ sugar amino group that is necessary for N‐formylation. Dashed lines indicate possible hydrogen bonding interactions within 3.2 Å. Ordered water molecules are displayed as red spheres

A close‐up view of the active site for Subunit 1 is presented in Figure 3b. The thymine ring is positioned into the active site by the side chain of Asn 224, and it is also surrounded by the aromatic rings of Phe 107, Phe 193, and Phe 222. The dTDP ribose adopts the C‐3′ endo pucker with its hydroxyl group lying within 3.2 Å of Nε2 of Gln 109 and a water molecule. The side chains of His 77 and Tyr 153 play key roles in anchoring the β‐phosphoryl oxygens into the active site cleft. Interestingly, there are no side chains directly involved in hydrogen bonding interactions with the hexose moiety of the substrate. Rather the backbone carbonyl of Gly 105 and the backbone amide of Phe 107 sit at 3.2 Å from the C‐3′ and the C‐2′ hydroxyls of the sugar, respectively. Additionally, the indole ring of Trp 106 provides CH/π interactions, which have been suggested to play key roles in carbohydrate/protein recognition for well over 40 years.20 The sugar C‐4′ amino group is oriented toward Asn 94, His 96, and Asp 131. These residues are strictly conserved in all N‐formyltransferases and have been shown to play critical roles in catalysis.11 His 96, located at 4.5 Å from the sugar C‐4′ nitrogen, is thought to serve as the active base required for the abstraction of the proton from the amino group.

Whereas the α‐carbons for the two subunits of the dimer superimpose well, the dTDP‐Qui4N ligand in Subunit 2 adopts an alternative conformation, specifically with respect to the β‐phosphoryl and pyranosyl groups as can be seen in Figure 4. The β‐phosphoryl oxygens of the dTDP‐Qui4N ligand again interact with the side chains of His 77 and Tyr 153 but there is an additional interaction with the side chain of Asn 10. The polypeptide chain backbone interactions between Gly 105 and Phe 107 and the sugar C‐3′ and the C‐2′ hydroxyls are disrupted in this alternate binding mode. The position of the Trp 106 side chain also changes. Taken together, these interactions result in the sugar C‐4′ amino nitrogen lying at 6.9 Å from Nδ1 of His 96. The other residues involved in substrate binding adopt virtually identical positions in both subunits, however. This alternative binding mode will be discussed in more detail below.

Figure 4.

Figure 4

Superpositions of the two active sites in the Shewanella enzyme. Those residues and ligands in Subunit 1 are colored in wheat whereas those in Subunit 2 are displayed in blue

2.2. The structure and function of WP_096244125.1 from P. congelans

The enzyme from P. congelans was also tested for activity against dTDP‐Qui4N. As can be seen in Figure 1b, the enzyme displays classical Michaelis–Menten kinetics with dTDP‐Qui4N as its substrate. The protein from P. congelans is not as efficient as the Shewanella N‐formyltransferase, however. Indeed, whereas its K M of 0.14 ± 0.01 mM is similar to that determined for the Shewanella enzyme, its k cat is much lower at 0.054 ± 0.005 s−1. The net result is a catalytic efficiency of 3.9 × 102 (±50) M−1 s−1, about an order of magnitude difference.

Crystals of the P. congelans enzyme were grown in the presence of N 5‐formyltetrahydrofolate and dTDP‐Qui4N and belonged to the space group P3221 with a dimer in the asymmetric unit. As in the case of the Shewanella protein, however, the electron density map revealed no binding of the coenzyme. In a subsequent experiment, the crystals were soaked in a solution containing N 10‐formyltetrahydrofolate and dTDP‐Qui4N before X‐ray data collection, which ultimately allowed for the positioning of these ligands into the active site. The model was refined at 2.03 Å resolution to an overall R‐factor of 20.2%. The α‐carbons for the two subunits of the dimer correspond with a root‐mean‐square deviation of 0.5 Å. As expected with an amino acid sequence identity and similarity of 49 and 66%, respectively, the α‐carbons for the Shewanella and P. congelans enzymes superimpose with a root‐mean SD of 1.5 Å.

A close‐up view of the active site in Subunit 1 is depicted in Figure 5. The side chain of Asn 234 lies within 3.2 Å of the thymine ring of dTDP‐Qui4N. Additionally, the side chains of Tyr 117 and Phe 232 form parallel stacking interactions with the thymine ring. Tyr 117 also participates in a T‐shaped stacking interaction with Tyr 203. In the Shewanella protein, these residues correspond to Phe 107 and Phe 193. The ribose of the dTDP‐Qui4N adopts the C‐3′ endo pucker, and its hydroxyl group is hydrogen bonded to Gln 119 and a water molecule. The α‐phosphoryl oxygens lie within 3.2 Å of two solvents, whereas the β‐phosphoryl oxygens are hydrogen bonded to a water molecule and the side chains of Asn 21 His 87, and Tyr 163. Lys 89 and a water molecule complete the hydrogen‐bonding sphere by interacting with the pyranosyl group (Figure 5). As opposed to that observed for the Shewanella enzyme, the dTDP‐Qui4N ligands in both subunits adopt similar conformations, but these are most likely enzymatically unproductive given that the C‐4′ amino groups of the pyranosyl moieties are located at ~7 Å from the active site histidines (His 106 in the P. congelans protein).

Figure 5.

Figure 5

The active site of Subunit 1 of the P. congelans enzyme. The ligands are colored in orange, and the protein side chains are highlighted in light blue. As in Figure 3, those residues in the conserved catalytic triad are displayed in green. Water molecules are represented by the red spheres. Interactions between the ligands and the protein are indicated by the dashed lines (3.2 Å or less)

3. DISCUSSION

In 2014, we reported the first structural analysis of a sugar N‐formyltransferase from F. tularensis that functions on dTDP‐Qui4N.7 The crystals utilized in that investigation contained eight subunits in the asymmetric unit. In five of the subunits, the dTDP‐sugar ligand was observed whereas in the other subunits only dTDP was found. Unfortunately, it was never possible to crystallize a ternary complex of the enzyme with a bound dTDP‐sugar and a folate analog. In each subunit that contained a dTDP‐sugar ligand, the C‐4′ amino nitrogen was located over 7 Å from the active site histidine (His 92). It was reasoned that the dTDP‐sugar bound in a non‐productive conformation due to the lack of a bound folate derivative.

We next turned our attention to the N‐formyltransferase from P. alcalifaciens, which was known to catalyze the formylation of dTDP‐Qui4N.5, 8 The crystals used in that study contained a dimer in the asymmetric unit, and both subunits had bound tetrahydrofolate and dTDP‐Qui4N. Again, however, the C‐4′ amino nitrogen of the substrate was located at ~7 Å from the active site base, His 94. Apparently, the unusual conformation of the dTDP‐ligand observed in the enzyme from F. tularensis was not due to a lack of a bound coenzyme. Indeed, the orientations of the dTDP‐Qui4N ligands in both the active sites of the F. tularensis and P. alcalfaciens enzymes were nearly identical.

A subsequent investigation of an N‐formyltransferase from M. tuberculosis finally revealed a binding conformation that made chemical sense with respect to catalysis, namely, the pyranosyl moiety of the substrate was positioned into the active site such that its C‐4′ amino nitrogen was within 3.5 Å of His 82. The same binding orientation of the sugar in the M. tuberculosis protein was recently observed in the enzyme from the plant pathogen, P. ananatis, as well.15

Shown in Figure 6 is an amino acid sequence alignment for the six N‐formyltransferases with known three‐dimensional structures that function on dTDP‐Qui4N. There is a striking stretch of sequence conservation defined by Cys 92 to Ile 113 (Shewanella enzyme numbering). Indeed, this characteristic signature sequence, C‐X‐N‐X‐H‐P‐G‐X‐N‐P‐X‐N‐R‐G‐W‐(F/Y)‐P‐Q‐X‐F‐S‐I (where X is a hydrophobic residue), will significantly aid in the future annotation of proteins that function on dTDP‐Qui4N because it is not conserved in those enzymes that act upon different nucleotide‐linked sugars. This signature sequence includes the fifth β‐strand (Cys 92–His 96) and the fifth α‐helix (Pro 108–Ile 113) of the protein, which are connected by a series of turns including a Type II turn that contains a cis‐proline at Position 3 (Leu 99–His 102). One of the residues in this connecting region is Trp 106, which, as noted above, is involved in CH/π interactions with the pyranosyl ring of the dTDP sugar. Additionally, it participates in T‐shaped stacking interactions with the aminobenzoyl group of the folate‐based coenzyme. Following this tryptophan is an aromatic residue, either a tyrosine or a phenylalanine. The conservation of this aromatic residue occurs because it participates in a parallel stacking interaction with the thymine ring of the dTDP‐sugar substrate.

Figure 6.

Figure 6

Amino acid sequence alignment for those N‐formyltransferases that function on dTDP‐Qui4N and whose three‐dimensional structures are known. WbtJ is from F. tularensis, VioF is from P. alcalifaciens O30, and Rv3404c is from M. tuberculosis. The positions of the β‐strands and α‐helices in these enzymes are indicated by the blue arrows and purple rectangles, respectively

Provided in Figures 7(a,b) are superpositions of the active sites for the six N‐formyltransferases whose amino acid sequences are aligned in Figure 7. Two distinct binding positions for the pyranosyl moieties have now been observed. That shown in Figure 7a is most likely the catalytically conformation in that it places the C‐4′ amino group near the conserved histidine. On the basis of these superpositions, it can be postulated that the region defined by Gly 105 to Phe 107 (Shewanella numbering) functions as the “gate‐keeper” required for proper pyranosyl binding. Most likely, the dTDP‐sugar substrate first binds in the conformation shown in Figure 7b. Subsequently, it is guided into the correct position for catalysis by the formation of hydrogen bonds between polypeptide chain backbone atoms (Gly 105 and Phe 107) and the C‐2′ and the C‐3′ hydroxyls of the dTDP sugar. The substrate is further locked into place via CH/π interactions provided by a conserved indole side chain (Trp 106).

Figure 7.

Figure 7

Comparison of the binding modes for the dTDP‐Qui4N substrates. In all the structures determined to date, one of two distinct binding modes have been observed. One conformation places the C‐4′ amino group within ~3.5 Å of the conserved histidine that is thought to function as the active site base. This conformation is shown in stereo in (a) where the enzymes from the Shewanella (Subunit 1), P. ananatis, and M. tuberculosis are depicted in violet, wheat, and light blue, respectively. The second conformation observed for the dTDP‐Qui4N substrates is displayed in stereo in (b), where the enzymes from Shewanella (Subunit 2), P. congelans, P. alcalifaciens O30, and F. tularensis are displayed in violet, wheat, light blue, and forest respectively. In this conformation, the C‐4′ amino group of the sugar is positioned ~7 Å from the catalytic base

In summary, due to the biochemical and structural efforts of this laboratory, the annotation of potential sugar N‐formyltransferases is improving in the databanks. Yet there are still entries that are not correct. For example, the GenBank entries HBQ68356.1 (Leclercia adecarboxylata), HBH06217.1 (Flavobacteriales bacterium), RLC23809.1 (Enterobacter sp. GER_MD16_1505_Eko_090), and RWT08860.1 (Aeromonas caviae) are all annotated as methionyl‐tRNA formyltransferases. On the basis of their amino acid sequences and in light of their genomic context, we contend that the enzymes encoded by these genes are, indeed, sugar N‐formyltransferases that utilize dTDP‐Qui4N as a substrate. There is no question that the field of bacteriology has radically changed since the genome sequences for Haemophilus influenza and Mycoplasma genitalium were first reported in 1995.21, 22 With tens of thousands of bacterial genome sequences now known, it is imperative that researchers continue to address the biochemical function of putative enzymes because important information may be inadvertently overlooked. Our data suggest that N‐formylated sugars in bacteria may be more common than originally anticipated back in 1985 when 2‐formamido‐2‐deoxy‐d‐galacturonic acid was first observed in the O‐antigens of Pseudomonas aeruginosa serotypes O4a,b, O4a,c, and O4,a,d.23

4. METHODS

4.1. Protein expression and purification of the N‐formyltransferases

The genes encoding the N‐formyltransferases from Shewanella sp. FDAARGOS_354 (CEQ32_00475, GenBank ASF13675) and P. congelans strain ME812.2b (CCL08_14445, GenBank PBQ17256) were synthesized by Integrated DNA Technologies and placed into pET28T3g36 and pET31b(+) vectors for protein expression, respectively. These plasmids were utilized to transform Rosetta2(DE3) Escherichia coli cells (Novagen).

The cultures were grown in lysogeny broth supplemented with kanamycin and chloramphenicol (both at 50 mg/L concentration) for the pET28T3g construct, or ampicillin and chloramphenicol (at 100 and 50 mg/L concentration, respectively) for the pET31b(+) construct at 37°C with shaking until an optical density of 0.8 was reached at 600 nm. The flasks were cooled in an ice bath, and the cells were induced with 1 mM isopropyl β‐d‐1‐thiogalactopyranoside and allowed to express protein at 21°C for 24 h.

The cells were harvested by centrifugation and frozen as pellets in liquid nitrogen. The pellets were subsequently disrupted by sonication on ice in a lysis buffer composed of 50 mM sodium phosphate, 20 mM imidazole, 10% glycerol, and 300 mM sodium chloride (pH 8.0). The lysate was cleared by centrifugation, and the enzymes were purified at 4°C utilizing nickel nitrilotriacetic acid resin (Prometheus Protein Biology Products) according to the manufacturer's instructions. All buffers were adjusted to pH 8.0 and contained 50 mM sodium phosphate, 300 mM sodium chloride, and imidazole concentrations of 20 mM for the wash buffer and 300 mM for the elution buffer. The proteins were dialyzed against 10 mM Tris–HCl (pH 8.0) and 200 mM NaCl. The Shewanella protein was concentrated to 12 mg/ml based on an extinction coefficient of 1.08 (mg/ml)−1 cm−1, whereas the Pseudomonas enzyme was concentrated to 27 mg/ml based on an extinction coefficient of 1.39 (mg/ml)−1 cm−1.

4.2. Crystallization

The crystals of the Shewanella N‐formyltransferase were grown at room temperature from 100 mM MOPS (pH 7.0), 9–14% O‐methyl ether poly(ethylene glycol) 5,000, 200 mM tetraethylammonium chloride, 5 mM dTDP‐Qui4N, and 5 mM N 5‐formyltetrahydrofolate. These crystals were then soaked in 100 mM MOPS (pH 7.0), 16% O‐methyl ether poly(ethylene glycol) 5,000, 200 mM tetraethylammonium chloride, 200 mM NaCl, 5 mM dTDP‐Qui4N, and 5 mM N 10‐formyltetrahydrofolate. They were prepared for X‐ray data collection by transfer for about 10 min to a cryoprotectant solution composed of 100 mM MOPS (pH 7.0), 22% O‐methyl ether poly(ethylene glycol) 5,000, 250 mM tetraethylammonium chloride, 250 mM NaCl, 5 mM dTDP‐Qui4N, 5 mM N 10‐formyltetrahydrofolate, and 15% ethylene glycol. The required N 10‐formyltetrahydrofolate was synthesized in the laboratory as previously described.6

Crystals of the P. congelans protein were grown at room temperature from 100 mM HEPES (pH 7.5), 16–19% O‐methyl ether poly(ethylene glycol) 5,000, 200 mM LiCl, 5 mM dTDP‐Qui4N, and 5 mM N 5‐formyltetrahydrofolate. Larger crystals were obtained by macroseeding into 100 mM HEPES (pH 7.5), 12–13% O‐methyl ether poly(ethylene glycol) 5,000, 300 mM LiCl, 5 mM dTDP‐Qui4N, and 5 mM N 5‐formyltetrahydrofolate. They were subsequently soaked in 100 mM HEPES (pH 7.5), 15% O‐methyl ether poly(ethylene glycol) 5,000, 150 mm LiCl, 100 mM NaCl, 5 mM dTDP‐Qui4N, and 5 mM N 10‐formyltetrahydrofolate. Finally, the crystals were prepared for X‐ray data collection by transfer for about 10 min to a cryoprotectant solution composed of 100 mM HEPES (pH 7.5), 20% O‐methyl ether poly(ethylene glycol) 5,000, 250 mM LiCl, 200 mM NaCl, 5 mM dTDP‐Qui4N, 5 mM N 10‐formyltetrahydrofolate, and 15% ethylene glycol.

4.3. X‐ray data collection and processing

X‐ray data sets were collected at the Advanced Photon Source, Structural Biology Center (Beamline 19‐BM) and were processed utilizing HKL3000.24 Relevant X‐ray data collection statistics are listed in Table 1.

Table 1.

X‐ray data collection statistics and model refinement statistics

Shewanella protein P. congelans protein
Resolution limits (Å)

50.0–1.9

(1.97–1.9)a

50.0–2.03

(2.13 – 2.03)a

Number of independent reflections

45,110

(4,036)

36,719

(5,131)

Completeness (%)

94.8

(86.1)

95.1

(93.7)

Redundancy

5.4

(3.2)

6.2

(3.4)

avg I/avg σ(I)

56.8

(5.6)

51.6

(3.6)

R sym (%)b

3.1

(15.5)

5.7

(25.2)

c R‐factor (overall)%/no. reflections 18.7/45,110 20.2/36,719
R‐factor (working)%/no. reflections 18.4/42,853 20.1/34,850
R‐factor (free)%/no. reflections 24.5/2,257 22.1/1,869
Number of protein atoms 4,021 3,897
Number of heteroatoms 421 372
Average B values
Protein atoms (Å2) 40.6 41.3
Ligand (Å2) 41.5 44.5
Solvent (Å2) 44.7 40.1
Weighted RMS deviations from ideality
Bond lengths (Å) 0.009 0.010
Bond angles (°) 1.60 1.75
Planar groups (Å) 0.008 0.008
Ramachandran regions (%)d
Most favored 97.4 96.3
Additionally allowed 2.6 3.5
Generously allowed 0.0 0.2
a

Statistics for the highest resolution bin.

b
Rsym=II¯/Ix100.
c

R‐factor = (∑|F oF c|/∑|F o|) × 100 where F o is the observed structure‐factor amplitude and F c is the calculated structure‐factor amplitude.

d

Distribution of Ramachandran angles according to PROCHECK.29

4.4. Structure solution and model refinement

The structures were each solved via molecular replacement with PHASER25 using PDB entry 4YFY as a search model.8 Iterative cycles of model building with COOT26, 27 and refinement with REFMAC28 led to final X‐ray models with overall R‐factors of 18.7% for the Shewanella enzyme and 20.2% for the Pseudomonas protein. Refinement statistics are provided in Table 1.

4.5. Determination of kinetic constants

The kinetic parameters were determined as previously described.15 Briefly, the kinetic constants were obtained via a discontinuous assay using an ÄKTA Purifier HPLC system. Reaction rates were determined by measuring the amount of N‐formylated product formed on the basis of peak area as monitored at 267 nm. The concentration was determined from the peak area via a calibration curve with standard samples that had been treated in the same manner as the reaction time points. The 2.0 ml reaction mixtures contained 5 mM N 10‐formyltetrahydrofolate, 50 mM HEPPS (pH 8.5), and 0.01 mg/ml of the Shewanella enzyme with dTDP‐Qui4N concentrations ranging from 0.1 mM to 2.5 mM, or 0.1 mg/ml of the Pseudomonas enzyme with dTDP‐Qui4N concentrations ranging from 0.1 to 5.0 mM. Seven 250 μl aliquots were taken over 2.5 min, and the reaction aliquots quenched by the addition of 6 μl of 6 M HCl. Following addition of 200 μl of carbon tetrachloride and vigorous mixing, the samples were spun at 14,000g for 2 min, and 200 μl of the aqueous phase removed for HPLC analysis. The samples were diluted with 2 ml water and loaded onto a 1 mL Resource‐Q column. Products were quantified after elution with an 8‐column volume gradient from 0 to 400 mM LiCl (pH 4.0, HCl). A plot of initial velocity versus concentration was analyzed using PRISM (GraphPad Software, Inc.) and fitted to the equation v o = (V max[S])/(K M + [S]). The k cat values were calculated according to the equation: k cat = V max/[E T].

CONFLICT OF INTEREST

The authors declare no potential conflict of interests.

ACKNOWLEDGMENTS

This research was supported in part by NIH grant GM115921 (to H.M.H.). X‐ray coordinates have been deposited in the Research Collaboratory for Structural Bioinformatics, Rutgers University, New Brunswick, NJ (accession nos. 6V2T and 6V33).

Girardi NM, Thoden JB, Holden HM. Misannotations of the genes encoding sugar N‐formyltransferases. Protein Science. 2020;29:930–940. 10.1002/pro.3807

Funding information Center for Scientific Review, Grant/Award Number: GM115921

REFERENCES

  • 1. Schnoes AM, Brown SD, Dodevski I, Babbitt PC. Annotation error in public databases: Misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol. 2009;5:e1000605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Promponas VJ, Iliopoulos I, Ouzounis CA. Annotation inconsistencies beyond sequence similarity‐based function prediction—Phylogeny and genome structure. Stand Genomic Sci. 2015;10:108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Kyrpides NC, Ouzounis CA. Whole‐genome sequence annotation: Going wrong with confidence. Mol Microbiol. 1999;32:886–887. [DOI] [PubMed] [Google Scholar]
  • 4. Breazeale SD, Ribeiro AA, McClerren AL, Raetz CR. A formyltransferase required for polymyxin resistance in Escherichia coli and the modification of lipid a with 4‐amino‐4‐deoxy‐l‐arabinose. Identification and function of UDP‐4‐deoxy‐4‐formamido‐l‐arabinose. J Biol Chem. 2005;280:14154–14167. [DOI] [PubMed] [Google Scholar]
  • 5. Liu B, Chen M, Perepelov AV, et al. Genetic analysis of the O‐antigen of Providencia alcalifaciens O30 and biochemical characterization of a formyltransferase involved in the synthesis of a Qui4N derivative. Glycobiology. 2012;22:1236–1244. [DOI] [PubMed] [Google Scholar]
  • 6. Thoden JB, Goneau MF, Gilbert M, Holden HM. Structure of a sugar N‐formyltransferase from Campylobacter jejuni . Biochemistry. 2013;52:6114–6126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Zimmer AL, Thoden JB, Holden HM. Three‐dimensional structure of a sugar N‐formyltransferase from Francisella tularensis . Protein Sci. 2014;23:273–283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Genthe NA, Thoden JB, Benning MM, Holden HM. Molecular structure of an N‐formyltransferase from Providencia alcalifaciens O30. Protein Sci. 2015;24:976–986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Woodford CR, Thoden JB, Holden HM. New role for the ankyrin repeat revealed by a study of the N‐formyltransferase from Providencia alcalifaciens . Biochemistry. 2015;54:631–638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Genthe NA, Thoden JB, Holden HM. Structure of the Escherichia coli ArnA N‐formyltransferase domain in complex with N(5)‐formyltetrahydrofolate and UDP‐Ara4N. Protein Sci. 2016;25:1555–1562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Holden HM, Thoden JB, Gilbert M. Enzymes required for the biosynthesis of N‐formylated sugars. Curr Opin Struct Biol. 2016;41:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Woodford CR, Thoden JB, Holden HM. Molecular architecture of an N‐formyltransferase from Salmonella enterica O60. J Struct Biol. 2017;200:267–278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Riegert AS, Chantigian DP, Thoden JB, Tipton PA, Holden HM. Biochemical characterization of WbkC, an N‐formyltransferase from Brucella melitensis . Biochemistry. 2017;56:3657–3668. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Dunsirn MM, Thoden JB, Gilbert M, Holden HM. Biochemical investigation of Rv3404c from Mycobacterium tuberculosis . Biochemistry. 2017;56:3818–3825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Hofmeister DL, Thoden JB, Holden HM. Investigation of a sugar N‐formyltransferase from the plant pathogen Pantoea ananatis . Protein Sci. 2019;28:707–716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Young EJ. An overview of human brucellosis. Clin Infect Dis. 1995;21:283–289. [DOI] [PubMed] [Google Scholar]
  • 17. Lacerda TL, Cardoso PG, Augusto de Almeida L, et al. Inactivation of formyltransferase (wbkC) gene generates a Brucella abortus rough strain that is attenuated in macrophages and in mice. Vaccine. 2010;28:5627–5634. [DOI] [PubMed] [Google Scholar]
  • 18. Behrendt U, Ulrich A, Schumann P. Fluorescent pseudomonads associated with the phyllosphere of grasses; Pseudomonas trivialis sp. nov., Pseudomonas poae sp. nov. and Pseudomonas congelans sp. nov. Intl J Systemat Evol Microbiol. 2003;53:1461–1469. [DOI] [PubMed] [Google Scholar]
  • 19. Lee HS, Kwon M, Heo S, Kim MG, Kim GB. Characterization of the biodiversity of the spoilage microbiota in chicken meat using next generation sequencing and culture dependent approach. Korean J Food Sci Anim Resour. 2017;37:535–541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Spiwok V. CH/pi interactions in carbohydrate recognition. Molecules. 2017;22:9–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Fleischmann RD, Adams MD, White O, et al. Whole‐genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995;269:496–512. [DOI] [PubMed] [Google Scholar]
  • 22. Fraser CM, Gocayne JD, White O, et al. The minimal gene complement of Mycoplasma genitalium . Science. 1995;270:397–403. [DOI] [PubMed] [Google Scholar]
  • 23. Knirel YA, Vinogradov EV, Shashkov AS, et al. Somatic antigens of Pseudomonas aeruginosa. The structure of the O‐specific polysaccharide chains of lipopolysaccharides of P. aeruginosa serogroup O4 (Lanyi) and related serotype O6 (Habs) and immunotype 1 (Fisher). Eur J Biochem. 1985;150:541–550. [DOI] [PubMed] [Google Scholar]
  • 24. Minor W, Cymborowski M, Otwinowski Z, Chruszcz M. HKL‐3000: The integration of data reduction and structure solution‐from diffraction images to an initial model in minutes. Acta Crystallogr. 2006;D62:859–866. [DOI] [PubMed] [Google Scholar]
  • 25. McCoy AJ, Grosse‐Kunstleve RW, Adams PD, Winn MD, Storoni LC, Read RJ. Phaser crystallographic software. J Appl Cryst. 2007;40:658–674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Emsley P, Cowtan K. Coot: Model‐building tools for molecular graphics. Acta Crystallogr. 2004;D60:2126–2132. [DOI] [PubMed] [Google Scholar]
  • 27. Emsley P, Lohkamp B, Scott WG, Cowtan K. Features and development of coot. Acta Crystallogr. 2010;D66:486–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Murshudov GN, Vagin AA, Dodson EJ. Refinement of macromolecular structures by the maximum‐likelihood method. Acta Crystallogr. 1997;D53:240–255. [DOI] [PubMed] [Google Scholar]
  • 29. Laskowski RA, Moss DS, Thornton JM. Main‐chain bond lengths and bond angles in protein structures. J Mol Biol. 1993;231:1049–1067. [DOI] [PubMed] [Google Scholar]
  • 30. DeLano WL (2002) The PyMOL molecular graphics system DeLano scientific, San Carlos, CA.

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES