Abstract
YdhR is a 101-residue conserved protein from Escherichia coli. Sequence searches reveal that the protein has >50% identity to proteins found in a variety of other bacterial genomes. Using size exclusion chromatography and fluorescence spectroscopy, we determined that ydhR exists in a dimeric state with a dissociation constant of ~40 nM. The three-dimensional structure of dimeric ydhR was determined using NMR spectroscopy. A total of 3400 unambiguous NOEs, both manually and automatically assigned, were used for the structure calculation that was refined using an explicit hydration shell. A family of 20 structures was obtained with a backbone RMSD of 0.48 Å for elements of secondary structure. The structure reveals a dimeric α,β fold characteristic of the alpha+beta barrel superfamily of proteins. Bioinformatic approaches were used to show that ydhR likely belongs to a recently identified group of mono-oxygenase proteins that includes ActVA-Orf6 and YgiN and are involved in the oxygenation of polyaromatic ring compounds.
Keywords: alpha + beta barrel, automated refinement, NMR, structural genomics, water refinement
YdhR is a 101-residue protein from Escherichia coli for which no function has yet been described or proposed. Analysis of the E. coli genome shows that it does not lie within any identified operon, and motif-based searches fail to identify any known active site or conserved domain. Sequence homology searches find that similar proteins are expressed in a wide variety of species; however, none of these have known structures or functions. Several bacterial species, including Salmonella typhimurium, Vibrio parahaemolyticus, Yersinia pestis, and Photobacterium profundum produce proteins with between 54% and 65% identity. The genomes of the yeast Debaryomyces hanseii and the mosquito Anopheles gambiae code for proteins with 52% and 54% identity, respectively. The high level of sequence identity and the high number of strictly conserved residues across this variety of species suggest an important, evolutionarily conserved biological role.
In this work we report the NMR solution structure for E. coli ydhR. The protein forms a homodimer comprised of a central four-strand β-sheet and four surrounding α-helices. The structure identifies this protein as a member of the dimeric alpha+beta barrel superfamily of proteins with a wide variety of possible functions. Structural homology to the ActVA-Orf6 and YgiN proteins indicate that ydhR likely serves as a mono-oxygenase in bacteria whose substrates include polyaromatic ring compounds.
Results and Discussion
The 1H-15N HSQC spectrum of ydhR (data not shown) revealed ~130 peaks, the number that would be expected from the backbone and side chain amino groups for a stable, monodisperse species in solution. The signals were well dispersed throughout the spectral region. All of the backbone atoms from the ydhR sequence except those from Met 1 were assigned, as was a total of 93% of the protons in the sequence. The protein contained a 20-residue N-terminal linker which exhibited several strong peaks in the 1H-15N HSQC spectrum and very few NOES in either the 15N or 13C-NOESY spectra. These were also not assigned. It was noted that Glu 90 showed two resolved amide resonances in the 15N-edited spectra, suggesting that the following residue, Pro 91, was undergoing a cis/trans isomerization on a slow exchange chemical shift time-scale.
In the initial rounds of structure calculation we attempted to use the program CYANA (Guntert et al. 1997) to assign all of the NOEs ab initio with only the dihedral restraints predefined for a monomeric ydhR protein. A series of seven cycles of iterative torsion angle dynamics and automated NOE assignments were used for a complete structure refinement. This method did not produce structures that fulfilled the CYANA criteria for quality and reliability, although a consistent fold could be discerned. This observation was consistent with the ydhR protein being in a different oligomeric state than the monomer that was assumed in the input data. To resolve this, analytical size exclusion chromatography was used to determine the oligomeric state of ydhR. Using conditions similar to the NMR experiments, ydhR eluted as a single peak corresponding to a mass of 27.3 kDa. Since this was approximately twice the mass expected for a monomeric form (13.5 kDa), it indicated that ydhR was a dimer in solution.
The homodimer structure of ydhR was determined using CYANA by initially calculating the monomeric fold and then applying intermonomer NOE distances to form the dimeric structure. The structures were then improved using the automated NOE assignment protocol in CYANA. These structures were refined by using a final round of restrained molecular dynamics and energy minimization with an explicit shell of water to surround the protein molecule. A superposition of the 20 final structures is shown in Figure 1A ▶. The topology of the ydhR dimer consists of two monomers each having a four-strand anti-parallel β-sheet that includes residues Thr 3–Ala 10 (β1), Phe 36–Ser 44 (β2), Glu 49–Phe 56 (β3), and Val 82–Lys 84 (β4). Four α-helices are formed between residues Asp 17–Ala 21 (helix α1), Leu 23–Ile 30 (α2), Ala 62–Leu 76 (α3), and Val 90–Phe 95 (α4). Helix α1 is framed by two prolines (Pro 14, Pro 25). Chemical shift information indicated that Pro 14 was in a cis conformation, although the final structures indicated this residue was in a distorted trans conformation. Pro 25 acts as a helix breaker between helices α1 and α2, leading to a kink between these two helices in the structure. The overall topology of the structure is β1α1α2β2β3α3β4α4, as shown in Figure 1B ▶. This set of structures was very well defined, having a backbone RMSD of 0.54 Å for residues 3–95 and 0.48 Å for the regions of secondary structure (Table 1). More than 98% of the residues were found in the most favored or additionally allowed regions of the Ramachandran plot. Only small differences were noted in these structures compared to the CYANA-derived structures prior to water refinement. For example, the general topology of the structure remained the same but the RMSD between the mean structure from CYANA and the mean, final refined structure was ~0.9 Å for residues in defined secondary structure regions. More importantly, the water-refined structure resulted in structures where the number of van der Waals bumps was reduced from an average of 102.1 per structure in the CYANA structures to 1.6 per water-refined structure. It was also noted that the α-helical structure improved in the water-refined structures by ~18%, largely due to improved definition of a region previously classified as a 310 helix. Both the decrease in steric clashes and improvement in secondary structure have been observed previously for many proteins refined in this manner (Nederveen et al. 2005). Complete structural statistics are shown in Table 1.
Table 1.
Restraints | CYANA | CYANA-Water Refinement |
NOE distance restraints (dimer) | 3461 | |
Intraresidue | 1685 | |
Medium range (|I−j| ≤ 5) | 648 | |
Long range (|i−j|>5) | 650 | |
Intermolecular | 476 | |
Dihedral Angle Restraints (TALOS) | 200 | |
PROCHECK-NMR (Laskowski et al. 1996) | ||
Most favored (%) | 81.0 | 81.3 |
Additionally allowed (%) | 18.8 | 16.8 |
Generously allowed (%) | 0.2 | 0.8 |
Disallowed (%) | 0.0 | 1.1 |
Close contacts (20 structures) | 2042 | 32 |
Secondary Structure Content | ||
Helix (%) | 18.0 | 35.6 |
Sheet (%) | 20.3 | 17.8 |
Other (3-10 helix, bend, turn) (%) | 34.7 | 21.8 |
Total | 73.0 | 75.2 |
Coordinate Precision | ||
RMSD from average structure (Å) | ||
Well-defined region | ||
Residue 3–95 backbone atoms | 0.61 | 0.54 |
Residue 3–95 heavy atoms | 0.93 | 0.86 |
Secondary Structure | ||
Residues 3–10,17–20,23–30,49–56, 62–76,82–84,90–95 | ||
Backbone | 0.54 | 0.48 |
Heavy | 0.85 | 0.77 |
NOE violations >0.2 Å | 0 | |
Dihedral violations >2.08 | 0 |
Two key contacting regions form the dimer interface in ydhR. The C terminus, including helix α4, forms an arm-like structure that extends to partially encircle the other subunit. There are extensive hydrophobic contacts between helix α4 and portions of the other monomer including residues in helix α1 and strand β2. Intersubunit close contacts were observed in this region, including interactions between the side chains of Met 19, Leu 23, Val 40, and Trp 41 with Leu 92′, Ala 2, Leu 37, and Leu 100 with Lys 99′, and Trp 38 and Lys 39 with Ile 95′. The second major dimer surface is formed by the β-sheets of each monomer such that β1 of one subunit is rotated by 99° relative to the other. The intersubunit contacts in the sheet region include the side chains of His 8 and Glu 49 to Gln 6′ and Lys 84′, Asn 47 to Lys 84′ and Lys 85′, Phe 86 with Gly 51′ and Gly 52′, and Ile 53 to Ile 53′. In general the contacts between the β-sheets are largely polar while interactions involving the C terminus are largely hydrophobic. The total buried surface at the dimer interface is 3680 Å2.
The dimeric solution structure of ydhR showed that the side chains of both of the tryptophans in the sequence, Trp 38 and Trp 41, are partially buried due to hydrophobic interactions with the C-terminal tail of the other monomer and would likely be more solvent-exposed in the monomer form. Using fluorescence spectroscopy, excitation at 295 nm of the dimeric form of ydhR at 55 μM resulted in a single broad tryptophan emission peak with a maximum near 347 nm. A series of dilutions of ydhR over the concentration range from 55.0 μM to 27.5 nM resulted in a red shift of the fluorescence maximum from 347 nm to 355 nm, as expected for a shift toward the monomeric form of the protein at lower concentrations. Analysis of the change in fluorescence versus concentration indicated a Kd from dimmer to monomer of ~40 nM. This tight association is consistent with the large intermolecular surface for the ydhR dimer.
Recently a 2.90 Å resolution crystal structure of the same molecule (1WD6) was deposited in the Protein Bata Bank. The solution structure of ydhR is in a dimeric form similar to that observed in the crystalline state. The monomers of the NMR- and X-ray-deter mined structures have essentially the same arrangement of secondary structure, and an overall backbone RMSD of 1.92 Å exists for β1-β4 and α3. The most notable difference is that the strand β4 of the X-ray structure is longer, extending for six residues rather than three in the solution structure. The extension of β4 allows it to interact with the end of strand β2 of the other subunit to form an imperfect central beta-barrel structure that cannot be clearly discerned in the solution structure. The monomers in the dimer interact slightly differently in the NMR and X-ray structures, with the angles between the central β1 strands of the sheet at 99° and 109°, respectively. Interestingly, the X-ray structure of ydhR also exhibits a similar significant variation from planarity for Pro 14 (ω = −148.7°) as observed in the solution structure.
Discerning the function of a protein based on its structure is an area of intensive study due to the proliferation of structural genomics (Watson et al. 2005). In this case sequence homology searches with approaches such as BLAST while identifying several highly homologous proteins across a wide range of species did not reveal any proteins of significant sequence homology to ydhR for which the structure or function was known. Structure-based analysis was therefore essential in identifying the possible functions of ydhR. A Hidden Markov Model (HMM) algorithm in the PROFUNC suite of programs (Laskowski et al. 2005) assigned the ydhR structure to the SCOP class of alpha and beta proteins with ferredoxin-like folds and more particularly to the recently identified dimeric alpha+beta barrel superfamily. The members of this family, in addition to the central beta-barrel structure that forms the dimer interface, have a largely hydrophobic, active site cleft located between the first two α-helical regions (α1,α2—residues Asp 17–Ile 30 in ydhR) and the third helix (α3—residues Ala 62–Leu 76) where polyaromatic ring substrates bind and are oxygenated (Fig. 2 ▶). The prototypical member of this superfamily is actinorhodin biosynthesis mono-oxygenase (ActVa-Orf6) from Streptomyces coelicolor (Sciara et al. 2003). The dimeric structure of ActVa-Orf6 is very similar to ydhR, with a backbone RMSD of 1.87 Å for residues β1-β4 and α1. While the sequence similarity between these two molecules is very low (16.5%), the key region Pro 34–Gly 35–Phe 36–Leu 37 of yhdR matches the strictly conserved sequence of the residues from 44 to 47 in ActVa-Orf6, and a similar sequence is present in other bacterial homologs of ydhR (Fig. 2D ▶). This sequence forms a β-turn at one end of the substrate-binding cleft in both molecules (Fig. 2 ▶). Crystal structures of ActVa-Orf6 in both the apo and substrate bound forms show that the conserved active site residues Tyr 51, Asn 62, Trp 66, and Arg 86 are essential for activity. In ydhR these residues correspond to Trp 41, Tyr 54, His 69, and Arg 72, all of which are conserved in the BLAST sequence alignments of several bacterial proteins with similarity to ydhR. In addition these residues in ydhR are positioned to substitute for the active site residues in ActVa-Orf6 (Fig. 2 ▶) surrounding a hydrophobic cleft in the protein. This cleft in the ydhR structure is partially obscured by helix α2 due to a bend between helices α1 and α2 induced by Pro 25, which is not present in the ActVa-Orf6 structure. Many of the conserved residues in the similar proteins identified by the BLAST queries are located in this cleft. Recently Adams and Jia (2005) identified the E. coli protein YgiN as a mono-oxygenase also, based on its structural homology to ActVa-Orf6 and more conclusively on its ability to oxidize quinol substrates. Interestingly the YgiN gene is located immediately downstream of the “modulator of drug activity B” (mdaB) gene that minimizes the toxicity of quinone compounds through reductive reactions. These reduced quinols are capable of producing reactive oxygen species, and the activity of YgiN is seen to be coupled to that of mdaB to eliminate these potentially dangerous reaction products. Similarly, the ydhR gene is located immediately upstream of the gene for a putative oxioreductase ydhS protein, suggesting the possibility of a similar pair of related redox activity proteins. YgiN has very low sequence to homology to either ActVa-Orf6 or ydhR, suggesting that this conformation is the result of convergent evolution.
Materials and methods
Protein expression and purification
The ydhR gene was subcloned from E. coli and then ligated into a pET15b (Novagen) vector (Yee et al. 2000). The final construct consisted of the wild-type, 101 amino acid protein sequence with a 20-residue N-terminal leader sequence containing a 6xHis tag and a thrombin cleavage sequence. The plasmid was transformed into the BL21(DE3) strain for expression. The BL21 cultures were grown in standard M9 media until the OD600 reached 0.6, at which point the cells were induced with 1.0 mM isopropylthiogalactoside and allowed to grow a further 5 h to maximize expression levels. Uniformly, 13C-labeled glucose (2 g/L) and 15N ammonium chloride (1 g/L) were used in the M9 media during expression of the NMR sample. Following the growth the cells were lysed and the ydhR protein was purified using Ni2+-NTA metal affinity chromatography. After purification the sample was dialyzed into 25 mM MES (pH 6.5) and 450 mM KCl at a concentration of ~1.0 mM for the collection of NMR data. The 20-residue nonnative leader sequence was retained in the NMR sample.
Dimerization assays
The oligomeric state of ydhR in solution was determined by size exclusion chromatography at 20°C using a Superdex 75 (16/60) column on an ATKA FPLC. A standard curve of log Mr versus Ve/Vo (Ve = elution volume, Vo = void volume) was generated for bovine serum albumin (Mr = 68 kDa), oval-bumin (45 kDa), and myoglobin (17 kDa) using buffers and pH similar to those used in the NMR experiments. Under these conditions ydhR eluted as a single peak at a volume that corresponded to a Mr of 27.3 kDa. Fluorescence experiments were completed using a Spex3 instrument for a range of ydhR concentrations between 55 μM and 27.5 nM. Dilutions of the protein were done using buffer containing 25 mM MES and 450 mM KCl at pH 6.5. Samples were excited at 295 nm, and scans collected between 300 and 400 nm. A plot of ydhR concentration versus emission maximum was generated and fit for a monomerdimer equilibrium using Prism v4.0 (Graph-pad Software).
NMR spectroscopy
A single NMR sample was used for all experiments. All NMR data were collected at 30°C on a 600 MHz Varian Inova spectrometer equipped with an xyz gradient, triple resonance probe. HNCO, HNCA, HNCACB (Grzesiek and Bax 1992), CBCA(CO)NH (Grzesiek et al. 1993), HCC(CO)NH, CC(CO)NH, HCCH-TOCSY (Kay et al. 1993), 15N NOESY-HSQC, and 13C NOESY-HSQC (in both the aliphatic and aromatic regions) spectra were collected using standard pulse sequences from Varian Biopack. Aromatic assignments were derived from HBCBCGCDCEHE, HBCBCGCDHD (Yamazaki et al. 1993), and aromatic region HCCH-COSY experiments. The 15N NOESY-HSQC was collected using a mixing time of 100 msec. The 13C NOESY-HSQC experiments were collected using a mixing time of 150 msec after the sample was buffer exchanged into 100% D2O. Typical experimental parameters included the collection of 16 or 24 transients per increment and 32 complex planes in F2. The total data collection time was ~1 mo. All data were processed with NMRPIPE software (Delaglio et al. 1995) using standard in-house scripts. A π/3 shifted sine bell function was applied as apodization to the directly detected dimensions while line broadening of 10 or 20 Hz was applied to the indirect dimensions. Three-dimensional data sets were linearly predicted and zero filled in the F1 and F2 dimensions to 256 and 64 points, respectively, except for the HNCO and HNCA experiments that were extended to only 128 points in F1. NMRVIEW 5.2.2 (Johnson and Blevins 1994) was used for manual spectral analysis and assignment. 1H chemical shifts were referenced directly to internal DSS at 0 ppm, and the 13C and 15N chemical shifts were indirectly referenced to DSS using the method of Wishart et al. (1995).
Structure calculations
All structure calculations were done using CYANA v2.0 (Guntert et al. 1997) using an Apple Dual G5 2.0 GHz computer. NOEs were assigned from 15N NOESY-HSQC and 13C NOESY-HSQC experiments using NMRVIEW, and distances were calibrated using the intensity protocol within CYANA. Dihedral restraints were determined using Hα, Cα, Cβ , and CO chemical shift assignments as input for TALOS (Cornilescu et al. 1999). The TALOS predicted angles were used only in cases where all 10 database matches gave consistent values and which were located in regions predicted by the Chemical Shift Index (CSI) (Wishart et al. 1992) to form regular secondary structure. The error ranges were set to either two times the TALOS-derived error value or ±20°, whichever was greater. A total of 50 residues met those criteria and had their φ/ψ angles specified by the TALOS restraints. Other residues had their dihedral angles loosely restrained to the favored areas of the Ramachandran plot. Proline cis/trans geometry was determined using the difference between Cγ and Cβ chemical shifts. This method indicated that Pro 14 likely adopted a cis conformation, and this constraint was used in initial calculations. The homodimer structure of ydhR was calculated using two identical, unfolded sequences for the protein connected using a 20-residue pseudo–amino acid linker with atoms having a van der Waals radius of 0 Å . Two initial cycles of refinement were done to define the monomer folds utilizing ~800 NOEs assigned to intramonomer interactions only. The 100 structures produced by this refinement consisted of two separate, folded, monomeric domains arranged in random, relative orientation connected by the flexible linker. In the next cycle ~100 manually assigned intermonomer NOEs were added to dock the two monomers. Following this, four cycles of iterative automated NOE assignment and structure refinement were carried out to produce a final set of 20 structures. A total of 3400 unambiguous NOEs, both manually and automatically assigned, were used in the final cycle of refinement and resulted in a well converged set of structures with an average target function value of 11.4. The 20 structures were refined using a final round of moderate temperature restrained molecular dynamics and energy minimization in a explicit shell of water (Nederveen et al. 2005). This specialized water refinement was implemented using CNS and a set of scripts available at http://www.ebi.ac.uk/msd-srv/docs/NMR/recoord/scripts.html. Structures were deposited in the RCSB Protein Data Bank under accession number 2ASY.
Acknowledgments
We gratefully acknowledge the Ontario Research Development Challenge Fund, a CIHR Multiuser Maintenance Grant, and the Canada Research Chairs Program for support. We thank Dr. Steven Smith (Queens University) for collecting the size exclusion chromatography data, and Drs. Logan Donaldson (York University) and Pascal Mercier (University of Alberta) for helpful discussions for the implementation and use of NMRVIEW, CYANA, and CNS.
Article published online ahead of print. Article and publication date are at http://www.proteinscience.org/cgi/doi/10.1110/ps.051809305.
References
- Adams, M.A. and Jia, Z. 2005. Structural and biochemical evidence for an enzymatic quinone redox cycle in Escherichia coli: Identification of a novel quinol monooxygenase. J. Biol. Chem. 280: 8358–8363. [DOI] [PubMed] [Google Scholar]
- Cornilescu, G., Delaglio, F., and Bax, A. 1999. Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J. Biomol. NMR 13: 289–302. [DOI] [PubMed] [Google Scholar]
- Delaglio, F., Grzesiek, S., Vuister, G.W., Zhu, G., Pfeifer, J., and Bax, A. 1995. NMRPipe: A multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6: 289–302. [DOI] [PubMed] [Google Scholar]
- Grzesiek, S. and Bax, A. 1992. An efficient experiment for sequential backbone assignment of medium-sized isotopically enriched proteins. J. Magn. Reson. 99: 201–207. [Google Scholar]
- Grzesiek, S., Anglister, J., and Bax, A. 1993. Correlation of backbone amide and aliphatic side-chain resonances in 13C/15N-enriched proteins by isotropic mixing of 13C magnetization. J. Magn. Reson. Ser. B. 101: 114–119. [Google Scholar]
- Guntert, P., Mumenthaler, C., and Wuthrich, K. 1997. Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol. 273: 283–298. [DOI] [PubMed] [Google Scholar]
- Johnson, B.A. and Blevins, R.A. 1994. NMR View: A computer program for the visualization and analysis of NMR data. J. Biomol. NMR 4: 603–614. [DOI] [PubMed] [Google Scholar]
- Kay, L.E., Xu, G., Singer, A.U., Muhandiram, D.R., and Forman-Kay, J.D. 1993. A gradient-enhanced HCCH-TOCSY experiment for recording sidechain 1H and 13C correlations in H2O samples of proteins. J. Magn. Reson. 101: 333–337. [Google Scholar]
- Laskowski, R.A., Watson, J.D., and Thornton, J.M. 2005. ProFunc: A server for predicting protein function from 3D structure. Nucleic Acids Res. 33: W89–W93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nederveen, A.J., Doreleijers, J.F., Vranken, W., Miller, Z., Spronk, C.A., Nabuurs, S.B., Guntert, P., Livny, M., Markley, J.L., Nilges, M., et al. 2005. RECOORD: A recalculated coordinate database of 500+ proteins from the PDB using restraints from the BioMagResBank. Proteins 59 662–672. [DOI] [PubMed] [Google Scholar]
- Sciara, G., Kendrew, S.G., Miele, A.E., Marsh, N.G., Federici, L., Mala-testa, F., Schimperna, G., Savino, C., and Vallone, B. 2003. The structure of ActVA-Orf6, a novel type of monooxygenase involved in actinorhodin biosynthesis. EMBO J. 22: 205–215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watson, J.D., Laskowski, R.A., and Thornton, J.M. 2005. Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15: 275–284. [DOI] [PubMed] [Google Scholar]
- Wishart, D.S., Sykes, B.D., and Richards, F.M. 1992. The chemical shift index: A fast and simple method for the assignment of protein secondary structure through NMR spectroscopy. Biochemistry 31: 1647–1651. [DOI] [PubMed] [Google Scholar]
- Wishart, D.S., Bigam, C.G., Yao, J., Abildgaard, F., Dyson, H.J., Oldfield, E., Markley, J.L., and Sykes, B.D. 1995. 1H, 13C and 15N chemical shift referencing in biomolecular NMR. J. Biomol. NMR 6: 135–140. [DOI] [PubMed] [Google Scholar]
- Yamazaki, T., Forman-Kay, J.D., and Kay, L.E. 1993. Two-dimensional NMR experiments for correlating 13Cβ and 1Hδ/ɛ chemical shifts of aromatic residues in 13C-labeled proteins via scalar couplings. J. Am. Chem. Soc. 115: 11054–11055. [Google Scholar]
- Yee, A., Booth, V., Dharamsi, A., Engel, A., Edwards, A.M., and Arrow-smith, C.H. 2000. Solution structure of the RNA polymerase subunit RPB5 from Methanobacterium thermoautotrophicum. Proc. Natl. Acad. Sci. 97: 6311–6315. [DOI] [PMC free article] [PubMed] [Google Scholar]