Abstract
HI1506 is a 128-residue hypothetical protein of unknown function from Haemophilus influenzae. It was originally annotated as a shorter 85-residue protein, but a more detailed sequence analysis conducted in our laboratory revealed that the full-length protein has an additional 43 residues on the C terminus, corresponding with a region initially ascribed to HI1507. As part of a larger effort to understand the functions of hypothetical proteins from Gram-negative bacteria, and H. influenzae in particular, we report here the three-dimensional solution NMR structure for the corrected full-length HI1506 protein. The structure consists of two well-defined domains, an α/β 50-residue N-domain and a 3-α 32-residue C-domain, separated by an unstructured 30-residue linker. Both domains have positively charged surface patches and weak structural homology with folds that are associated with RNA binding, suggesting a possible functional role in binding distal nucleic acid sites.
Keywords: structural genomics, NMR, Haemophilus influenzae, hypothetical protein
HI1506 is a 128-residue protein of unknown function in Haemophilus influenzae. It was originally annotated as a shorter 85-residue polypeptide in The Institute for Genomic Research (TIGR) database (Fleischmann et al. 1995; http://www.tigr.org). However, the corrected DNA sequence determined in our laboratory contains a single-base insertion that causes a frameshift near the end of the TIGR sequence. This leads to a longer protein than stated in the original TIGR annotation. The extra 43 residues on the C terminus correspond with a region previously annotated as the small protein HI1507. Thus, our results indicate that HI1507 is in fact part of the corrected full-length HI1506 protein.
The 128-residue HI1506 has sequence homologs in Haemophilus ducreyi (HD0133, 42% identity over 127 residues), Shewanella oneidensis (SO676, 31% identity over 118 residues), Shewanella baltica (Rho termination factor-like, 26% identity over 123 residues), Escherichia coli B171 (EcolB_01004138, 26% identity over 119 residues), Escherichia coli 53,638 (Ecol5_01002824, 25% identity over 108 residues), and Enterobacteria phage Mu (Mup35, 25% identity over 108 residues) (Fig. 1). All of these proteins are of unknown function. As part of an effort to use structure-based approaches to aid in functional annotation of conserved hypotheticals, the corrected sequence was cloned and the protein was expressed and purified for structural studies. We describe here the solution structure of full-length HI1506 using multidimensional NMR spectroscopy and discuss the possible functional implications of the structure.
Figure 1.
Sequence alignment of HI1506 and homologs using CLUSTALW (Thompson et al. 1994). Invariant residues are shown in dark columns and conserved residues are shown in boxes. The figure was displayed using ESPript (http://espript.ibcp.fr/).
Results and Discussion
Chemical-shift and NOE assignments
Proton, carbon, and nitrogen chemical-shift assignments were completed for >95% of the backbone atoms and ∼90% of the side-chain resonances using standard triple-resonance experiments (Sari et al. 2005) (BMRB-6780). Main-chain amide proton assignments were not made for residues S2, H3, M4, S111, and K128. Other unassigned signals included Hγ in L45. Side-chain amide protons were identified for all of the Asn residues and six of 10 Gln residues. Assignments were also obtained for guanidino proton resonances of R22 and R104.
NOEs were assigned using NOEID, an in-house NOE assignment program, and manual methods. A total of 1539 NOESY assignments were made from a list of 2261 NOESY picked peaks. Most of these NOE assignments (1147) were used in structure calculations. Those that were not used were from peaks with significant overlap and weak peaks without reciprocal assignments. The distribution of NOE types is shown in Table 1. Approximately 10 inter-residue NOE restraints were obtained on average for each amino acid in structured regions of the polypeptide chain, while loop regions had fewer inter-residue NOEs.
Table 1.
Experimental restraints and structural statistics for the ensemble of 20 structures of HI1506
Structure description
HI1506 is a monomer in solution based on size-exclusion chromatography and observed NMR linewidths. The solution structure of HI1506 consists of two structured domains connected by an unstructured 30 amino acid loop (Fig. 2A). Coordinates have been deposited in the PDB (accession code 2out). The N-terminal domain is from residues 7–56 and comprises a three-stranded antiparallel β-sheet packed against an α-helix. The β1-strand (residues 7–14) is central in the β-sheet and is connected to the β2-strand (residues 31–36) by a 15-residue loop (residues 15–30). This is followed by a short two-residue loop (residues 37–38), the α1-helix (residues 39–46), another short loop (residues 47–50), and a β3-strand (residues 51–56). The long axis of the α1-helix is essentially parallel to the β-strands. The C-terminal domain is from residues 89–120 and consists of three α-helices. The α2-helix (residues 89–92) contains a single turn, while α3 (residues 94–104) and α4 (residues 114–120) are packed orthogonally with respect to each other and linked by an eight-residue loop (residues 105–113). Residues at the N terminus (residues 1–7) and at the C terminus (residues 121–131) are unstructured, as evidenced by a low number of inter-residue NOEs.
Figure 2.
(A) Ribbon representation of the HI1506 solution structure with labeled secondary structure elements in the N- and C-domains. (B) Superposition of the 20 best structures showing alignment of the N-domain. (C) Superposition of the C-domain ensemble. (D) Surface potential representation of the N-domain. (Left) The same view as shown in B. (Right) A 180° rotation about the y-axis. Positively charged residues are displayed in blue and negatively charged residues are in red. (E) Surface potential representation of the C-domain. (Left) The same orientation as in C, while at right is a 180° rotation about the y-axis.
The structure statistics are summarized in Table 1. Figure 2, B and C, shows overlays of the 20 best structures for the N- and C-domains, respectively, and Supplemental Figure 1A shows the average backbone RMSD per residue. Supplemental Figure 1B illustrates the steady-state {1H}-15N NOEs for nonproline residues with the exception of G1-M4, G24, L45, E70, Q77, S111, and K128. The individual domains are well ordered within the ensemble of 20 best structures, but have large RMSDs when one domain is aligned with respect to the other. This is because there are no discernible NOEs between the two domains. Consistent with these observations, the dynamics results indicate that the 30-residue region linking the N- and C-domains is highly flexible with no inter-residue NOEs detected. Within these domains, the secondary structure elements are well defined, while loop regions have fewer inter-residue NOE restraints. In particular, the loop between β1 and β2 has an average of seven inter-residue restraints that are generally between residues within the loop, with few restraints to residues outside the loop. Thus, the average backbone RMSD within the loop is 1.79 ± 0.78 Å. Slowed exchange is observed for amide protons in most secondary structure regions except α2 (Supplemental Fig. 1A). Some residues in loops, Y20, D47, and A48 also exhibit slow amide exchange, and these appear to have suitable H-bond acceptors based on the NOE-derived structure.
Domain stability
In preliminary experiments determining optimal conditions for NMR studies, it was noted that a large number of signals could not be detected at pH 6. After NMR assignment of most HI1506 backbone amides at pH 7, it was apparent that all of the peaks absent from the spectrum at pH 6 corresponded to the N-domain, while the C-domain resonances were still clearly visible. The stability of these domains was further investigated using differential scanning calorimetry (DSC) of the full-length protein. The results of a typical DSC scan are shown in Supplemental Figure 2 at pH 7.5 and pH 6.5. At pH 7.5, the structures of the two domains are intact, while at pH 6.5 the N-domain loses its structural integrity. This is evident from the DSC results, where two separate transitions were observed at pH 7.5 and only one of the transitions was observed at pH 6.5. At pH 7.5, the lower temperature transition is in the range of 40°C–50°C, while the higher temperature transition is from 64°C to 68°C, and for both transitions the van't Hoff enthalpies are ∼200 kJ mol−1. The range of temperatures arises from uncertainty in analyzing each of the two transitions separately, since they both appear in the scan with sufficient overlap (Supplemental Fig. 2). The transition parameters appear to be independent of concentration from 0.036 to 0.173 mM. The higher temperature transition properties were unaffected by change in pH and correspond to the C-domain, while the lower temperature transition corresponds to the N-domain. Further, the loss of structure at the lower pH appears to correlate with protonation of histidine residues. The only two His groups in HI1506, H3, and H28 are in the N-domain. Both residues are solvent exposed with H3 in the unstructured N-terminal tail and H28 in the β1–β2 loop. It is not clear whether this difference in stabilities between the N- and C-domains has any functional relevance. Consistent with the NMR structure, the DSC results suggest that the two domains do not interact with each other, as indicated by the lack of any significant shift of the high-temperature transition at pH 6.5.
Sequence conservation and charge distribution
Figure 1 shows that there are 11 invariant residues among the HI1506 family of sequences (Y20, R21, R22, A23, G24, G30, N32, Q41, A47, D48, and L51) and that these are all in the N-domain. Residues 20–24 are located in the β1–β2 loop and contribute to a positively charged surface in this region that also includes the less-conserved amino acids, R15, K17, and R50 (Fig. 2D). Strictly conserved surface residues G30 and N32 in the β2-strand and A47 and D48 in the α1–β3 loop are located at opposite ends of this positively charged surface. Invariant residue L51 is buried in the hydrophobic core near these other conserved residues and likely plays a structural role in their correct orientation. The negatively charged surface regions apparent in Figure 2D are mostly due to amino acids that are not conserved.
There are no strictly conserved residues in the C-domain or in the large linker region between the two domains. Like the N-domain, the C-domain also contains a positively charged surface. This is largely due to residues K98 from the α3-helix, K109 from the α3–α4 loop, and K114 from the N terminus of the α4-helix (Fig. 2E). Two of three of these residues are also present in the sequence homologs HD0133 and the Rho termination factor-like sequence (Fig. 1). In addition, the N terminus of the α2-helix and the R104 side chain from α3 appear to form favorable stabilizing electrostatic interactions with the C terminus of α4 (Fig. 2E), perhaps partially accounting for the relatively high stability of this small 32-residue domain.
Structural homology and functional implications
Dali (Holm and Sander 1995) searches were performed for the N- and C-domains of HI1506 as well as the full-length protein to identify structural homologs. For the N-domain, the structures with highest homology were a conserved hypothetical (2f06, Z 3.4), a ClpS adaptor protein (1lzw, Z 3.2) (Zeth et al. 2002), a cell division protein (1uta, Z 3.1) (Yang et al. 2004), a Leu-tRNA synthetase fragment (1wkb, Z 3.0) (Fukunaga and Yokoyama 2005), and a polypyrimidine tract-binding protein (1qm9, Z 2.9) (Conte et al. 2000). The low Z-scores indicated weak structural homology, but a common feature in most of these homologs is the presence of an RNP motif, which is often associated with an RNA-binding function. The difference between the N-domain of HI1506 and RNP motifs is that the N-domain has a 15-residue loop connecting the β1- and β2-strands, whereas the RNP motif has an additional helix followed by a β-strand in place of this loop, forming a four-stranded antiparallel β-sheet with strand order 2314. The surface-charge distribution of the nearest structural homologs does not match that of the N-domain.
For the C-domain, the most homologous structures were a Lys-tRNA synthetase fragment (1lyl, Z 2.9) (Onesti et al. 1995), a LEM domain fragment (1h9e, Z 2.7) (Laguri et al. 2001), and a transcription-termination rho fragment (1a62, Z 2.5) (Allison et al. 1998). As with the N-domain homologs, the low Z-scores indicate only weak similarity. A common feature in these structure homologs is that the two longer helices corresponding to α3 and α4 are aligned in a parallel fashion. This contrasts the orientation seen in the C-domain, where α3 and α4 are essentially orthogonal. It is interesting to note that two of these homologs, 1lyl and 1a62, are linked with RNA-binding functions.
In conclusion, the structure of HI1506 reveals a novel two-domain conformation, in which each domain contains a significant contiguous area of positively charged surface. For both the N- and C-domains, some of the more closely related structures are associated with an RNA-binding function. In combination, the structure homology and charge distribution results suggest that the two domains of HI1506 may be involved in interactions with nucleic acid sites, perhaps separated by significant distances. The three-dimensional structure provides new insights on HI1506 and its sequence homologs, but further experiments will be needed to elucidate the precise biochemical function.
Materials and Methods
Sample preparation and NMR spectroscopy
Details of the sample preparation are available in the Supplemental Data section of Sari et al. (2005). NMR samples of 13C/15N-labeled HI1506 were prepared at concentrations of ∼1.0 mM in NMR buffer (50 mM potassium phosphate at pH 7.0, 100 mM NaCl, 1 mM DTT) containing 90% H2O/10% D2O. The following experiments were acquired: two-dimensional 15N-HSQC, 13C-HSQC, CBHD, and CBHE (Yamazaki et al. 1993), and three-dimensional HNCACB (Wittekind and Mueller 1993), HNCO (Grzesiek and Bax 1992b), CBCA(CO)NH, HBHA(CO)NH (Grzesiek and Bax 1992a), H(CCO)NH-TOCSY, (H)C(CO)NH-TOCSY (Montelione et al. 1992; Grzesiek et al. 1993), 15N-edited NOESY (τm 120 ms), and 13C-edited NOESY (τm 100 ms). Aromatic and aliphatic 13C-edited NOESY spectra were also recorded in D2O. The spectra were collected at 298 K on a Bruker AVANCE-600 fitted with a z-gradient triple-resonance cryoprobe. A15N HSQC spectrum was also acquired at 274 K 10 min after the addition of D2O to lyophilized HI1506. Steady-state {1H}-15N NOEs were measured using 15N-HSQC spectra with a gradient-selected, sensitivity-enhanced pulse sequence (Farrow et al. 1994). Experiments were recorded utilizing a relaxation delay of 2 sec prior to 3 sec of presaturation for the NOE experiment. A reference spectrum was recorded with presaturation applied 4 MHz off-resonance and a 5-sec relaxation delay. NOEs were calculated from the ratios of peak heights with proton saturation to those recorded without saturation. The standard deviation of the NOE value was estimated on the basis of measured background noise levels. All spectra were processed with nmrPipe (Delaglio et al. 1995) and analyzed with Sparky (Goddard and Kneller, UCSF).
Structure calculations
Structures were calculated using standard simulated annealing and torsion-angle dynamics protocols in CNS version 1.1 (Brunger et al. 1998) with distance and dihedral restraints described in Table 1. An extended polypeptide chain was used as an initial structure in the calculations. Nonbonded contacts were represented by a quartic van der Waals repulsion term. Final values for force constants were as follows: 1000 kcal mol−1 Å−2 for bond lengths, 400 kcal mol−1 rad−2 for angles and improper torsions, 30 kcal mol−1 Å−2 for experimental distance restraints, 200 kcal mol−1 rad−2 for dihedral angle restraints, and 4.0 kcal mol−1 Å−4 for the van der Waals repulsion term. Distance restraints were derived from the intensities of NOE peaks in the 13C- and 15N-edited NOE spectra. Depending on peak intensities, distance restraints were assigned as: strong (1.8–2.7 Å), medium-strong (1.8–3.1 Å), medium (1.8–3.5 Å), medium-weak (2.3–4.2 Å), weak (2.8–5.0 Å), and very weak (2.8–6.0 Å). The backbone dihedral restraints were obtained on the basis of chemical-shift values using TALOS (Cornilescu et al. 1999). The dihedral restraints were: 60 ± 20° for α-helix and −120 ± 20° for β-strand residues. Hydrogen-bond restraints were included in the final stages of structure refinement after the tertiary structure of HI1506 was well defined by the experimental NOE restraints. Hydrogen-bond restraints were 1.5–2.5 Å for rHN-O and 2.3–3.2 Å for rN-O. The 20 best structures were chosen based on low total energy, no NOE distance violations greater than 0.5 Å, no dihedral angle violations greater than 5°, and standard indicators of structure quality as shown in Table 1. PROCHECK was used to evaluate the quality of structures (Laskowski et al. 1996), and structures were displayed and analyzed using MOLMOL (Koradi et al. 1996) and PyMol (DeLano Scientific).
Differential scanning calorimetry
DSC measurements were performed on 0.1–1.0 mg/mL solutions of HI1506 using a VP-DSC Microcalorimeter from Microcal, Inc. In a series of DSC scans, a solution and reference vessel (0.511 mL) were first loaded with buffer, scanned from 15°C–95°C at 60°C h−1, cooled to 15°C, and rescanned. The solution vessel was then emptied and loaded with the protein solution by means of a syringe, and the protein solution scans were repeated several times to determine whether the transition was reversible. After completion of a set of scans, a second buffer versus buffer scan was taken as the baseline scan and subtracted from each of the protein solution versus buffer scans. The resulting net solution versus buffer scans were converted to heat capacity versus temperature scans for analysis.
Electronic supplemetal material
Supplementary material includes a plot of average backbone RMSDs per residue, steady-state 1H-15N heteronuclear NOE data, and calorimetric data for HI1506.
Acknowledgments
This work was supported by grants from the NIH (GM57890 and 1S10RR15744) and the W. M. Keck Foundation.
Footnotes
Supplemental material: see www.proteinscience.org
Reprint requests to: John Orban, Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, 9600 Gudelsky Dr., Rockville, MD 20850, USA; e-mail: orban@umbi.umd.edu; fax: (240) 314-6255.
Article published online ahead of print. Article and publication date are at http://www.proteinscience.org/cgi/doi/10.1110/ps.072820907.
References
- Allison T.J., Wood, T.C., Briercheck, D.M., Rastinejad, F., Richardson, J.P., and Rule, G.S. 1998. Crystal structure of the RNA-binding domain from transcription termination factor rho. Nat. Struct. Biol. 5: 352–356. [DOI] [PubMed] [Google Scholar]
- Brunger A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros, P., Grosse, K.R., Jiang, J.S., Kuszewski, J., Nilges, M., Pannu, N.S., et al. 1998. Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr. D Biol. Crystallogr. 54: 905–921. [DOI] [PubMed] [Google Scholar]
- Conte M.R., Grune, T., Ghuman, J., Kelly, G., Ladas, A., Matthews, S., and Curry, S. 2000. Structure of tandem RNA recognition motifs from polypyrimidine tract binding protein reveals novel features of the RRM fold. EMBO J. 19: 3132–3141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cornilescu G., Delaglio, F., and Bax, A. 1999. Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J. Biomol. NMR 13: 289–302. [DOI] [PubMed] [Google Scholar]
- Delaglio F., Grzesiek, S., Vuister, G.W., Zhu, G., Pfeifer, J., and Bax, A. 1995. NMRPipe: A multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6: 277–293. [DOI] [PubMed] [Google Scholar]
- Farrow N.A., Muhandiram, R., Singer, A.U., Pascal, S.M., Kay, C.M., Gish, G., Shoelson, S.E., Pawson, T., Forman-Kay, J.D., and Kay, L.E. 1994. Backbone dynamics of a free and phosphopeptide-complexed Src homology 2 domain studied by 15N NMR relaxation. Biochemistry 33: 5984–6003. [DOI] [PubMed] [Google Scholar]
- Fleischmann R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb, J.-F., Dougherty, B.A., Merrick, J.M., et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496–512. [DOI] [PubMed] [Google Scholar]
- Fukunaga R. and Yokoyama, S. 2005. Crystal structure of leucyl-tRNA synthetase from the archaeon Pyrococcus horikoshii reveals a novel editing domain orientation. J. Mol. Biol. 346: 57–71. [DOI] [PubMed] [Google Scholar]
- Grzesiek S. and Bax, A. 1992a. Correlating backbone amide and sidechain resonances in larger proteins by multiple relayed triple resonance NMR. J. Am. Chem. Soc. 114: 6291–6293. [Google Scholar]
- Grzesiek S. and Bax, A. 1992b. Improved 3D triple-resonance NMR techniques applied to a 31 kDa protein. J. Magn. Reson. 96: 432–440. [Google Scholar]
- Grzesiek S., Anglister, J., and Bax, A. 1993. Correlation of backbone amide and aliphatic side-chain resonances in C-13/N-15-enriched proteins by isotropic mixing of C-13 magnetization. J. Magn. Reson. B. 101: 114–119. [Google Scholar]
- Holm L. and Sander, C. 1995. Dali: A network tool for protein structure comparison. Trends Biochem. Sci. 20: 478–480. [DOI] [PubMed] [Google Scholar]
- Koradi R., Billeter, M., and Wuthrich, K. 1996. MOLMOL: A program for display and analysis of macromolecular structures. J. Mol. Graph. 14: 51–55. [DOI] [PubMed] [Google Scholar]
- Laguri C., Gilquin, B., Wolff, N., Romi-Lebrun, R., Courchay, K., Callebaut, I., Worman, H.J., and Zinn-Justin, S. 2001. Structural characterization of the LEM motif common to three human inner nuclear membrane proteins. Structure 9: 503–511. [DOI] [PubMed] [Google Scholar]
- Laskowski R.A., Rullmann, J.A., MacArthur, M.W., Kaptein, R., and Thornton, J.M. 1996. AQUA and PROCHECK-NMR: Programs for checking the quality of protein structures solved by NMR. J. Biomol. NMR 8: 477–486. [DOI] [PubMed] [Google Scholar]
- Montelione G.T., Lyons, B.A., Emerson, S.D., and Tashiro, M. 1992. An efficient triple resonance experiment using carbon-13 isotropic mixing for determining sequence-specific resonance assignments of isotopically-enriched proteins. J. Am. Chem. Soc. 114: 10974–10975. [Google Scholar]
- Onesti S., Miller, A.D., and Brick, P. 1995. The crystal structure of the lysyl-tRNA synthetase (LysU) from Escherichia coli. Structure 3: 163–176. [DOI] [PubMed] [Google Scholar]
- Sari N., Yeh, D.C., Doseeva, V., Surabian, K., Herzberg, O., and Orban, J. 2005. NMR assignment of HI1506, a novel two-domain protein from Haemophilus influenzae. J. Biomol. NMR 33: 281. [DOI] [PubMed] [Google Scholar]
- Thompson J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 4673–4680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wittekind M. and Mueller, L. 1993. HNCACB, a high sensitivity 3D NMR experiment to correlate amide-proton and nitrogen resonances with the α and β carbon resonances in proteins. J. Magn. Reson. B. 101: 201–205. [Google Scholar]
- Yamazaki T., Forman-Kay, J.D., and Kay, L.E. 1993. Two-dimensional NMR experiments for correlating 13Cβ and 1Hδ/ɛ chemical shifts of aromatic residues in 13C-labeled proteins via scalar couplings. J. Am. Chem. Soc. 115: 11054–11055. [Google Scholar]
- Yang J.C., Van Den Ent, F., Neuhaus, D., Brevier, J., and Lowe, J. 2004. Solution structure and domain architecture of the divisome protein FtsN. Mol. Microbiol. 52: 651–660. [DOI] [PubMed] [Google Scholar]
- Zeth K., Ravelli, R.B., Paal, K., Cusack, S., Bukau, B., and Dougan, D.A. 2002. Structural analysis of the adaptor protein ClpS in complex with the N-terminal domain of ClpA. Nat. Struct. Biol. 9: 906–911. [DOI] [PubMed] [Google Scholar]