Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2017 Dec 5;27(2):568–572. doi: 10.1002/pro.3343

Mycobacterium tuberculosis Rv3651 is a triple sensor‐domain protein

Jan Abendroth 1,2, Andrew Frando 3,4, Isabelle Q Phan 2,3, Bart L Staker 2,3, Peter J Myler 2,3,4,5, Thomas E Edwards 1,2, Christoph Grundner 3,4,
PMCID: PMC5775179  PMID: 29119630

Abstract

The genome of the human pathogen Mycobacterium tuberculosis (Mtb) encodes ∼4,400 proteins, but one third of them have unknown functions. We solved the crystal structure of Rv3651, a hypothetical protein with no discernible similarity to proteins with known function. Rv3651 has a three‐domain architecture that combines one cGMP‐specific phosphodiesterases, adenylyl cyclases and FhlA (GAF) domain and two Per‐ARNT‐Sim (PAS) domains. GAF and PAS domains are sensor domains that are typically linked to signaling effector molecules. Unlike these sensor‐effector proteins, Rv3651 is an unusual sensor domain‐only protein with highly divergent sequence. The structure suggests that Rv3651 integrates multiple different signals and serves as a scaffold to facilitate signal transfer.

Keywords: proteins of unknown function, PAS domain, GAF domain, Mycobacterium tuberculosis

Introduction

Proteins of unknown function make up large parts of most genomes, but even advanced sequence analysis often fails to predict their function.1 Protein structure is generally a better indicator for function than sequence, since the evolutionary restraints on structure are more stringent.2 A case in point is the Per‐ARNT‐Sim (PAS) domain, a ubiquitous protein domain involved in many signaling pathways.3 The PAS domain is defined by a highly conserved core structure, but the sequence similarity between related domains is typically not more than 20%.4, 5 The PAS and related GAF domains are highly versatile sensor domains that associate with a range of signaling effectors, in particular with histidine kinases of bacterial two component systems.6 The PAS domain binds a wide variety of ligands from diatomic gases to proteins that activate their associated signaling domains. Typically, one or several PAS domains are fused to the N‐terminus of one or several effector domains, creating a remarkable combinatorial diversity among the PAS domain‐containing proteins.

Over one third of all Mtb gene products are currently hypothetical proteins that have unknown functions.7 The incomplete annotation of Mtb is a major hurdle for tuberculosis research because the hypothetical proteins are likely enriched in proteins important for the specialized biology and pathogenesis of Mtb. Rv3651 is such a conserved hypothetical protein with unknown function that has no informative sequence similarity to proteins with known functions and that is predicted to be essential for Mtb pathogenesis.8 The crystal structure of Rv3651 revealed an unusual sensor‐only protein with one GAF and two PAS domains and provides a new architecture within the PAS‐ and GAF‐domain containing proteins that is indicative of a scaffold function to coordinate binding of three different ligands.

Results and Discussion

A search of the Pfam database did not detect any similarity of Rv3651 to known domains. A BLAST search produced many orthologs in slow‐growing Mycobacteria with sequence identity of >77%, but only five homologs with sequence identity >30% outside the Mycobacterium genus, but also within the Actinobacteria. All orthologs are similarly hypothetical proteins or proteins of unknown function. Thus, sequence comparison did not provide any clue to Rv3651 structure or function.

We next solved the crystal structure of Rv3651 to 1.95Å and refined the structure to an R‐factor of 0.182 and an R‐free of 0.218 (Table 1). Rv3651 crystallized as a dimer. The structure comprises residues 1–339 of the 345 amino acid protein of protomer one, and residues 1–338 of protomer two. The overall fold showed an extended configuration of three β‐sheets each interspersed with α‐helices and two extended linkers connecting the sheets, suggesting a modular organization of three distinct domains [Fig. 1(A–C)]. The three domains share a similar central antiparallel β‐sheet of six (N‐terminal domain) and five β‐strands (middle and C‐terminal domains). Analysis of the secondary structure showed a similar topology of domains 2 and 3, but an additional strand and α‐helices in different register in domain 1. The topology of the domain 1 β‐sheet follows a 3‐2‐1–6‐5‐4 order, the topology of domains two and three a 2‐1‐5‐4‐3 order [Fig. 1(C)]. Domains 1 and 2 are connected through an extended 12‐residue loop region that runs along the back of the β‐sheet to connect to strand 1 of domain 2. Domains 2 and 3 connect through a 5‐residue loop. The linkers form a spine along the back of the β‐sheet and both domain interfaces are densely packed, producing a rigid molecule with little flexibility between the individual domains.

Table 1.

Crystallographic Data and Refinement Statistics

Data collection
Data set SeMet
Wavelength (Å) 0.97872
Space group P21212
Resolution range (Å) 50‐1.95
Unit‐cell dimensions 131.62, 73.02, 82,47
a, b, c (Å) 90, 90, 90
α, β, γ (°)
Mean I/σ(I)
16.65 (3.36)
R merge 0.065 (0.487)
Completeness 98.1% (100%)
Multiplicity 6.3 (6.2)
No. unique reflections 57,563
Refinement statistics
R work 0.182
R free 0.218
RMSD bond lengths (Å) 0.007
RMSD angles (°) 1.05
Protein atoms 4860
Solvent atoms 443
Ramachandran per 98.4%
MolProbity 0.16%
Favoured
Disallowed
MolProbity score 1.15
PDB code 4Q6U

Rfree = ΣhǁFobs| – |Fcalcǁ/Σh|Fobs|. The free R factor was calculated using 5% of the reflections omitted from the refinement. Values in parentheses are for the highest resolution shell. Crystallographic and refinement parameters generated with MolProbity (20). The MolProbity score is a measure for the structure's quality and gives the resolution at which these quality characteristics would be expected.

Figure 1.

Figure 1

Overall Rv3651 crystal structure. (A) Rv3651 is a rigid, extended, modular protein with three distinct domains. Each domain is centered around a 5‐ or 6‐stranded β‐sheet at the core. (B) Space filling model shows the tight packing of the three domains against each other. (C) A topology diagram (PDBsum9) of Rv3651 shows the β‐sheet register characteristic of GAF (domain 1) and PAS domains (domains 2 and 3). (D) 2F oF c electron density map contoured at 1σ

The closest structural homologs determined by PDBeFold of all three domains by Cα root mean square deviation (rmsd) were PAS domains‐ the sensory box of a Vibrio cholera histidine kinase (PDB code 3MXQ, Cα rmsd of 1.73Å), a sensory box histidine kinase from Burkholderia (PDB code 3MR0, Cα rmsd of 2.3Å), and the circadian clock protein BMAL2 (PDB code 2KDK, Cα rmsd of 2.43Å), respectively. Closer inspection showed that domains 2 and 3 are indeed PAS domains with the signature β‐sheet topology, consistent with their closest structural matches. Due to missing density in domain 3, only two of the typical three helices that connect strand 2 with 3 are visible, but secondary structure prediction (JPred410) suggests that the missing ten residues likely form the canonical third helix. Domain 1, however, diverged from the canonical PAS fold, with an additional strand and different number and position of helices that is more similar to the PAS‐related GAF domain.11 Comparison of the GAF and PAS domains with one another (excluding N‐ and C‐terminal helices) gave an rmsd of 3Å, 3.6Å, and 3.2Å for domains 1 and 2, 1 and 3, and 2 and 3, respectively, as determined by a pairwise alignment using the DALI server.12 Thus, the three domains are more similar to PAS domains from orthologs than to each other, suggesting binding to different ligands. Together, the crystal structure identifies Rv3651 as an unusual sensor domain‐only protein with one GAF and two PAS domains, but without an additional enzymatic domain.

PAS domains frequently facilitate dimerization.5 Rv3651 also forms a parallel dimer through domain 3 interactions [Fig. 2(A)]. The dimer is held together by a typical interaction of the first helix of domain 3, with the other two domains extending outward and away from each other, forming a lung‐shaped overall structure. The dimer interface comprises residues in helix 1 from domain 3. In Rv3651, helix 1 from both protomers cross over and make extensive and largely symmetric interactions with 24 residues of the other protomer, including six hydrogen bonds and a hydrophobic patch along helix 1, resulting in a combined surface area of 1134Å as determined using PISA.13 The elution time of recombinant Rv3651 after size exclusion chromatography corresponded to a molecular weight of 61 kD, suggesting that the Rv3651 dimer also exists in solution.

Figure 2.

Figure 2

The Rv3651 dimer and ligand binding site. (A) Overall view of the Rv3651 dimer showing the lung shape of the dimer, the three domains, and the helix crossover. Right: View of helix 1 crossover, rotated 90° and looking down the symmetry axis. (B) 2F oF c electron density map contoured at 1σ showing the PEG molecule. (C) Domain 2 cartoon with PEG in space fill. (D) A Bacillus PAS domain with bound FMN in the same orientation

Although only few ligands for GAF and PAS domains have been identified to date, most are thought to bind ligands.3 Domain 2 showed unexplained electron density that fitted a PEG molecule [Fig. 2(B)]. PEG is likely a fortuitous ligand, given that only one protomer shows strong density, the interactions with the protein are minimal, and 20% of the crystallization buffer consisted of PEG. The bound ligand fits the inside concave and fairly hydrophobic face of the β‐sheet like a ball in glove [Fig. 2(C)], similar to ligand‐bound PAS domains.3 An overlay of the FMN‐bound PAS domain from the Bacillus subtilis photosensor YtvA 2PR5 14 shows that both ligands bind to the same region on the concave side of the β‐sheet [Fig. 2(D)], indicating that Rv3651 domain 2 has a canonical ligand‐binding site. The protein surfaces and electrostatic potentials of the respective ligand binding sites between the three Rv3651 domains showed no obvious charge or shape similarities, suggesting that they bind different ligands.

Rv3651 is a unique protein in that it contains only three sensor domains: one GAF and two PAS domains. Many structures of individual PAS domains have been reported, but structures of multiple PAS domain‐containing proteins have been elusive. The structure of the Rhodobacter transcription factor PpsR contains three PAS domains in series in addition to a DNA‐binding region.15 Interestingly, the PAS domains in the PpsR structure are interspersed with longer linker segments that form a long helix involved in oligomerization. This arrangement is quite different from the compact and rigid packing of the GAF and PAS domains in Rv3651. Although the dimerization of Rv3651 is reminiscent of a signaling function similar to typical PAS‐effector proteins, there is no structural indication that the linkers between the tightly packed PAS domains are flexible and could transmit a signal from one domain to the other, as has been proposed for other PAS‐containing proteins.5, 15

Rv3651 highlights the challenges of predicting function from sequence even for well‐known protein domains such as the PAS domain. In this case, sequence similarity to the closest PAS domain with known structure was only 16%, and the crystal structure was thus essential for identifying the GAF and PAS folds. The sensor domain‐only architecture of Rv3651 is highly unusual and raises questions about its function: In the absence of an effector domain, how is ligand binding transmitted to produce a cellular outcome? Several possibilities could account for the lack of a typical PAS‐associated effector domain. Rv3651 might associate with and activate other proteins in a ligand‐dependent manner, effectively functioning as a scaffold. Alternatively, the PAS/GAF domains might themselves have acquired enzymatic activity. A fold with such range and plasticity in ligand binding appears well suited for evolving new functions. Indeed, such a transition from sensor to enzyme was recently described for an N‐demethylase that repurposed a heme‐sensing PAS domain for oxidative demethylation.16 The Rv3651 crystal structure revealed a unique PAS/GAF sensor domain‐only protein. The next question will be how this unusual stand‐alone triple sensor‐protein functions to sense, integrate, and transmit different signals and eventually affects Mtb physiology and pathogenesis.

Methods

Protein expression and purification

Full length Rv3651 was cloned in frame with an N‐terminal His6 tag, expressed as selenomethionine‐labeled protein in E. coli, and purified by metal affinity and size exclusion chromatography as described previously in detail.17 Protein was eluted in 20 mM HEPES pH7, 0.3M NaCl, 5% glycerol, and 1 mM TCEP.

Crystallization, data collection, and structure determination

The structure was solved by single anomalous dispersion using selenomethionine‐labeled protein. Crystals were grown at 290K by sitting drop vapor diffusion using a 20 mg/ml protein solution and 200 mM NaBr, 100 mM BisTris propane pH 6.5, 20% PEG 3350 well solution. Crystals were vitrified using 15% ethyleneglycol and data collected at 100K at APS beamline 21‐ID‐F. Data were processed in XSCALE and reduced by XDS,18 phases calculated with PHASER,19 and the model was built and refined in Phenix.20 The structure was validated with MolProbity.21

Acknowledgments

This work was supported by NIH/NIAID contract no. HHSN272201200025C, by R01 AI117023 to C.G, and by NIH training grant AI055396 to AF.

Accession Number: Coordinates and structure factors of Rv3651 have been deposited in the Protein Data Bank with accession number 4Q6U.

References

  • 1. Galperin MY, Koonin EV (2004) Conserved hypothetical' proteins: prioritization of targets for experimental study. Nucleic Acids Res 32:5452–5463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Baker EN, Arcus VL, Lott JS (2003) Protein structure prediction and analysis as a tool for functional genomics. Appl Bioinform 2:S3–10. [PubMed] [Google Scholar]
  • 3. Henry JT, Crosson S (2011) Ligand‐binding PAS domains in a genomic, cellular, and structural context. Ann Rev Microbiol 65:261–286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR, Bateman A (2010) The Pfam protein families database. Nucleic Acids Res 38:D211–D222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Moglich A, Ayers RA, Moffat K (2009) Structure and signaling mechanism of Per‐ARNT‐Sim domains. Structure 17:1282–1294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Anantharaman V, Koonin EV, Aravind L (2001) Regulatory potential, phyletic distribution and evolution of ancient, intracellular small‐molecule‐binding domains. J Mol Biol 307:1271–1292. [DOI] [PubMed] [Google Scholar]
  • 7. Camus JC, Pryor MJ, Medigue C, Cole ST (2002) Re‐annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. Microbiology 148:2967–2973. [DOI] [PubMed] [Google Scholar]
  • 8. Griffin JE, Gawronski JD, Dejesus MA, Ioerger TR, Akerley BJ, Sassetti CM (2011) High‐resolution phenotypic profiling defines genes essential for mycobacterial growth and cholesterol catabolism. PLoS Pathog 7:e1002251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Laskowski RA, Jablonska J, Pravda L, Varekova RS, Thornton JM (2017) PDBsum: Structural summaries of PDB entries. Protein Sci in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Drozdetskiy A, Cole C, Procter J, Barton GJ (2015) JPred4: a protein secondary structure prediction server. Nucleic Acids Res 43:W389–W394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Hurley JH (2003) GAF domains: cyclic nucleotides come full circle. SciSTKE 2003:PE1. [DOI] [PubMed] [Google Scholar]
  • 12. Holm L, Rosenstrom P (2010) Dali server: conservation mapping in 3D. Nucleic Acids Res 38:W545–W549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Krissinel E, Henrick K (2007) Inference of macromolecular assemblies from crystalline state. J Mol Biol 372:774–797. [DOI] [PubMed] [Google Scholar]
  • 14. Moglich A, Moffat K (2007) Structural basis for light‐dependent signaling in the dimeric LOV domain of the photosensor YtvA. J Mol Biol 373:112–126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Heintz U, Meinhart A, Winkler A (2014) Multi‐PAS domain‐mediated protein oligomerization of PpsR from Rhodobacter sphaeroides. Acta Cryst 70:863–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Ortmayer M, Lafite P, Menon BR, Tralau T, Fisher K, Denkhaus L, Scrutton NS, Rigby SE, Munro AW, Hay S, Leys D (2016) An oxidative N‐demethylase reveals PAS transition from ubiquitous sensor to enzyme. Nature 539:593–597. [DOI] [PubMed] [Google Scholar]
  • 17. Raymond A, Haffner T, Ng N, Lorimer D, Staker B, Stewart L (2011) Gene design, cloning and protein‐expression methods for high‐value targets at the Seattle Structural Genomics Center for Infectious Disease. Acta Cryst 67:992–997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Kabsch W (2010) Xds. Acta Cryst 66:125–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. McCoy AJ, Grosse‐Kunstleve RW, Adams PD, Winn MD, Storoni LC, Read RJ (2007) Phaser crystallographic software. J Appl Cryst 40:658–674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Adams PD, Afonine PV, Bunkoczi G, Chen VB, Davis IW, Echols N, Headd JJ, Hung LW, Kapral GJ, Grosse‐Kunstleve RW, McCoy AJ, Moriarty NW, Oeffner R, Read RJ, Richardson DC, Richardson JS, Terwilliger TC, Zwart PH (2010) PHENIX: a comprehensive Python‐based system for macromolecular structure solution. Acta Cryst 66:213–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Chen VB, Arendall WB, 3rd , Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC (2010) MolProbity: all‐atom structure validation for macromolecular crystallography. Acta Cryst 66:12–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES