Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2012 Jun 5;109(26):E1733-E1742. doi: 10.1073/pnas.1201301109

Structural basis of histidine kinase autophosphorylation deduced by integrating genomics, molecular dynamics, and mutagenesis

Angel E Dago a,1, Alexander Schug b,c,1, Andrea Procaccini d,e, James A Hoch a, Martin Weigt d,f,2, Hendrik Szurmant a,2
PMCID: PMC3387055  PMID: 22670053

Abstract

Signal transduction proteins such as bacterial sensor histidine kinases, designed to transition between multiple conformations, are often ruled by unstable transient interactions making structural characterization of all functional states difficult. This study explored the inactive and signal-activated conformational states of the two catalytic domains of sensor histidine kinases, HisKA and HATPase. Direct coupling analyses, a global statistical inference approach, was applied to >13,000 such domains from protein databases to identify residue contacts between the two domains. These contacts guided structural assembly of the domains using MAGMA, an advanced molecular dynamics docking method. The active conformation structure generated by MAGMA simultaneously accommodated the sequence derived residue contacts and the ATP-catalytic histidine contact. The validity of this structure was confirmed biologically by mutation of contact positions in the Bacillus subtilis sensor histidine kinase KinA and by restoration of activity in an inactive KinA(HisKA):KinD(HATPase) hybrid protein. These data indicate that signals binding to sensor domains activate sensor histidine kinases by causing localized strain and unwinding at the end of the C-terminal helix of the HisKA domain. This destabilizes the contact positions of the inactive conformation of the two domains, identified by previous crystal structure analyses and by the sequence analysis described here, inducing the formation of the active conformation. This study reveals that structures of unstable transient complexes of interacting proteins and of protein domains are accessible by applying this combination of cross-validating technologies.

Keywords: coevolution, signal transduction, two component system, protein structure prediction, biological physics


Protein–protein interaction is the essence of signal transduction and essential for protein function at all levels of life. Proteins are frequently required to undergo conformational transitions between multiple states characterized by unique transient residue interactions for each state. Attempts to describe features of these states depend upon techniques that investigate single molecules at sub-ms timescales (1), mechanically pull them (2), or quantify their dynamics (3). On the theoretical side, energy landscape theory provides a basis to understand protein folding and function (4), and atomically resolved simulations can describe transitions between multiple (native) conformations covering ms timescales on specialized supercomputers (5).

Yet, despite all these advances, a full structural characterization of multiple conformations for a specific system remains a daunting task. Some states, active states in particular, are intrinsically short-lived and therefore difficult to stabilize for direct measurements or capture in a crystal lattice. By utilizing a strategy of integrating information from cross-disciplinary approaches, we show here that the shortcomings of individual techniques may be circumvented. By doing so, we introduce the active conformation of the catalytic domains of the sensor histidine kinase and provide the basis for understanding how signals induce the conformational changes required to form the active state.

Sensor histidine kinases are multidomain proteins composed of a variable number and type of signal input domains that, upon binding specific ligands, become induced to autophosphorylate on a conserved histidine residue (6). The catalytic region of these proteins consists of a conserved cytoplasmic two-domain catalytic core, which features a homodimeric four-helix bundle harboring the site of phosphorylation, the HisKA domain, and an ATP-binding domain recognized by the HATPase_c Hidden Markov model (HMM) shared by various classes of ATP-binding proteins (7). In recent years a few crystal structures of the entire catalytic core of histidine kinases have been determined but the two domains are clearly in an inactive conformation, in which ATP and histidine are too distal for catalysis (810). The structure of the active conformation of the two domains has not been captured in a crystal and remains unknown. This is an example of the difficulty in obtaining sufficiently stable conformations of transiently interacting proteins and protein domains for structure determination.

It is precisely for this purpose that we developed a computational-biology approach termed direct coupling analysis (DCA) (11, 12) with the ability to extract residue–residue contacts for individual proteins or protein interaction partners from the sequence variability in extensive protein family databases. DCA relies on genetic drift and identifies residue positions that coevolve. Previous efforts using local covariance measures (1318) are of limited value in contact prediction because coevolutionary interaction networks between residue pairs inflate local covariances far beyond their direct coupling. DCA applies a global inference step to covariance analysis in order to disentangle direct and indirect contributions. In a study on two-component signaling systems, DCA was capable of extracting contact residues between interacting proteins with very high accuracy, thus enabling identification of protein interaction surfaces (12). Applied to individual proteins, DCA identifies contact positions that define the individual folds as well as homo-oligomerization contacts (12, 19). In a subsequent study we developed a protein docking approach termed MAGMA (molecular dynamics and genomics for macromolecular assembly) and showed that the contact residue information available from DCA is sufficient to assemble accurate structures of protein complexes in silico when individual structures of the proteins are available (20, 21).

Here, we applied DCA to the catalytic core of the histidine kinase to identify contacts that define a representative structure for the active conformation of the sensor histidine kinase formed by the two catalytic domains. It was not clear a priori that sufficient sequence specificity has to exist between two domains on the same protein to allow extraction of significant correlation patterns. Similarly, it was not clear whether a significant majority of the kinases in the databases would conserve a single active conformational state, which would be a requirement in order to deduce the active conformation contact residues.

However, DCA indeed produced putative domain–domain contact information for the two catalytic domains. These contacts were used in a docking approach to generate an approximate structural model of the active state. This was refined using unrestrained molecular dynamics simulations, which confirmed that an active state in which these contacts are made exists. In addition, the structural model drove site-directed mutagenesis on an exemplary histidine kinase system, the results of which demonstrated the biological importance of the DCA contacts and established the validity of the structural model. These results illustrate the predictive power of our integrated cross-disciplinary approach, including statistical-physics inspired genomic analysis, molecular dynamics simulations, and mutagenesis experiments for visualizing transient protein interactions.

Results

DCA Identifies Potential Residue Contacts Between HisKA and HATPase_c Domains.

The primary goal of this research is to define the structure of the histidine kinase autophosphorylation complex formed by the HisKA and HATPase_c domains. To this end, DCA was applied to 13,647 histidine kinase proteins containing both HisKA and HATPase_c domains to reveal residue contacts both internal to each domain and in the interface between the two domains on the same protein. DCA ranks direct statistical coupling between residue-position pairs, which are quantified by their direct information (DI) values (12). The residue position pairs identified by the highest DI values were mapped onto the exemplary structure of Thermotoga maritima histidine kinase HK853 (Fig. 1A) (9). Within the 36 highest-ranking pairs there are 31 intradomain pairings (Table 1). Of these, 25 are within 5 Å (i.e., in contact distance) and three more are within 7 Å, where contact distance could be achieved through side-chain reorientation (Table 1, green pairings). Similar results are obtained when mapping the residue positions onto Geobacillus stearothermophilus KinB (10) (Table 1). These contacts are expected to exist in a large fraction of the analyzed proteins. We observed > 90% true-positive prediction rate for intradomain contacts within this range, demonstrating the predictive power of DCA.

Fig. 1.

Fig. 1.

Direct coupling analysis identifies interdomain contact pairings for sensor histidine kinases. (A) The dimeric structure of the catalytic domains of T. maritima sensor kinase HK853 (PDBID: 2C2A). The catalytic portion of the protein is comprised of a central four-helix bundle HisKA domain, with the histidine site of phosphorylation in red and the globular catalytic HATPase_c domain connected via short linker regions. The bound ADP molecules are in red. Because the ADP and histidine are distal in the crystal structure, the structure represents an inactive conformation of the kinase. (B) Direct coupling analysis identified two interdomain contact pairings (L315-E451 and D311-Q372, shortest heavy-atom distance displayed) that are realized in the HK853 structure, suggesting that this inactive state is generic and of physiological relevance for a majority of the 13,000+ sequence examples that were used to extract this data. (C) Direct coupling analysis identified three putative interdomain contact pairings (yellow) that are not realized in the HK853 crystal structure, but that would be consistent with an active structure in which ATP and histidine (in red) are in reaction proximity.

Table 1.

The 36 highest-scoring DCA pairings

graphic file with name pnas.1201301109tblT1.jpg

True-positive intradomain contacts are in green, false-positive intradomain contacts are in grey, and predicted interdomain contacts are in red.

*Numbering according to HK853 protein.

Minimal atom distances between the indicated residues in PDB: 2C2A.

Minimal atom distances between the indicated residue in PDB: 3D36 (NR, not resolved).

§Indicates the domains the residues are found in: 1 = HisKA and 2 = HATPase_c.

Among the 36 highest-ranking pairings we also found five interdomain pairings, which we considered as possible contact candidates (Table 1, red pairings). When mapped onto the HK853 structure (Fig. 1), we found two of the top five highest-ranking interdomain pairings (D311-Q372 and L315-E451) are in contact in the inactive form of HK853, visualized in the crystal structure (9) (Fig. 1B). This suggests that the crystallized version of the protein is generic for the majority of the 13,000+ kinases in our dataset and that the inactive structure formed between the two domains is of physiological relevance, consistent with previous results (9). Consistent with this notion, these contacts are also made in the G. stearothermophilus KinB structure (Table 1).

The other three top-scoring interdomain pairings (N257-Q427, E261-Q372, and E308-R369) were found to be distal in the HK853 crystal structure, suggesting that they are not contacts that define the inactive state of the kinase. Instead, these three residue pairings could be envisioned to simultaneously form contacts in an active conformation of the kinase in which the γ-phosphoryl group of ATP bound in the HATPase_c domain and the phosphorylatable histidine residue in the HisKA domain are within an appropriate distance for phosphotransfer (Fig. 1C).

These analyses reliably identified intradomain contact residue pairings from sequence alone with a true-positive rate of over 90% for the first 36 residue pairings. Within this range, they also identified two interdomain contact pairings that define the inactive form of the kinase observed in the HK853 and KinB crystal structures, and three interdomain residue pairings that are candidate contact pairings suspected to stabilize the unknown activated structure of the kinase. Biological and molecular dynamics experiments were undertaken to assess the validity of this prediction.

Site-Directed Mutagenesis on the KinA Sporulation Sensor Kinase Supports the Importance of DCA Contacts.

To gather experimental support that our DCA analysis had identified interdomain contacts crucial for generating the active site complex of the domains, a biological system to test the importance of the three interdomain contacts on both in vitro enzymatic activity and in vivo physiological activity was needed.

The best structure (in terms of resolution and completeness) that includes both HisKA and HATPase_c catalytic domains is the T. maritima sensor kinase HK853, which has been determined both individually (PDBID 2C2A) or in complex with its response regulator (PDBID 3DGE) (8, 9). However, T. maritima is not a genetically tractable organism and the physiological function of histidine kinase HK853 remains unknown. As a more suitable experimental system for examination of the interdomain contact positions we chose the Bacillus subtilis sporulation kinase KinA. Deletion of KinA has an easily identifiable sporulation phenotype in B. subtilis (22). Furthermore, the full-length KinA kinase is cytoplasmic and is more amenable to in vitro studies than transmembrane histidine kinases.

The importance of the residue pairings identified by the DCA study was tested by introducing individual point mutations in the three HisKA and the three HATPase_c domain positions predicted to define active state contacts. Mutations were chosen to change the observed HisKA–HATPase_c pairings at these positions in KinA to uncommon, potentially incompatible by size or charge pairings, with the prediction that these mutations might impair function. Specifically, we introduced KinA mutations G402Y, E406Q, and E448K (the corresponding positions to 257, 261, and 308 in HK853) in the HisKA domain and KinA mutations Q505V, Q508E, and P560G (the corresponding positions to 369, 372, and 427 in HK853) in the HATPase_c domain (sequence alignment and positions of mutated residues are shown in Fig. S1). Genes coding for wild-type KinA protein as well as the six point mutants were transformed into a B. subtilis strain JH642(∆ABCD) that harbored markerless, in-frame deletions of all sporulation kinases and was unable to sporulate. The respective strains were designed to express the wild-type kinA gene or its mutants ectopically from the dispensable amyE locus under native kinA promoter control. All six single point mutations had severe effects on the ability of KinA to induce sporulation as observed both on a solid plate (Fig. 2A) or quantified in liquid medium (Fig. 2B). Strains with mutations G402Y, E406Q, E448K, and Q508E in KinA failed to sporulate at levels above that of the isogenic strain without the kinA gene. When compared to an isogenic strain harboring a wild-type kinA, mutations P560G and Q505V caused a 100-and 100,000-fold reduction in the rate of sporulation, respectively. The reduced ability to induce sporulation for the KinA mutant proteins was not due to inability to express or degradation; all KinA mutant proteins were observed at comparable levels to the wild-type KinA protein when visualized by immunoblotting with anti-KinA antibody (Fig. 2C).

Fig. 2.

Fig. 2.

Undesired (uncommon) residue pairings at DCA contact positions in sensor kinase KinA impair in vivo and in vitro activity. Strain JH31000 deleted for all sporulation kinases (Δ), isogenic strains designed to express only wild-type kinA (WT), or the indicated kinA point mutants in DCA contact residues were assayed for (A) a sporulation plate phenotype on SM media, where sporulating strains appear white and nonsporulating strains appear clear. (B) Live cell and spore counts were quantified for the same strains after growth in liquid SM media for 24 h. (C) Cellular KinA protein levels were assessed in the same strains by immunostaining with monoclonal anti-KinA antibody to exclude the possibility that the mutations alter expression or protein stability. All lanes contained protein extract derived from 0.1 absorbance units of cells except for the left lane,which contained 0.5 ng of purified native KinA as a control. (D) In vitro autophosphorylation activities of the respective KinA proteins were assessed by autoradiography with γ-32P-ATP (Upper Gel). Reactions were stopped at the indicated time points. To assure similar protein levels, proteins were also visualized by Coomassie staining (Lower Gel).

To assess whether the observed in vivo phenotypes were reflected in the in vitro activities of the proteins, wild-type KinA and the individual KinA mutants were expressed in Escherichia coli, purified, and tested for autophosphorylation utilizing γ-32P-ATP. Phosphorylation rates were monitored over a time course of 30 min. From these assays it was apparent the mutant proteins that failed to induce B. subtilis sporulation retained virtually no activity in vitro. The two mutants that supported reduced sporulation rates also retained activity in vitro albeit at reduced rates compared to wild-type (Fig. 2D).

Thus, the data supported the importance of the residue positions identified by DCA analysis for kinase activity, and were consistent with the notion that these residue pairings might be contacts at the domain–domain interface. Furthermore, the data validated KinA as a good experimental test system to evaluate further the relevance of the DCA contacts.

MAGMA Generates Structural Models of the HisKA-HATPase_c Autophosphorylation Complex.

Our previous theoretical studies predicted histidine kinase-response regulator complexes based on structural data of the individual proteins and DCA-derived contact data in an approach we termed MAGMA (20, 21). Structural models were in excellent agreement both with an ancestral histidine kinase Spo0B-response regulator Spo0F structure (approximately 2.5 Å rmsd) (23) and with concomitantly published independent experimental work (approximately 3.5 Å rmsd) that determined the HK853-RR468 complex structure (8, 21). We now used MAGMA to predict the autophosphorylation conformation of the sensor kinase.

The three putative active state contacts extracted by DCA (see Fig. 1C) and an additional contact between the γ-phosphoryl group of ATP and H260 necessary for autophosphorylation were used as soft constraints in structure-based docking simulations to generate a possible autophosphorylation conformation. Simulations allowed for these four contacts to be formed either in cis (i.e., within the same monomer) or in trans (i.e., between the two peptides in the homodimer).

As a result of the docking simulations a large ensemble of possible cis conformations but no trans conformations was observed (see Fig. 3A). To investigate whether it is in principle possible to also generate a docked trans conformation, we performed another set of simulations in which the contacts were only allowed to form in trans and not in cis. While this enforced docked trans conformations, it was impossible to physically form/satisfy the four constraining trans-contacts simultaneously; the resulting conformations are very diverse, have a limited interdomain interface, and high strain at the end of helix α2. For further discussion we therefore disregard a possible trans conformation in agreement with experimental data (8).

Fig. 3.

Fig. 3.

An active kinase conformation generated by molecular simulation. (A) DCA identified three contact residue pairings (yellow) between the HATPase_c and the HisKA domain. In addition, ATP needed to be spatially close to H260 in the active conformation (red). These four contacts were used in subsequent docking studies, which generated a diverse ensemble of possible conformations that accommodate these four contacts. Ten different conformations are displayed. (B) One conformation from the ensemble was selected and simulated in three 170 ns unrestrained MD simulations. In one simulation a stable interdomain interface was formed. Based on a cluster analysis on this simulation, a dimer conformation was constructed and its stability probed in eight additional 150–200 ns MD simulations. In all simulations the interdomain interface remained stable, as demonstrated by the averaged contact matrix of residues (x-axis: HisKA domain;y-axis: HATPase_c domain; positions of α-helices are indicated). In particular, stable clusters of contacts, including the three DCA contacts (yellow), were formed between helices α1 and α2 with helices α3, α4, and α5. Two further DCA (green) contacts were found when extending the HATPase_c HMM to include helix α3. (C) Our predicted active conformation was selected from these simulations. Significant contacts were formed between α1 and α2 (HisKA) and α3 (HATPase_c). The three DCA contacts are shown in yellow and the additional two contacts identified when extending the HATPase_c HMM are in green. (D, E) When comparing the active conformation (blue) with the inactive crystal structure (magenta, PBDID: 2C2A) (D,Front;E,Top), one notices significant domain movement of the HATPase_c domain.

Within the cis ensemble additional interface contacts between helix α3 and (HisKA) helices α1/α2 were consistently seen. These contacts could not have been identified by DCA, because the HATPase_c HMM does not include helix α3 and therefore this region was not part of the sequence analysis (Fig. S1).

To refine the predicted model we selected a conformation out of the docked ensemble, which best fulfilled the selected constraints (7 Å between the Cα-atoms of an amino acid pair; 5 Å between ATP-γP and H260-Nϵ2), and used it as a starting structure in unconstrained molecular dynamics simulations with explicit water. Due to the transient nature of the active conformation and the scarcity of the initial contacts used for docking, we performed three 170 ns simulations with the expectation that not all simulations would find an active conformation with a stable interface. Indeed we found that in two simulations, the interdomain interface was partially lost for both monomers (Fig. S2). In one of these simulations, however, a stable HisKA-HATPase_c domain interface was formed within a monomer, again including significant contacts between helices HisKA α1/α2 and helix α3 (Fig. 4B). By performing a cluster analysis for this monomer, we gained a representative conformation with backbone rmsd of approximately 3 Å to the starting docked conformation. All DCA pairs that drove the initial docking simulations were in contact following the unrestrained molecular dynamics simulations. The Cα–Cα distances differed from the 7 Å used in the initial structure-based simulations. In particular, the pair E308-R369 interacted via fully extended complementary charged side chains and a Cα–Cα distance of 12 Å.

Fig. 4.

Fig. 4.

A detailed look at the interface of the active histidine kinase conformation. Out of an ensemble of structures derived from the final MD simulations, one structure that best fulfilled selection parameters (ATP/H260 contacts made, DCA contacts made, minimal rmsd to crystal structure for HATPase_c domain) was chosen as a representative for the autophosphorylation conformation. (A) View of the active conformation showing the ATP/H260 contact in the act of phosphoryl group transfer (red pseudobond), the three DCA contacts that were used to assemble the structure (yellow pseudobonds), and two additional contacts that were identified by extending the HATPase_c sequence HMM (green pseudobonds). Close-up views are shown of: (B) the DCA contact N257-Q427 featuring two hydrogen bonds:(C) the DCA contacts E261-Q372 and E308-R369 featuring a hydrogen bond and a saltbridge, respectively; (D) the ATP/H260 contact; and (E) the later identified contacts A268-A339 and Q298-A339.

We constructed a homodimer with two identical chains based on this representative active monomer conformation by aligning the monomer’s HisKA helices α1 and α2 to the HisKA helices of first chain A and then B of the docked starting structure. We then added the full cytoplasmic N-terminal helical tails from 2C2A, and, after energy minimization to remove atomic clashes, performed several additional unconstrained molecular dynamics simulations in multiple force fields both with and without ATP to further probe the stability of the homodimer and generate a representative active state dimer structure.

All simulations [NPT-ensemble with 2-fs time step; AMBER99SB-ildn/TIP3P (24) (no ATP):150 ns and two at 180 ns; Gromos43a1/SPC (with ATP): two at 200 ns; Gromos3a6/SPC (25) (with ATP): three at 200 ns] showed stable interfaces over their entire duration (see Fig. 3B and Fig. S2), demonstrating the robustness of this conformation. For further analysis we discarded the first 30 ns to allow local relaxation and removal of artifacts from our construction of a starting conformation. To quantify conformation diversity in the simulations we clustered each simulation into one representative conformation. The clustered conformations have a backbone rmsd of around 4 Å to each other (3.5 Å when excluding the highly mobile ATP lid and the N termini). We did not observe a significant difference in the presence or absence of ATP, suggesting a mechanism of conformational selection and not of induced fit for the adoption of this state (26). We also saw water molecules mediating contacts between the domains. Further, the end of HisKA helix α2 bent or unfolded compared to the inactive conformation, suggesting localized strain and “cracking” (27, 28).

To gain a representative active conformation, we selected a monomer conformation out of the five trajectories with ATP that best fulfilled a number of criteria, including correct ATP positioning in the pocket and appropriate DCA contacts and His-ATP contact distances (SI Text, Dataset S1, and Table S1). This representative active conformation differed significantly from the inactive crystal structure (see Fig. 3CE). In particular, the ATP-binding domain has rotated almost 90° and lies atop of H260 enabling autophosphorylation. In the active conformation, all three DCA contacts are simultaneously formed as characteristic side-chain interactions (Fig. 4A). Specifically, the DCA contact N257-Q427 is stabilized by two hydrogen bonds (Fig. 4B), the DCA contact E261-N372 is in hydrogen-bond distance, and the DCA contact E308-R369 is stabilized by a salt bridge (Fig. 4C). The catalytic histidine is in phosphorylation contact (3.2 Å) with the γ-phosphoryl group of ATP (Fig. 4D).

A Structure-Guided Mutagenesis Approach Succeeds in Repairing a KinA–KinD Hybrid Kinase.

The structural model was used to guide additional mutation studies to test its validity. According to our model of the active kinase state, three helices on one face of the HATPase_c domain (Fig. S1, helices α3–α5) arrange perpendicular to the HisKA four-helix bundle domain. The DCA analysis identified highly correlated positions in or close to helices α4 and α5 and, as described above, mutations in these positions impaired KinA function.

The observation that correlated residue positions can be identified at the HisKA-HATPase_c interface suggested that specificity exists between the two catalytic domains. In other words, HisKA and HATPase_c domains from distinct kinases are unlikely to be paired randomly to form a functional protein. To further test this hypothesis three hybrid kinases were generated in which the HATPase_c catalytic domain of KinA was replaced with increasingly sequence-divergent domains of other B. subtilis sensor histidine kinases. The chosen HATPase_c domains were from the histidine kinases KinD, YkoH, and YbdK, which featured identical residues at roughly 40%, 30%, and 20% residue positions, respectively (Fig. S3A). All three hybrid kinases completely failed to induce sporulation in a strain otherwise deleted for all sporulation sensor kinases (Fig. S3B). Now the question arose whether these hybrid kinases could be repaired by mutations directed at the established contact positions.

Because there was little disparity between KinA and KinD in the amino acid choice at the three active state DCA pairings (i.e., the residue on either the HisKA or HATPase_c domain or both was conserved between KinA and KinD) (Fig. S1), we focused efforts on helix α3, which is not part of the HATPase_c sequence HMM despite clearly being a part of this globular domain. Because of this, helix α3 was excluded from the DCA analysis and contacts in this region could therefore not have been forthcoming. Guided by our structural model, we questioned whether mutations could be identified here that improved signaling of the KinA–KinD hybrid. Predicted contact residue positions in the helix α3 that differed between KinA and KinD were subjected to mutagenesis. Specifically, mutations P475A/I476L/S479T were introduced in the KinD HATPase_c domain residues to reflect the residue choice observed at these positions in the KinA HATPase_c domain, with the expectation that such mutations might repair the KinA-KinD hybrid. Similarly, we exchanged contact residues in the KinA HisKA four-helix bundle (A413/S438) with the residues observed in KinD at these positions, again with the expectation that these mutations might repair the KinA–KinD hybrid protein.

The KinA–KinD(P475A/I476L/S479T) triple mutant and the KinA(A413G/S438Q)–KinD double mutant constructs were introduced into the kinase deficient strain JH642(∆ABCD) and expressed ectopically under wild-type promoter control. Both mutants showed unchanged protein levels when assessed by Western blotting. As expected, both strains supported sporulation levels that suggested significantly repaired kinase activity. In particular, the latter construct supported an almost wild-type level of sporulation (Fig. 5A).

Fig. 5.

Fig. 5.

Structure-guided mutagenesis of the KinA–KinD hybrid kinase identifies mutations that repair in vivo and in vitro activity. (A) Cell and spore counts for strains deleted for the sporulation kinases (∆) or expressing wild-type KinA, the KinA–KinD hybrid (KinAD), or hybrid proteins with point mutations that incorporate into (i) the KinD HATPase_c domain residues found in KinA (P475A, I476L, S479T), or (ii) the KinA HisKA domain residues found in KinD (A413G, S438Q) at these positions. (B) In vitro autophosphorylation of wild-type KinA, the KinA–KinD hybrid, or the KinA–KinD A413G mutant were performed as described for Fig. 2.

To assess whether any individual mutation in these proteins was responsible for the enhanced activity, we introduced the five single mutations in the KinA–KinD protein and again assessed their ability to complement the sporulation phenotype (Fig. 5A). For the mutations in the KinD HATPase_c domain it was evident that the mutation I476L had a significant effect alone in restoring sporulation, and this effect was enhanced in the presence of the other two mutations. For the mutations in the KinA HisKA domain it was evident that the A413G mutation alone was responsible for an almost wild-type sporulation rate.

One possibility for the observed results was that the A413G mutation was generally sufficient to increase kinase activity independent of the structural context. To eliminate this possibility, the same mutation was introduced in KinA, KinA-YbdK, and KinA-YkoH. Mutation A413G in KinA resulted in a slightly reduced sporulation rate in the corresponding strain, whereas it did not improve the ability of the other two hybrid proteins to induce sporulation, demonstrating that this mutation only has a positive effect on KinA–KinD mediated signaling. This mutant was thus selected for further in vitro studies. The purified protein was assessed for its autophosphorylation rate and compared to wild-type KinA and the unmutated KinA–KinD hybrid. Consistent with the in vivo phenotypes, we observed that the KinA(A413G)–KinD construct exhibited kinase activity similar to that of wild-type KinA (Fig. 5B).

These results showed we were able to assemble a structural model using genomically derived contacts, and this structural model proved sufficient in guiding mutagenesis efforts to repair a severely defective hybrid protein to full in vivo and in vitro function.

Extended Genomic Analysis Improves Interdomain Contact Prediction.

The results of molecular dynamics simulations and mutagenesis establish the structural and functional importance of contacts between the helix α3 and the HisKA domain. Because this helix is not part of the definition of the HATPase_c domain used in the Pfam database (29), it was not part of our original sequence analysis, and these contacts could not be identified by DCA.

To overcome this discrepancy, we extended the domain definition. More precisely, we used the standard Pfam HMMs for the HisKA and the HATPase_c domains to extract 359 kinases from the manually curated UniProtKB/Swiss-Prot database. We extracted sequences starting at the first position of the HisKA domain and ending at the last position of the HATPase_c domain, thus including the connector, which contains helix α3 (Fig. S1). After aligning them using MUSCLE (30), these sequences were used to build a single HMM with HMMER (31). The resulting 224 aligned positions included 46 positions describing the connector, and in particular the helix α3.

Scanning the National Center for Biotechnology Information (NCBI) RefSeq database (32), we extracted 13,535 sequences containing not only the old HisKA and HATPase_c domains, but also the connector domain, which was interpreted as being part of an extended HATPase_c domain. DCA led to results, which were highly coherent with prior DI ranking (Table S2). The most interesting outcome is that, in between the top nine interdomain residue pairings, we found again the five contacts identified before and used in the docking simulations. In addition, this list also contains two pairs having one position on helix α3, namely HK853 positions Q298:A339 and A268:A339 (KinA positions E438:L476 and A413:L476). Both pairs are in contact in the structural model and, remarkably, the latter two positions were the ones found to have the strongest effects in repairing the nonfunctional KinA–KinD hybrid. Note also that E438 is invariant between KinA and KinD, and therefore was not part of the set of potential repair residues for the KinA–KinD hybrid.

A remaining question is as to whether these additional contacts would improve our prediction of the active conformation. When rerunning the initial structure-based docking simulations, now including six (5 DCA +1 ATP-H260) contacts instead of the initial four, one gains conformations closer to our final prediction with a less diverse ensemble of conformations than in Fig. 3A. While our simulation protocol using later all-atom AMBER/GROMOS simulations formed a stable conformation including these two additional active contacts (Fig. 3C), their earlier inclusion would have likely simplified identification of an active conformation.

In summary, the disconnect between simulations/experimental results and the boundaries in the definition of the HATPase_c domain could be resolved by adding the connector sequences (Fig. S1) to the domain definition, illustrating once more the predictive power of our cross-disciplinary approach consisting of genomic sequence analysis, guided molecular dynamics docking simulations, and mutagenesis experiments.

Discussion

Our analysis of over 13,000 histidine kinases identified evolutionary patterns consistent with a two-state model of kinase activity, featuring an inactive and an active conformation. In the inactive conformation, the two catalytic HisKA and HATPase_c domains are oriented such that the phosphorylatable histidine and ATP are distant. In the active conformation the two domains rearrange so that the phosphorylatable histidine and ATP are in perfect orientation for phosphoryl group transfer. By applying DCA, interdomain contact pairings defining both inactive and active states can be extracted from protein sequence databases alone.

Two highly coevolving interdomain residue pairings identified by DCA studies are in contact in the known structures of T. maritima HK853 (8, 9), consistent with the notion that these structures capture an inactive but physiologically relevant state of the kinase. Previous mutational analysis of this interface resulted in increased or decreased kinase activity, suggesting that the observed structure represents a physiologically relevant conformation of HK853 (9). The fact that our sequence analysis on 13,000 kinases identified HisKA–HATPase_c surface contacts that are made in this structure suggests that this inactive conformation is generic for the large majority of the kinases in the dataset.

Three highly coevolving interdomain residue pairings that could be envisioned as possible active conformation contacts were extracted by DCA. Mutational analysis of these residue positions in sporulation kinase KinA revealed the importance of residue choice at these positions; several single point mutations completely destroyed kinase activity. The residue positions can be utilized to build a stable structural model of this conformation by applying a molecular dynamics approach we recently developed to integrate sequence-derived contact information (20, 21). In Fig. 4A we capture the active state and demonstrate that all three contacts can be simultaneously made, allowing the catalytic histidine to come in phosphorylation contact with the γ-phosphoryl group of ATP (Fig. 4D). Impressively, the three DCA contacts form some characteristic interactions: Contact N257-Q427 is stabilized by two hydrogen bonds (Fig. 4B); contact E261-N372 forms a hydrogen bond; and contact E308-R369 features a salt bridge (Fig. 4C). We also note that the latter two contacts are connected via a hydrogen bond between residues 369 and 372, suggesting that they form an interaction network. Because these residue positions are among the most strongly and directly correlated pairings (Table 1, position 12), the interaction network is likely representative for most histidine kinases. This means that simple swapping of amino acids at interacting residue positions may or may not yield a functional protein.

In addition to surfaces surrounding the DCA contacts, we find a significant area of interaction between the HisKA helices α1 and α2 and HATPase_c domain helix α3 where no contacts would have been identified by our sequence analysis because the involved sequence stretch is not part of the sequence definition of the HATPase_c domain and was not part of our original analysis. This surface area was further investigated experimentally. Mutations in residue positions in this area were identified that repaired a severely defective chimeric kinase. Extension of the sequence analysis including this region by building a new histidine kinase-specific HMM identifies two additional contact pairings (Fig. 4E). These pairings are coincident with those that, when mutated, repair the chimeric kinase, giving independent support that the structure of the active conformation is correct and generic for most HisKA sensor kinases.

A paradigm for sensor kinase autophosphorylation has long been that it occurs in trans (i.e., one subunit phosphorylates the other within the stable homodimer). Experimental evidence for the trans-phosphorylation mechanism in the NrII and EnvZ kinases is strong (33, 34). Only recently has this trans-paradigm been challenged by the observation that some kinases phosphorylate in cis, including HK853 (8, 35). The notion that different phosphorylation mechanisms exist among the set of HisKA-type sensor kinases is seemingly at odds with the notion that interaction residue contacts defining inactive and active states of the kinase can be extracted from HisKA-HATPase_c sequence databases. DCA should only identify residue–residue contacts if they are made in all, or at least a large majority, of the proteins in the alignment.

However, this apparent discrepancy can be resolved. Our structural model of the active conformation displays a cis-autophosphorylation conformation because it utilized the cis-phosphorylating HK853 protein as a structural model. A trans-model that accommodates the identical DCA contacts would require a significant change of the HisKA four-helix bundle arrangement. Such a change is observed when comparing the known HisKA structures of cis-phosphorylating HK853 kinase with that of the trans-phosphorylating EnvZ kinase (9, 36). The four-helix bundle HisKA domain of EnvZ shows an opposing rotation when compared to HK853 (and all other known structures of HisKA domains). In essence, this leads to a replacement of HisKA helices α1 and α1′ in the dimeric four-helix bundle. As a consequence, an identical active state accommodating all DCA contacts can be envisioned for both HK853 and EnvZ, except that the contacts involving helix α1 residues are made in cis in one and in trans in the other arrangement. In conclusion, our structural model could be representative for all HisKA sensor kinases and the cis-and trans-phosphorylating mechanism would be indicative of the rotation of the HisKA four-helix bundle.

Some support for our structural model as a possible representative for all HisKA-type sensor kinases comes from a disulfide cross-linking approach that attempted to define the active conformation of the EnvZ kinase (37). This study can now be interpreted in light of our active structure and accumulating evidence that sensor kinases exist in distinct (active and inactive) conformations (38). Eight double cysteine-substitution mutants with one cysteine each in the HisKA or the HATpase_c domain were tested for disulfide cross-linking with the notion that cross-linking residues are in close proximity. Of the eight double mutants, four failed to cross-link, and mutated residues are also distal in either our active state HK853 model or the inactive state crystal structure. A single transmolecular cross-link was observed between EnvZ residues 236 and 411. Corresponding residues in our active state model are in contact (albeit they are formed in cis based on the different helix orientation). Three intramolecular contacts were identified in the disulfide cross-linking study. One of these corresponds to an inactive state contact between HK853 residues 311 and 369, suggesting that EnvZ forms a similar inactive conformation as HK853. The other two contacts are not accommodated in either conformation but could become contacts during the transition from one to the other. Though cysteine cross-linking is an empirical approach with many artifacts (and a much larger number of disulfide mutants would be required to prove our current structure), the available data are, by and large, consistent with our active state model.

A major mystery surrounding histidine kinases is the molecular basis for the signal transduction mechanism in which ligand binding to the sensor domain induces the autophosphorylation reaction on the histidine of the HisKA domain. The data presented here define the two-state model for histidine kinases where specific contacts between the HisKA and the HATPase_c catalytic domains exist for both the inactive and active conformations (Fig. 6). Comparing the spatial positions of the active state DCA contact residues on the HisKA domain in both the active and inactive conformations revealed that these positions were unchanged in both states. However, the spatial position of the DCA contact residues in the HisKA domain, which define the inactive state of the enzyme, are severely distorted in the active conformation (Fig. 6A). This is caused by localized strain at the end of the C-terminal helix of the four-helix bundle, which leads to a bend or partial unwinding of the helix by typically 1.5 turns. This local unfolding has been referred to as cracking. While such cracking would be disastrous for macromolecular machines, cracking has been implicated on the molecular level in the initiation of molecular movement in other proteins as well and leads to lowered transition barriers of the functional energy landscape (27, 28). We suggest that a signal-induced conformational change in the HisKA domain mediated by the immediately N-terminal coiled-coil region leads to strain at the end of helix α2 in the dimeric four-helix bundle HisKA domain (Fig. 6B). This in turn destabilizes the inactive DCA contacts and favors the formation of the active conformation of the sensor kinase. Significant further analyses will be required to test this hypothesis.

Fig. 6.

Fig. 6.

A helix cracking model for histidine kinase activation. (A) An overlay of the HK853 HisKA domain in the inactive crystal structure conformation and our active conformation. H260 is in red, the five activeconformation DCA contacts are in yellow, and the two inactive conformation contacts are in green. Note that the position of the yellow residues does not significantly change between the two conformations, whereas the position of the inactive contact residues is severely changed due to cracking at the end of the C-terminal helix. This cracking is consistently observed across all MD simulations. (B) A schematic of the model illustrates the two different conformations. In the inactive conformation, the inactive DCA contacts are made (green lines) whereas the active contacts and the ATP-H260 contacts are not realized (yellow and red dots, respectively). Upon ligand (orange triangle) binding, a conformational change (red arrow) propagates through the dimer. This leads to local strain at the end of helix α2, which is released by helix cracking (highlighted in the green circle). Inactive contacts (green dots) are no longer made, favoring the formation of the active contacts (yellow lines) and the ATP-H260 contact (red line).

In conclusion, while many successful structural characterizations of active conformation of proteins have been described in the literature, some active conformations of proteins can be difficult to attain by purely experimental means. We demonstrate that residue contacts can be extracted from protein sequence databases. This is true even for multidomain proteins with domain-domain interactions that exist in multiple conformations, adding to previous successful results we obtained when exploring hetero- and homo-interactions between proteins. The residue contact information available through DCA sequence analysis is sufficient to build structural models of protein complexes or, as demonstrated here, to build alternative conformations of multidomain proteins. We are confident that similar, detailed cross-disciplinary studies will allow the identification of active states of other proteins. The only apparent limitation is in the requirement of a large number of protein sequences with significant sequence variability. With ever-growing protein sequence databases, universal applicability is expected in years to come.

Methods

Direct Coupling Analysis.

Genomes downloaded from the NCBI RefSeq database (39) were scanned with the HMM (40) for the Pfam protein domain families HisKA (PF00512), HATPase_c (PF02518), and Response_reg (PF00072) (29). We found 13,647 proteins containing one HisKA and one HATPase_c domain, but no Response_reg domain (i.e., no hybrid kinases were included in this analysis). Sequences were aligned to the two HMMs and assembled in one multiple sequence alignment (MSA).

This MSA was analyzed using DCA as described (11, 12), extracting first 60 residues involved in the highest interdomain correlation and running the message passing-based model inference on the reduced alignment. The main output, DI values for all column pairs, is a measure of the direct coevolutionary coupling between residue positions. High DI was previously shown to be an accurate predictor for residue–residue contacts.

Protein Docking and Molecular Dynamics Simulations.

Detailed docking and simulation protocols are provided in the SI Text. Briefly, to generate a conformation of the autophosphorylation state, we introduced the DCA-derived interdomain contacts and an interaction between the ATP-Pγ and the conserved His-260 residue as additional Go-type contacts (41, 42) into low-temperature structure-based simulations (20, 21) based on the crystal structure 3DGE (8). Simulations did not bias whether contacts were made in cis or in trans (i.e., within the same chain or between chains A and B of the homodimer). Existing contacts at the interface between HATPase_c and HisKA were weakened to facilitate transition to the new active conformation. In the explicit water simulation we used Amber99sb-ildn/TIP3P (24), Gromos43a1/SPC, and Gromos53a6/SPC (25) with a 2-fs time step. For all simulations we used the GROMACS simulations suite (43).

Growth Conditions.

All E. coli strains were grown in lysogeny broth (LB) and all B. subtilis strains were grown in Schaeffer’s sporulation medium (SM), in the presence of appropriate antibiotics whenever necessary. Antibiotic concentrations were 100 μg·ml-1 ampicillin or 25 μg·ml-1 kanamycin for E. coli and 2.5 μg·ml-1 kanamycin or 0.5 μg·ml-1 erythromycin and 12.5 μg·ml-1 lincomycin for B. subtilis.

Plasmid and Strain Constructions.

All plasmids, strains, and oligonucleotides used in this study are listed in Tables S3S5. The plasmids and strains were generated utilizing standard molecular biology protocols. Details are available in the SI Text.

Protein Expression and Purification.

C-terminal hexa-histidine tagged proteins were expressed and purified using standard protocols. Details are available in the SI Text.

Autophosphorylation Assay.

Autophosphorylation rates for KinA kinase and its derivatives were followed as previously described (44). Briefly, reactions were initiated by addition of 1 mM ATP containing 100 μCi [γ-32P]-ATP to 2 μm protein in a total volume of 100 μL. Fifteen μL reaction time points were taken and reactions were terminated by addition of 5 μL 4× Laemmli buffer. Later, the reactions were developed by SDS-PAGE and visualized by autoradiography using a Storm 840 PhosphorImager (Molecular Dynamics). ImageQuant software was used for data processing.

Sporulation Assays.

To assess sporulation efficiency supported by the various KinA mutant proteins, the relevant B. subtilis strains were grown in 5 ml SM at 37 °C for 28 h. Serial dilutions of cultures before and after a 15-s treatment with 1.1 volumes of chloroform were plated on SM media to measure viable cell counts and chloroform-resistant spore counts, respectively.

Immunoblotting.

KinA protein levels were assessed by immunoblotting of cell extracts of all relevant strains utilizing primary monoclonal anti-KinA antibody (1∶200) and secondary horseradish peroxidase-conjugated goat anti-mouse antibody (1∶4,000) and the ECL Plus Western blotting detection system (GE Healthcare) following standard protocols (SI Text).

Supplementary Material

Supporting Information

ACKNOWLEDGMENTS.

Helpful discussions with T. Hwa are gratefully acknowledged. This work was supported by Grant AI055860 (to J.A.H.) from the National Institute of Allergy and Infectious Diseases and GM019416 (to J.A.H.) from the National Institute of General Medical Sciences, National Institutes of Health, and US Public Health Service. Oligonucleotide synthesis and DNA sequencing costs were underwritten in part by the Stein Beneficial Trust. A.S. recognizes support by the Impuls- und Vernetzungsfonds of the Helmholtz Association and the SimLab NanoMicro (SCC, KIT). The MD simulations were performed using the resources of High Performance Computing Center North (HPC2N Grant SNIC001-10-193) and of bwGRiD.

Footnotes

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

See Author Summary on page 10148 (volume 109, number 26).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1201301109/-/DCSupplemental.

References

  • 1.Gambin Y, et al. Visualizing a one-way protein encounter complex by ultrafast single-molecule mixing. Nat Meth. 2011;8:239–241. doi: 10.1038/nmeth.1568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Gebhardt JC, Bornschlogl T, Rief M. Full distance-resolved folding energy landscape of one single protein molecule. Proc Natl Acad Sci USA. 2010;107:2013–2018. doi: 10.1073/pnas.0909854107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Olsson U, Wolf-Watz M. Overlap between folding and functional energy landscapes for adenylate kinase conformational change. Nat Commun. 2010;1:111. doi: 10.1038/ncomms1106. [DOI] [PubMed] [Google Scholar]
  • 4.Onuchic JN, Wolynes PG. Theory of protein folding. Curr Opin Struct Biol. 2004;14:70–75. doi: 10.1016/j.sbi.2004.01.009. [DOI] [PubMed] [Google Scholar]
  • 5.Shaw DE, et al. Atomic-level characterization of the structural dynamics of proteins. Science. 2010;330:341–346. doi: 10.1126/science.1187409. [DOI] [PubMed] [Google Scholar]
  • 6.Szurmant H, White RA, Hoch JA. Sensor complexes regulating two-component signal transduction. Curr Opin Struct Biol. 2007;17:706–715. doi: 10.1016/j.sbi.2007.08.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Dutta R, Inouye M. GHKL, an emergent ATPase/kinase superfamily. Trends Biochem Sci. 2000;25:24–28. doi: 10.1016/s0968-0004(99)01503-0. [DOI] [PubMed] [Google Scholar]
  • 8.Casino P, Rubio V, Marina A. Structural insight into partner specificity and phosphoryl transfer in two-component signal transduction. Cell. 2009;139:325–336. doi: 10.1016/j.cell.2009.08.032. [DOI] [PubMed] [Google Scholar]
  • 9.Marina A, Waldburger CD, Hendrickson WA. Structure of the entire cytoplasmic portion of a sensor histidine-kinase protein. EMBO J. 2005;24:4247–4259. doi: 10.1038/sj.emboj.7600886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Bick MJ, et al. How to switch off a histidine kinase: Crystal structure of Geobacillus stearothermophilus KinB with the inhibitor Sda. J Mol Biol. 2009;386:163–177. doi: 10.1016/j.jmb.2008.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lunt B, et al. Inference of direct residue contacts in two-component signaling. Methods Enzymol. 2010;471:17–41. doi: 10.1016/S0076-6879(10)71002-8. [DOI] [PubMed] [Google Scholar]
  • 12.Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein–protein interaction by message passing. Proc Natl Acad Sci USA. 2009;106:67–72. doi: 10.1073/pnas.0805923106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Altschuh D, Lesk AM, Bloomer AC, Klug A. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J Mol Biol. 1987;193:693–707. doi: 10.1016/0022-2836(87)90352-4. [DOI] [PubMed] [Google Scholar]
  • 14.Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins. 1994;18:309–317. doi: 10.1002/prot.340180402. [DOI] [PubMed] [Google Scholar]
  • 15.Neher E. How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci USA. 1994;91:98–102. doi: 10.1073/pnas.91.1.98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shindyalov IN, Kolchanov NA, Sander C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng. 1994;7:349–358. doi: 10.1093/protein/7.3.349. [DOI] [PubMed] [Google Scholar]
  • 17.Fodor AA, Aldrich RW. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins. 2004;56:211–221. doi: 10.1002/prot.20098. [DOI] [PubMed] [Google Scholar]
  • 18.Lockless SW, Ranganathan R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science. 1999;286:295–299. doi: 10.1126/science.286.5438.295. [DOI] [PubMed] [Google Scholar]
  • 19.Morcos F, et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci USA. 2011;108:E1293–E1301. doi: 10.1073/pnas.1111471108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Schug A, et al. Computational modeling of phosphotransfer complexes in two-component signaling. Methods Enzymol. 2010;471:43–58. doi: 10.1016/S0076-6879(10)71003-X. [DOI] [PubMed] [Google Scholar]
  • 21.Schug A, Weigt M, Onuchic JN, Hwa T, Szurmant H. High-resolution protein complexes from integrating genomic information with molecular simulation. Proc Natl Acad Sci USA. 2009;106:22124–22129. doi: 10.1073/pnas.0912100106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Burbulys D, Trach KA, Hoch JA. Initiation of sporulation in B subtilis is controlled by a multicomponent phosphorelay. Cell. 1991;64:545–552. doi: 10.1016/0092-8674(91)90238-t. [DOI] [PubMed] [Google Scholar]
  • 23.Zapf J, Sen U, Madhusudan, Hoch JA, Varughese KI. A transient interaction between two phosphorelay proteins trapped in a crystal lattice reveals the mechanism of molecular recognition and phosphotransfer in signal transduction. Structure. 2000;8:851–862. doi: 10.1016/s0969-2126(00)00174-x. [DOI] [PubMed] [Google Scholar]
  • 24.Lindorff-Larsen K, et al. Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins. 2010;78:1950–1958. doi: 10.1002/prot.22711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Oostenbrink C, Villa A, Mark AE, van Gunsteren WF. A biomolecular force field based on the free enthalpy of hydration and solvation: The GROMOS force-field parameter sets 53A5 and 53A6. J Comput Chem. 2004;25:1656–1676. doi: 10.1002/jcc.20090. [DOI] [PubMed] [Google Scholar]
  • 26.Lange OF, et al. Recognition dynamics up to microseconds revealed from an RDC-derived ubiquitin ensemble in solution. Science. 2008;320:1471–1475. doi: 10.1126/science.1157092. [DOI] [PubMed] [Google Scholar]
  • 27.Whitford PC, Miyashita O, Levy Y, Onuchic JN. Conformational transitions of adenylate kinase: Switching by cracking. J Mol Biol. 2007;366:1661–1671. doi: 10.1016/j.jmb.2006.11.085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Miyashita O, Onuchic JN, Wolynes PG. Nonlinear elasticity, proteinquakes, and the energy landscapes of functional transitions in proteins. Proc Natl Acad Sci USA. 2003;100:12570–12575. doi: 10.1073/pnas.2135471100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Finn RD, et al. The Pfam protein families database. Nucleic Acids Res. 2010;38:D211–D222. doi: 10.1093/nar/gkp985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Edgar R. MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113. doi: 10.1186/1471-2105-5-113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Durbin R, Eddy SR, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge: Cambridge Univ Press; 1998. [Google Scholar]
  • 32.Pruitt KD, Tatusova T, Klimke W, Maglott DR. NCBI reference sequences: Current status, policy and new initiatives. Nucleic Acids Res. 2009;37:D32–D36. doi: 10.1093/nar/gkn721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ninfa EG, Atkinson MR, Kamberov ES, Ninfa AJ. Mechanism of autophosphorylation of Escherichia coli nitrogen regulator II (NRII or NtrB): Trans-phosphorylation between subunits. J Bacteriol. 1993;175:7024–7032. doi: 10.1128/jb.175.21.7024-7032.1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Cai SJ, Inouye M. Spontaneous subunit exchange and biochemical evidence for trans-autophosphorylation in a dimer of Escherichia coli histidine kinase (EnvZ) J Mol Biol. 2003;329:495–503. doi: 10.1016/s0022-2836(03)00446-7. [DOI] [PubMed] [Google Scholar]
  • 35.Pena-Sandoval GR, Georgellis D. The ArcB sensor kinase of Escherichia coli autophosphorylates by an intramolecular reaction. J Bacteriol. 2010;192:1735–1739. doi: 10.1128/JB.01401-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Tomomori C, et al. Solution structure of the homodimeric core domain of Escherichia coli histidine kinase EnvZ. Nat Struct Biol. 1999;6:729–734. doi: 10.1038/11495. [DOI] [PubMed] [Google Scholar]
  • 37.Cai SJ, Khorchid A, Ikura M, Inouye M. Probing catalytically essential domain orientation in histidine kinase EnvZ by targeted disulfide crosslinking. J Mol Biol. 2003;328:409–418. doi: 10.1016/s0022-2836(03)00275-4. [DOI] [PubMed] [Google Scholar]
  • 38.Stewart RC. Protein histidine kinases: Assembly of active sites and their regulation in signaling pathways. Curr Opin Microbiol. 2010;13:133–141. doi: 10.1016/j.mib.2009.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Sayers EW, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009;37:D5–D15. doi: 10.1093/nar/gkn741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. [DOI] [PubMed] [Google Scholar]
  • 41.Schug A, Onuchic JN. From protein folding to protein function and biomolecular binding by energy landscape theory. Curr Opin Pharmacol. 2010;10:709–714. doi: 10.1016/j.coph.2010.09.012. [DOI] [PubMed] [Google Scholar]
  • 42.Whitford PC, et al. An all-atom structure-based potential for proteins: Bridging minimal models with all-atom empirical forcefields. Proteins. 2009;75:430–441. doi: 10.1002/prot.22253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Hess B, Kutzner C, van der Spoel D, Lindahl E. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. J Chem Theory Comput. 2008;4:435–447. doi: 10.1021/ct700301q. [DOI] [PubMed] [Google Scholar]
  • 44.Grimshaw CE, et al. Synergistic kinetic interactions between components of the phosphorelay controlling sporulation in Bacillus subtilis. Biochemistry. 1998;37:1365–1375. doi: 10.1021/bi971917m. [DOI] [PubMed] [Google Scholar]
Proc Natl Acad Sci U S A. 2012 Jun 26;109(26):10148-10149.

Author Summary

AUTHOR SUMMARY

The ongoing progress in sequencing and other high-throughput technologies combined with complementary bioinformatics tools have transformed the world of biology in the past decade and set the stage for the genomic revolution. However, the impact of such approaches on our understanding of cellular protein function at the molecular level have been limited. Many proteins are designed to transition between multiple conformations or interact with other biomolecules to exert their function. These conformations and interactions are often ruled by transient interactions, making direct structural characterization of all functional states of a protein a daunting task. This is particularly true for activated protein conformations that are typically short-lived and unstable. The present study demonstrates that by integrating data from three different techniques, the activated conformation can be deduced for an important class of proteins that has eluded experimental structural characterization despite decades of detailed molecular attention.

Signal transduction proteins that detect external stimuli and control their inherent activities in response to a signal are excellent examples of proteins that have to transition between inactive and active states. In bacteria, the sensor histidine kinase is a well-studied, ubiquitous, and genetically amplified signal transduction protein (1). The histidine kinases are specifically mated to transcription factors that generate the response to the signal detected by the kinase by modulating gene expression. The message between these proteins is mediated by phosphoryl group transfer. A typical bacterial genome codes for tens to even hundreds of such proteins. Histidine kinases and their transcription factors are employed for a flurry of adaptation responses to environmental conditions, control of cell-cycle progression, and to mediate important developmental programs, such as dormant spore formation or fruit ripening in plants.

Signal ligands initiate the signal transduction event by modulating the intrinsic autophosphorylation activity of the sensor histidine kinase. In its active conformation, two domains of the kinase are juxtaposed in order to catalyze the phosphorylation of a conserved histidine residue on one domain by the ATP cofactor bound to the other domain. The structure of this active conformation has not been captured in a crystal and, until now, was unknown.

Our ability to identify interacting protein surfaces is a direct consequence of the rapid advances in DNA sequencing that have resulted in the accumulation of over 2,000 bacterial genome sequences (2), providing a largely untapped resource that can aid in the structural characterization of unknown protein conformations. We have shown that by studying large protein alignments of thousands of homologous sequences, one can detect the spatial proximity of an individual residue-position pairing via the statistical correlation of amino acid choices between these two positions (3). Local covariance approaches that measure correlations between two individual residue positions are of limited value in this regard: If position i is in contact with position j, and position j with position k, then i and k will also show correlation even if not directly coupled. To solve this problem we recently introduced a technology, termed direct coupling analysis (DCA), that amends local covariance by applying global statistical inference methods. It describes the joint statistics of amino acid occurrences over the full alignment width, and thereby disentangles direct from indirect correlation effects. In the above example, the observed indirect correlation between positions i and k will be explained via the direct coupling of both positions with position j. Concentrating on such direct couplings alone has proven successful in determining residue contacts with high accuracy, both for individual proteins and protein interaction partners (3).

Residue contacts identified by DCA for interacting proteins or protein domains may be utilized to guide structural assembly of unknown protein complexes or protein conformations when structures of the individual proteins or protein domains are known. Our approach to structure assembly, termed molecular dynamics and genomics for macromolecular assembly (MAGMA), utilizes the sequence-derived contact information and the known structures of individual protein domains. In an initial computationally efficient step, the domains are moved against each other in structure-based models to realize the sequence-derived contact information, a process also known as protein docking. This approximate domain-domain arrangement is then tested and refined in more-detailed computer models to generate a final structure. Recent results obtained using this approach have demonstrated that large protein complexes can be determined, approaching crystal resolution accuracy, a gold standard for structure determination (4).

The current study applied the technology described above to an alignment of more than 10,000 sensor histidine kinases to deduce the domain arrangement of the activated conformation of the kinase. Using DCA, five highly correlated interdomain residue pairings were identified. Two amino acid pairings in the sensor histidine kinase were found to be in contact in known crystal structures of nonactivated sensor histidine kinases, where the catalytic ATP and histidine residues are too distal for autophosphorylation to occur. This suggests that this inactive conformation is of physiological relevance and is typical of most kinases in the database. The other three contacts were not realized in the nonactivated structure and were not compatible with its conformation. These three contacts potentially define the activated conformation of the kinase. To test this notion, the three residue contacts along with an existing histidine kinase structure were utilized in MAGMA to generate a docked active conformation of the protein. The docked structure was tested and improved in extensive molecular-level computer simulations. We observed a stable conformation that simultaneously accommodates all three sequence-derived contacts along with the histidine-ATP catalytic contact.

Two experimental approaches were employed to validate our structural model. First, we mutagenized the three putative active state contacts to introduce uncommon residue combinations (i.e., residues that naturally occur at the individual sites, but not usually in combination) in a well-studied kinase and observed severe defects of kinase activity in vivo and in vitro. This is consistent with the notion that these residue positions have to be paired specifically to stabilize the active conformation of the kinase. Second, we engineered a nonfunctional hybrid kinase in which the phosphorylatable domain from one kinase was paired with an ATP-binding domain from another histidine kinase. The hybrid kinase was analyzed in vivo and in vitro and shown to retain a low level of in vitro autophosphorylation activity, insufficient to support in vivo function. This kinase was then utilized to test the structural model. Residue positions at the predicted active state interface, which differed between wild-type and hybrid kinases, were mutated, and these mutations completely repaired the in vivo and in vitro kinase activity. Thus, the enzymatic experiments were entirely consistent with the contact residue positions identified by DCA and MAGMA (Fig. P1).

Fig. P1.

Fig. P1.

(A) A statistical analysis of over 10,000 aligned histidine kinase sequences reveals highly correlated amino acid pairs between the two catalytic histidine kinase A (HisKA) and histidine kinase-, DNA gyrase B-, and HSP90-like ATPase (HATPase_c) domains. (B) The sequence-derived contact pairs (yellow) are found to be functionally relevant in mutagenic experiments and are used to drive docking simulations along with the catalytic ATP-His contact (red) to deduce the active conformation of the histidine kinase. (C) Molecular dynamics simulations generate an active conformation (blue) with a stable interdomain interface that requires a large domain movement when compared to a known inactive conformation of the kinase (red). (D) The contact interface from the model demonstrates that all sequence-derived contacts are simultaneously accommodated (yellow dots). Additional contacts (green dots) were utilized to repair an inactive chimeric histidine kinase protein in support of this active conformation.

In conclusion, we established a structural model of the activated conformation of sensor histidine kinases. This model provides a platform from which testable hypotheses related to the ligand-induced transition from inactive to active states can be investigated. Such a hypothesis is described in the main text. More generally, we show that genomic databases and resources can be utilized to identify residue-position pairings between protein domains of individual proteins with multiple states as well as interacting proteins, as previously demonstrated (4). This technology will be generally useful for deducing additional structures of unresolved transient physical states of proteins in a computationally efficient manner. These multiple physical states of proteins are often critically associated with protein function, yet inaccessible to discovery by traditional methods.

Footnotes

The authors declare no conflict of interest.

This Direct Submission article had a prearranged editor.

See full research article on page E1733 of www.pnas.org.

Cite this Author Summary as: PNAS 10.1073/pnas.1201301109.

References

  • 1.Bourret RB, Silversmith RE. Two-component signal transduction. Curr Opin Microbiol. 2010;13:113–115. doi: 10.1016/j.mib.2010.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Pagani I, et al. The Genomes OnLine Database (GOLD) v.4: Status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2012;40:D571–D579. doi: 10.1093/nar/gkr1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein–protein interaction by message passing. Proc Natl Acad Sci USA. 2009;106:67–72. doi: 10.1073/pnas.0805923106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Schug A, Weigt M, Onuchic JN, Hwa T, Szurmant H. High-resolution protein complexes from integrating genomic information with molecular simulation. Proc Natl Acad Sci USA. 2009;106:22124–22129. doi: 10.1073/pnas.0912100106. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES