Combining multiple structure and sequence alignments to improve sequence detection and alignment: Application to the SH2 domains of Janus kinases

Bissan Al-Lazikani; Felix B Sheinerman; Barry Honig

doi:10.1073/pnas.011577898

. 2001 Dec 18;98(26):14796–14801. doi: 10.1073/pnas.011577898

Combining multiple structure and sequence alignments to improve sequence detection and alignment: Application to the SH2 domains of Janus kinases

Bissan Al-Lazikani ^1,^*,^†, Felix B Sheinerman ^1,^†, Barry Honig ^1,^‡

PMCID: PMC64938 PMID: 11752426

Abstract

In this paper, an approach is described that combines multiple structure alignments and multiple sequence alignments to generate sequence profiles for protein families. First, multiple sequence alignments are generated from sequences that are closely related to each sequence of known three-dimensional structure. These alignments then are merged through a multiple structure alignment of family members of known structure. The merged alignment is used to generate a Hidden Markov Model for the family in question. The Hidden Markov Model can be used to search for new family members or to improve alignments for distantly related family members that already have been identified. Application of a profile generated for SH2 domains indicates that the Janus family of nonreceptor protein tyrosine kinases contains SH2 domains. This conclusion is strongly supported by the results of secondary structure-prediction programs, threading calculations, and the analysis of comparative models generated for these domains. One of the Janus kinases, human TYK2, has an SH2 domain that contains a histidine instead of the conserved arginine at the key phosphotyrosine-binding position, βB5. Calculations of the pK_a values of the βB5 arginines in a number of SH2 domains and of the βB5 histidine in a homology model of TYK2 suggest that this histidine is likely to be neutral around pH 7, thus indicating that it may have lost the ability to bind phosphotyrosine. If this indeed is the case, TYK2 may contain a domain with an SH2 fold that has a modified binding specificity.

The development of profile-based sequence alignment methods has led to major improvements in the detection of remote sequence relationships (see, e.g., refs. 1–4). However, for extremely low levels of sequence similarity, profile methods often fail. Because there are many examples of strong structural similarity in the absence of a detectable sequence relationship, a number of techniques have been developed that exploit structural information in the detection of remote homologs. Threading methods (see, e.g., ref. 5), which are purely structure-based, were developed with this goal in mind, and multiple sequence profiles have been generated directly from multiple structure alignments (see, e.g., ref. 6). Increasingly, novel ways have been found to combine structural and sequence information. These include the incorporation of structural information directly into sequence profiles (7–9) and the integration of sequence information into threading algorithms (4, 10, 11). This paper describes an approach that uses multiple structure alignments to merge profiles that have been generated from high-quality multiple sequence alignments. A Hidden Markov Model (HMM) then is generated from the merged alignment. The approach is related to previous work that has combined multiple structure and sequence information (e.g., ref. 9); however, there are a number of essential differences. These are due, in part, to the fact that our primary goal is not the detection of remote homologs but rather the generation of an optimal alignment for a homolog that already may have been detected.

The utility of the approach is demonstrated through its application to the SH2 domain family (reviewed in ref. 12). SH2 domains are composed of about 100 amino acids and consist of two α-helices packed against a central, antiparallel β-sheet. SH2 domains are present in a large number of signal-transduction proteins. They mediate intramolecular regulation and intermolecular protein—protein association by binding to specific motifs containing a tyrosine residue. In almost all known cases, phosphorylation of the tyrosine located within a specific peptide sequence on the target protein is a prerequisite for SH2 binding. The SH2 domain family includes members with as little as ≈15% pairwise sequence identity, which is a level at which pure sequence-based methods often generate erroneous alignments. One such example is provided by the SH2 domain of the Cbl adapter protein, which was identified as an SH2 domain only after its structure was solved (13). Cbl-SH2 thus provides an ideal test of both sequence-detection and sequence-alignment methods.

Janus kinases (JAKs) are a family of nonreceptor protein tyrosine kinases involved in signaling cascades initiated by various cytokines, interferons, and growth factors (14). There are four human JAK proteins: JAK1–3 and TYK2. JAKs share seven main regions of homology, termed JAK-homology domains 1–7 (JH1–7), numbered from the C to the N terminus (15). JH1 is the C-terminal protein kinase domain, and JH2 is a kinase-like domain whose precise function remains unclear (16). JH3–7 play a role in receptor interactions. There has been considerable uncertainty as to whether JAKs contain SH2 domains. An early description of the sequence of murine JAK2 mentions an “intriguing, albeit tenuous similarity” of the portion of the JH4 domains in JAKs with sequences of some SH2 domains (15). More recent sequence-search methods predict the presence of SH2 domains in JAKs (17), and a multiple sequence alignment obtained from clustal w (18) was reported. These results notwithstanding, the presence of SH2 domains in JAKs is not accepted universally (see, e.g., refs. 19–21). Moreover, the Pfam database (22) includes only JAK2 in the SH2 family alignment, whereas JAK1, JAK3, and TYK2 are not classified as SH2-containing. The smart database (23, 24) includes regions (in JH3–4) of JAK1–3 and TYK2 as SH2 domains, but the alignment does not include the BG loop that forms a part of the peptide-binding surface and the last β-strand (βG) that follows it. A recent study (25) reported that secondary-structure predictions for human JAK1, JAK2, and JAK3 are consistent with the presence of SH2 domains in these proteins and that fold recognition servers suggest the presence of an SH2 domain in human JAK2 and JAK3 proteins.

Despite the rather strong evidence that JAKs contain SH2 domains, a number of questions remain. In addition to the absence of an alignment in the C-terminal region, the alignment for TYK2 suggests that if it contains an SH2 domain, it is a highly unusual one. In all reported multiple alignments (see, e.g., refs. 18, 24, and 25), there is a histidine present at βB5, a position that contains a conserved arginine in every known SH2 domain as well as predicted SH2 domains in JAKs 1–3. This arginine is in direct contact with the phosphotyrosine and makes an important contribution to the binding affinity of phosphotyrosine-containing peptides (12, 26). Indeed, substitution of this arginine to a lysine in the SH2 domain of Abl kinase completely abolishes the binding function of this domain (27). It is possible, of course, that substitution of the conserved arginine with another basic residue in other SH2 domains has less dramatic effects and that a charged histidine in TYK2 plays the same role as the arginine. This, in fact, has been implicitly assumed in reported alignments. Another possibility is that the putative SH2 domain of TYK2 does not bind peptides that contain phosphotyrosine or can do so only under specific conditions. Specifically, given that the pK_a values of histidines in proteins can vary over many pH units, it is not at all clear whether the His in TYK2 is charged and whether SH2-TUK2 has the capacity to bind phosphotyrosine.

We have used the sequence alignment obtained from our profile method as a basis for addressing these issues. The profile clearly predicts the existence of an SH2 domain in all JAKs, and the resulting alignment includes the entire structural domain. The alignment is used to generate homology models for all JAK-SH2s, and a number of tests suggest that these are viable models. In agreement with the conclusions of previous studies, a His is located at the βB5 position in TYK2. However, in contrast to previous assumptions, evidence is presented that this His is unlikely to be protonated at pH 7. Thus, TYK2 appears to contain a highly unusual SH2 domain. Possible implications of this finding are discussed.

Methods

Selection of Representative SH2 Structures.

The Protein Data Bank (PDB) (28) contains more than 100 structures of SH2 domains. Of these, we selected 19 representative structures so that no two domains in the set have more than 95% sequence identity to one another. For structures with similar sequences (>95% sequence identity), those solved to highest resolution were selected. Where possible, structures determined with x-ray crystallography were chosen over those determined with NMR spectroscopy. The selected proteins and the associated PDB identifiers are listed in Fig. 1.

Alignment of JAKs and 19 SH2 domains of known structures. The multiple alignment of the SH2 domains is structure-based (6), whereas the four JAKs are aligned with an HMM of the SH2 domain family (see text for details). Minor manual modifications were made to close gaps in predicted secondary structures, e.g., the αB helix of TYK2. The conventional secondary-structure assignments and numbering for SH2 domains (12) are illustrated in cartoon form above the alignment. Residues are colored according to secondary structure: gold, β-strand; magenta, α-helix. For the JAKs, the secondary structure displayed is that predicted by jpred2 (39). For the 19 SH2 domains, the DSSP (49) secondary-structure assignments from the PDB coordinates are shown.

Structure-Based Multiple Alignment Procedure.

A structure-based sequence alignment of the 19 nonredundant SH2 domains is produced by using the multiple structure superposition routine implemented in prism (6, 29), followed by some manual adjustments to the loop regions (see Fig. 1). A Position-Specific Scoring Matrix was generated with psi-blast (2) based on the structure-based alignment and used to perform psi-blast searches of the SwissProt database (30) for other SH2 domains. The searches were performed with the BLOSUM62 substitution matrix (31) by using gap initiation and extension penalties of 11 and 1, respectively. Nineteen searches, using each of the SH2 domains shown in Fig. 1 as a probe sequence, each biased by the Position-Specific Scoring Matrix, were carried out. Two hundred and two sequences matched with expectation values of 0.01 or better (excluding the JAKs) are all annotated as SH2 domains in SwissProt. These sequences were used to generate a sequence profile of the SH2 domain family.

Each of the 202 sequences was aligned to each of the sequences of the 19 SH2 domains of known structure. A multiple sequence alignment of all sequences that were greater than 50% identical to a given PDB sequence (over a region of at least 70 residues in length) was carried out with clustal w (18). This resulted in 19 separate multiple sequence alignments. The 19 multiple sequence alignments then were combined based on the structure-based alignment shown in Fig. 1, yielding a single multiple sequence alignment containing a total of 141 sequences (including the 19 sequences shown in Fig. 1). This alignment is used to generate an HMM, which was used to align the SH2 domains that were not included in the earlier step. The HMM was built by using the hmmer 2.1.1 package (S. Eddy in http://hmmer.wustl.edu), and the alignment was carried out with the HMMALIGN utility in hmmer 2.1.1. In this process, the alignment of SH2 domains in the combined (containing 141 sequences) multiple sequence alignment was kept fixed. A flowchart of the procedure is given in Fig. 2. The sequence profile of the SH2 family generated as described then is used to construct an HMM and to align sequences of putative SH2 domains in JAKs.

Flowchart of the procedure used to align all SH2 domains.

The rationale for the procedure described in this section stems from an observation that, for closely related sequences (e.g., >50% identical), global sequence alignment performed with the Needleman—Wunsch method (32) implemented in the clustal w package produces high-quality alignments similar to those obtained from structural superposition, whereas the explicit use of structural information becomes crucial for the alignment of proteins with low sequence similarity (33, 34). This is illustrated in Fig. 3a, where the structure-based alignment of a subset of SH2 domains is compared with the sequence-only-based alignment of these proteins, reported in the Pfam database (22). As can be seen, for two of the four proteins shown, a significant fraction of the domain, including the C terminus α-helix, is missing in the Pfam alignment.

Sequence- and structure-based alignments of SH2 domains. (a) Alignments of several SH2 domains of known structure (colors indicate secondary structure elements, as in Fig. 1). (b) Alignments of Cbl with Src-SH2. Residues within Cbl-SH2 are shown in bold.

It is of interest to compare the alignment obtained from the flow chart in Fig. 2 with a more conventional, HMM-based alignment. To this end, we built an HMM based on the alignment of the 19 SH2 domains of known structure shown in Fig. 1 and used it to align all 202 sequences of SH2 domains found in SwissProt. The alignment produced in this way contained significantly more gaps, many of which fell within secondary structure elements of SH2 domains, e.g., 27 gaps are introduced within secondary structure of v-src SH2 (shown in red and yellow in Fig. 1), compared with the total of 7 gaps, introduced by the alignment procedure described in Fig. 2. Visual inspection also revealed that similar sequences were occasionally misaligned in the alignment of 202 SH2 sequences based on the HMM built on 19 superimposed structures.

Construction and Evaluation of Homology Models.

The multiple alignment shown in Fig. 1 was used to build a homology model of TYK2-SH2 as well as of putative SH2 domains in other human JAKs and of other SH2 domains. The program modeller 4.0 (35) was used, and all 19 structures were used simultaneously as templates. The Verify3D server (36, 37) was used to assess the quality of each model. The server analyzes a protein in terms of the suitability of each residue to be found in its specific local environment. The properties evaluated include the buried surface area of the residue, the fraction of the side chain surface area in contact with polar atoms, and the local secondary structure. A database of the precomputed preferences for each residue type to be found in specific environments is used to calculate a score for each residue in the model, with a higher score corresponding to a more favorable environment (37).

Results

Test of the SH2 Profile-Based Alignment: Alignment of Cbl-SH2.

As a test of the ability of the sequence profile generated for the SH2 family to align remote homologues, we aligned the sequence of the SH2 domain of the Cbl adapter protein (13) onto the profile. The sequence identity between Cbl-SH2 and other SH2 domains used in the structure-based alignment shown in Fig. 1 ranges from 7% to 20%. The presence of an SH2 domain in Cbl was not predicted based on sequence comparisons (13, 38), and, in contrast to putative SH2 domains in JAKs, Cbl-SH2 is not detected with an expectation value below 0.01 in our psi-blast searches (the best E value is 2.4). The structure of Cbl-SH2 was excluded from the structure-based profile used in this test. Given the low level of sequence identity, an accurate alignment of Cbl-SH2 with other SH2 domains represents a good test of the sensitivity of sequence-alignment procedures.

Fig. 3b reports alignments of Cbl-SH2 with Src-SH2 obtained from a number of methods. The structure-based alignment generated with prism is used as a standard. An alignment of Cbl-SH2 with Src-SH2 based on an HMM built on a sequence profile of the SH2 family, generated as described in Methods, is quite similar to the one obtained from structural superposition. In contrast, the Needleman–Wunsch method that attempts to align the entire sequence fails completely whereas psi-blast produces a reasonable alignment but for only part of the sequence.

Sequence-Based Identification of SH2 Domains in JAKs.

The psi-blast searches carried out as described in Methods identified TYK2 and JAK1–3 as SH2 domains with highly significant scores; the best E values obtained upon convergence range from 10⁻¹¹ (for TYK2) to 10⁻²⁰ (for JAK1 and JAK2). The JAK-SH2 sequences also were used as probe sequences to search the nonredundant database of the National Center for Biotechnology Information, nrDB (release of July 2000) by using psi-blast (2) with default parameters. In all four searches, a large number of SH2 domains were matched with E values ranging between 2 × 10⁻¹² and 2 × 10⁻²⁷.

Alignment of the full JAK1–3 and TYK2 sequences onto the SH2 sequence profile, performed as described in Methods, aligned the regions spanning residues 426–533, 399–500, 375–476, and 450–552 from human JAK1, JAK2, JAK3, and TYK2, respectively. A multiple alignment of the JAK SH2 domains with 19 sequences for which structures are available is shown in Fig. 1. It is important to note that the entire domain, including the βG strand and the preceding loop, is included in the alignment. As noted above, these regions were absent in the previously reported alignments.

Structural Analysis of Alignment Quality.

The secondary structure for each of the JAKs was predicted by using the JPRED2 server (39). JPRED2 produces a consensus secondary structure as predicted by many programs such as phd (40) and dsc (41). Fig. 1 shows the alignment of all four JAKs, with the 19 SH2 domains colored by secondary structure. The predicted secondary structures are in agreement with the secondary structures of typical SH2 domains. As can be seen in Fig. 1, the hydrophobicity patterns of each of the JAK-SH2s are also in very good agreement with the SH2 structures. Moreover, genthreader (4) matched all putative JAK-SH2 domains with the structures of known SH2 domains with a confidence level of “certain.”

A more detailed structural analysis of the sequence alignment reveals other features of SH2 domains that are present in JAKs. Residues βB2–4 are buried deeply in the SH2 structures listed in Fig. 1 and, therefore, are likely to be important for SH2 stability. These are aromatic–aliphatic–aliphatic residues in all 19 structures and are highly conserved among all SH2 sequences. They are aligned with Tyr-Val-Leu in JAKs 1–3 and Tyr-Leu-Ile in TYK2. The highly conserved Gly-AB7 adopts an unusual +φ, −ψ conformation and thus is likely to be important for the proper folding of an SH2 domain. The mutation of this residue to glutamate in Bruton's tyrosine kinase (Btk), found in patients with X-linked Agammaglobulinemia (XLA), introduces severe structural alterations in the SH2 domain. All four JAKs contain a glycine at this position. Three other specific mutations in patients also cause considerable perturbation of the structure of the Btk-SH2 domain (42). These mutations are Tyr-βD5 to Ser, Leu-αB5 to Phe, and His-αB9 to Gln. Human JAK1–3 and TYK2 contain either Phe or Cys at position βD5, both of which are seen in SH2 domains of known structures (see alignment of Fig. 1). Position αB5 is occupied by a Leu in JAK1–3 and TYK2, which is the amino acid found in most other known SH2 domains (Fig. 1). Position αB9 is occupied by aromatic and hydrophobic residues in JAK1–3 and TYK2 (Leu, Tyr, Cys, and Leu, correspondingly; see Fig. 1). Aromatic residues (Tyr, Phe) are also seen at this position in other SH2 domains.

The alignment we obtain for the BG loop is of particular interest because this region has not been included in previous JAK alignments. Examination of known SH2 domain structures reveals that, although the length of the BG loop is quite variable, there is a conserved BG loop “anchor” at the BG13 position. Most SH2 domains contain a leucine at this position that packs against the C terminus of the αB helix. A leucine is also present in our alignment of the JAK SH2 domains. Similarly, an aromatic residue is present at the βF3 position that, in SH2 domains of known structure, often packs against the N terminus of the αB helix. It seems clear from this analysis and from previous work that JAK family proteins have SH2 domains that fold into a structure that is quite similar to that of other SH2 domains.

Is the Binding Site Conserved?

The most dramatic difference between TYK2-SH2 and other SH2 domains is the identity of the residue at the βB5 position. TYK2 contains a histidine whereas all other SH2 domains, including the other JAK proteins, contain an arginine at the βB5 position. Although it is possible that the histidine plays the same role as the conserved arginine in all other SH2 domains, it is not clear that the histidine is charged in TYK2 under normal conditions. In the following section, we report the construction of a homology model of the putative SH2 domain of TYK2. Our goal is both to determine whether such a domain is likely to be stable and to consider its binding properties in greater detail.

Homology Model of TYK2.

A homology model of the SH2 domain in TYK2 was built as described in Methods. All 19 structures listed in Fig. 1 were used as templates. The Verify3D (36, 37) server was used to assess the quality of the model. The three-dimensional (3D) profiles generated for the model and for all 19 structures are shown in Fig. 4a. The profile of TYK2-SH2 falls within the range seen for the 19 experimental structures and does not score below 0.0 at any point, characteristic of a good model. The cumulative 3D scores, calculated by summing the 3D scores at each position in the profile (36, 37), are shown in Fig. 4b. The cumulative score of TYK2-SH2 model is 34, within the range obtained for the 19 SH2 structures (29.4–52.4, see Fig. 4b). Luthy et al. (36) presented a plot describing the relationship between the 3D score and the length of the protein in experimentally determined structures. The score obtained by the TYK2-SH2 model, which is 103 residues long, ranks among the scores obtained by medium-resolution x-ray structures and NMR structures.

3D profiles and cumulative scores for JAK-SH2 models and 19 SH2 structures. Profiles were obtained from the Verify3D server (37). (a) 3D profiles of the 19 selected SH2 domain structures and of the Tyk2-SH2 model. (b) Cumulative 3D scores for the 19 selected SH2 domain structures and of the four JAK models.

Although the model clearly is not accurate in all its details, the fact that it scores in the range seen for SH2 domains of known structure suggests that it is a reasonable approximation to the actual structure. Homology models for the JAK1–3-SH2 regions were built in the same manner as for TYK2. As seen from the data presented in Fig. 4b, all human JAK sequences fit the architecture of an SH2 fold quite well, as judged by Verify3D scores.

Binding of Phosphopeptides: Structure of a Putative Binding Site.

It appears quite likely that the SH2 domains of JAKs 1–3 bind phosphotyrosine-containing peptides, as do all SH2 domains of known structure. This is not the case for TYK2. The factors that determine the pK_a values of groups in proteins have been discussed extensively (43–46). Briefly, a basic group that is partially removed from solvent will tend to have its pK_a lowered relative to the isolated amino acid because the charged form of the amino acid will be less well solvated in the protein than in water. The loss of aqueous solvation, in principle, can be compensated by stabilizing hydrogen-bonding interactions with polar groups in the protein or with negatively charged amino acids. Thus, the observed pK_a will depend on the local environment of the group in the protein. We have used the method of Alexov and Gunner (47) to calculate the pK_a values of arginines at the βB5 position in five different SH2 domains of known structure (v-src, SHPTP2-N, Cbl, PLCγ-C, and STAT1). All calculated pK_a values are quite high (≥12) despite the fact that the partial burial of the Arg in the deep phosphotyrosine-binding pocket would, on its own, result in pK_a shifts to lower values. However, compensating interactions with nearby charged and polar groups raise the calculated pK_a values to those normally associated with Arg residues in proteins. The groups that make the largest stabilizing contributions are His or Gln residues at the βD4 position and/or spatially adjacent glutamic acids (at positions αA6 and BC1). His βD4 is, in particular, highly conserved perhaps because of its role in stabilizing the charge and position of the Arg.

The pK_a values of Arg βB5 in JAK1–3 are also predicted to be high. A histidine is present at the βD4 position in JAK2, and other groups interact favorably with Arg βB5 in JAKs 1 and 3. In contrast, His βB5 in our model of TYK2-SH2 is predicted to have an extremely low pK_a (near zero). This large shift results in part from desolvation effects and in part from the presence of a lysine at position βD4, where a stabilizing histidine is normally found. The only other SH2 domain with a highly basic residue at βD4 is the C-terminal SH2 domain in GAP, which contains an Arg at this position. We built a homology model for this domain and found that the calculated pK_a of Arg-βB5 in the model of GAP-SH2 is above 13. This is due, in part, to strong stabilizing interactions with Asp-BC1. In contrast, TYK2-SH2 contains Thr at this position. To test the sensitivity of the calculated pK_a of His-βB5 in TYK2-SH2 to the structural model, pK_a calculations on the top 10 models generated by modeller were carried out. The average pK_a of His-βB5 in these models is 0.6, and the highest calculated pK_a is 2.3.

Although pK_a calculations on inaccurate models are unlikely to produce accurate values, the major point of the calculations reported in this section is the demonstration that His βB5 in TYK2 is unlikely to be protonated. As is the case for the arginines at this position in other SH2 domains, its presence at the bottom of a deep pocket inevitably will shift its pK_a to lower values. However, in contrast to other SH2 domains, the strongest interaction in this region (with-Lys βD4) only serves to further reduce its pK_a. The available evidence then suggests that TYK2-SH2 differs from all known SH2 domains in that, at around pH 7, it does not present a positive charge at the βB5 position.

Discussion

The alignment of protein sequences based on the 3D superposition of their structures is known to provide a means for protein-sequence comparison in cases in which similarities are weak and cannot be detected reliably by sequence-only methods. In this paper, we have used structural superposition to improve a multiple alignment in a case in which a weak sequence relationship already has been detected. The combined sequence/structure alignment approach should be applicable generally in cases in which a number of structures are available for members of a particular protein family. However, the appropriate strategy may well vary from family to family and depend on the number of available structures and their range of structural distances. A general-purpose algorithm that exploits the overall approach described in this work has been benchmarked extensively and yields significant improvement over existing Position-Specific Scoring Matrix methods (L. Xie and B.H., unpublished results).

As summarized above, a number of novel methods have been reported recently that combine sequence and structural information to improve the detection of remote homologs (4, 7–11). In contrast, the primary goal of this work has been to improve sequence alignments for family members that already have been identified, and this has dictated a somewhat different strategy. For example, we have used psi-blast to detect these putative family members but have not relied on psi-blast alignments. Rather, we have used pure sequence-based methods only for cases with high levels of sequence identity (50% in this paper), whereas more distant sequence neighbors are aligned only to the merged structure-based HMM. This procedure reduces the probability of errors in the multiple sequence alignment that result from alignment problems for distantly related sequences.

Our results provide strong support for previous studies (17, 25) that have concluded that JAK family proteins contain SH2 domains. The most surprising prediction of our study is that human TYK2 kinase contains an SH2 domain that cannot bind phosphotyrosine. The presence of a histidine instead of an arginine in the crucial βB5 position indicates that this is a unique SH2 domain. In principle, it is possible, of course, that the histidine simply replaces the arginine as a determinant of binding affinity. However, pK_a shifts from desolvation effects and the apparent absence of interactions that stabilize the charged form of the His argue that it is unlikely to attract a negatively charged substrate such as phosphotyrosine-containing peptides. A number of conjectures suggest themselves. It is possible that another residue in the binding site, specifically, Lys-βD4, coordinates a phosphotyrosine or that other still-unknown factors (for example, another protein domain or low pH) serve to enhance binding. Alternate possibilities are that TYK2-SH2 associates with a completely different class of targets and that its activity is not controlled by phosphorylation or, possibly, that it binds nonphosphorylated tyrosines. Resolution of these questions can come only from experimental studies.

Acknowledgments

This work was supported, in part, by National Institutes of Health Grant GM-30518 (to B.H.) and a Sloan/Department of Energy Postdoctoral Fellowship in Computational Molecular Biology (to F.B.S.).

Abbreviations

JAKs: Janus kinases
PDB: Protein Data Bank
3D: three-dimensional
HMM: Hidden Markov Model

References

1.Krogh A, Brown M, Mian S I, Sjolander K, Haussler D. J Mol Biol. 1994;235:1501–1531. doi: 10.1006/jmbi.1994.1104. [DOI] [PubMed] [Google Scholar]
2.Altschul S F, Madden T L, Schaffer A A, Zhang J, Zhang Z, Miller W, Lipman D J. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Eddy S R. Curr Opin Struct Biol. 1996;6:361–365. doi: 10.1016/s0959-440x(96)80056-x. [DOI] [PubMed] [Google Scholar]
4.Jones D T. J Mol Biol. 1999;1999:797–815. doi: 10.1006/jmbi.1999.2583. [DOI] [PubMed] [Google Scholar]
5.Fischer D, Eisenberg D. Protein Sci. 1996;5:947–955. doi: 10.1002/pro.5560050516. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Yang A-S, Honig B. J Mol Biol. 2000;301:691–711. doi: 10.1006/jmbi.2000.3975. [DOI] [PubMed] [Google Scholar]
7.Rice D W, Eisenberg D. J Mol Biol. 1997;267:1026–1038. doi: 10.1006/jmbi.1997.0924. [DOI] [PubMed] [Google Scholar]
8.Hargbo J, Elofsson A. Proteins. 1999;36:68–76. [PubMed] [Google Scholar]
9.Kelley L A, MacCallum R M, Sternberg M J E. J Mol Biol. 2000;299:499–520. doi: 10.1006/jmbi.2000.3741. [DOI] [PubMed] [Google Scholar]
10.Panchenko A, Marchler-Bauer A, Bryant S. J Mol Biol. 2000;296:1319–1331. doi: 10.1006/jmbi.2000.3541. [DOI] [PubMed] [Google Scholar]
11.Kolinski A, Betancourt M R, Kihara D, Rotkiewicz P, Skolnick J. Proteins. 2001;44:133–149. doi: 10.1002/prot.1080. [DOI] [PubMed] [Google Scholar]
12.Kuriyan J, Cowburn D. Annu Rev Biophys Biomol Struct. 1997;26:259–288. doi: 10.1146/annurev.biophys.26.1.259. [DOI] [PubMed] [Google Scholar]
13.Meng W, Sawasdikosol S, Burakoff S J, Eck M J. Nature (London) 1999;398:84–90. doi: 10.1038/18050. [DOI] [PubMed] [Google Scholar]
14.Schindler C, Darnell J. Annu Rev Biochem. 1995;64:621–651. doi: 10.1146/annurev.bi.64.070195.003201. [DOI] [PubMed] [Google Scholar]
15.Harpur A G, Andres A C, Ziemiecki A, Aston R R, Wilks A F. Oncogene. 1992;7:1347–1353. [PubMed] [Google Scholar]
16.Yeh T C, Dondi E, Uze G, Pellegrini S. Proc Natl Acad Sci USA. 2000;97:8991–8996. doi: 10.1073/pnas.160130297. . (First Published July 25, 2000; 10.1073/pnas.160130297) [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Bork P, Gibson T J. Methods Enzymol. 1996;266:162–184. doi: 10.1016/s0076-6879(96)66013-3. [DOI] [PubMed] [Google Scholar]
18.Higgins D G, Thompson J D, Gibson T J. Methods Enzymol. 1996;266:383–402. doi: 10.1016/s0076-6879(96)66024-8. [DOI] [PubMed] [Google Scholar]
19.Yan H, Krishnan K, Lim J T E, Contillo L G, Krolewski J J. Mol Cell Biol. 1996;16:2074–2082. doi: 10.1128/mcb.16.5.2074. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Gauzzi M C, Barbieri G, Richter M F, Uze G, Ling L, Fellous M, Pellegrini S. Proc Natl Acad Sci USA. 1997;94:11839–11844. doi: 10.1073/pnas.94.22.11839. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Ali M S, Sayeski P P, Bernstein K E. J Biol Chem. 2000;275:15586–15593. doi: 10.1074/jbc.M908931199. [DOI] [PubMed] [Google Scholar]
22.Bateman A, Birney E, Durbin R, Eddy S R, Howe K L, Sonnhammer E L. Nucleic Acids Res. 2000;28:263–266. doi: 10.1093/nar/28.1.263. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Schultz J, Milpetz F, Bork P, Ponting C P. Proc Natl Acad Sci USA. 1998;95:5857–5864. doi: 10.1073/pnas.95.11.5857. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Ponting C P, Schultz J, Milpetz F, Bork P. Nucleic Acids Res. 1999;27:229–232. doi: 10.1093/nar/27.1.229. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Kampa D, Burnside J. Biochem Biophys Res Commun. 2000;278:175–182. doi: 10.1006/bbrc.2000.3757. [DOI] [PubMed] [Google Scholar]
26.Bradshaw J M, Waksman G. Biochemistry. 1999;38:5147–5154. doi: 10.1021/bi982974y. [DOI] [PubMed] [Google Scholar]
27.Mayer B J, Jackson P K, van Etten R A, Baltimore D. Mol Cell Biol. 1992;12:609–618. doi: 10.1128/mcb.12.2.609. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Bernstein F C, Koetzle T F, Williams G L B, Meyer E F, Jr, Brice M D, Rodgers J R, Kennard O, Shimanouchi T, Tasumi M. J Mol Biol. 1977;112:535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
29.Yang A S, Honig B. Proteins. 1999;37, Suppl. 3:66–72. doi: 10.1002/(sici)1097-0134(1999)37:3+<66::aid-prot10>3.3.co;2-b. [DOI] [PubMed] [Google Scholar]
30.Bairoch A, Apweiler R. Nucleic Acids Res. 1999;27:49–54. doi: 10.1093/nar/27.1.49. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Henikoff S, Henikoff J G. Proc Natl Acad Sci USA. 1992;89:10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Needleman S B, Wunsch C D. J Mol Biol. 1970;48:443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]
33.Abagyan R A, Batalov S. J Mol Biol. 1997;273:355–368. doi: 10.1006/jmbi.1997.1287. [DOI] [PubMed] [Google Scholar]
34.Sauder J M, Arthur J W, Dunbrack R L., Jr Proteins. 2000;40:6–22. doi: 10.1002/(sici)1097-0134(20000701)40:1<6::aid-prot30>3.0.co;2-7. [DOI] [PubMed] [Google Scholar]
35.Sali A, Blundell T L. J Mol Biol. 1993;234:779–815. doi: 10.1006/jmbi.1993.1626. [DOI] [PubMed] [Google Scholar]
36.Luthy R, Bowie J U, Eisenberg D. Nature (London) 1992;356:83–85. doi: 10.1038/356083a0. [DOI] [PubMed] [Google Scholar]
37.Eisenberg D, Luthy R, Bowie J U. Methods Enzymol. 1997;277:396–404. doi: 10.1016/s0076-6879(97)77022-8. [DOI] [PubMed] [Google Scholar]
38.Kuriyan J, Darnell J E., Jr Nature (London) 1999;398:22–23. doi: 10.1038/17916. [DOI] [PubMed] [Google Scholar]
39.Cuff J A, Clamp M E, Siddiqui A S, Finlay M, Barton G J. Bioinformatics. 1998;14:892–893. doi: 10.1093/bioinformatics/14.10.892. [DOI] [PubMed] [Google Scholar]
40.Rost B, Sander C, Schneider R. Comput Appl Biosci. 1994;10:53–60. doi: 10.1093/bioinformatics/10.1.53. [DOI] [PubMed] [Google Scholar]
41.King R D, Saqi M, Sayle R, Sternberg M J. Comput Appl Biosci. 1997;13:473–474. doi: 10.1093/bioinformatics/13.4.473. [DOI] [PubMed] [Google Scholar]
42.Mattsson P T, Lappalainen I, Backesjo C M, Brockmann E, Lauren S. J Immunol. 2000;164:4170–4177. doi: 10.4049/jimmunol.164.8.4170. [DOI] [PubMed] [Google Scholar]
43.Bashford D, Karplus M. Biochemistry. 1990;29:10219–10225. doi: 10.1021/bi00496a010. [DOI] [PubMed] [Google Scholar]
44.Yang A S, Gunner M R, Sampogna R, Sharp K, Honig B. Proteins. 1993;15:252–265. doi: 10.1002/prot.340150304. [DOI] [PubMed] [Google Scholar]
45.Antosiewicz J, McCammon J A, Gilson M K. Biochemistry. 1996;35:7819–7833. doi: 10.1021/bi9601565. [DOI] [PubMed] [Google Scholar]
46.Sham Y Y, Chu Z T, Warshel A. J Phys Chem. 1997;101:4458–4472. [Google Scholar]
47.Alexov E, Gunner M. Biophys J. 1997;74:2075–2093. doi: 10.1016/S0006-3495(97)78851-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Waksman G, Kominos D, Robertson S C, Pant N, Baltimore D, Birge R B, Cowburn D, Hanafusa H, Mayer B J, Overduin M, et al. Nature (London) 1992;358:646–653. doi: 10.1038/358646a0. [DOI] [PubMed] [Google Scholar]
49.Kabsch W, Sander C. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]

[B1] 1.Krogh A, Brown M, Mian S I, Sjolander K, Haussler D. J Mol Biol. 1994;235:1501–1531. doi: 10.1006/jmbi.1994.1104. [DOI] [PubMed] [Google Scholar]

[B2] 2.Altschul S F, Madden T L, Schaffer A A, Zhang J, Zhang Z, Miller W, Lipman D J. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Eddy S R. Curr Opin Struct Biol. 1996;6:361–365. doi: 10.1016/s0959-440x(96)80056-x. [DOI] [PubMed] [Google Scholar]

[B4] 4.Jones D T. J Mol Biol. 1999;1999:797–815. doi: 10.1006/jmbi.1999.2583. [DOI] [PubMed] [Google Scholar]

[B5] 5.Fischer D, Eisenberg D. Protein Sci. 1996;5:947–955. doi: 10.1002/pro.5560050516. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Yang A-S, Honig B. J Mol Biol. 2000;301:691–711. doi: 10.1006/jmbi.2000.3975. [DOI] [PubMed] [Google Scholar]

[B7] 7.Rice D W, Eisenberg D. J Mol Biol. 1997;267:1026–1038. doi: 10.1006/jmbi.1997.0924. [DOI] [PubMed] [Google Scholar]

[B8] 8.Hargbo J, Elofsson A. Proteins. 1999;36:68–76. [PubMed] [Google Scholar]

[B9] 9.Kelley L A, MacCallum R M, Sternberg M J E. J Mol Biol. 2000;299:499–520. doi: 10.1006/jmbi.2000.3741. [DOI] [PubMed] [Google Scholar]

[B10] 10.Panchenko A, Marchler-Bauer A, Bryant S. J Mol Biol. 2000;296:1319–1331. doi: 10.1006/jmbi.2000.3541. [DOI] [PubMed] [Google Scholar]

[B11] 11.Kolinski A, Betancourt M R, Kihara D, Rotkiewicz P, Skolnick J. Proteins. 2001;44:133–149. doi: 10.1002/prot.1080. [DOI] [PubMed] [Google Scholar]

[B12] 12.Kuriyan J, Cowburn D. Annu Rev Biophys Biomol Struct. 1997;26:259–288. doi: 10.1146/annurev.biophys.26.1.259. [DOI] [PubMed] [Google Scholar]

[B13] 13.Meng W, Sawasdikosol S, Burakoff S J, Eck M J. Nature (London) 1999;398:84–90. doi: 10.1038/18050. [DOI] [PubMed] [Google Scholar]

[B14] 14.Schindler C, Darnell J. Annu Rev Biochem. 1995;64:621–651. doi: 10.1146/annurev.bi.64.070195.003201. [DOI] [PubMed] [Google Scholar]

[B15] 15.Harpur A G, Andres A C, Ziemiecki A, Aston R R, Wilks A F. Oncogene. 1992;7:1347–1353. [PubMed] [Google Scholar]

[B16] 16.Yeh T C, Dondi E, Uze G, Pellegrini S. Proc Natl Acad Sci USA. 2000;97:8991–8996. doi: 10.1073/pnas.160130297. . (First Published July 25, 2000; 10.1073/pnas.160130297) [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Bork P, Gibson T J. Methods Enzymol. 1996;266:162–184. doi: 10.1016/s0076-6879(96)66013-3. [DOI] [PubMed] [Google Scholar]

[B18] 18.Higgins D G, Thompson J D, Gibson T J. Methods Enzymol. 1996;266:383–402. doi: 10.1016/s0076-6879(96)66024-8. [DOI] [PubMed] [Google Scholar]

[B19] 19.Yan H, Krishnan K, Lim J T E, Contillo L G, Krolewski J J. Mol Cell Biol. 1996;16:2074–2082. doi: 10.1128/mcb.16.5.2074. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20.Gauzzi M C, Barbieri G, Richter M F, Uze G, Ling L, Fellous M, Pellegrini S. Proc Natl Acad Sci USA. 1997;94:11839–11844. doi: 10.1073/pnas.94.22.11839. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Ali M S, Sayeski P P, Bernstein K E. J Biol Chem. 2000;275:15586–15593. doi: 10.1074/jbc.M908931199. [DOI] [PubMed] [Google Scholar]

[B22] 22.Bateman A, Birney E, Durbin R, Eddy S R, Howe K L, Sonnhammer E L. Nucleic Acids Res. 2000;28:263–266. doi: 10.1093/nar/28.1.263. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23.Schultz J, Milpetz F, Bork P, Ponting C P. Proc Natl Acad Sci USA. 1998;95:5857–5864. doi: 10.1073/pnas.95.11.5857. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] 24.Ponting C P, Schultz J, Milpetz F, Bork P. Nucleic Acids Res. 1999;27:229–232. doi: 10.1093/nar/27.1.229. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25.Kampa D, Burnside J. Biochem Biophys Res Commun. 2000;278:175–182. doi: 10.1006/bbrc.2000.3757. [DOI] [PubMed] [Google Scholar]

[B26] 26.Bradshaw J M, Waksman G. Biochemistry. 1999;38:5147–5154. doi: 10.1021/bi982974y. [DOI] [PubMed] [Google Scholar]

[B27] 27.Mayer B J, Jackson P K, van Etten R A, Baltimore D. Mol Cell Biol. 1992;12:609–618. doi: 10.1128/mcb.12.2.609. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] 28.Bernstein F C, Koetzle T F, Williams G L B, Meyer E F, Jr, Brice M D, Rodgers J R, Kennard O, Shimanouchi T, Tasumi M. J Mol Biol. 1977;112:535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]

[B29] 29.Yang A S, Honig B. Proteins. 1999;37, Suppl. 3:66–72. doi: 10.1002/(sici)1097-0134(1999)37:3+<66::aid-prot10>3.3.co;2-b. [DOI] [PubMed] [Google Scholar]

[B30] 30.Bairoch A, Apweiler R. Nucleic Acids Res. 1999;27:49–54. doi: 10.1093/nar/27.1.49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B31] 31.Henikoff S, Henikoff J G. Proc Natl Acad Sci USA. 1992;89:10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] 32.Needleman S B, Wunsch C D. J Mol Biol. 1970;48:443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]

[B33] 33.Abagyan R A, Batalov S. J Mol Biol. 1997;273:355–368. doi: 10.1006/jmbi.1997.1287. [DOI] [PubMed] [Google Scholar]

[B34] 34.Sauder J M, Arthur J W, Dunbrack R L., Jr Proteins. 2000;40:6–22. doi: 10.1002/(sici)1097-0134(20000701)40:1<6::aid-prot30>3.0.co;2-7. [DOI] [PubMed] [Google Scholar]

[B35] 35.Sali A, Blundell T L. J Mol Biol. 1993;234:779–815. doi: 10.1006/jmbi.1993.1626. [DOI] [PubMed] [Google Scholar]

[B36] 36.Luthy R, Bowie J U, Eisenberg D. Nature (London) 1992;356:83–85. doi: 10.1038/356083a0. [DOI] [PubMed] [Google Scholar]

[B37] 37.Eisenberg D, Luthy R, Bowie J U. Methods Enzymol. 1997;277:396–404. doi: 10.1016/s0076-6879(97)77022-8. [DOI] [PubMed] [Google Scholar]

[B38] 38.Kuriyan J, Darnell J E., Jr Nature (London) 1999;398:22–23. doi: 10.1038/17916. [DOI] [PubMed] [Google Scholar]

[B39] 39.Cuff J A, Clamp M E, Siddiqui A S, Finlay M, Barton G J. Bioinformatics. 1998;14:892–893. doi: 10.1093/bioinformatics/14.10.892. [DOI] [PubMed] [Google Scholar]

[B40] 40.Rost B, Sander C, Schneider R. Comput Appl Biosci. 1994;10:53–60. doi: 10.1093/bioinformatics/10.1.53. [DOI] [PubMed] [Google Scholar]

[B41] 41.King R D, Saqi M, Sayle R, Sternberg M J. Comput Appl Biosci. 1997;13:473–474. doi: 10.1093/bioinformatics/13.4.473. [DOI] [PubMed] [Google Scholar]

[B42] 42.Mattsson P T, Lappalainen I, Backesjo C M, Brockmann E, Lauren S. J Immunol. 2000;164:4170–4177. doi: 10.4049/jimmunol.164.8.4170. [DOI] [PubMed] [Google Scholar]

[B43] 43.Bashford D, Karplus M. Biochemistry. 1990;29:10219–10225. doi: 10.1021/bi00496a010. [DOI] [PubMed] [Google Scholar]

[B44] 44.Yang A S, Gunner M R, Sampogna R, Sharp K, Honig B. Proteins. 1993;15:252–265. doi: 10.1002/prot.340150304. [DOI] [PubMed] [Google Scholar]

[B45] 45.Antosiewicz J, McCammon J A, Gilson M K. Biochemistry. 1996;35:7819–7833. doi: 10.1021/bi9601565. [DOI] [PubMed] [Google Scholar]

[B46] 46.Sham Y Y, Chu Z T, Warshel A. J Phys Chem. 1997;101:4458–4472. [Google Scholar]

[B47] 47.Alexov E, Gunner M. Biophys J. 1997;74:2075–2093. doi: 10.1016/S0006-3495(97)78851-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B48] 48.Waksman G, Kominos D, Robertson S C, Pant N, Baltimore D, Birge R B, Cowburn D, Hanafusa H, Mayer B J, Overduin M, et al. Nature (London) 1992;358:646–653. doi: 10.1038/358646a0. [DOI] [PubMed] [Google Scholar]

[B49] 49.Kabsch W, Sander C. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]

PERMALINK

Combining multiple structure and sequence alignments to improve sequence detection and alignment: Application to the SH2 domains of Janus kinases

Bissan Al-Lazikani

Felix B Sheinerman

Barry Honig

Abstract

Methods

Selection of Representative SH2 Structures.

Figure 1.

Structure-Based Multiple Alignment Procedure.

Figure 2.

Figure 3.

Construction and Evaluation of Homology Models.

Results

Test of the SH2 Profile-Based Alignment: Alignment of Cbl-SH2.

Sequence-Based Identification of SH2 Domains in JAKs.

Structural Analysis of Alignment Quality.

Is the Binding Site Conserved?

Homology Model of TYK2.

Figure 4.

Binding of Phosphopeptides: Structure of a Putative Binding Site.

Discussion

Acknowledgments

Abbreviations

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Combining multiple structure and sequence alignments to improve sequence detection and alignment: Application to the SH2 domains of Janus kinases

Bissan Al-Lazikani

Felix B Sheinerman

Barry Honig

Abstract

Methods

Selection of Representative SH2 Structures.

Figure 1.

Structure-Based Multiple Alignment Procedure.

Figure 2.

Figure 3.

Construction and Evaluation of Homology Models.

Results

Test of the SH2 Profile-Based Alignment: Alignment of Cbl-SH2.

Sequence-Based Identification of SH2 Domains in JAKs.

Structural Analysis of Alignment Quality.

Is the Binding Site Conserved?

Homology Model of TYK2.

Figure 4.

Binding of Phosphopeptides: Structure of a Putative Binding Site.

Discussion

Acknowledgments

Abbreviations

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases