Abstract
Tyrosine recombinases participate in diverse biological processes by catalyzing recombination between specific DNA sites. Although a conserved protein fold has been described for the catalytic (CAT) domains of five recombinases, structural relationships between their core-binding (CB) domains remain unclear. Despite differences in the specificity and affinity of core-type DNA recognition, a conserved binding mechanism is suggested by the shared two-domain motif in crystal structure models of the recombinases Cre, XerD and Flp. We have found additional evidence for conservation of the CB domain fold. Comparison of XerD and Cre crystal structures showed that their CB domains are closely related; the three central α-helices of these domains are superposable to within 1.44 Å. A structure-based multiple sequence alignment containing 25 diverse CB domain sequences provided evidence for widespread conservation of both structural and functional elements in this fold. Based upon the Cre and XerD crystal structures, we employed homology modeling to construct a three-dimensional structure for the λ integrase CB domain. The model provides a conceptual framework within which many previously identified, functionally important amino acid residues were investigated. In addition, the model predicts new residues that may participate in core-type DNA binding or dimerization, thereby providing hypotheses for future genetic and biochemical experiments.
INTRODUCTION
The tyrosine family of site-specific recombinases contains hundreds of members (1,2, this study). These enzymes catalyze functionally diverse processes, including integration and excision of phages from their host chromosomes, conjugative transposition, partitioning of phage, bacterial and plasmid genomes during cell division, antigenic phase variation, dissemination of antibiotic- and antiseptic- resistance gene cassettes, and relaxation of DNA supercoils (reviewed in 1–6). Members of this family are defined by four strongly conserved active-site residues in the catalytic (CAT) domain; these residues consist of an arginine–histidine– arginine triad, and a tyrosine nucleophile that covalently bonds the DNA upon cleavage of the phosphodiester backbone (7–10).
Three-dimensional structures of varying complexity have been published for five tyrosine recombinases: Cre (11–15), XerD (16), HP1 integrase (HP1-Int) (17), Flp (18) and λ integrase (λ-Int) (9,19). Despite apparent mechanistic and regulatory differences between the respective recombination systems in which they function, these structures have revealed a conserved CAT domain fold (20). This similarity provided a common framework facilitating comparison of experimental data pertaining to each recombination system. Such comparisons led to proposals for a conserved mechanism in the recombination reaction describing interactions between the recombinase and DNA, the role of active-site residues in the nucleic acid chemistry of recombination, the role of inter-protein contacts in asymmetric DNA cleavage, and the geometry and dynamics of conformational changes undergone by the Holliday junction intermediate (13,20–25).
These mechanistic similarities may be reflected by structural conservation extending beyond the CAT domain fold. Genetic and biochemical experiments with λ-Int and HK022 integrase (HK-Int) suggest an important role for both the core-binding (CB) and CAT domains in core-type DNA recognition (26–32). Regarding these experiments, Dorgai et al. (30) have noted that the mechanism of sequence recognition appears highly conserved among tyrosine recombinases, despite a diversity of primary structure and biological function. Tirumalai et al. (27) observed that the members of this family share a striking resemblance in their use of a two-domain structure for core-type DNA binding. Similarities between the recent Cre and Flp co-crystal structures with DNA suggest that quaternary organization of the CB and CAT domains into a crescent-shaped clamp may be conserved as well (11–15,18).
Despite these similarities, structural relationships between the crystallized CB domain structures from XerD, Cre and Flp are unclear. Although a general likeness between the XerD and Cre CB domains has been noted (6), the CB domains of Cre and Flp appear unrelated (18). Because CB domain structures are not available for other well characterized recombinases such as HP1-Int or λ-Int, it is not possible to compare their CB domains directly with those of Cre, XerD or Flp. Furthermore, evolutionary divergence between these enzymes complicates direct comparison of CB domain residues by alignment of amino acid sequences (see below). This paucity of structural information precludes both investigation of general relationships between recombinase CB domains and identification of conserved structural and functional elements therein.
In this study, we examined CB domain conservation through comparison of the XerD and Cre crystal structures, profile-based identification of related CB domain sequences, assembly of multiple-sequence alignments, and comparative modeling of the λ-Int CB domain. Our results suggest that the CB domain fold is widely conserved among viral, eubacterial and archaeal recombinases. The λ-Int model presented here provides a hitherto missing structural relationship with other tyrosine recombinases; core-type DNA-binding studies on λ-Int and HK-Int can now be directly related to the structures solved for Cre and XerD. In addition, the λ-Int model allowed us to directly rationalize the function of many previously identified amino acid residues, and predict new important residues in λ-Int whose functions can be tested in future genetic and biochemical experiments.
MATERIALS AND METHODS
Determination of root mean square deviation
The root mean square (RMS) deviation of backbone atoms in the Cre and XerD CB domains was calculated with Swiss-PdbViewer (33,34). Corresponding pairs of amino acid residues in helices 2, 3 and 4 (see Fig. 2) were determined by inspection of the 4crx (Cre) (13) and 1a0p (XerD) (16) atomic coordinates. The structures were then superposed based upon three-dimensional alignment of these residue pairs. The RMS deviation of backbone atoms comprising the structures used for superposition was computed as follows: the distance between each atom pair was squared, the squared values were then averaged for the regions in question, and the square root of the average was reported as the RMS deviation. This calculation was performed for groups of backbone atoms located in helices h2 (Cre residues 20–40, XerD residues 23–43), h3 (Cre 49–57, XerD 52–60), h4 (Cre 68–84, XerD 71–87), and the turn between h2 and h3 (Cre 41–48, XerD 44–51).
Figure 2.
Comparison of the XerD and Cre CB domains. The XerD and Cre CB domains were isolated from the remainder of their respective structures, 1a0p (16) and 4crx (13), respectively. The α-helices in each domain are colored according to their N- to C-terminal order of succession in the polypeptide chain, as in Figure 1. The orientation of the structures was chosen to illustrate both the structural similarity between the XerD and Cre CB domains and the relationship between the Cre CB domain and loxP core-type DNA. In the Cre structures (left-hand side), a fragment of the loxP DNA backbone is shown as a magenta cartoon. (A and B) The Cre CB domain is shown such that the DNA molecule is viewed along the helical axis, with the XerD CB domain shown in a similar orientation. (C and D) The structures have been rotated ∼80° about the x-axis. [Figure prepared with the program PyMOL (55).]
Psi-BLAST searches
We performed Psi-BLAST searches (35) to identify protein sequences that may contain structural homologs to the CB domain. When queried with the λ-Int CB domain sequence (residues 65–170), four iterations of Psi-BLAST searches identified 546 putative CB domain sequence homologs without reaching convergence. After each iterative search, several sequence classes were discarded before submitting the remaining results for profile refinement. For example, entries discarded after the third iterative search included five sequences containing frequent large gaps in the alignment (>10 residues), six enzymes that are not recombinases, three sequences originating from fusion proteins or IS insertions, four potentially non-functional sequences either from cryptic prophages or annotated as defective proteins, three sequences judged too short to encode active recombinases (<156 total residues), and 11 judged too long (>550 total residues). In addition, it was assumed that 32 alignments to the λ-Int CB domain shorter than 75 amino acids were either from unrelated or non-functional proteins, and these sequences were removed. After these adjustments, 345 sequences (85%) remained for refinement of the fourth iterative search, which yielded 546 total hits with ‘expect’ values ranging from 2 × 10–21 to 8.8.
Threading-based multiple-sequence alignment
The multiple-sequence alignment in Figure 3 was constructed using a combination of methods. Twenty-five divergent sequences were first selected from the Psi-BLAST results, including several sequences that encode well studied recombinases. For each sequence, structural homology to all proteins in the PDB database (36) was evaluated with the mGenTHREADER fold-recognition algorithm (37,38). Most searches generated alignments to both the 1a0p (XerD) and 4crx (Cre) structures. Since all of these pairwise alignments were found to be consistent with each other, the entire collection was assembled manually into a composite multiple-sequence alignment. Protein sequence conservation in the full alignment was evaluated with CLUSTALX (39–41). Low scoring regions were calculated to identify possible misaligned regions, and the mGenTHREADER results corresponding to each of these regions were re-examined carefully to determine whether the low-scoring regions could be improved. The presence of conserved residues in other sequences and the relative location of the putative CB domain within the full-length protein was used as a guide in these judgments. In general, consistent results were obtained with all sequences; only the N-terminal portion of the P22 integrase sequence was realigned to account for a probable 30-residue insertion in the middle of its CB domain. Figure 3 was created from this alignment with CLUSTALX, using default coloring parameters for residue conservation.
Figure 3.
Alignment of 25 CB domain sequences. The multiple-sequence alignment was assembled from individual pairwise alignments to Cre or XerD constructed by mGenTHREADER. Conserved residues present in the majority of sequences are colored according to their chemical properties, using the default coloring parameters in CLUSTALX. Briefly, the residues are colored as the following groups if the group members comprise >50% of the column: D, E, purple; R, K, red; A, V, I, L, F, M, C, W, blue; H, Y, dark blue; S, T, N, Q, green; G, orange; P, yellow (see Materials and Methods). Proline and glycine are colored regardless of their frequency or the dominant character of the column. The leftmost column shows the species name from which each recombinase was identified, and the GI number for each sequence. Above the alignment is a cartoon representation of the secondary structure found in the Cre sequence. Each α-helix is shown as a cylinder, colored in accordance with the previous CB domain representations in Figures 1 and 2. The locations of four large sequence insertions are marked by black triangles; the length of each insertion in amino acids is indicated above each triangle, and the particular sequence containing the insertion is indicated by the character ‘X’ in the body of the alignment. [Figure prepared with the program CLUSTALX (39–41).]
Homology modeling of the λ-Int core-binding domain
In an attempt to create the best possible model of the λ-Int CB domain, residues 66–169 were submitted to nine different structure-prediction servers available through the Structure Prediction Meta Server (42). These servers included the Fold & Function Assignment System server (FFAS) (43), the Sequence Alignment and Modeling System server (Sam-T99) (44,45), the PSIpred protein structure prediction center (mGenTHREADER) (35,37,46), the 3D-PSSM server (47–49), the Bioinbgu server (INBGU) (50), the Profile Library Search Against HOMSTRAD server (FUGUE) (51), the Superfamily HMM and genome assignments server (52), the Pcons2 consensus prediction server (53), and the 123D+ server (54). Each of these servers generated highly consistent sequence-structure alignments between λ-Int and Cre or XerD. To create a comparative three-dimensional model, the λ-Int CB sequence was manually threaded onto the crystal structures of XerD or Cre to match the sequence-structure alignments from three of these fold-recognition algorithms: FFAS, mGenTHREADER and 3D-PSSM. Threading was performed with Deep View Swiss-PdbViewer software (33,34), and the aligned structures were submitted to the Swiss-Model server (34) for model construction, including loop building and energy minimization. Because there was no apparent difference between the modeling results based upon each of the three alignments, the structure derived from the mGenTHREADER alignment to 4crx (Cre) was arbitrarily chosen for further analysis. The molecular coordinates of this model have been deposited with the PDB database under accession code 1m97.
However, a potential issue exists regarding the 4crx Cre-based model: although the effect is more pronounced for h1 than h5, both of these α-helices are extended away from the core of the Cre CB domain by dimerization interactions. This is in contrast to the XerD crystal structure, which does not demonstrate related dimerization contacts. In order to evaluate properly the hydrophobic core of the model, we wished to model the h1 helix packed against the core of the CB domain as seen in the XerD structure. Since XerD and Cre were found to be equally suitable as modeling templates, we created a hybrid structure with the XerD h1 atomic coordinates substituted for the h1 α-helix of Cre. This allowed us to combine the template features we desired from both structures: the Cre portion of the template structure simplified subsequent analysis of putative DNA-binding contacts, and the XerD portion contributed a more compact hydrophobic core. The hybrid template was constructed by superposing the h2–h4 helices of Cre and XerD (see RMS deviation above), and merging the atomic coordinates of the XerD h1 helix with the atomic coordinates of Cre (minus the h1 helix) into a single structure. The model created from this hybrid XerD–Cre template was used to prepare all figures in this study. Structural representations of the λ-Int model were created with the free molecular-graphics program PyMOL (55).
Superposition of the λ-Int model on the Cre–loxP co-crystal structure
Because energy minimization by the Swiss-Model server (34) primarily modifies the orientation of residue side chains, the backbone path of the λ-Int model remained highly similar to the Cre template structure. Thus, the λ-Int model could be precisely superposed with the Cre CB domain in the 4crx structure, analogous to superposition of Cre and XerD for determination of RMS deviation. The superposed structure allows direct evaluation of putative protein–DNA contacts between the λ-Int model and a core-type DNA site, as approximated by Cre binding to loxP. Since the DNA molecule in the λ-Int model is a fragment of loxP, this approach ignores potential sequence-dependent changes in the structure of the core-type DNA site, and assumes that a DNA molecule bound by the λ-Int CB domain will be in the same conformation as DNA bound by the asymmetric Cre dimer. Despite these caveats, the superposed structures provide a convenient method to identify residues that may participate in DNA-binding interactions without requiring complex molecular-docking simulations.
Prediction of DNA-binding contacts in the model
Amino acid residues located proximal to the DNA in the model were identified by inspection. Each amino acid side chain was examined to determine whether interactions with the DNA molecule were possible, and various side chain conformations of proximal residues were analyzed to determine whether these interactions could occur with the sugar– phosphate backbone or bases. For residues that may contain hydrogen bond donors or acceptors, both distance and geometric constraints were considered to determine whether a bond with the DNA was possible: separation of the donor and acceptor atoms was restricted to 2.195–3.300 Å, with the acceptor atom offset at a maximum angle of 90° from a line drawn between the donor and acceptor atoms. For non-polar side chains, residue conformations were examined to determine whether dipole–dipole interactions were possible through contacts at the van der Waals radii of the atoms.
RESULTS AND DISCUSSION
Structural similarity in the Cre and XerD CB domains
Various crystal structure models of a Cre dimer bound to loxP DNA (11–15) show the CB domain of Cre positioned on the opposite side of the DNA helix from where the larger CAT domain resides, in an open conformation resembling a crescent-shaped clamp (Fig. 1A). Within the Cre CB domain, an orthogonally crossed pair of α-helices (h2 and h4) likely contains sequence-specific loxP-binding determinants (12). Residues located on the surface of these two α-helices penetrate the DNA major groove, and are available for interactions with both the DNA sugar–phosphate backbone and bases.
Figure 1.
Comparison of XerD and Cre quaternary structures. (A) A portion of the Cre (4crx) crystal structure is shown (13); one of the Cre recombinases and most of the DNA molecule were omitted for clarity, except for 10 base pairs of loxP DNA bound by the CB and CAT domains. (B) The XerD (1a0p) structure is shown (16). In both panels, the five α-helices of the CB domains have been colored according to their N- to C-terminal order in the polypeptide chain, which is the same in both Cre and XerD: h1, dark blue; h2, light blue; h3, green; h4, yellow; h5, red. Note the orthogonally crossed helices of Cre, h2 (light blue) and h4 (yellow), in the major groove of the DNA. The two disordered segments of the XerD peptide backbone are represented by a dotted line: the turn between helices h3 and h4 is shown in gray, and the region analogous to h5 is shown in red. A gray cartoon arrow indicates a possible quaternary change (relative to Cre) undergone by XerD in the absence of DNA. For both structures, the CAT domains were positioned in similar orientations, with the active site Arg–His–Arg and Tyr residues shown as red and yellow sticks, respectively (note that the His residue of XerD is obscured by the yellow h4 helix). Since the C-terminal tails of these proteins are in different conformations, these regions are colored blue to distinguish them from the conserved portions of the CAT structure shown in gray. Also, note that the extreme end of the Cre tail has been omitted for space considerations. [Figure prepared with the program PyMOL (55).]
The structure of the XerD protein was solved in the absence of core-type DNA (Fig. 1B) (16). A comparison of only the crystallized CB domains of XerD and Cre is shown in Figure 2. Although these two protein domains share a related general structure, the central three helices, h2, h3 and h4, display the highest degree of spatial conservation. To quantify this similarity, the α-helical regions of these helices were superposed and the RMS deviation of peptide-backbone atoms was determined. Analysis of 188 backbone atoms within the α-helical regions of h2, h3 and h4 yielded a RMS deviation of 1.44 Å, showing that the core fold of the Cre and XerD CB domains is closely related. Since α-helices h2 and h4 of Cre comprise the orthogonally crossed structure noted by Guo et al. (12), this similarity suggests the analogous α-helices in XerD may participate in sequence-specific DNA binding.
When the entire Cre and XerD protein structures were compared, their CB and CAT domains displayed different quaternary relationships (Fig. 1A and B). In contrast to the Cre structure, the XerD CB domain existed in a ‘folded-over’ conformation relative to its CAT domain. In this compact, or ‘closed’ conformation, the orthogonally crossed α-helices of the CB domain are oriented away from the DNA-binding surface of the CAT domain. Although this relationship may be related to crystallization of the protein, it may also reflect an alternate structure formed in the absence of core-type DNA binding in vivo. Interestingly, the ‘closed’ conformation occludes the DNA-binding surface of the CAT domain, but leaves the orthogonally crossed helices of the CB domain solvent exposed.
The Cre and XerD CB domain structures differ in two additional locations: the relative positions of the h1 and h5 α-helices do not superpose. In the Cre structure, the C-terminal end of helix h5 is connected to a presumably flexible region that joins the CB and CAT domains. This region is disordered in XerD, where the ‘closed’ conformation appears to have formed at the expense of h5 α-helical structure (Fig. 1B). The second minor difference involves α-helix h1. Dimerization contacts between the two Cre monomers bound to loxP occur between the h1 α-helix of one Cre monomer and the h5 helix of the second, extending h1 away from the CB domain (Figs 1 and 2). Since XerD protein was crystallized as a monomer, a related dimerization contact was not observed.
Despite these minor structural differences, the Cre and XerD CB domains exhibit a closely conserved protein fold while sharing only 13.5% identical amino acids in the superposed regions. Although the DNA-binding and dimerization contacts described for Cre may be used to predict the amino acid determinants of related functions for the XerD CB domain, the differences between these structures should not be overlooked. Relative to Cre, the structure of unliganded XerD provides a glimpse of the quaternary flexibility available to the CB and CAT domains of these enzymes.
Psi-BLAST identification of CB domain sequences
The structural similarity between the Cre and XerD CB domains suggests that additional recombinases may be related. During the last year, the Pfam ‘Phage_integr_N’ family was defined (accession no. PF02899) (56), containing 210 sequences from the SWISS-PROT and TrEMBL databases (57). These sequences were identified and aligned using a profile hidden Markov model (HMM) encoded by the program HMMER (58). A profile HMM is a statistical description of a multiple-sequence alignment, in which a position-specific scoring matrix or ‘profile’ (59) is calculated for amino acid conservation and the presence of insertions or deletions (60,61).
Database search algorithms that employ multiple-sequence profiles improve detection of distantly related sequences over methods that use a single query sequence (45,62–64). However, the advantage of profile-based methods may be greatest when an iterative search is conducted (45,64). Designed to facilitate iterative searching, the Psi-BLAST algorithm (35) detects related sequences at high specificity at least as reliably as methods based on HMMs (64). Similar to HMMER, Psi-BLAST uses a position-specific scoring matrix calculated from significant results in a standard BLAST search. Substitution of the profile for the query sequence in a second BLAST search enables detection of additional sequences related to the query. An advantage of Psi-BLAST is that subsequent iterations of BLAST can easily be performed with user-directed refinement of the profile after each round. This allows the sequences used for profile refinement to be carefully chosen, thereby potentially increasing the sensitivity of the search (65). Ultimately, the data set is said to reach convergence when no additional refinement of the profile is possible.
Since the ‘Phage_integr_N’ family was created from a single profile HMM search, we used Psi-BLAST to identify additional sequences in a user-directed iterative approach. Four search iterations identified 546 putative CB domain sequence homologs related to λ-Int without reaching convergence (the raw results are available as Supplementary Material). In an attempt to enhance specificity, sequences from possibly unrelated alignments were removed from each collection submitted for profile refinement (see Materials and Methods). Since almost every sequence identified was annotated as an ‘integrase’ or ‘recombinase’, we included sequences in profile refinement from the entire range of Psi-BLAST ‘expect’ values. Based on the frequency of unrelated alignments in the third iterative search, we estimate that 464 (85%) or more of the 546 sequences may encode functional CB domains from active recombinases.
Comparison of the Psi-BLAST and HMMER data sets shows that they are complementary, but unique; 95 (45%) of the HMMER identified sequences were not present among the entries from the fourth Psi-BLAST search. The combined identification of 559 potentially functional sequences from both methods established a more comprehensive sequence collection than either method alone. As of July 2002, the InterPro database (66) listed 301 sequences containing ‘Phage_integr_N’ CB domains (InterPro: IPR004107), and 747 containing ‘Phage_Integrase’ CAT domains (InterPro: IPR002104). The results of this study suggest that the number of CB domains associated with CAT entries is under-represented. This is probably due to increased difficulty in their detection arising from evolutionary divergence beyond that observed for the CAT domains.
mGenTHREADER sequence alignment
A potential limitation of the profile-based Psi-BLAST and HMMER search methods is that they do not account for higher order relationships between positions in the scoring matrix. In contrast, structure-based fold recognition or ‘threading’ methods address these relationships by calculating the solvation potential for each sequence position, and the pairwise contact potential for distal positions in the linear sequence that become proximal in a folded structure (reviewed in 67,68). Such methods have proven successful at detecting and aligning related protein folds in distantly related sequences (64,68–72). Because alignment of divergent sequences (<30% identical) may benefit when predictions are compared from multiple sequence- and structure-based methods (71,73,74), we compared the Pfam HMMER alignment with one generated using a structure-based method, mGenTHREADER (37,46,72). A small collection of 25 highly divergent CB domain sequences was examined, including sequences encoding several experimentally studied recombinases, and a select group of sequences identified by Psi-BLAST with ‘expect’ values ranging from 6 × 10–20 to 0.81. A multiple-sequence alignment was manually assembled from a collection of 25 pairwise alignments to Cre or XerD constructed by mGenTHREADER (Fig. 3; also see Materials and Methods). Based upon these results, the λ-Int CB domain shares only 13.2% identical residues with XerD and 10.6% with Cre. The other sequences ranged from 7.3 to 16.4% identical to Cre.
Several conclusions regarding CB domain structure and function can be inferred from comparison of the HMMER and mGenTHREADER alignments. First, the sequence- and structure-based methods generally agree: differences were detected only in turn regions between α-helices. Secondly, the identification of greater than 550 sequences from viruses, archaea and eubacteria demonstrated widespread conservation of this domain, although no complete eukaryotic sequences were identified by either the HMMER or Psi-BLAST searches. Thirdly, a consistent set of conserved positions was identified by both alignment methods, though additional positions are highlighted in Figure 3 due to the smaller number of aligned sequences. Fourthly, both alignments were consistent with the secondary structure observed for Cre and XerD in their respective crystal structures. The results were also consistent with a PSIPRED secondary-structure prediction for each of the 25 proteins in Figure 3 (data not shown). Fifthly, the alignment showed that conserved positions are composed primarily of hydrophobic residues that correspond to buried positions in Cre crystal structure. However, two highly conserved polar residues (λ-Int T96 and S139) were discovered. Lastly, proline and glycine residues primarily clustered within the weakly conserved turn regions.
In agreement with the Cre and XerD structures, the variable nature of the turn regions suggests they may lie at the protein surface. These divergent regions may contribute to specialized functions for each recombinase, such as DNA-binding specificity, dimerization, or interactions with accessory protein factors. For example, the Cre–loxP co-crystal structure indicates that residues located at the beginning of h2 and h4 may be involved in core-type DNA binding. This region of h2 contains a position in λ-Int and HK-Int (N99 and D99, respectively) that was predicted to participate in core-type DNA-binding specificity (26,29–31, see below). Supporting these predictions, the HMMER and mGenTHREADER alignments showed that this non-conserved position exists within a conserved structural element located at the Cre protein–DNA interface.
Comparative modeling of the λ-Int CB domain
Comparative modeling is useful for relating patterns observed among a small number of structures to the generalized functions of a large protein family (for recent reviews, see 71,73–77). Recent improvements in comparative modeling make it practical for homology-based assignment of biological function to unknown proteins, and structure-based design of novel pharmaceuticals (71). To create a comparative model, sequence- or structure-based methods are first used to identify a suitable template structure for an unknown target sequence. After a sequence-structure alignment is created, a three-dimensional structure can be constructed by modeling residues of the target sequence onto the template structure, while adjusting the protein backbone to accommodate insertions or deletions in the target sequence.
Because of the large volume of experimental data related to the λ-Int protein, we wished to model the three-dimensional structure of the λ-Int CB domain. The probable evolution ary relationship between CB domains evident in the mGenTHREADER alignment (Fig. 3) suggested that Cre and XerD were suitable templates structures. However, the limiting factor in modeling divergent proteins is correct alignment of the target sequence to the template structure (73,76–78). The two most common causes of misalignment are insertions and deletions in the protein backbone, and differing structural contexts surrounding the region of interest (73). The mGenTHREADER alignment shows that major insertions or deletions are not found between the Cre, XerD and λ-Int CB sequences. Based upon our structural comparison of Cre and XerD, their CB domains also share closely related quaternary contexts in each recombinase.
Despite these positive signals, we used multiple fold-recognition algorithms to empirically determine which proteins in the PDB database could be used as modeling templates for the λ-Int CB domain (see Materials and Methods). All searches consistently identified either XerD (PDB no. 1a0p) or Cre (PDB nos 1–5crx, 1drg, 1kbu) as the most suitable template structure (data not shown). Since different fold-recognition methods may succeed where others fail (79), we examined the sequence-structure alignments generated by each algorithm for consistency. Matches to proteins other than Cre or XerD were typically characterized by low confidence scores, highly gapped alignments, and inconsistency, suggesting they did not reflect structural homology. In contrast, each algorithm generated alignments between λ-Int and Cre or XerD that were highly consistent. This result encouraged us to construct and examine a full atom model of the λ-Int CB domain (Fig. 4, PDB no. 1m97; also see Materials and Methods).
Figure 4.
The λ-Int CB domain model. The structure of the λ-Int CB domain model is shown bound to a fragment of loxP DNA derived from the Cre template structure (PDB no. 4crx). The protein backbone is shown in a cartoon representation, with each of the α-helices colored according to their order of succession in the linear polypeptide chain as in Figures 1–3: h1, dark blue; h2, light blue; h3, green; h4, yellow; h5, red. Each panel shows one of three orthogonal views of the model: (A) front; (B) side; (C) top. The molecular coordinates of the model, including the CB domain and bound loxP DNA, have been deposited with the PDB database under accession code 1m97. [Figure prepared with the program PyMOL (55).]
Currently, comparative models often approximate the true fold of the target sequence, but are rarely closer to that structure than the template (77). Thus, they are not replacements for experimentally determined structures, and must be evaluated with care if they are to be used for interpretation of structure–function relationships or to assist experimental design. Because of these limitations, we wish to clearly point out regions where the fine structure of the model may be less accurate. First, modeling of loop regions greater than five residues in length is not generally considered reliable (74). Although the regions between h1–h2 and h3–h4 are five residues long, the loops between h2–h3 and h4–h5 are nine residues long. Secondly, the mGenTHREADER alignment (Fig. 3) showed that the h1 and h5 α-helices contain some of the least conserved elements of the CB domain. Thus, the fine structure of helices h1 and h5 and the loops between h2–h3 and h4–h5 should be interpreted with caution. However, it is still probable that the coarse shape of the model is correct in these regions.
We attempted to validate the model by evaluating its intrinsic structural properties, and correlating the structure with experimental results (76). First, stereochemical analysis of amino acid residue geometries was performed with PROCHECK (80) and deviation from standard atomic volumes was examined with PROVE (81), both of which yielded acceptable scores (data not shown). Secondly, to determine whether the model is consistent with the HMMER and mGenTHREADER alignments (Fig. 3), we examined the three-dimensional location of conserved hydrophobic residues that were identified in these alignments. As expected, almost all 26 conserved hydrophobic side chains are oriented toward the oily protein interior, where they may participate in hydrophobic core-packing interactions (Fig. 5). Thirdly, we examined the putative DNA-binding surface defined by the orthogonally crossed helices h2 and h4. When the electrostatic potential of the λ-Int CB model was calculated, a positively charged surface was observed that could interact with core-type DNA in a manner similar to Cre (provided as Supplementary Material). Fourthly, to correlate the model with experimental evidence, we analyzed the three- dimensional position of important amino acid residues identified in λ-Int and HK-Int during studies of function. Discussed in detail below, we found that the model is consistent with almost all available genetic and biochemical evidence related to this domain. Collectively, these results suggest that the model presents a valid facsimile of λ-Int CB domain structure.
Figure 5.
The location of conserved amino acid residues in the λ-Int CB model. The location of conserved amino acid residues identified in the mGenTHREADER alignment (Fig. 3) is illustrated in the λ-Int CB domain model. Residues are shown in a space-filling representation, colored according to the conserved chemical property of that position in the alignment: hydrophobic residues are blue, polar residues are green and charged residues are red. The molecular surface of the model was computed from the van der Waals radii of surface exposed atoms, and is shown in translucent gray. The backbone of 10 core-type DNA base pairs bound by the model is shown in magenta. In an attempt to present the entire surface of the protein in profile, each panel shows one of three orthogonal views of the model: (A) front; (B) side; (C) top. These views are almost identical to those presented in Figure 4. [Figure prepared with the program PyMOL (55).]
Prediction of DNA-binding contacts
To identify potential DNA-binding contacts within the λ-Int CB domain, the model was superposed with the Cre CB domain from the 4crx crystal structure (see Materials and Methods). Amino acid residues in the CB model whose side chains are in proximity to the DNA molecule (<5 Å) are listed in Table 1. Since the orthogonally crossed helices h2 and h4 penetrate the major groove, many contacts are possible along their length. In Table 1, potential bonding interactions are indicated for each residue; few side chains appear likely to interact directly with the DNA bases, although many contacts with the DNA backbone appear possible. Other work has suggested that DNA contacts may change during different λ-Int recombination pathways (82–85). Although the CB domain model presents one possible set of static interactions with the DNA, the precise nature of these interactions may change within each unique recombination pathway. Nevertheless, a lack of direct contacts with the DNA bases does not diminish the role of the CB domain in specific DNA binding; both water-mediated hydrogen-bonding networks and indirect contacts are believed to play a large role in the specificity of protein–DNA binding interfaces (86–88). Furthermore, the Cre protein is believed to bind loxP DNA in a similar fashion, with the majority of protein contacts made to the DNA backbone (12). Although many of the residues in Table 1 may participate in core-type DNA binding, some of them are probably dispensable despite their proximity to the DNA. For example, substitution of a Cre side chain (Q90) that makes a hydrogen bond to a DNA base in the co-crystal structure did not significantly alter DNA binding by Cre in vivo (89).
Table 1. Amino acid residues in proximity to the DNA.
| Residue | Locationa | DNA interactionb | Distancec (Å) | |
|---|---|---|---|---|
| Bases | Backbone | |||
| R90 | t1–2 | – | h | 3–4 |
| I92 | t1–2 | – | V | 3–4 |
| K93 | t1–2 | H | H | 1–2 |
| K95 | h2 | H* | H | 2–3 |
| T96 | h2 | – | H | 2–3 |
| I98 | h2 | V | – | 1–2 |
| N99 | h2 | H* | – | 3–4 |
| S102 | h2 | h | H | 3–4 |
| K103 | h2 | h | H | 3–4 |
| K105 | h2 | – | H | 3–4 |
| R109 | h2 | – | H | 2–3 |
| Y131 | h3 | – | H | 2–3 |
| K136 | t3–4 | – | H | 2–3 |
| A138 | h4 | – | v | 3–4 |
| S139 | h4 | – | H | 2–3 |
| K141 | h4 | H | – | 2–3 |
| L142 | h4 | V | – | 2–3 |
| R144 | h4 | h | h | 4–5 |
| S145 | h4 | h | h | 4–5 |
| S148 | h4 | – | h | 4–5 |
| D149 | h4 | – | H | 3–4 |
| R152 | h4 | – | H | 2–3 |
| E153 | h4 | – | h | 4–5 |
| R169 | h5 | – | h | 3–4 |
aThe secondary structure element containing each residue is indicated: h2, helix 2; h3, helix 3; h4, helix 4; h5, helix 5; t1–2, turn between h1 and h2; t3–4, turn between h3 and h4.
bHydrogen bonds or van der Waals contacts with the DNA bases or backbone that were observed in the model are indicated by an upper-case ‘H’ or ‘V’, respectively. Similar interactions that require minor changes in the spatial relationship between the model and DNA are indicated by lower-case ‘h’ and ‘v’. Combinations unlikely to result in a bond are shown with a ‘–’. Observed bidentate or complex hydrogen bonding interactions are indicated by ‘H*’. Bidentate or complex interactions consist of two simultaneous hydrogen bonds from one residue side chain to a base or base pair, or to two bases in adjacent base steps, respectively (see Fig. 7). The assignment of these values is based upon the proximity of each residue to the structural features of the DNA molecule, and the distance and geometric constraints required for bond formation (see Materials and Methods).
cThe approximate distance interval that atoms in the amino acid side chain are separated from atoms in the DNA molecule at their closest point.
Analysis of λ-Int and HK-Int residues identified in genetic assays
Mutations seldom adversely affect a single function of the λ-Int protein (32), probably because amino acid residues frequently play multiple roles. Thus, it is often beneficial to consult a structural model when interpreting complex phenotypes. Although not previously possible for mutations in the CB domain of λ-Int, the model allowed us to revisit and examine residues previously implicated in recombinase function during three independent genetic studies (summarized below and in Table 2).
Table 2. Amino acid substitution mutations in the λ-Int and HK-Int CB domains.
| Group: specific function or defecta | Substitution or positionb | Reference |
|---|---|---|
| Genetic evidence: λ | ||
| Group 1: pleiotropic attL defect | L80F, I104N, P112F, D149V, E153K, A154T, A154V, G158S, H159D, T168N | 32 |
| Group 2: CB-specific attL defect | K93E, T96I, T120I, S139L, R144K | 32 |
| Group 3: recombination defect | R108S | 90, this work |
| Group 4: recombination specificity | N99D | 29–31 |
| Genetic evidence: HK022 | ||
| Group 5: CB specificity | D99N, D99A, T146K, D149R | 26 |
| Biochemical evidence: λ | ||
| Group 6: chemical protection | K103 | 27 |
| Group 7: crosslinking | A125, A126 | 27 |
| K141 | 28 |
aResidues identified in genetic and biochemical assays for their role in λ-Int and HK-Int protein structure and function. Group numbers were assigned to distinguish between related origins and phenotypes.
bAmino acid positions also identified in Table 1 for their putative role in DNA binding are underlined.
Our laboratory isolated and characterized a diverse collection of mutant λ-Int proteins defective for excision (32). To identify amino acid substitutions that alter core-type DNA binding and catalytic function, the mutants were screened to remove those that severely affect arm-type DNA binding. Here we discuss 16 mutants from that study containing substitutions in the CB domain. The mutant proteins were previously assayed for recombination, Holliday junction resolution, topoisomerase activity, arm-type DNA binding and attL intasome formation (32).
Enquist and Weisberg (90) isolated recombination- defective λ-Int mutants using the red-plaque test. We have recently determined that one of the integration-defective mutants they isolated contains an R108S substitution in the CB domain. Challenge-phage assays similar to those conducted by Han et al. (32) demonstrated that the R108S mutant is defective for attL intasome formation in vivo, although it still interacts properly with arm-type DNA sites (B.Swalla, unpublished data).
Several substitution mutations have been isolated in the λ-Int and HK-Int CB domains that affect the specificity of core-type DNA binding (26) and recombination (29,31). Three of these substitutions were found to relax core-type DNA-binding specificity in vivo. Although the mechanism by which these substitutions act is not known, these residues were postulated to interact directly with the core-type DNA site.
To correlate these experimental results with the λ-Int CB model, we asked whether the phenotype of each substitution could be satisfactorily explained by the model. First, the local structure surrounding each substituted amino acid residue was examined. We then proposed a role for the function of the wild-type residue in the λ-Int and HK-Int proteins, and tried to explain the mechanism responsible for the phenotype of each mutant protein. These predictions are discussed in detail below, and summarized in Table 3.
Table 3. λ-Int and HK-Int substitution mutation phenotypes.
| Residue | Groupa | Assay resultsb | Locationc | Wild-type functiond | Reference | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Ex | HJ | Topo | P′1 | P′123 | attL | |||||
| WT | n/a | +++ | + | + | ++ | ++ | ++ | |||
| L80F | 1 | + | + | + | ++ | + | – | h1 | α-Helix packing, dimerization | 32 |
| K93E | 2a | + | + | + | ++ | ++ | – | t1-2 | H-bond DNA | 32 |
| T96I | 2b | – | +/– | – | +++ | +++ | – | h2 | H-bond DNA | 32 |
| N99D | 4 | h2 | H-bond DNA | 29,31 | ||||||
| D99A | 5 | + | ++ | ++ | ++ | h2 | H-bond DNA | 26 | ||
| D99N | 5 | +++ | +++ | ++ | ++ | h2 | H-bond DNA | 26 | ||
| K103 | 6 | h2 | H-bond DNA | 27 | ||||||
| I104N | 1 | + | + | + | ++ | + | – | h2 | α-Helix packing | 32 |
| R108S | 3 | – | ++ | ++ | – | h2 | Interaction surface | 90, this work | ||
| P112F | 1 | + | + | + | ++ | + | – | t2-3 | Turn | 32 |
| T120I | 2c | – | + | + | ++ | ++ | – | t2-3 | Interaction surface | 32 |
| A125 | 7 | h3 | Interaction surface? | 27 | ||||||
| A126 | 7 | h3 | Interaction surface? | 27 | ||||||
| S139L | 2b | + | + | – | + | +++ | +/– | t3-4 | H-bond DNA | 32 |
| K141 | 7 | h4 | H-bond DNA | 28 | ||||||
| R144K | 2a | + | + | – | ++ | ++ | – | h4 | H-bond DNA | 32 |
| T146K | 5 | ++ | + | ++ | ++ | h4 | α-Helix packing | 26 | ||
| D149R | 5 | + | + | ++ | ++ | h4 | H-bond DNA | 26 | ||
| D149V | 1 | + | + | + | ++ | + | – | h4 | H-bond DNA | 32 |
| E153K | 1 | + | +* | + | ++ | + | +/– | h4 | α-Helix packing | 32 |
| A154T | 1 | – | +/– | – | ++ | + | – | h4 | α-Helix packing | 32 |
| A154V | 1 | – | +/– | – | ++ | + | – | h4 | α-Helix packing | 32 |
| G158S | 1 | – | + | + | ++ | + | – | t4-5 | Turn | 32 |
| H159D | 1 | – | + | – | ++ | + | – | t4-5 | α-Helix packing | 32 |
| T168N | 1 | + | +* | + | ++ | + | – | h5 | α-Helix packing, dimerization | 32 |
aThe numerical group designations are defined in Table 2. Group 2 mutants were divided into three subgroups based upon their arm-type DNA-binding activity and location within the CB domain model: (2a), no effect on arm-type binding and located proximal the core-type DNA-binding surface; (2b), enhanced arm-type binding and located proximal the core-type DNA-binding surface; (2c), no effect on arm-type binding and located distal the core-type DNA-binding surface.
bThe results of various assays are listed in columns 3–8. Results with the wild-type protein in each assay are shown in the first row. The number of ‘+’ symbols is used to indicate the level of activity measured relative to the wild-type protein in each assay. The assays and results were previously described by Han et al. (32); Ex, excisive recombination assays in vivo; HJ, Holliday junction resolution assays in vitro (the ‘+*’ symbol indicates an altered resolution bias was measured); Topo, topoisomerase activity in vitro; P′1, challenge-phage assays measuring the affinity of binding to a single P′1 arm-type site in vivo; P′123, challenge-phage assays measuring the affinity of cooperative binding to the three contiguous arm-type sites, P′1, P′2 and P′3, in vivo; attL, challenge-phage assays measuring formation of an attL intasome in vivo (see text).
cLocation: the secondary structure element containing each residue is indicated: h1, helix 1; h2, helix 2; h3, helix 3; h4, helix 4; h5, helix 5; t1–2, turn between h1 and h2; t2–3, turn between h2 and h3; t3–4, turn between h3 and h4; t4–5, turn between h4 and h5.
dWild-type function, the predicted function of the indicated residue in the wild-type protein.
Three Int-mediated functions are required for stabilization of the λ attL intasome in vivo: arm-type DNA binding, core-type DNA binding, and cooperative interactions between Int proteins (82,91–94). Recently, work by Sarkar et al. (95) identified an important relationship between the arm binding (AB) and CB domains that suggests a fourth function: allosteric modulation of core-type binding by the AB domain. All 15 of the mutant proteins isolated by Han et al. (32) were defective at forming the attL intasome in vivo (Table 3). Based upon their reported arm-type DNA-binding phenotypes, these 15 mutants can be divided into two classes: 10 pleiotropic mutants that show defects in functions mediated by both the AB and CB domains (Table 2, group 1), and five specific mutants that show defects only in CB domain functions (Table 2, group 2).
Pleiotropic mutants
The pleiotropic nature of the group 1 substitutions suggests that some global property of the protein has been altered. In challenge-phage DNA-binding assays (Table 3), they bound the P′1 site at wild-type levels, but were defective for cooperative binding at the adjacent P′123 sites, indicating that higher order interactions may have been perturbed. When examined in the mGenTHREADER alignment (Fig. 3), the group 1 residues corresponded to conserved positions that are presumably important for core domain structure. In the model, 8 of these 10 mutations probably affect packing interactions between α-helices (Fig. 6A; L80F, I104N, D149V, E153K, A154T, A154V, H159D, T168N), and two appear to affect stabilization of inter-helix turns (Fig. 6A; P112F, G158S). Comparison of the CB domain model with Cre suggests that two of these residues (L80F, T168N) may also participate directly in dimerization contacts between CB domains. One of these substitutions (Fig. 6A; D149V) also removes a hydrogen bond with the DNA backbone in the model, while a second, E153K, may introduce a novel hydrogen bond.
Figure 6.
Location in the model of residues important for λ-Int structure and function. Residues important for λ-Int structure and function are shown in a space-filling representation colored according to their mutant phenotype class, or the experiments in which they were identified (Tables 2 and 3; also see text). In each panel, the model is shown in the same orientation as seen in Figure 4A. The backbone of the λ-Int model is shown in a gradient from blue to red, illustrating the N- to C-terminal progression of residues within the polypeptide chain. The color of the protein backbone approximates the colors of the five CB domain α-helices as shown in Figure 4. The backbone of the core-type DNA molecule is shown in magenta. In (A) the residues are shown as follows: purple, group 1 residues that may affect a global property of the CB domain, such as folding; red, the unique group 1 residue that participates in DNA binding (also note that the E153K substitution may create a novel interaction with the DNA backbone); green, group 1 residues that may be important for dimerization during core-type DNA binding. In (B) the residues are shown as follows: red, group 2a residues that appear to have specific effects on DNA binding; cyan, group 2b residues that appear to have specific effects on DNA binding and enhance cooperative arm-type binding; green, group 2c and group 3 residues that may be involved in inter- or intra-protein interactions; yellow, group 4 and group 5 residues from λ-Int or HK-Int that are important for core-type DNA binding and recombination specificity; blue, group 6 and group 7 residues identified in photo-crosslinking and chemical modification experiments. [Figure prepared with the program PyMOL (55).]
CB domain-specific mutants
The five group 2 mutant proteins isolated by Han et al. (32) interact properly with arm-type DNA sites; they appear to affect a function of the CB domain required for attL intasome formation. Based upon their arm-type DNA-binding phenotype and location in the model, these proteins can be divided into three sub-groups (Table 3). Two of the mutations affect residues that are located proximal to the core-type DNA-binding surface, and have no detectable effect upon arm-type binding (group 2a; K93E, R144K). Two other substitutions are also located proximal to the core-type DNA, and display enhanced cooperative binding at arm-type DNA sites (group 2b; T96I, S139L). The remaining mutant also has no effect on arm-type DNA binding, but is located distal to the core-type DNA (group 2c; T120I).
In the model, three of the four residues from groups 2a and 2b form a hydrogen bond with the DNA sugar–phosphate backbone or bases (Fig. 6B; K93, T96, S139). Although the remaining residue, R144, is beyond the requisite distance for hydrogen bond formation, minor changes in the spatial relationship between the model and the DNA may permit R144 to interact with the DNA directly. Alternatively, R144 may play an indirect role in DNA binding by ‘buttressing’ a neighboring amino acid side chain (such as K141, see below) that contacts the DNA. Similar interactions have been proposed during IHF binding at the H′ site (96). In any case, it appears that the four group 2a and 2b substitutions have affected core-type DNA binding directly, thereby decreasing the efficiency of attL intasome formation.
This study yielded several interesting correlations regarding the two group 2b residues, T96 and S139 (Fig. 6B). In the mGenTHREADER alignment, they corresponded respectively to the best and third-best conserved positions in the entire alignment. In the model, each of these two residues is located at the end of the prominent DNA binding α-helices, h2 and h4, where they both form hydrogen bonds to the DNA backbone. Located on opposite sides of the DNA molecule, these interactions comprise a DNA-binding motif with 2-fold radial symmetry. Interestingly, substitution of these residues with hydrophobic side chains (T96I and S139L) enhanced cooperative binding to the P′123 arm-type DNA sites in both cases. However, these substitutions are distinguished by their effect upon binding to a single P′1 site; T96I enhanced binding while S139L reduced activity.
The group 2c mutant, T120I, shares the same phenotype as the group 3 mutant, R108S, originally isolated by Enquist and Weisberg (90); both proteins are defective for attL intasome formation in vivo and have no effect on arm-type DNA binding. Based upon this phenotype, we predicted these residues would lie at the protein–DNA interface similar to the group 2a residues. Instead, they were found distal to the DNA in the model (Fig. 6B). Although their location suggests they do not interact with the DNA directly, they may still affect attL intasome formation by altering intra-protein or inter-protein contacts. For example, the region containing R108 and T120 may constitute a docking surface between the AB and CB domains, or between Int monomers within an intasome. In the model, both residues are surface exposed, suggesting that they may participate directly in this interaction.
This prediction becomes more interesting when an additional, unusual phenotype of the R108S mutation is considered: the mutant protein is more defective for integration than excision (90). Since integration requires higher levels of Int activity than excision (97), this recombination phenotype was previously suggested to result from a general decrease in protein stability, rather than a functional defect specific to integration (97). However, among the few described mutant proteins with this phenotype (a generous gift from Dr Robert Weisberg), we found that only the mutant protein containing R108S interacts properly with the P′123 arm-type DNA sites and appears as stable as the wild-type protein in vivo (data not shown). The location of this residue in the model suggests that it may participate in a surface function that is more important for integration than excision.
λ-Int and HK-Int core-type binding specificity
Although substitutions isolated in HK-Int at positions 99, 146 and 149 were suggested to alter residues in direct contact with core-type DNA, the strongest evidence pointed to position 99 (Table 2, groups 4 and 5) (26,29–31). A non-conserved position between λ-Int and HK-Int, residue 99 was first implicated in λ and HK022 recombination specificity by Dorgai et al. (29) and Yagil et al. (31). In these studies, a substitution of the λ-Int residue with the HK-Int residue at this position, N99D, increased recombination of HK022 core-type DNA sites by the mutant λ-Int. Later work suggested that position 99 contributes directly to the specificity of HK-Int core-type DNA binding (26). In the mGenTHREADER alignment, position 99 corresponds to a Cre residue at the core-type DNA-binding interface, M25. In the λ-Int CB model, N99 projects deeply into the major groove of the DNA, such that it may form two simultaneous hydrogen bonds with the DNA bases (Figs 6B and 7). Since bidentate or complex bonding interactions such as these are believed to be particularly important determinants of DNA-binding specificity (86), the location of position 99 within the model strongly supports its direct role in the specificity of core-type DNA binding and recombination.
Figure 7.
Potential hydrogen bonds between N99, K103, K141 and core-type DNA. The five α-helices of the λ-Int CB domain model are presented as cartoon ribbons colored as in Figures 1–4. The backbone of a DNA molecule containing the loxP site is shown as a magenta cartoon. The h2 (light blue) and h4 (yellow) α-helices of the CB model are prominently visible in the major groove of the DNA. Dotted yellow lines represent actual hydrogen bonds between the DNA base pairs shown, while dotted green lines represent putative hydrogen bonds between the residues of h2 or h4 and the DNA. For clarity, the loxP DNA has been resected to reveal the G base corresponding to the λ core-type DNA position that was crosslinked to K141 at high efficiency when substituted with 4-thio-deoxythymidine (28). A potential hydrogen bond is shown between the side chain of K141 (red) and the N7 atom of this loxP guanine base (blue). Possible hydrogen bonds are also shown between the side chains of N99 (purple) or K103 (orange) and the DNA bases or backbone, respectively. The N99 residue may form simultaneous hydrogen bonds to the O4 atom of thymine and the N6 atom of adenine. The K103 residue may hydrogen bond to a backbone oxygen atom (not illustrated) [Figure prepared with the program PyMOL (55).]
Previously, one could not determine whether the remaining two substitutions in the CB domain of HK022 Int, T146K and D149R (Table 2, group 5) affected residues that are involved directly in core-type DNA binding (26). Unlike position 99, these residues are identical in both λ-Int and HK-Int, suggesting that they are not important specificity determinants. However, both substitutions were found to increase the affinity of the mutant proteins for core-type DNA. We proposed previously that substitution of positively charged side chains at these positions could create new bonds with the sugar–phosphate backbone. Consistent with this prediction, both positions are in proximity to the DNA in the model, such that the T146K and D149R substitution mutations may interact with the DNA backbone (Fig. 6B).
Analysis of λ-Int residues identified in biochemical assays
Pyridoxal-5′-phosphate (PLP) modification studies performed by Tirumalai et al. (27) indicated that K103 of λ-Int is at the CB DNA interface. In agreement with this result, the model predicted that K103 lies in the major groove, on the solvent-exposed surface of the orthogonally crossed helix, h2 (Figs 6B and 7). Upon binding to core-type DNA, the model showed K103 deeply buried in the protein–DNA interface, where it appears solvent inaccessible. Examination of other lysine residues suggests that they remain solvent exposed in the presence of bound DNA, supporting the unique identification of K103 for its resistance to PLP modification.
Recently, photo-crosslinking of λ-Int to a 4-thio-deoxythymidine substituted core-type DNA site identified K141 at the protein–DNA interface (28). In agreement with this result, the model showed that K141 lies in the major groove of the DNA, where it may form hydrogen bonds with the DNA backbone or bases (Fig. 6B). The model also showed that K141 could form a hydrogen bond to the loxP DNA base at the same position as the substituted nucleotide that crosslinked most efficiently with K141 (Fig. 7). The proximity of K141 to this DNA base argues strongly in favor of the model.
Zero-length photo-crosslinking was used to predict that A125 and A126 were at the DNA-binding surface of the CB domain (27). Contrary to this result, these residues are distal to the DNA in the model, and located instead near the putative interaction surface defined by R108S and T120I. These results with A125 and A126 are the only experimental result that cannot be correlated satisfactorily with the model, unless an alternative interpretation of the crosslinking result is suggested. For example, perhaps the CB domain contains a secondary, non-specific DNA-binding surface proximal to A125 and A126. The electrostatic potential of the λ-Int model shows that positively charged surface patches exist near these alanine residues (Supplementary Material). A DNA molecule could be presented to this surface while bound by a second CB domain in a multimeric complex.
Are prokaryotic and eukaryotic CB domains related?
Within the tyrosine family, enzymes may be divided into prokaryotic and eukaryotic subfamilies based upon structural and functional characteristics. First, the two groups are distinguished by their preference for active site assembly in cis or in trans (18,98–108). Secondly, the eukaryotic Flp CAT domain is the most divergent from the four other prokaryotic CAT structures that have been reported (18). Thirdly, the structures of the prokaryotic Cre and eukaryotic Flp recombinases illustrate major structural differences in their CB domain folds (18). In this study, neither the Psi-BLAST nor fold-recognition algorithms we employed suggested an evolutionary relationship between the CB domains of Flp and Cre or XerD.
To ascertain directly whether the Flp CB domain shares homology with any known prokaryotic or archaeal sequences, Psi-BLAST and mGenTHREADER searches were performed with the N-terminal 150 amino acids of Flp protein. These searches identified nine distantly related protein sequences from Saccharomycetaceae that were annotated as Flp recombinases (data not shown). Due to the number of prokaryotic sequences available and the low degree of sequence identity required for their detection by these methods, this negative result suggests that the Flp CB domain does not have common structural homologs of prokaryotic origin. The absence of an evolutionary relationship between the prokaryotic/archaeal and eukaryotic recombinase families further distinguishes between them, and suggests that the Flp protein reflects a distantly related subgroup of the tyrosine recombinase family.
CB domain evolution
Our results suggest a conserved structural and functional role for the CB domain in site-specific recombination. Evolutionary pressure may exist for the CB and CAT domains to evolve as a discrete functional unit, in which the structure of both domains is closely linked to their coordinated function during recombination. Conservation of key structural elements, such as the conserved residues in Figures 3 and 5, is probably required for maintenance of the core protein scaffold that enables the shared functions of recombination, including DNA binding and DNA strand exchange. In contrast, evolution probably tolerates changes in the structures that mediate DNA-binding specificity and affinity, inter-protein and inter-domain contacts, and interactions with external accessory factors. Such tolerance appears to have allowed greater divergence among CB domains than between CAT domains. Nevertheless, two extraordinarily conserved polar residues were identified in proximity to the DNA: T96 and S139 of λ-Int. It is tempting to speculate that the 2-fold radially symmetric motif defined by these residues may not be required simply for DNA binding, but also to play an important mechanistic role during DNA strand exchange and recombination.
SUPPLEMENTARY MATERIAL
Supplementary Material is available at NAR Online.
Acknowledgments
ACKNOWLEDGEMENTS
The authors would like to thank Dr Gary Olsen, Robert Kazmierczak and Brian Matlock for helpful discussions and comments on the manuscript. This work was supported by NIH grant GM28717.
PDB no. 1m97
REFERENCES
- 1.Nunes-Düby S.E., Kwon,H.J., Tirumalai,R.S., Ellenberger,T. and Landy,A. (1998) Similarities and differences among 105 members of the Int family of site-specific recombinases. Nucleic Acids Res., 26, 391–406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Esposito D. and Scocca,J.J. (1997) The integrase family of tyrosine recombinases: evolution of a conserved active site domain. Nucleic Acids Res., 25, 3605–3614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Landy A. (1989) Dynamic, structural and regulatory aspects of lambda site-specific recombination. Annu. Rev. Biochem., 58, 913–949. [DOI] [PubMed] [Google Scholar]
- 4.Azaro M.A. and Landy,A. (2002) Lambda integrase and the lambda int family. In Craig,N.L., Craigie,R., Gellert,M. and Lambowitz,A.M. (eds), Mobile DNA II. ASM Press, Washington, DC, pp. 118–148.
- 5.Nash H.A. (1996) Site-specific recombination: integration, excision, resolution, and inversion of defined DNA segments. In Neidhardt,F.C., Curtiss,R.,III, Ingraham,J.L., Lin,E.C.C., Low,K.B., Magasanik,B., Reznikoff,W.S., Riley,M., Schaechter,M. and Umbarger,H.E. (eds), Escherichia coli and Salmonella: Cellular and Molecular Biology, 2nd Edn. ASM Press, Washington, DC, pp. 2363–2376.
- 6.VanDuyne G. (2002) A structural view of tyrosine recombinase site-specific recombination. In Craig,N.L., Craigie,R., Gellert,M. and Lambowitz,A.M. (eds), Mobile DNA II. ASM Press, Washington, DC, pp. 93–117.
- 7.Argos P., Landy,A., Abremski,K., Egan,J.B., Haggard-Ljungquist,E., Hoess,R.H., Kahn,M.L., Kalionis,B., Narayana,S.V., Pierson,L.S.,III et al. (1986) The integrase family of site-specific recombinases: regional similarities and global diversity. EMBO J., 5, 433–440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Abremski K.E. and Hoess,R.H. (1992) Evidence for a second conserved arginine residue in the integrase family of recombination proteins. Protein Eng., 5, 87–91. [DOI] [PubMed] [Google Scholar]
- 9.Kwon H.J., Tirumalai,R., Landy,A. and Ellenberger,T. (1997) Flexibility in DNA recombination: structure of the lambda integrase catalytic core. Science, 276, 126–131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pargellis C.A., Nunes-Duby,S.E., de Vargas,L.M. and Landy,A. (1988) Suicide recombination substrates yield covalent lambda integrase–DNA complexes and lead to identification of the active site tyrosine. J. Biol. Chem., 263, 7678–7685. [PubMed] [Google Scholar]
- 11.Gopaul D.N., Guo,F. and Van Duyne,G.D. (1998) Structure of the Holliday junction intermediate in Cre-loxP site-specific recombination. EMBO J., 17, 4175–4187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Guo F., Gopaul,D.N. and van Duyne,G.D. (1997) Structure of Cre recombinase complexed with DNA in a site-specific recombination synapse. Nature, 389, 40–46. [DOI] [PubMed] [Google Scholar]
- 13.Guo F., Gopaul,D.N. and Van Duyne,G.D. (1999) Asymmetric DNA bending in the Cre-loxP site-specific recombination synapse. Proc. Natl Acad. Sci. USA, 96, 7143–7148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Woods K.C., Martin,S.S., Chu,V.C. and Baldwin,E.P. (2001) Quasi-equivalence in site-specific recombinase structure and function: crystal structure and activity of trimeric Cre recombinase bound to a three-way Lox DNA junction. J. Mol. Biol., 313, 49–69. [DOI] [PubMed] [Google Scholar]
- 15.Martin S.S., Pulido,E., Chu,V.C., Lechner,T.S. and Baldwin,E.P. (2002) The order of strand exchanges in Cre-LoxP recombination and its basis suggested by the crystal structure of a Cre-LoxP Holliday junction complex. J. Mol. Biol., 319, 107–127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Subramanya H.S., Arciszewska,L.K., Baker,R.A., Bird,L.E., Sherratt,D.J. and Wigley,D.B. (1997) Crystal structure of the site-specific recombinase, XerD. EMBO J., 16, 5178–5187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hickman A.B., Waninger,S., Scocca,J.J. and Dyda,F. (1997) Molecular organization in site-specific recombination: the catalytic domain of bacteriophage HP1 integrase at 2.7 Å resolution. Cell, 89, 227–237. [DOI] [PubMed] [Google Scholar]
- 18.Chen Y., Narendra,U., Iype,L.E., Cox,M.M. and Rice,P.A. (2000) Crystal structure of a Flp recombinase–Holliday junction complex: assembly of an active oligomer by helix swapping. Mol. Cell, 6, 885–897. [PubMed] [Google Scholar]
- 19.Wojciak J.M., Sarkar,D., Landy,A. and Clubb,R.T. (2002) Arm-site binding by lambda-integrase: solution structure and functional characterization of its amino-terminal domain. Proc. Natl Acad. Sci. USA, 99, 3434–3439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Yang W. and Mizuuchi,K. (1997) Site-specific recombination in plane view. Structure, 5, 1401–1406. [DOI] [PubMed] [Google Scholar]
- 21.Grainge I. and Jayaram,M. (1999) The integrase family of recombinase: organization and function of the active site. Mol. Microbiol., 33, 449–456. [DOI] [PubMed] [Google Scholar]
- 22.Grindley N.D. (1997) Site-specific recombination: synapsis and strand exchange revealed. Curr. Biol., 7, R608–R612. [DOI] [PubMed] [Google Scholar]
- 23.Lilley D.M. (1997) Site-specific recombination caught in the act. Chem. Biol., 4, 717–720. [DOI] [PubMed] [Google Scholar]
- 24.Voziyanov Y., Pathania,S. and Jayaram,M. (1999) A general model for site-specific recombination by the integrase family recombinases. Nucleic Acids Res., 27, 930–941. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Weisberg R.A., Gottesman,M.E., Hendrix,R.W. and Little,J.W. (1999) Family values in the age of genomics: comparative analyses of temperate bacteriophage HK022. Annu. Rev. Genet., 33, 565–602. [DOI] [PubMed] [Google Scholar]
- 26.Cheng Q., Swalla,B.M., Beck,M., Alcaraz,R.,Jr, Gumport,R.I. and Gardner,J.F. (2000) Specificity determinants for bacteriophage Hong Kong 022 integrase: analysis of mutants with relaxed core-binding specificities. Mol. Microbiol., 36, 424–436. [DOI] [PubMed] [Google Scholar]
- 27.Tirumalai R.S., Kwon,H.J., Cardente,E.H., Ellenberger,T. and Landy,A. (1998) Recognition of core-type DNA sites by lambda integrase. J. Mol. Biol., 279, 513–527. [DOI] [PubMed] [Google Scholar]
- 28.Kovach M.J., Tirumalai,R. and Landy,A. (2002) Site-specific photo-cross-linking between lambda integrase and its DNA recombination target. J. Biol. Chem., 277, 14530–14538. [DOI] [PubMed] [Google Scholar]
- 29.Dorgai L., Yagil,E. and Weisberg,R.A. (1995) Identifying determinants of recombination specificity: construction and characterization of mutant bacteriophage integrases. J. Mol. Biol., 252, 178–188. [DOI] [PubMed] [Google Scholar]
- 30.Dorgai L., Sloan,S. and Weisberg,R.A. (1998) Recognition of core binding sites by bacteriophage integrases. J. Mol. Biol., 277, 1059–1070. [DOI] [PubMed] [Google Scholar]
- 31.Yagil E., Dorgai,L. and Weisberg,R.A. (1995) Identifying determinants of recombination specificity: construction and characterization of chimeric bacteriophage integrases. J. Mol. Biol., 252, 163–177. [DOI] [PubMed] [Google Scholar]
- 32.Han Y.W., Gumport,R.I. and Gardner,J.F. (1994) Mapping the functional domains of bacteriophage lambda integrase protein. J. Mol. Biol., 235, 908–925. [DOI] [PubMed] [Google Scholar]
- 33.Guex N., Diemand,A. and Peitsch,M.C. (1999) Protein modeling for all. Trends Biochem. Sci., 24, 364–367. [DOI] [PubMed] [Google Scholar]
- 34.Guex N. and Peitsch,M.C. (1997) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis, 18, 2714–2723. [DOI] [PubMed] [Google Scholar]
- 35.Altschul S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Sussman J.L., Lin,D., Jiang,J., Manning,N.O., Prilusky,J., Ritter,O. and Abola,E.E. (1998) Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta Crystallogr. D Biol. Crystallogr., 54, 1078–1084. [DOI] [PubMed] [Google Scholar]
- 37.Jones D.T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol., 287, 797–815. [DOI] [PubMed] [Google Scholar]
- 38.McGuffin L.J., Bryson,K. and Jones,D.T. (2000) The PSIPRED protein structure prediction server. Bioinformatics, 16, 404–405. [DOI] [PubMed] [Google Scholar]
- 39.Jeanmougin F., Thompson,J.D., Gouy,M., Higgins,D.G. and Gibson,T.J. (1998) Multiple sequence alignment with Clustal X. Trends Biochem. Sci., 23, 403–405. [DOI] [PubMed] [Google Scholar]
- 40.Higgins D.G. and Sharp,P.M. (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene, 73, 237–244. [DOI] [PubMed] [Google Scholar]
- 41.Thompson J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G. (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res., 25, 4876–4882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Bujnicki J.M., Elofsson,A., Fischer,D. and Rychlewski,L. (2001) Structure prediction meta server. Bioinformatics, 17, 750–751. [DOI] [PubMed] [Google Scholar]
- 43.Rychlewski L., Jaroszewski,L., Li,W. and Godzik,A. (2000) Comparison of sequence profiles: strategies for structural predictions using sequence information. Protein Sci., 9, 232–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Karplus K., Barrett,C. and Hughey,R. (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics, 14, 846–856. [DOI] [PubMed] [Google Scholar]
- 45.Park J., Karplus,K., Barrett,C., Hughey,R., Haussler,D., Hubbard,T. and Chothia,C. (1998) Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol., 284, 1201–1210. [DOI] [PubMed] [Google Scholar]
- 46.Jones D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292, 195–202. [DOI] [PubMed] [Google Scholar]
- 47.Fischer D., Barret,C., Bryson,K., Elofsson,A., Godzik,A., Jones,D., Karplus,K.J., Kelley,L.A., MacCallum,R.M., Pawowski,K., Rost,B., Rychlewski,L. and Sternberg,M. (1999) CAFASP-1: critical assessment of fully automated structure prediction methods. Proteins, (Suppl. 3), 209–217. [DOI] [PubMed] [Google Scholar]
- 48.Kelley L.A., MacCallum,R.M. and Sternberg,M.J. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol., 299, 499–520. [DOI] [PubMed] [Google Scholar]
- 49.Kelley L.A., MacCallum,R.M. and Sternberg,M.J. (1999) Recognition of remote protein homologies using three-dimensional information to generate a position-specific scoring matrix in the program 3DPSSM. In Istrail,S., Pevzner,P. and Waterman,M. (eds), Proceedings of the Third Annual Conference on Computational Molecular Biology. ACM Press, New York, NY, USA, pp. 218–225.
- 50.Fischer D. (2000) Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pac. Symp. Biocomput., 119–130. [PubMed] [Google Scholar]
- 51.Shi J., Blundell,T.L. and Mizuguchi,K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol., 310, 243–257. [DOI] [PubMed] [Google Scholar]
- 52.Gough J., Karplus,K., Hughey,R. and Chothia,C. (2001) Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol., 313, 903–919. [DOI] [PubMed] [Google Scholar]
- 53.Lundstrom J., Rychlewski,L., Bujnicki,J. and Elofsson,A. (2001) Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci., 10, 2354–2362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Alexandrov N.N., Nussinov,R. and Zimmer,R.M. (1996) Fast protein fold recognition via sequence to structure alignment and contact capacity potentials. Pac. Symp. Biocomput., 53–72. [PubMed] [Google Scholar]
- 55.DeLano W.L. (2001) The PyMOL Molecular Graphics System. DeLano Scientific, San Carlos CA, USA.
- 56.Bateman A., Birney,E., Cerruti,L., Durbin,R., Etwiller,L., Eddy,S.R., Griffiths-Jones,S., Howe,K.L., Marshall,M. and Sonnhammer,E.L. (2002) The Pfam protein families database. Nucleic Acids Res., 30, 276–280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Bairoch A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Eddy S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763. [DOI] [PubMed] [Google Scholar]
- 59.Gribskov M., Luthy,R. and Eisenberg,D. (1990) Profile analysis. Methods Enzymol., 183, 146–159. [DOI] [PubMed] [Google Scholar]
- 60.Krogh A., Mian,I.S. and Haussler,D. (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res., 22, 4768–4778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Durbin R., Eddy,S., Krogh,A. and Mitchison,G. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, UK.
- 62.Park J., Teichmann,S.A., Hubbard,T. and Chothia,C. (1997) Intermediate sequences increase the detection of homology between sequences. J. Mol. Biol., 273, 349–354. [DOI] [PubMed] [Google Scholar]
- 63.Henikoff S. and Henikoff,J.G. (1997) Embedding strategies for effective use of information from multiple sequence alignments. Protein Sci., 6, 698–705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Lindahl E. and Elofsson,A. (2000) Identification of related proteins on family, superfamily and fold level. J. Mol. Biol., 295, 613–625. [DOI] [PubMed] [Google Scholar]
- 65.Aravind L. and Koonin,E.V. (1999) Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J. Mol. Biol., 287, 1023–1040. [DOI] [PubMed] [Google Scholar]
- 66.Apweiler R., Attwood,T.K., Bairoch,A., Bateman,A., Birney,E., Biswas,M., Bucher,P., Cerutti,L., Corpet,F., Croning,M.D., Durbin,R., Falquet,L., Fleischmann,W., Gouzy,J., Hermjakob,H., Hulo,N., Jonassen,I., Kahn,D., Kanapin,A., Karavidopoulou,Y., Lopez,R., Marx,B., Mulder,N.J., Oinn,T.M., Pagni,M. and Servant,F. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res., 29, 37–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Smith T.F., Lo Conte,L., Bienkowska,J., Gaitatzes,C., Rogers,R.G.,Jr and Lathrop,R. (1997) Current limitations to protein threading approaches. J. Comput. Biol., 4, 217–225. [DOI] [PubMed] [Google Scholar]
- 68.Jones D.T. (1997) Progress in protein structure prediction. Curr. Opin. Struct. Biol., 7, 377–387. [DOI] [PubMed] [Google Scholar]
- 69.Sippl M.J., Lackner,P., Domingues,F.S., Prlic,A., Malik,R., Andreeva,A. and Wiederstein,M. (2001) Assessment of the CASP4 fold recognition category. Proteins, 45 (Suppl. 5), 55–67. [DOI] [PubMed] [Google Scholar]
- 70.Levitt M. (1997) Competitive assessment of protein fold recognition and alignment accuracy. Proteins, (Suppl. 1), 92–104. [DOI] [PubMed] [Google Scholar]
- 71.Al-Lazikani B., Jung,J., Xiang,Z. and Honig,B. (2001) Protein structure prediction. Curr. Opin. Chem. Biol., 5, 51–56. [DOI] [PubMed] [Google Scholar]
- 72.Jones D.T., Tress,M., Bryson,K. and Hadley,C. (1999) Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure. Proteins, 37, 104–111. [DOI] [PubMed] [Google Scholar]
- 73.Venclovas C. (2001) Comparative modeling of CASP4 target proteins: combining results of sequence search with three-dimensional structure assessment. Proteins, 45 (Suppl. 5), 47–54. [DOI] [PubMed] [Google Scholar]
- 74.Marti-Renom M.A., Stuart,A.C., Fiser,A., Sanchez,R., Melo,F. and Sali,A. (2000) Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct., 29, 291–325. [DOI] [PubMed] [Google Scholar]
- 75.Baker D. and Sali,A. (2001) Protein structure prediction and structural genomics. Science, 294, 93–96. [DOI] [PubMed] [Google Scholar]
- 76.Moult J., Fidelis,K., Zemla,A. and Hubbard,T. (2001) Critical assessment of methods of protein structure prediction (CASP): round IV. Proteins, 45 (Suppl. 5), 2–7. [PubMed] [Google Scholar]
- 77.Tramontano A., Leplae,R. and Morea,V. (2001) Analysis and assessment of comparative modeling predictions in CASP4. Proteins, 45 (Suppl. 5), 22–38. [DOI] [PubMed] [Google Scholar]
- 78.Venclovas C., Zemla,A., Fidelis,K. and Moult,J. (2001) Comparison of performance in successive CASP experiments. Proteins, 45 (Suppl. 5), 163–170. [DOI] [PubMed] [Google Scholar]
- 79.Jaroszewski L., Rychlewski,L., Zhang,B. and Godzik,A. (1998) Fold prediction by a hierarchy of sequence, threading and modeling methods. Protein Sci., 7, 1431–1440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Laskowski R.A., MacArthur,M.W., Moss,D.S. and Thornton,J.M. (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr., 26, 283–291. [Google Scholar]
- 81.Pontius J., Richelle,J. and Wodak,S.J. (1996) Deviations from standard atomic volumes as a quality measure for protein crystal structures. J. Mol. Biol., 264, 121–136. [DOI] [PubMed] [Google Scholar]
- 82.Kim S., Moitoso de Vargas,L., Nunes-Duby,S.E. and Landy,A. (1990) Mapping of a higher order protein–DNA complex: two kinds of long-range interactions in lambda attL. Cell, 63, 773–781. [DOI] [PubMed] [Google Scholar]
- 83.Segall A.M. and Nash,H.A. (1993) Synaptic intermediates in bacteriophage lambda site-specific recombination: integrase can align pairs of attachment sites. EMBO J., 12, 4567–4576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Segall A.M. (1998) Analysis of higher order intermediates and synapsis in the bent-L pathway of bacteriophage lambda site-specific recombination. J. Biol. Chem., 273, 24258–24265. [DOI] [PubMed] [Google Scholar]
- 85.Kim S. and Landy,A. (1992) Lambda Int protein bridges between higher order complexes at two distant chromosomal loci attL and attR. Science, 256, 198–203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Luscombe N.M., Laskowski,R.A. and Thornton,J.M. (2001) Amino acid–base interactions: a three-dimensional analysis of protein–DNA interactions at an atomic level. Nucleic Acids Res., 29, 2860–2874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Ladbury J.E. (1996) Just add water! The effect of water on the specificity of protein–ligand binding sites and its potential application to drug design. Chem. Biol., 3, 973–980. [DOI] [PubMed] [Google Scholar]
- 88.Otwinowski Z., Schevitz,R.W., Zhang,R.G., Lawson,C.L., Joachimiak,A., Marmorstein,R.Q., Luisi,B.F. and Sigler,P.B. (1988) Crystal structure of trp repressor/operator complex at atomic resolution. Nature, 335, 321–329. [DOI] [PubMed] [Google Scholar]
- 89.Kim S.T., Kim,G.W., Lee,Y.S. and Park,J.S. (2001) Characterization of Cre-loxP interaction in the major groove: hint for structural distortion of mutant Cre and possible strategy for HIV-1 therapy. J. Cell. Biochem., 80, 321–327. [PubMed] [Google Scholar]
- 90.Enquist L.W. and Weisberg,R.A. (1977) A genetic analysis of the att-int-xis region of coliphage lambda. J. Mol. Biol., 111, 97–120. [DOI] [PubMed] [Google Scholar]
- 91.MacWilliams M.P., Gumport,R.I. and Gardner,J.F. (1996) Genetic analysis of the bacteriophage lambda attL nucleoprotein complex. Genetics, 143, 1069–1079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.MacWilliams M.P., Gumport,R.I. and Gardner,J.F. (1997) Mutational analysis of protein binding sites involved in formation of the bacteriophage lambda attL complex. J. Bacteriol., 179, 1059–1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Goodman S.D., Nicholson,S.C. and Nash,H.A. (1992) Deformation of DNA during site-specific recombination of bacteriophage lambda: replacement of IHF protein by HU protein or sequence-directed bends. Proc. Natl Acad. Sci. USA, 89, 11910–11914. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Segall A.M., Goodman,S.D. and Nash,H.A. (1994) Architectural elements in nucleoprotein complexes: interchangeability of specific and non-specific DNA binding proteins. EMBO J., 13, 4536–4548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Sarkar D., Radman-Livaja,M. and Landy,A. (2001) The small DNA binding domain of lambda integrase is a context-sensitive modulator of recombinase functions. EMBO J., 20, 1203–1212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Read E.K., Gumport,R.I. and Gardner,J.F. (2000) Specific recognition of DNA by integration host factor. Glutamic acid 44 of the beta-subunit specifies the discrimination of a T:A from an A:T base pair without directly contacting the DNA. J. Biol. Chem., 275, 33759–33764. [DOI] [PubMed] [Google Scholar]
- 97.Enquist L.W., Kikuchi,A. and Weisberg,R.A. (1979) The role of lambda integrase in integration and excision. Cold Spring Harb. Symp. Quant. Biol., 2, 1115–1120. [DOI] [PubMed] [Google Scholar]
- 98.Blakely G.W. and Sherratt,D.J. (1996) Cis and trans in site-specific recombination. Mol. Microbiol., 20, 234–237. [DOI] [PubMed] [Google Scholar]
- 99.Chen J.W., Lee,J. and Jayaram,M. (1992) DNA cleavage in trans by the active site tyrosine during Flp recombination: switching protein partners before exchanging strands. Cell, 69, 647–658. [DOI] [PubMed] [Google Scholar]
- 100.Dixon J.E., Shaikh,A.C. and Sadowski,P.D. (1995) The Flp recombinase cleaves Holliday junctions in trans. Mol. Microbiol., 18, 449–458. [DOI] [PubMed] [Google Scholar]
- 101.Han Y.W., Gumport,R.I. and Gardner,J.F. (1993) Complementation of bacteriophage lambda integrase mutants: evidence for an intersubunit active site. EMBO J., 12, 4577–4584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Jayaram M. (1997) The cis–trans paradox of integrase. Science, 276, 49–51. [DOI] [PubMed] [Google Scholar]
- 103.Lee J., Jayaram,M. and Grainge,I. (1999) Wild-type Flp recombinase cleaves DNA in trans. EMBO J., 18, 784–791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Lee J., Whang,I. and Jayaram,M. (1994) Directed protein replacement in recombination full sites reveals trans-horizontal DNA cleavage by Flp recombinase. EMBO J., 13, 5346–5354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Mondragon A. (1997) Solving the cis/trans paradox in the Int family of recombinases. Nature Struct. Biol., 4, 427–429. [DOI] [PubMed] [Google Scholar]
- 106.Nunes-Düby S.E., Tirumalai,R.S., Dorgai,L., Yagil,E., Weisberg,R.A. and Landy,A. (1994) Lambda integrase cleaves DNA in cis. EMBO J., 13, 4421–4430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Shaikh A.C. and Sadowski,P.D. (1997) The Cre recombinase cleaves the lox site in trans. J. Biol. Chem., 272, 5695–5702. [DOI] [PubMed] [Google Scholar]
- 108.Nunes-Düby S.E., Radman-Livaja,M., Kuimelis,R.G., Pearline,R.V., McLaughlin,L.W. and Landy,A. (2002) Lambda integrase complementation at the level of DNA binding and complex formation. J. Bacteriol., 184, 1385–1394. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







