Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2011 Nov 21;109(7):E398–E405. doi: 10.1073/pnas.1113277108

Displacement of the canonical single-stranded DNA-binding protein in the Thermoproteales

Sonia Paytubi a,1,2, Stephen A McMahon a,2, Shirley Graham a, Huanting Liu a, Catherine H Botting a, Kira S Makarova b, Eugene V Koonin b, James H Naismith a,3, Malcolm F White a,3
PMCID: PMC3289382  PMID: 22106294

Abstract

ssDNA-binding proteins (SSBs) based on the oligonucleotide-binding fold are considered ubiquitous in nature and play a central role in many DNA transactions including replication, recombination, and repair. We demonstrate that the Thermoproteales, a clade of hyperthermophilic Crenarchaea, lack a canonical SSB. Instead, they encode a distinct ssDNA-binding protein that we term “ThermoDBP,” exemplified by the protein Ttx1576 from Thermoproteus tenax. ThermoDBP binds specifically to ssDNA with low sequence specificity. The crystal structure of Ttx1576 reveals a unique fold and a mechanism for ssDNA binding, consisting of an extended cleft lined with hydrophobic phenylalanine residues and flanked by basic amino acids. Two ssDNA-binding domains are linked by a coiled-coil leucine zipper. ThermoDBP appears to have displaced the canonical SSB during the diversification of the Thermoproteales, a highly unusual example of the loss of a “ubiquitous” protein during evolution.

Keywords: Archaea, molecular evolution, replication protein A


The ssDNA-binding (SSB) proteins are essential for the genome maintenance of all known cellular organisms (1) and are present in many viruses (2, 3). These proteins play vital roles in DNA metabolism, sequestering and protecting transiently formed ssDNA during DNA replication and recombination (4, 5), melting dsDNA, detecting DNA damage, and recruiting repair proteins (6). The SSBs from the three domains of life share limited sequence similarity and display diverse subunit organization. The low sequence conservation notwithstanding, all SSB family proteins contain one or more conserved oligonucleotide-binding (OB) fold domains (a five-stranded β-sheet coiled to form a closed β-barrel) that mediate ssDNA binding with high affinity (7, 8). The organization of OB folds in SSBs varies considerably. For example, Escherichia coli SSB is a homotetramer, with each subunit consisting of a single OB domain, in conjunction with a flexible C-terminal extension involved in protein–protein interactions (9, 10). The Deinococcus/Thermus SSBs, although still using the tetrameric functional-binding mode, arrive at this arrangement by combining two SSB homodimers, each SSB monomer containing two OB folds linked by a conserved spacer sequence (11, 12). Moreover, the DdrB (DR0070) protein that is essential for radiation resistance in Deinococcus radiodurans is a highly divergent SSB homolog (13, 14).

Eukaryotes use a heterotrimeric SSB known as “replication protein A” (RPA) with six OB folds, two that mediate subunit interactions and four that are involved in ssDNA binding (15, 16). In addition, Metazoa encode one or more additional SSB proteins with a single OB fold, exemplified by hSSB1 in Homo sapiens, which is implicated in the DNA damage response (17). The arrangement of euryarchaeal SSBs is similar to eukaryotic RPA: a polypeptide or polypeptides with multiple OB folds, including a characteristic OB fold interrupted by a zinc-binding domain (1821). It appears that some euryarchaeal SSBs form heterotrimers and others form heterodimers or monomers (19, 21, 22). In contrast, in most Crenarchaea SSB has a bacterial-like domain structure, with a single OB fold followed by a flexible C-terminal tail that is not involved in DNA binding (23). The crystal structure of the OB fold of the Sulfolobus solfataricus SSB demonstrated its close structural relationship with the ssDNA-binding domains of human RPA70 (24).

Structural and bioinformatic studies have identified a characteristic sequence signature for the OB fold that allows its detection even in genomes that encode highly diverged versions of the SSB. OB fold-containing SSB proteins have been detected in all three domains of life, but, as we reported previously, one group of Crenarchaea, the Thermoproteales, appear to lack an identifiable SSB-encoding gene (25). There are now 10 fully sequenced genomes in this group (Thermoproteus tenax, Thermoproteus uzoniensis, Thermoproteus neutrophilius, Caldivirga maquilingensis, Pyrobaculum aerophilum, Pyrobaculum arsenaticum, Pyrobaculum islandicum, Pyrobaculum calidifontis, Vulcanisaeta moutnovskia, and Vulcanisaeta distributa) that lack any identifiable ssb genes. By contrast, only one sequenced genome in this clade, Thermofilum pendens, does encode two SSB proteins. We reasoned that the 10 species of Thermoproteales that apparently lack a canonical SSB must use an alternative ssDNA-binding protein. By biochemically screening T. tenax cell extracts for ssDNA-binding proteins, we identified a single candidate, Ttx1576, which was unique to the species lacking a canonical SSB. The gene encoding Ttx1576 was cloned, and the protein was shown to possess properties consistent with a role as an SSB. Structural characterization of Ttx1576 has revealed an ssDNA-binding domain, with a distinct fold, attached to a C-terminal leucine zipper dimerization domain.

Results

Identification of T. tenax Proteins Binding to ssDNA.

SSB and RPA proteins have been identified previously from crude cell extracts using gel-shift experiments with labeled ssDNA (23, 26). To identify ssDNA-binding proteins from T. tenax, we used a related affinity purification approach. A biotinylated 45-nt oligonucleotide was bound to magnetic streptavidin beads and incubated with T. tenax cell lysate for 90 min at 50 °C to maximize the opportunity for a binding equilibrium to develop. The beads were harvested and washed in buffer with progressively higher NaCl concentrations. After each set of washes, the supernatants were collected, and proteins were precipitated using trichloroacetic acid (TCA)/acetone before separation by SDS/PAGE. As shown in Fig. 1A, this approach yielded a number of distinct protein bands following SDS/PAGE, which were excised and identified by MS. The affinity purification experiment was repeated three times, with highly reproducible results. As expected, we observed high-abundance proteins known to bind to ssDNA or RNA. Prominent bands included several subunits of RNA polymerase, a DNA helicase (Ttx0530), a RadA paralog (Ttx1408), RNase E (Ttx0105), and transcription termination factor NusA (Ttx1674). In addition, we identified two proteins of unknown function, Ttx2090 and Ttx1576, which were characterized in more detail.

Fig. 1.

Fig. 1.

Identification and purification of ssDNA-binding proteins from T. tenax. (A) SDS/PAGE analysis of T. tenax proteins binding to a biotinylated 45-nt DNA oligonucleotide. Proteins identified by MS from excised gel bands are labeled. For each protein, the Ttx gene number is indicated along with a description. (B) SDS/PAGE analysis of the purified, heterologously expressed Ttx1576 protein. (C) Estimation of the cellular levels of T. tenax Ttx1576 and S. solfataricus SSB. The concentrations of Ttx1576 and SsoSSB in soluble T. tenax and S. solfataricus cell lysates, respectively, were analyzed by SDS/PAGE and Western blot. Purified recombinant proteins were used to calibrate the result. MW, molecular weight marker lane.

The Ttx2090 protein has homologs in archaeal species, such as Aeropyrum pernix and Ignicoccus hospitalis, which encode a canonical SSB. The ttx2090 gene was cloned into an E. coli expression vector, but we did not succeed in obtaining soluble protein. Bioinformatic analysis suggested that ths gene may be a member of the prefoldin family of protein chaperones [>95% probability by HHpred (27)] and therefore may therefore have been purified in association with partially unstructured proteins. In contrast, the Ttx1576 protein belongs to archaeal clusters of orthologous groups (COG) arCOG05578 (28), which is represented in all available genomes of Thermoproteales, with the sole exception of T. pendens, and in no other sequenced genomes. It therefore shows perfect complementarity with the phyletic pattern of RPA/SSB proteins (arCOG01510) (Dataset S1). Thus, we focused on Ttx1576 as the candidate for a unique ssDNA-binding protein in Thermoproteales.

Cloning and Expression of Ttx1576.

To test whether Ttx1576 has properties consistent with a ssDNA-binding protein in vitro, we amplified the gene from T. tenax chromosomal DNA by PCR and cloned it into an E. coli expression vector (pET151/d-TOPO) allowing expression in E. coli with a cleavable N-terminal polyhistidine tag. The protein was purified by immobilized metal affinity chromatography, and the his-tag was removed with tobacco etch virus (TEV) protease. Ttx1576 then was purified again by immobilized metal affinity chromatography, and the flowthrough was collected, yielding essentially homogeneous protein (Fig. 1B). This preparation was used for ensuing biochemical and structural studies. Polyclonal antibodies against Ttx1576 were raised in sheep and used to estimate the amount of the protein present in T. tenax cells (Fig. 1C). Western blotting was used to quantify the levels of Ttx1576 in a defined quantity of cell extract using the recombinant protein for calibration. This method estimated the levels of Ttx1576 in the cell to be 0.07–0.13% of total soluble protein. By comparison, cellular levels of S. solfataricus SSB were estimated to be 0.08–0.16% of total soluble protein. Thus, Ttx1576 has a cellular concentration consistent with a putative role as an ssDNA-binding protein.

Ttx1576 Binds ssDNA Specifically.

To determine the nucleic acid-binding properties of Ttx1576, we carried out gel electrophoretic mobility shift experiments using a 24mer oligonucleotide of mixed sequence (5′-CTTTCAATTCTATAGTAGATTAGC) with a fluorescent label at the 5′ end. Protein and DNA were mixed and incubated at either 20 °C or 80 °C for 10 min before loading on a polyacrylamide gel for electrophoresis and subsequent imaging (Fig. 2). At both temperatures, a clear retarded species was observed in the gel corresponding to an apparent Kd of ∼0.6 μM. Incubation at the higher temperature did not appear to influence the binding affinity significantly, although slightly more unbound DNA was observed at higher protein concentrations under these conditions. When the experiment was repeated with an RNA oligonucleotide of the same sequence, the binding affinity was significantly weaker, and no specific retarded species was observed (Fig. 2). Binding to a DNA duplex of the same sequence was very weak, with no evidence of retarded products observed at the highest protein concentration. Together, these data suggest that Ttx1576 binds specifically to ssDNA and that this interaction is not especially sensitive to incubation temperature. Previous studies of S. solfataricus SSB have demonstrated that ssDNA–SSB interactions are exothermic, with binding affinity higher at lower temperatures, although Kds also are influenced strongly by the ions present in the buffer (29).

Fig. 2.

Fig. 2.

Interaction of Ttx1576 with different nucleic acids. EMSAs showing binding of Ttx1576 to a 24-nt DNA oligonucleotide (Top), RNA oligonucleotide of the same sequence (Middle), and DNA duplex (Bottom). Oligonucleotides (0.2 μM) were incubated at 20 or 80 °C for 10 min with varying concentrations of Ttx1576 before electrophoresis at room temperature. Ttx1576 concentrations in each experiment were (left to right): 0, 0.05, 0.1, 0.2, 0.4, 0.8, 1.6, 3.2, and 10 μM.

To obtain more quantitative values for DNA-binding affinity, TTX1576 binding to ssDNA and ds DNA was analyzed by isothermal titration calorimetry. Oligonucleotide 21T (composed of 21 deoxythymidine nucleotides) was titrated into a 10-μM solution of Ttx1576. The binding isotherm (Fig. 3A) revealed exothermic binding characteristic of SSBs (24) with a calculated Kd of 160 nM and a stoichiometry of two protein monomers (or one dimer) bound per oligonucleotide. By contrast, a 21-base pair dsDNA molecule of mixed sequence (ds21mix) was not bound by Ttx1576 under these conditions (Fig. 2B), confirming the specificity for ssDNA. To examine the sequence specificity of Ttx1576, we used oligonucleotides of different sequences labeled with a 5′-fluorescein dye and measured changes in anisotropy upon protein binding (Fig. 3C). Binding isotherms for three oligonucleotides whose sequences are listed in Materials and Methods were determined in triplicate: 21T, 21G-rich, and 21C. The 21T oligonucleotide was bound with an affinity of 100 ± 8 nM, in good agreement with the Kd calculated using isothermal titration calorimetry (ITC). The G-rich oligonucleotide was bound with a similar Kd, 116 ± 11 nM, whereas the 21C oligonucleotide was bound more tightly, with a Kd of 22 ± 5 nM. Neither the ITC nor the anisotropy experiments revealed any evidence for cooperative binding, a known property of other SSBs. This result almost certainly reflects the relatively small lengths of DNA used, which appear to accommodate only one dimer of the protein. Finally, binding to RNA was investigated using a fluorescently labeled RNA oligonucleotide, 21U. The resulting binding isotherm generated a significantly higher Kd of 2.1 ± 0.2 μM, in good agreement with the gel retardation data.

Fig. 3.

Fig. 3.

Quantitative measurements of Ttx1576 binding to DNA and RNA. (A) Quantification of Ttx1576 binding to a 21T oligonucleotide by ITC. Oligonucleotide 21T (75 μM in the syringe) was injected into a 10-μM solution of Ttx1576. The data were fitted with a simple one-site-binding model, yielding a Kd of 160 nM and a binding stoichiometry of 2:1 (Ttx1576 monomers:oligonucleotide). (B) Quantification of Ttx1576 binding to a 21-bp DNA duplex of mixed sequence (21Mix) by ITC. 21Mix dsDNA (75 μM in syringe) was injected into a solution of 10 μM of Ttx1576 protein. No binding was observed. (C) A plot of the anisotropy changes resulting from binding of Ttx1576 to 21-nt oligonucleotides tagged with a 5′ fluorescein reporter molecule was used to determine dissociation constants (21C, Kd 22 nM; 21T, Kd 100 nM; 21G-rich, Kd 116 nM; and the RNA oligonucleotide 21U, Kd 2.1 μM). Experiments were carried out in triplicate; results shown are means + SEs. Data were fitted as described in Materials and Methods.

Taken together, the data indicate that Ttx1576 binds preferentially to ssDNA and shows limited sequence specificity, properties consistent with a role as a SSB. We therefore propose the name “ThermoDBP” for this Thermoproteales-specific protein family, to distinguish it from canonical SSB proteins.

Structure of ThermoDBP Reveals a Distinct Fold and DNA-Binding Surface.

The full-length protein having failed to give crystals with useful diffraction, we incubated Ttx1576 with chymotrypsin (30) during crystallization. This method gave crystals (denoted “cTtx1576”) that diffracted well. The cTtx1576 structure was solved by selenomethionine incorporation and anomalous diffraction (Table 1). In the final model of cTtx1576 there was one monomer in the asymmetric unit consisting of residues amino acids 24–139. Because the protein was obtained in the presence of protease, it was not possible to be certain whether disorder or cleavage was the cause of the missing residues (amino acids 1–23 and 140–196). Based on sequence analysis described in detail later, we constructed, purified and crystallized a Ttx1576 mutant corresponding to amino acids 1–148. Crystals diffracted to 2.0 Å, and the structure was solved using molecular replacement, revealing that residues 10–148 were well ordered (Fig. 4A). The protein structure consists of a single compact domain, comprising four α helices (α1–4) and a four-stranded anti-parallel β-sheet (β1–4), measuring ∼50 Å × 20 Å × 20 Å. The N terminus of the structure comprises β1 and β2 followed by the four α-helices that form three sides of a distorted quadrilateral. The fourth side is completed by the loop between α-4 and β-3. The β−sheet packs against one face of the quadrilateral, leaving an extended cleft open to solvent on the other face. Strikingly, nine phenylalanine residues are distributed along the flanking helices and form a continuous hydrophobic patch that runs along the length of the cleft (Fig. 4B). The outer edge of the binding cleft has a strongly positive electrostatic surface potential because of the presence of the conserved basic residues R49, K54, R65, R80, R86, R90, K97, and R112 (Fig. 4 B and C). Thus, the putative ssDNA-binding cleft of ThermoDBP has a hydrophobic, aromatic core suitable for interaction with the nucleobases of ssDNA and a positively charged periphery for electrostatic interactions with the phosphodiester backbone.

Table 1.

Crystallographic data for Ttx1576

Data collection Mutant 1–148 cTtx1576 cTtx1576 Se
Resolution (Å) 50–2.0 (2.03–2.0) 50–2.9 (3–2.9) 50–3.56 (3.62–3.56)
Space group P 21 I213 I213
Cell dimensions
 a, b, c (Å), (°) a = 39.8 b = 103.7 c = 39.8 β=118 a = b =c = 105.7 a = b =c = 106.6
 Vm3/Da) 2.67 3.65 3.73
 Solvent (%) 53.9 66.3 67
 Total reflections 1365045 319809 198356
 Unique reflections 19149 4519 2560
 II 24.4 (1.6) 71.5 (8.9) 18.1 (3)
 Completeness (%) 91.8 (54.4) 99.7 (97.3) 100 (100)
 Redundancy 5.3 (3.1) 20.6 (14.6) 19.7 (18.1)
 Rmerge 9.0 (31.8) 7.2 (45) 22.4 (76.5)
Refinement
 Rwork/Rfree 20.4/25.6
 No. atoms
 Protein 2301
 Water 93
RMS deviations
 Bond lengths (Å) 0.012
 Bond angles (°) 1.326

Fig. 4.

Fig. 4.

The crystal structure of Ttx1576 reveals a distinct fold with ssDNA-binding features. (A) Structure of the DNA-binding domain of Ttx1576 (residues 10–148) showing the conserved aromatic and basic residues that line the proposed ssDNA-binding cleft. The N and C termini are indicated by blue and red spheres, respectively. (B) A surface representation shown in the same orientation as in A, colored to show conserved phenylalanine residues in yellow and basic residues in blue. The hydrophobic binding cleft is lined with basic amino acids that are well positioned to interact with the phosphodiester backbone of bound ssDNA. (C) A surface representation shown in the same orientation as in A and B, indicating the electrostatic potential of the putative binding interface. (D) Sequence alignment of the C-terminal region of Ttx1576 (residues 154–192) with a Hidden Markov Model corresponding to the bZIP family, with conserved residues indicated. (E) A model of the full-length structure of Ttx1576, including the C-terminal leucine zipper motif. The model for the C terminus was constructed using the zipper motif from the Jun BZIP homodimer (PDB ID code 2H7H). The binding cleft in the DNA-binding domain is denoted by phenylalanine side chains in yellow, and the termini defined in the crystal structure are indicated by spheres as before. The model is not intended to represent the relative position of the two ssDNA-binding domains accurately.

A search for structurally similar proteins using either PDBFold or Dali suggested that ThermoDBP has a unique fold. By allowing more tolerance than the default, it was possible to detect weak structural similarity with two RNA-binding proteins, HutP [Protein Data Bank (PDB) ID code 1wrq, an RNA-binding antitermination protein (31)] and the L31e protein from the large ribosomal subunit of Haloarcula marismortui (1yj9) (32). An optimum alignment of HutP and Ttx1576 matches 75 Cα with an rmsd of 2.6 Å. Visual inspection reveals that the proteins have almost identical topology but share essentially no sequence similarity. In HutP, the secondary structural elements are displaced relative to ThermoDBP, and the cleft is absent. The RNA-binding site of HutP is on the side of the structure, parallel to α1 but remote from the presumed ThermoDBP ssDNA-binding cleft. The L31e protein has the same topology as ThermoDBP (superimposing 57 Cα atoms with an rmsd of 3.1 Å) but lacks the cleft. L31e forms part of the polypeptide exit tunnel of the ribosome in close association with rRNA (32).

Domain Organization of ThermoDBP.

The C terminus of ThermoDBP, which was removed by proteolysis or mutagenesis before crystallization, is strongly predicted to adopt a coiled-coil, amphipathic leucine zipper structure. Circular dichroism of the full-length protein shows an overall helical content of 47%, markedly higher than the 27% helical content of the crystallized DNA-binding domain, consistent with a strongly helical C-terminal domain. Analysis of the ThermoDBP sequence by a Conserved Domain Database (33) search revealed a statistically significant (E-value = 2.79e-03) match with profile cl02576 corresponding to the basic leucine zipper (bZIP) domain (Fig. 4D). Multiple alignments of ThermoDBP family proteins revealed conservation of several leucines in the last α helical region (Fig. S1). Leucine zippers are dimerization domains, suggesting that ThermoDBP may have a dimeric structure with an N-terminal ssDNA-binding domain and a C-terminal dimerization domain (modeled in Fig. 4E).

To investigate the roles of different domains in the function of ThermoDBP, we made two different C-terminally truncated mutant versions and compared their DNA-binding affinities by gel electrophoretic mobility shift of a fluorescent oligonucleotide. In the first mutant (1–148), the predicted leucine zipper domain was removed by introducing a stop codon at amino acid position 149. This mutant protein still bound ssDNA with an affinity only slightly lower than that of the full-length wild-type protein (Fig. 5A). Retarded DNA migrated more quickly than for the full-length protein, suggesting a smaller nucleoprotein complex. Deletion of a further nine amino acids including the conserved “LIYWIRSDR” sequence (mutant 1–139) showed significantly weaker DNA-binding activity, suggesting that the conserved sequence motif participates in ssDNA binding. The truncation mutant 1–139 elutes from a calibrated size exclusion column with a retention time corresponding to a molecular weight of 17 kDa, consistent with a monomeric domain molecular mass of 16 kDa (Fig. 5B). In contrast the full-length protein has a retention time corresponding to a molecular mass of 57 kDa. This mass is slightly higher than the expected value of 46 kDa for a dimeric structure, probably because of the elongated shape of the predicted dimer, a factor that is known to influence retention times in size exclusion chromatography.

Fig. 5.

Fig. 5.

Truncation of the C terminus of Ttx1576 alters quaternary structure and DNA-binding affinity. (A) Gel-shift analysis of Ttx1576 binding to a fluorescent 45mer oligonucleotide. The wild-type protein bound efficiently to the DNA, but deletion of the C-terminal sequence after residue 139, which removes the leucine zipper region and the conserved sequence between 140–148, reduced binding affinity significantly. The deletion construct 1–148, which also lacks the leucine zipper but includes the conserved sequence, retained much of the binding affinity of the wild-type enzyme. Protein concentrations (left to right) were 0, 1, 5, 10, 20, 50, and 100 μM. (B) The molecular weights of the full-length (WT) and C-terminally truncated 1–139 mutant Ttx1576 proteins were estimated by gel filtration using a calibrated Superose 12 column. The elution volumes were consistent with a dimeric structure of the WT protein and a monomeric structure for the mutant. (C) Schematic showing the domain organization of Ttx1576. The crystallized portion (XTAL) is shaded.

Distant Archaeal Homologs of ThermoDBP.

An HHpred search starting from the Ttx1576 sequence (amino acids 1–146) revealed statistically significant similarity (probability = 94%) with the pfam10015 protein family (DUF2258, also known as COG4345). The reverse search using PSI-BLAST with SSO1098, a member of DUF2258 from Sulfolobus solfataricus, as a query, detected statistically significant similarity (E-value = 0.001) with a ThermoDBP family representative, Pcal_0963 from Pyrobaculum calidifontis. The DUF2258 family contains archaeal proteins only, and in arCOGs the phyletic profile for the corresponding arCOG03772 includes proteins from Thermoproteales, Desulfurococcales, Thermococci, and a few Archaeoglobi (Figs. S1 and S2). Multiple alignment of this protein family revealed conservation of all eight secondary structure elements of the ThermoDBP fold (Fig. S2). In addition, all these proteins contain a predicted long α helix after the core domain, suggesting that, similarly to ThermoDBP, these proteins can dimerize. No conservation of the leucine zipper signature was observed. A group of proteins within the arCOG03772 family contain an additional C-terminal domain with a predicted mixed α and β secondary structure, for which no similarity to any known protein could be detected (Fig. S1). We have not found any conservation of gene neighborhood for this family, so there are no specific clues as to the function(s) of these distant homologs of ThermoDBP. Nevertheless, the sequence conservation and broad distribution of this family in Archaea suggest an important biological role. Given that the member of arCOG03772 from T. tenax (Ttx1840) was not detected in our ssDNA-binding assay, there seems to be a distinct possibility that these homologs of ThermoDBP are not ssDNA-binding proteins or are not highly expressed.

Discussion

Strikingly, ThermoDBP is the only protein family that is present in all 10 Thermoproteales species lacking a canonical SSB and absent in all species encoding an SSB. The complementarity of the phyletic patterns of SSB and ThermoDBP suggests the possibility that, in the Thermoproteales lacking canonical SSB proteins, ThermoDBP supplies the essential ssDNA-binding activity, in a dramatic case of nonorthologous gene displacement (34). An alternative possibility remained that ssDNA binding in the Thermoproteales was mediated by an extremely divergent OB fold containing an SSB variant that was undetectable by sequence analysis. This possibility appears remote, because in the course of arCOG construction RPA orthologs have been detected with a high degree of confidence using sensitive sequence profiles in all available archaeal genomes except for Thermoproteales, including deep and possibly fast-evolving lineages such Korarchaeota and Nanoarchaeota (28). More importantly, by affinity purifying ssDNA-binding proteins from T. tenax, we confirmed that, even if such a protein was present in these organisms, the principal ssDNA-binding activity in the Thermoproteales cells resides in a distinct protein, ThermoDBP (Ttx1576), which possesses functional properties characteristic of an SSB.

Although ThermoDBP shares topology with two known RNA- binding proteins, it possesses unique structural features and is unrelated to the canonical, OB fold-containing SSB family. The structure of ThermoDBP reveals that the large central phenylalanine-lined cleft is the site for DNA binding. The phenylalanine residues are spaced out along the cleft with a separation of 3.5–5.5 Å, potentially allowing nucleotide bases to insert into the cleft. Stacking aromatic and hydrophobic side chains against nucleotide bases is a well-known theme in nucleic acid-binding proteins, in particular the OB fold found in SSB proteins (24). The strongly positive electrostatic charge surrounding the cleft appears optimal for binding the negatively charged phosphate backbone. The C-terminal domain has the characteristic signature of an amphipathic helical leucine zipper (Lx6L) and is implicated in the dimerization of ThermoDBP. The presence of a leucine zipper in a family of archaeal proteins is unusual. A common origin for leucine zippers in ThermoDBP and eukaryotic bZIP transcription factors is a provocative possibility, but, given the generic coiled-coil structure of this domain, convergence cannot be ruled out.

Our search of protein sequence databases for distant homologs of ThermoDBP identified a distinct family of archaeal proteins, arCOG03772. These Archaea-specific proteins have a broader phyletic distribution than ThermoDBP and seem to contain counterparts to all structural elements of the ThermoDBP fold as well as a putative coiled-coil dimerization domain that is distinct from the ThermoDBP leucine zipper. Functional characterization of arCOG03772 proteins is an interesting goal for further experiments.

Nonorthologous gene displacement is a common phenomenon in genome evolution that encompasses even central cellular functions, in particular key components of the DNA replication machinery (35). However, the SSB seemed to remain one of the few “truly universal” proteins. The present work shows that, even for this fundamental function, unrelated solutions have evolved.

Materials and Methods

Growth of T. tenax and Cell Lysis.

One gram of T. tenax Kra1 cells, generously provided by Bettina Siebers (University of Duisberg, Essen, Germany) was resuspended in binding buffer (BB) [20 mM Mes (pH 6.5), 50 mM NaCl, 5 mM EDTA and 1 mM DTT] and lysed by sonication. The lysate was centrifuged at 16,000 × g for 30 min at 4 °C, and the resulting supernatant was filtered through a 0.45-μM filter.

Detection of Proteins Interacting with ssDNA.

To detect proteins interacting with ssDNA, 0.6 mg of magnetic streptavidin beads (Promega) were washed three times with 0.5× SSC buffer [75 mM NaCl, 7.5 mM sodium citrate (pH 7.0)]; then 1,500 pmol (22 μg) of a single-stranded biotinylated oligonucleotide (Biot-45ssDNA) was bound to the beads and left at room temperature for 10 min. The beads were washed again three times with 0.1× SSC buffer, and 20 mg of cell lysate was added. The mix was incubated for 90 min at 50 °C. After this incubation, the beads were washed five times with buffer BB containing 150 mM NaCl. Bound proteins were eluted progressively with buffer BB supplemented with 250, 500, and 1,000 mM NaCl. All the fractions were TCA/acetone precipitated, resuspended in SDS/PAGE loading buffer, boiled, and run on a 4–12% Bis-Tris gel (Invitrogen). Gels were stained with SYPRO Ruby and visualized under UV light.

MS: Protein Identification.

Bands from the SDS/PAGE gel were excised into ∼1-mm cubes. These cubes then were subjected to in-gel digestion, using a ProGest Investigator in-gel digestion robot. Briefly, the gel cubes were destained by washing with acetonitrile and subjected to reduction and alkylation before digestion with trypsin at 37 °C. The peptides were extracted with 10% (vol/vol) formic acid and concentrated to 20 μL (SpeedVac; ThemoSavant). They then were separated using an UltiMate nanoLC (LC Packings) equipped with a PepMap C18 trap and column using a 60- or 90-min elution profile, depending on the complexity (molecular mass) of the sample, with a gradient of increasing acetonitrile containing 0.1% (vol/vol) formic acid to elute the peptides [5–35% (vol/vol) acetonitrile over 18 min or 40 min, respectively; then 35–50% (vol/vol) acetonitrile for a further 7 or 20 min, followed by 95% (vol/vol) acetonitrile to clean the column before re-equilibration to 5% (vol/vol) acetonitrile]. The eluent was sprayed into a Q-Star XL tandem mass spectrometer (Applied Biosystems) and was analyzed in information-dependent acquisition mode, performing 1 s of MS followed by 3-s tandem MS (MS/MS) analyses of the two most intense peaks seen by MS. These masses then were excluded from analysis for the next 60 s. MS/MS data for doubly and triply charged precursor ions were converted to centroid data, without smoothing, using the Analyst QS1.1 mascot.dll data import filter with default settings. The MS/MS data file generated was analyzed using the Mascot 2.1 search engine (Matrix Science) searching against a database containing the protein translations of the T. tenax ORFs. The T. tenax genome sequence was provided by Bettina Siebers before publication (36). The data were searched with tolerances of 0.2 Da for the precursor and fragment ions, trypsin as the cleavage enzyme, one missed cleavage, carbamidomethyl modification of cysteines as a fixed modification, and methionine oxidation selected as a variable modification. The Mascot search result was accepted if the protein match had a score that was significantly above the score for other matches and included at least one peptide with a score above the homology threshold.

Cloning, Mutagenesis, and Expression of Ttx1576.

The Ttx1576 gene was amplified by PCR using the primers Ttx1576_forward (5′-CACCGGAGAGGAGCTAAGAGAGGAG) and Ttx1576_reverse (5′-TTATTTCAATAAACTTGTTATC) and was cloned into the pET151/d-TOPO vector (Invitrogen) following the manufacturer's instructions. pET151/d-TOPO-Ttx1576 was transformed to E. coli BL21-Star (DE3) cells. Cells were grown in 2 liters of LB medium at 37 °C to an OD600 of 0.6. At this point, His6-tagged Ttx1576 was induced by 1 mM isopropylthio-β-d-galactopyranoside for 3 h at 37 °C. Truncated mutant versions of the Ttx1576 protein were created by introducing a stop codon at position 140 or 149 to remove the C-terminal domain, as described (37). These proteins were purified as for the wild-type protein but were eluted more slowly on gel filtration, consistent with the disruption of the C-terminal dimerization domain (Fig. 5).

Ttx1576 Protein Purification.

Cells were harvested and resuspended in lysis buffer [20 mM Tris⋅HCl (pH 8.0), 500 mM NaCl, 0.1% Triton X-100, 1 mM MgCl2, and complete EDTA-free protease inhibitors (Roche)], lysed by sonication, and clarified by centrifugation. The supernatant was heated to 70 °C for 10 min and recentrifuged. The resultant supernatant was diluted twofold in buffer A [20 mM Tris⋅HCl (pH 8.0), 500 mM NaCl, 30 mM NaH2PO4] with 30 mM imidazole and filtered through a 0.45-μM filter. The sample then was applied to a column containing Ni-NTA-Agarose (HiTrap 5-mL Chelating HP; GE Healthcare) pre-equilibrated with buffer A plus 30 mM imidazole. The protein was eluted with a linear gradient of 0–500 mM imidazole in buffer A. Fractions containing the His-Ttx1576 protein were identified by SDS/PAGE and pooled. His-Ttx1576 was buffer exchanged against TEV cleavage buffer [20 mM Tris⋅HCl (pH 7.0), 500 mM NaCl, 1 mM DTT, 30 mM NaH2PO4, and 10% glycerol]. The protein was cleaved with the TEV protease overnight at room temperature by adding a final concentration of 200 ng/μL of TEV protease. Cleaved Ttx1576 was repurified by loading onto the same column pre-equilibrated with buffer A plus 30 mM imidazole and collecting the flowthrough. Positive fractions were pooled, and buffer Qa exchanged extensively against storage buffer [50 mM Tris⋅HCl (pH 7.5), 200 mM KCl, 1 mM DTT, 1 mM EDTA, 0.01% Triton X-100, and 50% glycerol]. MS confirmed the expected mass for the recombinant protein following tag removal. Protein to be used for crystallization was prepared as described in ref. 38.

Quantitative Western Blotting.

For the production of Ttx1576 antibodies, 1 mg of purified Ttx1576 protein was used to raise polyclonal antibodies in sheep (Scottish National Blood Transfusion Service). Western blots were performed to detect Ttx1576 and SSB proteins. Defined amounts of Ttx1576 and SSB were run on NuPage 4–12% Bis-Tris SDS gels (Invitrogen) along with 25 and 75 μg of protein prepared from T. tenax and S. solfataricus lysates. Western blots were carried out following standard procedures. Protein content was measured by Bradford assay.

The concentration of endogenous Ttx1576 and SSB in the extracts was determined using a standard curve generated with the recombinant proteins.

Oligonucleotides.

Biot-45ssDNA:

Biotin 5′-GTTTGAAACTACTTTTAACTATAAGTTAAAATGACTCTTAAATAG

Fluorescent 24mer DNA: 5′-FAM-CTTTCAATTCTATAGTAGATTAGC

Fluorescent 24mer RNA: 5′-FAM-CUUUCAAUUCUAUAGUAGAUUAGC

Fluorescent 21T: 5′-FAM-TTTTTTTTTTTTTTTTTTTTT

Fluorescent 21C: 5′-FAM-CCCCCCCCCCCCCCCCCCCCC

Fluorescent 21G-rich: 5′-FAM-TTCTGGGGCTGGGGCTGGGGT

Fluorescent 21U: 5′-FAM-UUUUUUUUUUUUUUUUUUUU

Fluorescent 45mer:

5′′-FAM-GCTTGCTAGGACGGATCCCTCGAGGTTTTTTTTTTTTTTTTTTTT

21T for ITC: 5′-TTTTTTTTTTTTTTTTTTTTT

The DNA duplex ds21Mix used for ITC experiments was assembled from the following oligonucleotides by mixing and slow cooling from 80 °C to 30 °C over 3 h:

21Mix forward: 5′-ATTCAGTTCAACTGTTAGACT

21Mix- reverse: 5′′-AGTCTAACAGTTGAACTTGAAT

Gel Electrophoretic Mobility-Shift Assays.

Binding of wild-type and mutant forms of Ttx1576 to fluorescent DNA and RNA oligonucleotides (200 nM) was performed in binding buffer [50 mM Tris⋅HCl (pH7.5), 50 mM NaCl, 0.1 mg/mL BSA]. Reactions were incubated for 10 min at 20 °C or 80 °C before addition of Ficoll loading buffer, run on an 8% acrylamide gel in 1× TBE at 15 mA constant current for 60–90 min, then scanned using a Fuji FLA5000 fluorescent imager.

ITC.

Binding of Ttx1576 to ssDNA and dsDNA was assessed by ITC using a VP-ITC unit (Microcal; GE Healthcare). Ttx1576 protein samples were dialyzed extensively against ITC buffer [50 mM Tris⋅HCl (pH 7.5), 50 mM NaCl] and degassed in a vacuum. Oligonucleotides also were dissolved in ITC buffer. The binding experiments were performed at 25 °C. A 285-μL syringe with stirring at 300 rpm was used to titrate the 21T or the ds21Mix oligonucleotides (75 μM) into a cell containing 1.4 mL of Ttx1576 protein (10 μM). Titrations comprised 50 injections of ssDNA or dsDNA (one 2-μL injection followed by forty-nine 5-μL injections). The initial data point was deleted routinely to allow for diffusion of ligand/receptor across the needle tip during the equilibration period. ITC-binding isotherms were analyzed using a simple single-binding-site model with the ITC data analysis software (ORIGIN) provided by the manufacturer.

Fluorescence Anisotropy.

The binding affinities of Ttx1576 to several oligonucleotides (21T, 21C, 21G-rich, and 21U) were determined at 20 °C using a Cary Eclipse fluorimeter (Varian) with automatic polarizer (excitation, 490 nm; emission, 535 nm). Both the excitation and emission slit width were set at 5 nm. For direct titration, 25 nM 5′-fluorescein–labeled 21-nt oligonucleotide was equilibrated in 500 μL fluorescence buffer [50 mM Tris⋅HCl (pH 7.5), 50 mM NaCl, 1 mM DTT, 1 mM EDTA]. Total fluorescence intensity was measured in parallel following each protein addition, and the effects of dilution were corrected. Fluorescence quenching higher than 20% was observed only for the 21U oligonucleotide. To minimize rotational effects on fluorescence intensity, “magic angle” conditions were used. Each protein titration was repeated in triplicate. Data were fitted, using Kaleidagraph, to the following equation:

graphic file with name pnas.1113277108eq1.jpg

Where A represents measured anisotropy; E, variable protein concentration; D, total DNA concentration; Amin, anisotropy of free DNA; Amax, anisotropy of DNA-protein complex; and Kd, the dissociation constant (39). The protein concentration recorded corresponded to the concentration of Ttx1576 subunits.

Structural Biology.

Protein for crystallization was concentrated to 15–30 mg/mL for crystallization trials set up as part of a previous structural proteomics effort (38). This approach failed to yield any useful crystals. We repurified the protein and incubated it with chymotrypsin (1:300 vol/vol) before crystallization as a rescue method for intractable proteins (30). The optimum native protein crystallization conditions were refined to 15 mg/mL of chymotrypsin-treated protein equilibrated against 0.9 M sodium tartrate, 0.1 M bicine (pH 8.5), and 0.05 M sodium potassium phosphate. These crystals grew in 2 wk and were reproducible. For selenomethionine-labeled protein the best crystals grew from 28 mg/mL of chymotrypsin-treated protein against a solution of 3.05 M sodium chloride, 0.1 M bicine (pH 9.5), and 0.26 M lithium chloride in 2 wk. Phases were determined by a selenomethionine–single-wavelength anomalous diffraction experiment performed on a single selenomethionine-labeled Ttx1576 crystal. The data have an usually high Rmerge (22.4%) as a result of both radiation damage to the crystal and the presence of ice. The crystal was exposed to the synchrotron beam without any attenuation to get data to 3.5 Å. The high resolution and much higher-quality data for the native protein then were used to refine the structure. Data were collected at 100% transmission on beamline I03 at the Diamond synchrotron light source, Oxfordshire, England. A 2.9-Å native dataset was collected in house using a Rigaku Micromax-007HF Cu anode with VariMax optics alongside a Rigaku Saturn 944+ CCD detector. All data were indexed and scaled with HKL2000 (40). By using both the native and derivative data, the single selenium site was located with SHELXD using the SHELC/D/E (41) suite of programs including the test version of SHELXE. Parrot and Buccaneer, part of CCP4 (42), were used for further automated model building. In contrast the mutant Ttx1576 1–148 crystallized readily overnight when equilibrated against 11.8% isopropanol, 0.1 M sodium citrate (pH 5), and 0.2 M lithium sulfate at a protein concentration of 15 mg/mL High-resolution data were collected on beamline I04-1 at Diamond and were indexed and scaled with HKL2000 (40). By using the 2.9-Å resolution structure as a model, Ttx1576 1–148 was solved by molecular replacement using Phaser in CCP4 (43). Both models were refined with REFMAC5 (44, 45) and Coot (46), with model quality assessed by MOLPROBITY (47). The final coordinates are available in the Protein Data Bank (ID code 3TEK).

Circular Dichroism Spectroscopy.

Ttx1576 was analyzed by circular dichroism to access whether the missing regions of the crystal structure were, as predicted, largely α-helical. Spectra were recorded on a JASCO J-180 spectrometer with protein at 0.15 mg/mL in PBS. The recorded spectrum was analyzed using the Dichroweb server (48).

Size-Exclusion Chromatography.

A Superose 12 column (GE Healthcare) was calibrated using molecular mass standards (blue dextran, thyroglobulin, bovine gamma globulin, chicken ovalbumin, equine myoglobin, and vitamin B12) in GF buffer [20 mM Tris⋅HCl (pH 7.5), 150 mM NaCl] with a flow rate of 0.8 mL/min. The full-length and 1–139 truncation mutant of Ttx1576 were analyzed in the same conditions. The standards yielded a linear relationship for Kav (the fraction of the stationary gel volume accessible to a given solute) to log molecular weight, which was used to calculate the native molecular weight of the Ttx1576 proteins (49).

Bioinformatic Analysis.

Protein sequence database searches were performed using PSI-BLAST (50) with an inclusion threshold E-value of 0.01 and no composition-based statistical correction against the nonredundant (NR) database at the National Center for Biotechnology Information. In addition, distant similarity detection approaches were applied, namely the conserved domain database search (51) and the HHpred search (27). Multiple alignments of protein sequences were constructed using the MUSCLE program (52), followed, when necessary, by a minimal manual correction on the basis of local alignments obtained using the PSI-BLAST and HHpred programs. Protein secondary structure was predicted using the Jpred program (53).

Supplementary Material

Supporting Information

Acknowledgments

We thank Richard Hutton for helpful discussions, Bettina Siebers for the T. tenax biomass and genome sequence, and the University of St Andrews Mass Spectrometry and Proteomics Facility, which is funded by grants from the Wellcome Trust. This work was funded by Biotechnology and Biological Sciences Research Council Grant BB/S/B14450. K.S.M. and E.V.K. are supported by intramural funds of the US Department of Health and Human Services (to the National Library of Medicine, National Institutes of Health).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The atomic coordinates and structure factors have been deposited in the Protein Data Bank database, www.pdb.org (PDB ID code 3TEK).

See Author Summary on page 2198.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1113277108/-/DCSupplemental.

References

  • 1.Mushegian AR, Koonin EV. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci USA. 1996;93:10268–10273. doi: 10.1073/pnas.93.19.10268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Sun S, Shamoo Y. Biochemical characterization of interactions between DNA polymerase and single-stranded DNA-binding protein in bacteriophage RB69. J Biol Chem. 2003;278:3876–3881. doi: 10.1074/jbc.M210497200. [DOI] [PubMed] [Google Scholar]
  • 3.Kowalczykowski SC, Lonberg N, Newport JW, von Hippel PH. Interactions of bacteriophage T4-coded gene 32 protein with nucleic acids. I. Characterization of the binding interactions. J Mol Biol. 1981;145:75–104. doi: 10.1016/0022-2836(81)90335-1. [DOI] [PubMed] [Google Scholar]
  • 4.Wold MS. Replication protein A: A heterotrimeric, single-stranded DNA-binding protein required for eukaryotic DNA metabolism. Annu Rev Biochem. 1997;66:61–92. doi: 10.1146/annurev.biochem.66.1.61. [DOI] [PubMed] [Google Scholar]
  • 5.Meyer RR, Laine PS. The single-stranded DNA-binding protein of Escherichia coli. Microbiol Rev. 1990;54:342–380. doi: 10.1128/mr.54.4.342-380.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Iftode C, Daniely Y, Borowiec JA. Replication protein A (RPA): The eukaryotic SSB. Crit Rev Biochem Mol Biol. 1999;34:141–180. doi: 10.1080/10409239991209255. [DOI] [PubMed] [Google Scholar]
  • 7.Murzin AG. OB(oligonucleotide/oligosaccharide binding)-fold: Common structural and functional solution for non-homologous sequences. EMBO J. 1993;12:861–867. doi: 10.1002/j.1460-2075.1993.tb05726.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Suck D. Common fold, common function, common origin? Nat Struct Biol. 1997;4:161–165. doi: 10.1038/nsb0397-161. [DOI] [PubMed] [Google Scholar]
  • 9.Lohman TM, Ferrari ME. Escherichia coli single-stranded DNA-binding protein: Multiple DNA-binding modes and cooperativities. Annu Rev Biochem. 1994;63:527–570. doi: 10.1146/annurev.bi.63.070194.002523. [DOI] [PubMed] [Google Scholar]
  • 10.Raghunathan S, Kozlov AG, Lohman TM, Waksman G. Structure of the DNA binding domain of E. coli SSB bound to ssDNA. Nat Struct Biol. 2000;7:648–652. doi: 10.1038/77943. [DOI] [PubMed] [Google Scholar]
  • 11.Bernstein DA, et al. Crystal structure of the Deinococcus radiodurans single-stranded DNA-binding protein suggests a mechanism for coping with DNA damage. Proc Natl Acad Sci USA. 2004;101:8575–8580. doi: 10.1073/pnas.0401331101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Dabrowski S, et al. Identification and characterization of single-stranded-DNA-binding proteins from Thermus thermophilus and Thermus aquaticus - new arrangement of binding domains. Microbiology. 2002;148:3307–3315. doi: 10.1099/00221287-148-10-3307. [DOI] [PubMed] [Google Scholar]
  • 13.Sugiman-Marangos S, Junop MS. The structure of DdrB from Deinococcus: A new fold for single-stranded DNA binding proteins. Nucleic Acids Res. 2010;38:3432–3440. doi: 10.1093/nar/gkq036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Norais CA, Chitteni-Pattu S, Wood EA, Inman RB, Cox MM. DdrB protein, an alternative Deinococcus radiodurans SSB induced by ionizing radiation. J Biol Chem. 2009;284:21402–21411. doi: 10.1074/jbc.M109.010454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Bochkarev A, Bochkareva E, Frappier L, Edwards AM. The crystal structure of the complex of replication protein A subunits RPA32 and RPA14 reveals a mechanism for single-stranded DNA binding. EMBO J. 1999;18:4498–4504. doi: 10.1093/emboj/18.16.4498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bochkarev A, Pfuetzner RA, Edwards AM, Frappier L. Structure of the single-stranded-DNA-binding domain of replication protein A bound to DNA. Nature. 1997;385:176–181. doi: 10.1038/385176a0. [DOI] [PubMed] [Google Scholar]
  • 17.Richard DJ, et al. Single-stranded DNA-binding protein hSSB1 is critical for genomic stability. Nature. 2008;453:677–681. doi: 10.1038/nature06883. [DOI] [PubMed] [Google Scholar]
  • 18.White MF. Archaeal DNA repair: Paradigms and puzzles. Biochem Soc Trans. 2003;31:690–693. doi: 10.1042/bst0310690.. [DOI] [PubMed] [Google Scholar]
  • 19.Kelly TJ, Simancek P, Brush GS. Identification and characterization of a single-stranded DNA-binding protein from the archaeon Methanococcus jannaschii. Proc Natl Acad Sci USA. 1998;95:14634–14639. doi: 10.1073/pnas.95.25.14634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chédin F, Seitz EM, Kowalczykowski SC. Novel homologs of replication protein A in archaea: Implications for the evolution of ssDNA-binding proteins. Trends Biochem Sci. 1998;23:273–277. doi: 10.1016/s0968-0004(98)01243-2. [DOI] [PubMed] [Google Scholar]
  • 21.Komori K, Ishino Y. Replication protein A in Pyrococcus furiosus is involved in homologous DNA recombination. J Biol Chem. 2001;276:25654–25660. doi: 10.1074/jbc.M102423200. [DOI] [PubMed] [Google Scholar]
  • 22.Kelman Z, Pietrokovski S, Hurwitz J. Isolation and characterization of a split B-type DNA polymerase from the archaeon Methanobacterium thermoautotrophicum deltaH. J Biol Chem. 1999;274:28751–28761. doi: 10.1074/jbc.274.40.28751. [DOI] [PubMed] [Google Scholar]
  • 23.Wadsworth RI, White MF. Identification and properties of the crenarchaeal single-stranded DNA binding protein from Sulfolobus solfataricus. Nucleic Acids Res. 2001;29:914–920. doi: 10.1093/nar/29.4.914. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kerr ID, et al. Insights into ssDNA recognition by the OB fold from a structural and thermodynamic study of Sulfolobus SSB protein. EMBO J. 2003;22:2561–2570. doi: 10.1093/emboj/cdg272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Luo X, et al. CC1, a novel crenarchaeal DNA binding protein. J Bacteriol. 2007;189:403–409. doi: 10.1128/JB.01246-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Seroussi E, Lavi S. Replication protein A is the major single-stranded DNA binding protein detected in mammalian cell extracts by gel retardation assays and UV cross-linking of long and short single-stranded DNA molecules. J Biol Chem. 1993;268:7147–7154. [PubMed] [Google Scholar]
  • 27.Soding J, Biegert A, Lupas AN. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005;33(Web Server issue):W244–248. doi: 10.1093/nar/gki408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Makarova KS, Sorokin AV, Novichkov PS, Wolf YI, Koonin EV. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea. Biol Direct. 2007;2:33. doi: 10.1186/1745-6150-2-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kernchen U, Lipps G. Thermodynamic analysis of the single-stranded DNA binding activity of the archaeal replication protein A (RPA) from Sulfolobus solfataricus. Biochemistry. 2006;45:594–603. doi: 10.1021/bi051414d. [DOI] [PubMed] [Google Scholar]
  • 30.Wernimont A, Edwards A. In situ proteolysis to generate crystals for structure determination: An update. PLoS ONE. 2009;4:e5094. doi: 10.1371/journal.pone.0005094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kumarevel T, et al. Crystal structure of activated HutP; an RNA binding protein that regulates transcription of the hut operon in Bacillus subtilis. Structure. 2004;12:1269–1280. doi: 10.1016/j.str.2004.05.005. [DOI] [PubMed] [Google Scholar]
  • 32.Nissen P, Hansen J, Ban N, Moore PB, Steitz TA. The structural basis of ribosome activity in peptide bond synthesis. Science. 2000;289:920–930. doi: 10.1126/science.289.5481.920. [DOI] [PubMed] [Google Scholar]
  • 33.Marchler-Bauer A, et al. CDD: A Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 2011;39(Database issue):D225–D229. doi: 10.1093/nar/gkq1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Koonin EV, Mushegian AR, Bork P. Non-orthologous gene displacement. Trends Genet. 1996;12:334–336. [PubMed] [Google Scholar]
  • 35.Leipe DD, Aravind L, Koonin EV. Did DNA replication evolve twice independently? Nucleic Acids Res. 1999;27:3389–3401. doi: 10.1093/nar/27.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Siebers B, et al. The complete genome sequence of Thermoproteus tenax: A physiologically versatile member of the Crenarchaeota. PLoS ONE. 2011;6:e24222. doi: 10.1371/journal.pone.0024222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Liu H, Naismith JH. An efficient one-step site-directed deletion, insertion, single and multiple-site plasmid mutagenesis protocol. BMC Biotechnol. 2008;8:91. doi: 10.1186/1472-6750-8-91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Oke M, et al. The Scottish Structural Proteomics Facility: Targets, methods and outputs. J Struct Funct Genomics. 2010;11:167–180. doi: 10.1007/s10969-010-9090-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Reid SL, Parry D, Liu HH, Connolly BA. Binding and recognition of GATATC target sequences by the EcoRV restriction endonuclease: A study using fluorescent oligonucleotides and fluorescence polarization. Biochemistry. 2001;40:2484–2494. doi: 10.1021/bi001956p. [DOI] [PubMed] [Google Scholar]
  • 40.Otwinowski Z, Minor W. In: Processing of X-ray Diffraction Data Collected in Oscillation Mode. Macromolecular Crystallography, Part A, Methods in Enzymology. Carter CWJ, Sweet RM, editors. Vol. 276. New York: Academic; 1997. pp. 307–326. [DOI] [PubMed] [Google Scholar]
  • 41.Sheldrick GM. A short history of SHELX. Acta Crystallogr A. 2008;64:112–122. doi: 10.1107/S0108767307043930. [DOI] [PubMed] [Google Scholar]
  • 42.Bailey S, Collaborative Computational Project, Number 4 The CCP4 suite: Programs for protein crystallography. Acta Crystallogr D Biol Crystallogr. 1994;50:760–763. doi: 10.1107/S0907444994003112. [DOI] [PubMed] [Google Scholar]
  • 43.McCoy AJ, et al. Phaser crystallographic software. J Appl Cryst. 2007;40:658–674. doi: 10.1107/S0021889807021206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Murshudov GN, Vagin AA, Dodson EJ. Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr D Biol Crystallogr. 1997;53:240–255. doi: 10.1107/S0907444996012255. [DOI] [PubMed] [Google Scholar]
  • 45.Winn MD, Isupov MN, Murshudov GN. Use of TLS parameters to model anisotropic displacements in macromolecular refinement. Acta Crystallogr D Biol Crystallogr. 2001;57:122–133. doi: 10.1107/s0907444900014736. [DOI] [PubMed] [Google Scholar]
  • 46.Emsley P, Cowtan K. Coot: Model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr. 2004;60:2126–2132. doi: 10.1107/S0907444904019158. [DOI] [PubMed] [Google Scholar]
  • 47.Davis IW, et al. MolProbity: All-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res. 2007;35(Web Server issue):W375–383. doi: 10.1093/nar/gkm216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Whitmore L, Wallace BA. DICHROWEB, an online server for protein secondary structure analyses from circular dichroism spectroscopic data. Nucleic Acids Res. 2004;32(Web Server issue):W668–673. doi: 10.1093/nar/gkh371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Andrews P. The gel-filtration behaviour of proteins related to their molecular weights over a wide range. Biochem J. 1965;96:595–606. doi: 10.1042/bj0960595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Altschul SF, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Marchler-Bauer A, et al. CDD: Specific functional annotation with the Conserved Domain Database. Nucleic Acids Res. 2009;37(Database issue):D205–D210. doi: 10.1093/nar/gkn845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Edgar RC. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ. JPred: A consensus secondary structure prediction server. Bioinformatics. 1998;14:892–893. doi: 10.1093/bioinformatics/14.10.892. [DOI] [PubMed] [Google Scholar]
Proc Natl Acad Sci U S A. 2012 Feb 14;109(7):2198–2199.

Author Summary

AUTHOR SUMMARY

Proteins are the major structural and operational components of cells. Even the simplest organisms possess hundreds of different proteins, and more complex organisms typically have many thousands. Because all living beings, from microbes to humans, are related by evolution, they share a core set of proteins in common. Proteins perform fundamental roles in key metabolic processes and in the processing of information from DNA via RNA to proteins. A notable example is the ssDNA-binding protein, SSB, which is essential for DNA replication and repair and is widely considered to be one of the few core universal proteins shared by all life forms. Here we demonstrate that one branch of the tree of life has lost this “ubiquitous” protein and replaced it with another, unrelated one. This finding has important implications for our understanding of the plasticity of evolution.

Rapid advances in genome-sequencing technology over the past 15 y yielded a wealth of new information on many divergent parts of the tree of life that can be mined for information to illuminate all aspects of biology. The tree of life consists of three fundamental divisions known as “domains”: Eukarya (organisms with a nucleus where DNA is stored, including plants, fungi, animals) and the prokaryotic Bacteria and Archaea, which lack a nucleus (1). One of the most highly conserved proteins found in all three domains is the SSB protein, which binds and protects ssDNA during replication and repair of damage to the genome. It is an essential protein that is thought to have been present in the last common ancestor of all extant life (2). The defining feature of all known SSBs is the oligonucleotide-binding (OB) fold shown in Fig. P1. Recently we noted that one group of archaeal species, the Thermoproteales, lack a detectable gene for the SSB protein in their genomes (3).

Fig. P1.

Fig. P1.

A schematic representation of the tree of life with the domains Eukarya, Bacteria, and Archaea indicated. The archaeal domain is expanded to highlight the different subdivisions and to highlight the Thermoproteales, which are boxed. All forms of life use the canonical ssDNA-binding protein based on the OB-fold, which is represented in cyan, with the exception of the Thermoproteales, which use the completely unrelated protein ThermoDBP for ssDNA binding.

Because a functional SSB is likely to be essential for any organism, we reasoned that the Thermoproteales might have lost the canonical SSB gene and replaced it with another, unrelated gene. To test this hypothesis, we undertook a two-pronged approach comprising a combination of bioinformatics and biochemistry. The bioinformatic analysis involved a search for any genes that were common to the 10 Thermoproteales species lacking the canonical SSB and that were not found in any other genome. Remarkably, only a single gene fits these criteria, ttx1576. This observation is clearly compatible with the possibility that the Ttx1576 protein compensates functionally for the missing SSB protein in Thermoproteales. The biochemical route involved direct purification and identification of proteins that could bind to ssDNA in one of the Thermoproteales, Thermoproteus tenax. This approach resulted in the identification of the product of the gene ttx1576 as a candidate for the missing SSB.

We proceeded to characterize the properties of the Ttx1576 protein, (which we renamed “ThermoDBP” for Thermoproteales DNA-binding protein), showing that it has all the biochemical properties consistent with a role as a functional SSB, including a clear preference for ssDNA binding and low sequence specificity. Using crystallographic analysis, we solved the structure of the DNA-binding domain of ThermoDBP, revealing a protein fold with a prominent cleft punctuated with aromatic amino acid residues and lined by positively charged residues. The structure of ThermoDBP immediately suggested a mechanism for the binding of ssDNA along the cleft that is reminiscent of the binding clefts of canonical SSB proteins but is unrelated to them in sequence and detailed structure. The two ssDNA-binding domains are linked by a C-terminal helical coiled-coil domain that allows ThermoDBP to dimerize (Fig. P1).

In conclusion, we used biochemistry and bioinformatics to demonstrate the displacement of an essential, “universal” protein by a completely unrelated one in one branch of the tree of life. The structure of ThermoDBP reveals a unique solution to the problem of ssDNA binding. This result suggests that even the most fundamental, ubiquitous proteins can be replaced during evolution.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The atomic coordinates and structure factors have been deposited in the Protein Data Bank database, www.pdb.org (PDB ID code 3TEK).

See full research article on page E398 of www.pnas.org.

Cite this Author Summary as: PNAS 10.1073/pnas.1113277108.

References

  • 1.Woese CR, Kandler O, Wheelis ML. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci USA. 1990;87:4576–4579. doi: 10.1073/pnas.87.12.4576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Mushegian AR, Koonin EV. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci USA. 1996;93:10268–10273. doi: 10.1073/pnas.93.19.10268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Luo X, et al. CC1, a novel crenarchaeal DNA binding protein. J Bacteriol. 2007;189:403–409. doi: 10.1128/JB.01246-06. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES