Abstract
H-NS and Lsr2 are nucleoid-associated proteins from Gram-negative bacteria and Mycobacteria, respectively, that play an important role in the silencing of horizontally acquired foreign DNA that is more AT-rich than the resident genome. Despite the fact that Lsr2 and H-NS proteins are dissimilar in sequence and structure, they serve apparently similar functions and can functionally complement one another. The mechanism by which these xenogeneic silencers selectively target AT-rich DNA has been enigmatic. We performed high-resolution protein binding microarray analysis to simultaneously assess the binding preference of H-NS and Lsr2 for all possible 8-base sequences. Concurrently, we performed a detailed structure-function relationship analysis of their C-terminal DNA binding domains by NMR. Unexpectedly, we found that H-NS and Lsr2 use a common DNA binding mechanism where a short loop containing a “Q/RGR” motif selectively interacts with the DNA minor groove, where the highest affinity is for AT-rich sequences that lack A-tracts. Mutations of the Q/RGR motif abolished DNA binding activity. Netropsin, a DNA minor groove-binding molecule effectively outcompeted H-NS and Lsr2 for binding to AT-rich sequences. These results provide a unified molecular mechanism to explain findings related to xenogeneic silencing proteins, including their lack of apparent sequence specificity but preference for AT-rich sequences. Our findings also suggest that structural information contained within the DNA minor groove is deciphered by xenogeneic silencing proteins to distinguish genetic material that is self from nonself.
The H-NS protein is one of the most intensively studied members of the bacterial nucleoid-associated proteins (1). Initially discovered under conditions for isolating eukaryotic histones (2), H-NS is thought to play a role in nucleoid structure (3). The recent identification of in vivo H-NS binding sites by genome-wide studies revealed that H-NS also plays a major role in selectively silencing expression from sequences with significantly higher AT-content than the core genome (4–8). AT-rich genomic islands are usually obtained via horizontal gene transfer and are associated with adaptive stress responses and virulence (9). The silencing of such sequences by H-NS, termed xenogeneic silencing, is thought to allow bacteria to safely acquire new genetic material without compromising their genomic and regulatory integrity (10). As a result, H-NS serves as a central regulator of virulence gene expression in the enteric bacteria, including Yersinia, Escherichia coli, and Salmonella (4, 5, 11–13).
Clearly identifiable H-NS homologs are found only in a subset of proteobacteria (14). However, proteins lacking sequence similarity to H-NS may function as xenogeneic silencers in other bacteria. The Pseudomonas MvaT/MvaU can complement phenotypes of E. coli hns mutants and selectively bind AT-rich sequences in a manner similar to H-NS (15, 16). The Rok protein of Bacillus subtilis was also found to be a functional analog of H-NS (17). Lsr2 of Mycobacterium tuberculosis (Mtb) can complement hns phenotypes in E. coli (18) and specifically binds and silences AT-rich regions of the mycobacterial genome (19–21). The functional equivalence of H-NS and Lsr2 appears to be a result of convergent evolution, as these proteins do not share sequence homology and their DNA binding domains exhibit different tertiary structure (21, 22).
An important feature that is critical for H-NS function is its ability to multimerize to form higher-order nucleoprotein complexes (23, 24), a property that is also important for the function of the MvaT/MvaU proteins (25). The 80 N-terminal amino acids of H-NS contain two distinct dimerization domains that form extended higher-order structures via head-to-head/tail-to-tail ineractions (1, 26), and are joined through a flexible linker to the C-terminal DNA-binding domain (residues 91–137) (26, 27).
The mechanism by which H-NS and analogous proteins selectively bind AT-rich DNA remains unclear. Although early biochemical studies suggested that a major binding determinant is intrinsically curved DNA caused by repeated A-tracts (28, 29), recent genome-wide binding studies revealed that H-NS binding correlates much more strongly with the degree of AT-content (4). Footprinting and ChIP studies indicate that H-NS binds DNA with little sequence specificity. However, a high-affinity binding sequence present in some promoters was recently identified (30), which may serve as a nucleation site for subsequent polymerization on lower affinity AT-rich sites. The critical feature for the specific H-NS high-affinity sequence examined is the presence of a T-A step that may cause a structural anomaly (31), but this mode of binding does not explain how H-NS binds the majority of sites mapped in vivo and by footprinting of a number of promoters in vitro.
In this study we sought to explore the mechanism by which the H-NS and Lsr2 proteins selectively target AT-rich sequences.
Results
H-NS and Lsr2 Exhibit Similar DNA Binding Specificity.
To characterize the DNA binding specificity of H-NS and Lsr2, we used protein binding microarray (PBM), whereby each protein was individually applied to microarray slides containing double-stranded oligonucleotide target sequences to simultaneously assess their affinity for various sequences in an unbiased manner. H-NS and Lsr2 were produced as N-terminal GST-tagged proteins by T7-driven in vitro transcription and translation and applied to arrays containing 41,944 features that were designed such that all nonpalindromic 8-mers are represented 32 times (16 times for palindromic 8-mers) on each array, so that the assay provides a robust estimate of relative preference to each 8-mer (32). The combined data (average) from two independent array experiments using arrays of different design were used to analyze the relative binding preferences.
For each 8-mer, we quantified the relative binding preference by two measures, the Z-score and E-score, which capture essentially the same information (relative binding preference of a protein for a given 8-base sequence) (32, 33). The Z-score is calculated from the average signal intensity across the 32 spots containing each 8-mer on each microarray and scales almost linearly with binding affinity (32, 33). The E-score is a nonparametric statistical measure that essentially reflects the relative ranking of the signal intensities of the 32 probes that contain each 8-mer, relative to the remaining ∼41,000 probes (32). The E-score ranges from +0.5 (most favored) to −0.5 (most disfavored), and on the basis of random permutations of the array data, there should be no random 8-mer sequence that achieves an E-score above 0.45 (34).
PBM experiments with both H-NS and Lsr2 were successful, in that we obtained multiple 8-mers with E-scores above 0.45 for each protein with the highest score around 0.49, indicating clear binding preferences (Dataset S1). Moreover, the most highly preferred sequences had an obvious relationship to each other (see below). There is a striking correlation between the PBM data for H-NS and Lsr2, not only for the most preferred sequences, but also for moderately and less-preferred sequences. This finding is evident from the scatter plot using the Z-score (R2 = 0.859) (Fig. 1A) or E-score (R2 = 0.809) (Fig. S1A) of all 8-mers. Therefore, H-NS and Lsr2 bind DNA with essentially the same sequence preference.
H-NS and Lsr2 Bind Contiguous AT Sequences.
Previous ChIP-chip analysis suggested that H-NS and Lsr2 exhibit preferential binding for AT-rich sequences in the genome (4, 5, 21). This finding is confirmed by our PBM data. A positive correlation between the Z-score (Fig. 1B) or E-score (Fig. S1B) and the AT-content of 8-mers was observed. The data for H-NS and Lsr2 are nearly identical and essentially overlap.
For each group of 8-mers with identical AT-content, the position of G or C within the sequence affects binding. For 8-mers that contain a single G or C residue (87.5% AT), sequences containing G or C at the central position show markedly lower preference by H-NS and Lsr2 than those where the G or C is located at the periphery positions (Fig. 1C). Extending the analysis to 8-mers that are 75% AT (two Gs or Cs) revealed that there is a positive correlation between the binding preference and the maximum length of contiguous AT sequence (no G or C) (Fig. 1D), indicating that H-NS and Lsr2 bind contiguous AT sequences.
DNA Minor Groove Width Determines the Binding Preference of H-NS and Lsr2.
To identify DNA sequence features other than AT-content that influence H-NS and Lsr2 binding, we next focused on the 136 8-mer sequences that are 100% AT. H-NS and Lsr2 exhibit a clear preference for these sequences but they are not equally favored; the Z-score and E-score for the 100% AT 8-mers range from 2.4 to 10.6 and 0.28 to 0.49, respectively, for both H-NS and Lsr2. Strikingly, many of the less-preferred 100%-AT 8-mers contain A-tracts, which are defined as stretches of four or more As or Ts (An, Tn, n ≥ 4, or AmTn, m + n ≥ 4, on the same strand) that do not contain T-A steps (reviewed in ref. 35) (Dataset S1). The average Z-score for the 8-mers that do not contain A-tracts are significantly higher than that of 8-mers containing A-tracts (Fig. 2A), suggesting that A-tracts have a negative impact on H-NS and Lsr2 binding for these DNA oligomers. This finding is consistent with previous observation that a T-A step is present in some H-NS high affinity sequences (31) because T-A steps and A-tracts are exclusive to each other. Notably, the effect of A-tracts on H-NS and Lsr2 binding is dependent on the overall AT-content of the sequence. For 8-mers containing 50% to 75% AT, the presence of A-tracts positively influences H-NS and Lsr2 binding, but for 8-mers that contain 87.5% AT, A-tracts do not have a significant impact on binding (Fig. 2A).
The simplest explanation for the PBM data are that H-NS and Lsr2 exhibit optimal binding to DNA with appropriate minor groove width. A-tracts have narrower minor grooves than other DNA sequences including AT-rich sequences containing T-A steps (35), and AT-rich sequences have narrower minor grooves than GC-rich sequences (36, 37). For relatively GC-rich sequences (e.g., 50–75% AT), the occurrence of A-tracts may result in increased binding for H-NS and Lsr2 because it serves to narrow minor groove width. The positive impact of A-tracts on relatively GC-rich sequences (e.g., 50% AT) is not simply a result of the contiguousness of the AT sequence in A-tracts, because such sequences containing A-tracts are still more favorable binding sites than similar sequences containing contiguous AT-rich sequences that lack A-tracts and contain T-A steps (Fig. 2B). DNA sequences with higher AT-content have narrower minor grooves (36, 37) and the narrowing effect of A-tracts becomes insignificant upon reaching 87.5% AT. For sequences that are 100% AT, the minor groove may reach optimal geometry for binding, and the presence of A-tracts, which further narrows the minor groove, becomes unfavorable.
Solution Structure of the DNA-Binding Domain of Salmonella H-NS and Burkholderia Bv3F.
Lsr2 was modeled to bind the minor groove of AT-rich DNA through an AT-hook like motif with an “Arg-Gly-Arg” sequence (21). Although H-NS does not exhibit sequence homology to Lsr2, the C-terminal domain of H-NS family proteins contain an “XGR” motif imbedded in the conserved TWTXGRXP sequence that has been implicated in DNA binding (1, 14) (Fig. 3E). In H-NS, the “X” residue of the XGR motif is Gln, whereas in some other H-NS homologs, such as the Bv3F protein (Bcep1808_6219) of Burkholderia vietnamiensis, the “X” residue is Arg. We postulated that the H-NS family proteins may use the “QGR” or “RGR” motif to bind DNA.
Although the solution structure of the DNA-binding domain of H-NS from E. coli was solved in 1995 (22), it is of relatively low quality because only 257 NOE restraints were used in defining the conformation, and the overall atomic rmsd between the individual structure and the mean structure is 1.52 ± 0.29 Å for the backbone heavy atoms of the secondary structure regions. In that structure, the loop containing the conserved sequence is highly disordered and poorly defined (22).
We redetermined the solution structure of the C-terminal DNA binding domain (H-NSCtd, residues 91–137) of the Salmonella H-NS. The H-NS structure was calculated with 1,713 NOE restraints, and rmsd of the backbone heavy atoms for the secondary structure regions is 0.36 Å (Fig. S2 and Table S1). The DNA binding domain of the Salmonella H-NS consists of a two-stranded antiparallel β-sheet (β1, residues 97–100; β2, 105–109), one α-helix (residues 117–126), and one 310 helix (residues 130–133) (Fig. 3 A and B). The overall architecture of the DNA binding domain of Salmonella H-NS is similar to that of E. coli H-NS. Notably, the conserved motif (residues 110–117) forms a well-defined loop in our structure.
We also determined the solution structure of the C-terminal domain of Bv3F (Bv3FCtd, residues 71–112) (Fig. 3 C and D). The domain consists of one small antiparallel β-sheet (β1, residues 77–78; β2, 85–86) and two short 310 helices (residues 94–96; 101–103), with flexible N- and C-terminal tails (residues 71–74; 106–112).
The loops containing the “Q/RGR” motif in H-NSCtd and Bv3FCtd adopt almost identical conformation, which is also quite similar to the RGR motif in Lsr2Ctd (21) (Fig. 3F), even though the overall tertiary structure of Lsr2Ctd is quite different from the other two. This result suggests that these three proteins likely bind DNA by a similar mechanism and that the loop structure is critical for the observed similarities in target specificity.
H-NS Binds DNA Minor Groove Through an AT-Hook–Like Loop.
To test the above hypothesis, we examined the interaction of H-NSCtd and Bv3FCtd with DNA by NMR. We chose a DNA duplex with the sequence of CGCATATATGCG for this experiment because it contains the “ATATAT” sequence that is present in several of the highest Z-score 8-mers, including the top-scoring 8-mer identified by the PBM experiments (Fig. 1A). In addition, its X-ray structure has been determined (38), which facilitated our modeling of the H-NS/DNA complex (see below). Comparison of 2D 1H-15N HSQC spectra of H-NSCtd, free or in complex with DNA, revealed that the most pronounced change for NH signals occurred at Gly113-Ala117 (Fig. S3 A and C). Their signals gradually disappeared as DNA concentration increased, presumably because of intermediate exchange on NMR time scale (Pro116 has no amide proton and thus no NH signal). Significant NH chemical-shift changes (Δδcomb > 0.15 ppm) also occurred at Gln112, as well as Arg93, Ala95, Lys96, Glu102, and Lys121. The strong chemical shifts within residues of loop 2 suggests that the “QGRTPA” sequence, which connects the β2 strand and the α-helix (Fig. 3B), constitutes the major DNA-binding motif.
Similar DNA titration experiments were performed with Bv3FCtd. As in H-NSCtd, residues most affected by DNA binding are primarily localized in the “RGRQPAW” sequence of loop 2 connecting the β2 strand and the first 310 helix. The NH signals of Arg89, Arg91, and Gln92 disappeared with increasing concentrations of DNA, and Gly90, Ala94, and Trp95 exhibited significant NH chemical shift changes (Δδcomb > 0.2 ppm) upon DNA binding (Fig. S4 A and C).
We also mapped the residues of DNA that interact with H-NSCtd and Bv3FCtd by comparing 2D 1H NOSEY spectra of the DNA, free and in complex with the protein. For DNA in complex with H-NSCtd, significant intraresidue H1′-H6/H8 NOE peak shift (H1′ or H6/H8 Δδ> 0.025 ppm) occurred at the central A4T5A6T7A8T9 residues (Fig. S3 B and D). Similarly, Bv3FCtd binds to A6T7A8T9G10 of the sequence CGCATATATGCG (Fig. S4 B and D).
Based on the mapped binding interfaces and the structures of protein (H-NSCtd or Bv3FCtd) and DNA (38), docking models for H-NSCtd/DNA and Bv3FCtd/DNA complexes were calculated using HADDOCK 2.0 (39). For H-NSCtd, the loop region consisting of Gln112-Gly113-Arg114 is inserted into the minor groove of the DNA duplex. The side chains of Gln112 and Arg114 are oriented parallel to the minor groove pointing away from each other and occupying a region covering about 5 base pairs (Fig. 4A), which resembles the central Arg-Gly-Arg core conformation of the DNA binding AT-hook motif (i.e., Pro-Arg-Gly-Arg-Pro, flanked by positively charged residues) of mammalian nonhistone chromatin protein HMG-I (40). Bv3FCtd binds the DNA minor groove in a similar fashion. The Arg89-Gly90-Arg91 motif of the loop 2 inserts into the minor groove of DNA, and side chains of the two Arg residues are in a conformation that resembles that of the AT-hook motif (Fig. 4B) (40). Taken together, our results indicate that H-NS and Bv3F use a DNA-binding mechanism similar to that of Lsr2 (i.e., they bind the DNA minor groove through an AT-hook–like motif).
Q/RGR Motif Is Essential for H-NS Binding to DNA.
The docking models predict that the conserved Q/RGR motif is the primary site interacting with DNA. Specifically, side chains of the first and last residues (Q/R and R) directly contact the DNA minor groove. To confirm this, we performed site-directed mutagenesis to replace the Gln/Arg and Arg residues with Ala and examined the interaction of mutant proteins with the CGCATATATGCG duplex by NMR. For all three proteins, the mutations did not change their gross structure, evidenced by the minimal perturbation of 2D 1H-15N HSQC spectra. For H-NSCtd, the Q112A/R114A double mutations essentially abolished DNA binding; the NH signals of most residues of the Q112A/R114A mutant were not affected by the addition of DNA, with the exception of minor changes affecting a few residues of the N and C termini (Fig. S5A), which may be caused by weak and nonspecific binding. The single Q112A or R114A mutations each reduced DNA binding, but the R114A mutation caused a much more pronounced effect than Q112A (Fig. S6A and Table S2), suggesting a more important role for R114 in DNA binding. Residue E102 is not directly involved in DNA binding because the spectra of E102A mutant and WT proteins in the presence of DNA were nearly identical, except for the few residues whose NH signals were altered by the E102A mutation (Fig. S6 C and D and Table S2). The signal change associated with this residue (Fig. S3A) may be caused by secondary effects following DNA binding.
The 2D 1H-15N HSQC spectra of the R89A/R91A mutant of Bv3FCtd in the presence and absence of DNA substrate were nearly identical (Fig. S5B), demonstrating that the double mutations completely abolish the DNA binding activity of Bv3FCtd. Similarly, for the R97A/R99A mutant of Lsr2Ctd, addition of DNA did not change the NH signal of most residues (Fig. S5C). The two Arg residues in Lsr2Ctd appear to play equivalent roles; single R97A or R99A mutations each reduced DNA binding activity to a similar extent (Fig. S6B and Table S2).
These results demonstrate that the Q/RGR motif in all three proteins studied is critical for DNA binding.
DNA Minor Groove Binding Reagents Compete for the Binding of H-NS to DNA.
To confirm that H-NS binds the minor groove of AT-rich DNA, we performed competition experiments using netropsin, a naturally occurring polypyrrolecarboxamide that binds to the minor groove of AT-rich DNA (41). Addition of netropsin to the preformed H-NSCtd/DNA complex released H-NSCtd from the DNA; the NH signals of H-NSCtd residues in complex with DNA shifted back toward that of free H-NSCtd, and this effect is concentration-dependent (Fig. S5D). At a netropsin/DNA ratio of 2.5:1, the 2D 1H-15N HSQC spectrum is essentially identical to that of free H-NSCtd, indicating that netropsin completely displaces H-NSCtd from DNA (Fig. S5D). Similar results were found for Bv3FCtd and Lsr2Ctd (Fig. S5 E and F). These data provide strong support for our conclusion that H-NS, Bv3F, and Lsr2 bind to the minor groove of DNA.
Confirmation of the DNA-Binding Preference of H-NS by NMR.
Our PBM analysis indicates that H-NS and Lsr2 exhibit optimal binding activity for contiguous AT sequences that do not contain A-tracts. To substantiate this finding, we analyzed the binding of H-NSCtd, Lsr2Ctd, and Bv3FCtd to several DNA substrates by NMR. Four DNA substrates were tested, including the CGCATATATGCG duplex, which was used throughout the experiments described above, a CGCATGCATGCG duplex in which a “GC” was introduced at the center of the sequence, an A-tract containing sequence CGCAAAAAAGCG/CGCTTTTTTGCG, and a CGCGCGCGCGCG duplex that is 100% GC. These DNA substrates caused similar NH chemical-shift change patterns in each protein and the extent of changes reflects the binding affinity. Consistent with our finding from the PBM experiments, NH signals of H-NSCtd residues were most affected by the binding of the CGCATATATGCG DNA, followed by CGCAAAAAAGCG/CGCTTTTTTGCG, CGCATGCATGCG, and CGCGCGCGCGCG (Fig. S7A). The binding affinities (Kd) for each DNA substrate estimated by using chemical-shift changes of different residues are in general agreement (Table S3), and the Kd for AT-rich sequences (∼9 μM) is similar to previously reported data (27). For Lsr2Ctd and Bv3FCtd, similar results were obtained except that the A-tract containing DNA appears to be equivalent to the AT-rich sequence in binding affinity (Fig. S7 and Table S3). Lsr2Ctd and Bv3FCtd appear to have higher binding affinity than H-NSCtd (Table S3).
Correlation with in Vivo Binding Sites.
Finally, we asked whether preferred sequences identified by PBMs in vitro reflect genome binding sites of H-NS and Lsr2 in vivo. The high-affinity binding sequence (TCGATATATT) of H-NS identified in the E. coli proU operon (30) contains an 8-mer (GATATATT). Its complement strand sequence (AATATATC) is the 19th ranked 8-mer of the H-NS binding identified by PBM, with an E-score of 0.467 (Dataset S1). We scored the occurrences of each of the preferred 8-mers in all bound fragments identified by ChIP-seq or ChIP-chip analysis (8, 21), and compared with that in randomly selected fragments. High-scoring 8-mers with E-score ≥ 0.45 were used for H-NS analysis. Because of the high GC-content of Mtb genome, we used 8-mers with E-score ≥ 0.40 for Lsr2 analysis. In both cases, we observed a significant enrichment for the preferred sequences in the neighborhood of bound fragments, with a peak close to the center (Fig. S8 A and B). Therefore, a large proportion of in vivo binding events apparently involves sequence preferences that can be derived from in vitro experiments.
Salmonella has a genome-wide AT-content of 47.8%, whereas the mean AT-content of Mtb is 34.4%. We reanalyzed the ChIP-chip data by plotting the fraction of bound sequence (i.e., sequence with binding ratio ≥2) at each AT-content. The fraction of the bound sequence begins to increase when the AT-content reaches ∼38% for Lsr2 and ∼50% for H-NS (Fig. S8C), which correspond to the mean AT-content of the corresponding genome, respectively. Normalization of the data against the mean AT-content of the respective genome shows that H-NS and Lsr2 exhibit nearly identical binding when the AT-content of the bound sequence is compared relative to the AT-content of the whole genome (Fig. S8D). It remains a question how each silencer targets sequences that are comparatively AT-rich with respect to their corresponding genomes. For example, H-NS avoids sequences with an AT-content of 50%, whereas Lsr2 targets such sequences as foreign (Fig. S8C). We speculate that differences between the RGR and QGR motifs in binding affinity for less AT-rich sequences may provide an explanation. Thus, the RGR motif, used primarily in silencers (Lsr2 and Bv3F) from bacteria with low AT-content (∼34%) genomes (Mycobacterium and Burkholderia), may enable tighter binding to sequences of lower AT-content, whereas the QGR motif, found in silencers (H-NS) from bacteria with higher AT-content (∼50%) genomes (Salmonella and Escherichia), may lower the affinity of these proteins for DNA that is only mildly AT-rich. Alternatively, however, the relative abundance of AT-rich sequences in these genomes and the competitive advantage of preferred sequences for recognition by silencers may suffice as explanation.
Discussion
We find that structural information contained within the DNA minor groove is deciphered by nonspecific DNA binding proteins to distinguish genetic material that is self from nonself. In this study, we have performed a comprehensive analysis of the DNA binding specificity and binding mechanism of both the Lsr2 and H-NS proteins. We present multiple lines of evidence that these proteins, unrelated in sequence and structure, use a common mechanism to interact directly with the minor groove of DNA, and propose that the specific geometry of the DNA minor groove largely dictates the degree of binding. Our conclusions provide a unified molecular mechanism to explain various findings related to xenogeneic silencing proteins, including their lack of apparent sequence specificity and their ability to bind even nonoptimal (i.e., GC-rich) sequences with affinities only an order of magnitude less than some preferred sites (30). It remains to be determined whether other H-NS analogs, such as the MvaT/MvaU of Pseudomonas (15, 16) and Rok of Bacillus (17), exploit a similar binding mechanism.
The PBM analysis provided a comprehensive and unbiased approach for analyzing sequence specificity at high resolution. Insights gained from the PBMs allowed us to generate predictive models regarding the specific structural features within AT-rich DNA that are preferentially targeted by H-NS and Lsr2. The PBM data suggest that the shape of DNA, specifically that of the DNA minor groove, is a primary determinant of H-NS and Lsr2 binding specificity. The presence of GC base pairs in the center of AT-rich 8-mer sequences is unfavorable for H-NS and Lsr2 binding, which is likely because of the less optimal electrostatic potential for binding to arginine residues (37). The presence of a 2-NH2 group on G that protrudes into the minor groove may also disturb the binding of H-NS and Lsr2 (40). The effect of A-tracts on the binding preference of H-NS and Lsr2 depends on the AT-content of the flanking sequences. A-tracts have a unique structure, which is distinct from that of B-DNA and is cooperatively formed whenever there are four or more adjacent As or Ts (35). In contrast to typical B-DNA, where bases are perpendicular to the helical axis and thus have a wider minor groove, the bases within the A-tracts are negatively inclined relative to the global helical axis, and are highly propeller twisted. This propeller-twisted conformation results in a highly narrow minor groove. Thus, H-NS and Lsr2 exhibit low binding affinity for DNA sequences with a minor groove width that is too wide (i.e., GC-rich sequences) or too narrow (i.e., A-tract sequences within the context of an AT-rich region), and exhibit the highest affinity for DNA sequences with an ideal minor groove width (i.e., mixed AT-rich sequences or a short A-tract within a GC-rich sequence). The binding of H-NS to the DNA minor groove is further supported by the demonstration that netropsin, a DNA minor groove-binding molecule (41), effectively outcompeted H-NS, Lsr2 and Bv3F for binding to an AT-rich sequence. An earlier study using distamycin, another minor groove-binding compound, showed similar results but was interpreted to be the result of disruption in DNA curvature (42). The DNA substrate whose binding to H-NS, Lsr2, and Bv3F was successfully competed with netropsin in our study, however, does not contain significant intrinsic curvature (38). We suspect that DNA curvature is not used for recognition by this class of proteins, even though recognition is for an AT-rich minor groove. This also appears to be the case for the C-terminal domain of the α-subunit of RNA polymerase, which also recognizes AT-rich DNA in the minor groove (43–45).
H-NS binds DNA in a cooperative manner, generating extended nucleoprotein filaments (26, 46, 47), but the relatively short PBM probes presumably detect binding of individual monomers, which, for H-NS, is relatively weak compared with the affinities of most sequence-specific DNA-binding proteins for most sites (30). However, for both H-NS and Lsr2, the preferred monomer-binding 8-mer sequences identified by PBM experiments in vitro are enriched at the center of the genomic fragments bound by the same protein in vivo. Therefore, the preferred monomer binding sequences are likely to be a component of the targeting mechanism, and may serve as initiation sites for nucleation of H-NS to form higher-order nucleoprotein structures.
Our structural analysis of H-NS, Lsr2, and Bv3F reveals that all three proteins bind the minor groove using a loop containing a Q/RGR motif. Although the interaction of H-NS with DNA has previously been studied by NMR analysis, these studies were limited in part by the use of a low-affinity DNA sequence as a substrate for binding (48) or the use of a high-affinity site of unknown structure (27). Here we circumvented these concerns by using a high-affinity DNA sequence with a previously determined structure (38). The high-quality structure of the H-NS DNA binding domain enabled the construction of a docking model that implicates the Q/RGR motif as the primary DNA interacting site. Accordingly, engineering mutations where the Q/RGR motif is replaced with “AGA” abolished the DNA binding activity of all three proteins. Other residues that may interact with DNA include Lys96 and Thr115 in H-NS, and Lys76, Gln92-Trp95 in Bv3F (Fig. 4), which also exhibited chemical shift changes upon DNA binding. Future structure-function study is required to confirm their role.
Compared with Gln, the positively charged Arg is more likely to bind negatively charged DNA. This finding may explain why Gln is less critical for DNA binding than Arg in the QGR motif of H-NS, and why Lsr2 and Bv3F have apparently higher affinity for less AT-rich sequences than H-NS. The second Arg residue of the Q/RGR motif is conserved in all three proteins, and site-directed mutagenesis analysis shows that it is critical for DNA binding. Our data agree with recent findings that several classes of transcription factors bind to the minor groove of AT-rich sequences by inserting Arg residues into the minor groove, presumably because of their ability to form favorable electrostatic interactions with the relatively narrow groove formed by these sequences (37, 49). These studies reveal that common mechanistic themes underlie DNA recognition in the absence of “typical” highly sequence-specific interactions that occur via contacts with bases in the major groove.
Experimental Procedures
A complete description of the materials and experimental procedures is included in SI Experimental Procedures. The design of all 10-mer universal PBMs and experiments were conducted and analyzed as previously described (32, 33). For NMR analysis, the C-terminal domains of H-NS, Bv3F, and Lsr2 were each prepared by partial trypsin digestion of the full-length respective proteins on Ni-NTA column, followed by gel-filtration purification.
Supplementary Material
Acknowledgments
We thank Ferric Fang for critical reading and helpful comments of the manuscript. This work was supported by Canadian Institutes of Health Research Grants MOP-15107 (to J.L.), MOP-77721 (to T.R.H.), and MOP-86683 (to W.W.N.), and by Grant 2009CB521703 from the 973 Program of China (to B.X.).
Footnotes
The authors declare no conflict of interest.
Data deposition: NMR, atomic coordinates, chemical shifts, and restraints have been deposited in the Protein Data Bank, www.pdb.org (PDB code 2l93 and PDB ID code 2l92).
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102544108/-/DCSupplemental.
References
- 1.Dorman CJ. H-NS: A universal regulator for a dynamic genome. Nat Rev Microbiol. 2004;2:391–400. doi: 10.1038/nrmicro883. [DOI] [PubMed] [Google Scholar]
- 2.Varshavsky AJ, Nedospasov SA, Bakayev VV, Bakayeva TG, Georgiev GP. Histone-like proteins in the purified Escherichia coli deoxyribonucleoprotein. Nucleic Acids Res. 1977;4:2725–2745. doi: 10.1093/nar/4.8.2725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Falconi M, Gualtieri MT, La Teana A, Losso MA, Pon CL. Proteins from the prokaryotic nucleoid: primary and quaternary structure of the 15-kD Escherichia coli DNA binding protein H-NS. Mol Microbiol. 1988;2:323–329. doi: 10.1111/j.1365-2958.1988.tb00035.x. [DOI] [PubMed] [Google Scholar]
- 4.Lucchini S, et al. H-NS mediates the silencing of laterally acquired genes in bacteria. PLoS Pathog. 2006;2:e81. doi: 10.1371/journal.ppat.0020081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Navarre WW, et al. Selective silencing of foreign DNA with low GC content by the H-NS protein in Salmonella. Science. 2006;313:236–238. doi: 10.1126/science.1128794. [DOI] [PubMed] [Google Scholar]
- 6.Grainger DC, Hurd D, Goldberg MD, Busby SJ. Association of nucleoid proteins with coding and non-coding segments of the Escherichia coli genome. Nucleic Acids Res. 2006;34:4642–4652. doi: 10.1093/nar/gkl542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Oshima T, Ishikawa S, Kurokawa K, Aiba H, Ogasawara N. Escherichia coli histone-like protein H-NS preferentially binds to horizontally acquired DNA in association with RNA polymerase. DNA Res. 2006;13:141–153. doi: 10.1093/dnares/dsl009. [DOI] [PubMed] [Google Scholar]
- 8.Kahramanoglou C, et al. Direct and indirect effects of H-NS and Fis on global gene expression control in Escherichia coli. Nucleic Acids Res. 2011;39:2073–2091. doi: 10.1093/nar/gkq934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ochman H, Lawrence JG, Groisman EA. Lateral gene transfer and the nature of bacterial innovation. Nature. 2000;405:299–304. doi: 10.1038/35012500. [DOI] [PubMed] [Google Scholar]
- 10.Navarre WW, McClelland M, Libby SJ, Fang FC. Silencing of xenogeneic DNA by H-NS-facilitation of lateral gene transfer in bacteria by a defense system that recognizes foreign DNA. Genes Dev. 2007;21:1456–1471. doi: 10.1101/gad.1543107. [DOI] [PubMed] [Google Scholar]
- 11.Cathelyn JS, Ellison DW, Hinchliffe SJ, Wren BW, Miller VL. The RovA regulons of Yersinia enterocolitica and Yersinia pestis are distinct: Evidence that many RovA-regulated genes were acquired more recently than the core genome. Mol Microbiol. 2007;66:189–205. doi: 10.1111/j.1365-2958.2007.05907.x. [DOI] [PubMed] [Google Scholar]
- 12.Müller CM, et al. Differential effects and interactions of endogenous and horizontally acquired H-NS-like proteins in pathogenic Escherichia coli. Mol Microbiol. 2010;75:280–293. doi: 10.1111/j.1365-2958.2009.06995.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Baños RC, Pons JI, Madrid C, Juárez A. A global modulatory role for the Yersinia enterocolitica H-NS protein. Microbiology. 2008;154:1281–1289. doi: 10.1099/mic.0.2007/015610-0. [DOI] [PubMed] [Google Scholar]
- 14.Tendeng C, Bertin PN. H-NS in Gram-negative bacteria: A family of multifaceted proteins. Trends Microbiol. 2003;11:511–518. doi: 10.1016/j.tim.2003.09.005. [DOI] [PubMed] [Google Scholar]
- 15.Castang S, McManus HR, Turner KH, Dove SL. H-NS family members function coordinately in an opportunistic pathogen. Proc Natl Acad Sci USA. 2008;105:18947–18952. doi: 10.1073/pnas.0808215105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Tendeng C, Soutourina OA, Danchin A, Bertin PN. MvaT proteins in Pseudomonas spp.: A novel class of H-NS-like proteins. Microbiology. 2003;149:3047–3050. doi: 10.1099/mic.0.C0125-0. [DOI] [PubMed] [Google Scholar]
- 17.Smits WK, Grossman AD. The transcriptional regulator Rok binds A+T-rich DNA and is involved in repression of a mobile genetic element in Bacillus subtilis. PLoS Genet. 2010;6:e1001207. doi: 10.1371/journal.pgen.1001207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gordon BR, Imperial R, Wang L, Navarre WW, Liu J. Lsr2 of Mycobacterium represents a novel class of H-NS-like proteins. J Bacteriol. 2008;190:7052–7059. doi: 10.1128/JB.00733-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chen JM, et al. Lsr2 of Mycobacterium tuberculosis is a DNA-bridging protein. Nucleic Acids Res. 2008;36:2123–2135. doi: 10.1093/nar/gkm1162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kocíncová D, et al. Spontaneous transposition of IS1096 or ISMsm3 leads to glycopeptidolipid overproduction and affects surface properties in Mycobacterium smegmatis. Tuberculosis (Edinb) 2008;88:390–398. doi: 10.1016/j.tube.2008.02.005. [DOI] [PubMed] [Google Scholar]
- 21.Gordon BR, et al. Lsr2 is a nucleoid-associated protein that targets AT-rich sequences and virulence genes in Mycobacterium tuberculosis. Proc Natl Acad Sci USA. 2010;107:5154–5159. doi: 10.1073/pnas.0913551107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shindo H, et al. Solution structure of the DNA binding domain of a nucleoid-associated protein, H-NS, from Escherichia coli. FEBS Lett. 1995;360:125–131. doi: 10.1016/0014-5793(95)00079-o. [DOI] [PubMed] [Google Scholar]
- 23.Badaut C, et al. The degree of oligomerization of the H-NS nucleoid structuring protein is related to specific binding to DNA. J Biol Chem. 2002;277:41657–41666. doi: 10.1074/jbc.M206037200. [DOI] [PubMed] [Google Scholar]
- 24.Stella S, Spurio R, Falconi M, Pon CL, Gualerzi CO. Nature and mechanism of the in vivo oligomerization of nucleoid protein H-NS. EMBO J. 2005;24:2896–2905. doi: 10.1038/sj.emboj.7600754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Castang S, Dove SL. High-order oligomerization is required for the function of the H-NS family member MvaT in Pseudomonas aeruginosa. Mol Microbiol. 2010;78:916–931. doi: 10.1111/j.1365-2958.2010.07378.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Arold ST, Leonard PG, Parkinson GN, Ladbury JE. H-NS forms a superhelical protein scaffold for DNA condensation. Proc Natl Acad Sci USA. 2010;107:15728–15732. doi: 10.1073/pnas.1006966107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Sette M, et al. Sequence-specific recognition of DNA by the C-terminal domain of nucleoid-associated protein H-NS. J Biol Chem. 2009;284:30453–30462. doi: 10.1074/jbc.M109.044313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Owen-Hughes TA, et al. The chromatin-associated protein H-NS interacts with curved DNA to influence DNA topology and gene expression. Cell. 1992;71:255–265. doi: 10.1016/0092-8674(92)90354-f. [DOI] [PubMed] [Google Scholar]
- 29.Tupper AE, et al. The chromatin-associated protein H-NS alters DNA topology in vitro. EMBO J. 1994;13:258–268. doi: 10.1002/j.1460-2075.1994.tb06256.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Bouffartigues E, Buckle M, Badaut C, Travers A, Rimsky S. H-NS cooperative binding to high-affinity sites in a regulatory element results in transcriptional silencing. Nat Struct Mol Biol. 2007;14:441–448. doi: 10.1038/nsmb1233. [DOI] [PubMed] [Google Scholar]
- 31.Lang B, et al. High-affinity DNA binding sites for H-NS provide a molecular basis for selective silencing within proteobacterial genomes. Nucleic Acids Res. 2007;35:6330–6337. doi: 10.1093/nar/gkm712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Berger MF, et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat Biotechnol. 2006;24:1429–1435. doi: 10.1038/nbt1246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Badis G, et al. Diversity and complexity in DNA recognition by transcription factors. Science. 2009;324:1720–1723. doi: 10.1126/science.1162327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Berger MF, et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell. 2008;133:1266–1276. doi: 10.1016/j.cell.2008.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Haran TE, Mohanty U. The unique structure of A-tracts and intrinsic DNA bending. Q Rev Biophys. 2009;42:41–81. doi: 10.1017/S0033583509004752. [DOI] [PubMed] [Google Scholar]
- 36.Stella S, Cascio D, Johnson RC. The shape of the DNA minor groove directs binding by the DNA-bending protein Fis. Genes Dev. 2010;24:814–826. doi: 10.1101/gad.1900610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Rohs R, et al. The role of DNA shape in protein-DNA recognition. Nature. 2009;461:1248–1253. doi: 10.1038/nature08473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Yoon C, Privé GG, Goodsell DS, Dickerson RE. Structure of an alternating-B DNA helix and its relationship to A-tract DNA. Proc Natl Acad Sci USA. 1988;85:6332–6336. doi: 10.1073/pnas.85.17.6332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Dominguez C, Boelens R, Bonvin AM. HADDOCK: A protein-protein docking approach based on biochemical or biophysical information. J Am Chem Soc. 2003;125:1731–1737. doi: 10.1021/ja026939x. [DOI] [PubMed] [Google Scholar]
- 40.Huth JR, et al. The solution structure of an HMG-I(Y)-DNA complex defines a new architectural minor groove binding motif. Nat Struct Biol. 1997;4:657–665. doi: 10.1038/nsb0897-657. [DOI] [PubMed] [Google Scholar]
- 41.Tabernero L, et al. Molecular structure of the A-tract DNA dodecamer d(CGCAAATTTGCG) complexed with the minor groove binding drug netropsin. Biochemistry. 1993;32:8403–8410. doi: 10.1021/bi00084a004. [DOI] [PubMed] [Google Scholar]
- 42.Yamada H, Muramatsu S, Mizuno T. An Escherichia coli protein that preferentially binds to sharply curved DNA. J Biochem. 1990;108:420–425. doi: 10.1093/oxfordjournals.jbchem.a123216. [DOI] [PubMed] [Google Scholar]
- 43.Aiyar SE, Gourse RL, Ross W. Upstream A-tracts increase bacterial promoter activity through interactions with the RNA polymerase alpha subunit. Proc Natl Acad Sci USA. 1998;95:14652–14657. doi: 10.1073/pnas.95.25.14652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Ross W, Ernst A, Gourse RL. Fine structure of E. coli RNA polymerase-promoter interactions: Alpha subunit binding to the UP element minor groove. Genes Dev. 2001;15:491–506. doi: 10.1101/gad.870001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Benoff B, et al. Structural basis of transcription activation: The CAP-alpha CTD-DNA complex. Science. 2002;297:1562–1566. doi: 10.1126/science.1076376. [DOI] [PubMed] [Google Scholar]
- 46.Donato GM, Lelivelt MJ, Kawula TH. Promoter-specific repression of fimB expression by the Escherichia coli nucleoid-associated protein H-NS. J Bacteriol. 1997;179:6618–6625. doi: 10.1128/jb.179.21.6618-6625.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Falconi M, Colonna B, Prosseda G, Micheli G, Gualerzi CO. Thermoregulation of Shigella and Escherichia coli EIEC pathogenicity. A temperature-dependent structural transition of DNA modulates accessibility of virF promoter to transcriptional repressor H-NS. EMBO J. 1998;17:7033–7043. doi: 10.1093/emboj/17.23.7033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Shindo H, et al. Identification of the DNA binding surface of H-NS protein from Escherichia coli by heteronuclear NMR spectroscopy. FEBS Lett. 1999;455:63–69. doi: 10.1016/s0014-5793(99)00862-5. [DOI] [PubMed] [Google Scholar]
- 49.Rohs R, et al. Origins of specificity in protein-DNA recognition. Annu Rev Biochem. 2010;79:233–269. doi: 10.1146/annurev-biochem-060408-091030. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.