Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2007 Aug 15;2(8):e741. doi: 10.1371/journal.pone.0000741

Genome Dynamics of Short Oligonucleotides: The Example of Bacterial DNA Uptake Enhancing Sequences

Mohammed Bakkali 1,*
Editor: Cecile Fairhead2
PMCID: PMC1939737  PMID: 17710141

Abstract

Among the many bacteria naturally competent for transformation by DNA uptake—a phenomenon with significant clinical and financial implications— Pasteurellaceae and Neisseriaceae species preferentially take up DNA containing specific short sequences. The genomic overrepresentation of these DNA uptake enhancing sequences (DUES) causes preferential uptake of conspecific DNA, but the function(s) behind this overrepresentation and its evolution are still a matter for discovery. Here I analyze DUES genome dynamics and evolution and test the validity of the results to other selectively constrained oligonucleotides. I use statistical methods and computer simulations to examine DUESs accumulation in Haemophilus influenzae and Neisseria gonorrhoeae genomes. I analyze DUESs sequence and nucleotide frequencies, as well as those of all their mismatched forms, and prove the dependence of DUESs genomic overrepresentation on their preferential uptake by quantifying and correlating both characteristics. I then argue that mutation, uptake bias, and weak selection against DUESs in less constrained parts of the genome combined are sufficient enough to cause DUESs accumulation in susceptible parts of the genome with no need for other DUES function. The distribution of overrepresentation values across sequences with different mismatch loads compared to the DUES suggests a gradual yet not linear molecular drive of DNA sequences depending on their similarity to the DUES. Other genomically overrepresented sequences, both pro- and eukaryotic, show similar distribution of frequencies suggesting that the molecular drive reported above applies to other frequent oligonucleotides. Rare oligonucleotides, however, seem to be gradually drawn to genomic underrepresentation, thus, suggesting a molecular drag. To my knowledge this work provides the first clear evidence of the gradual evolution of selectively constrained oligonucleotides, including repeated, palindromic and protein/transcription factor-binding DNAs.

Introduction

Many bacteria are naturally competent for transformation by spontaneous uptake of DNA from their surrounding environments [1], [2]. Among these, Pasteurellaceae and Neisseriaceae species preferentially take up DNA containing specific short sequences, called uptake signal sequences (USS) and DNA uptake sequences (DUS), respectively [3][9]. Both USSs and DUSs are overrepresented in their respective genomes and, while only one DUS (5′-GCCGTCTGAA) has been described [10][13], USS was reported in two slightly different versions: (i) 5′-AAGTGCGGT in Haemophilus influenzae [13][15], Actinobacillus actinomycetemcomitans [6], [9], [15], Haemophilus somnus and Pasteurella multocida [15], Mannheimia succiniciproducens [16] and, probably Haemophilus parasuis [16], [17] and (ii) 5′-ACAAGCGGTC in Mannheimia haemolytica [16], [18] and Actinobacillus pleuropneumoniae [16]. Bacterial competence for natural transformation is best characterized in Bacillus subtilis (Bacillaceae), H. influenzae (Pasteurellaceae), Neisseria gonorrhoeae (Neisseriaceae) and Streptococcus pneumoniae (Streptococcaceae) (see [1], [2], [19][21]). This work therefore focuses on H. influenzae and N. gonorrhoeae USS/DUS—for which I will henceforth use the unifying name DNA Uptake Enhancing Sequence (DUES).

Less than 10 DUESs are expected in random sequences of similar length and base composition as H. influenzae Rd Kw20 and N. gonorrhoeae FA1090 genomes (see Table S1). However, Smith et al. [13], [14] found 1471 DUES in the first genome, and Davidsen et al. [11] counted 1965 in the latter. The search for a function to account for the evolution of these sequences is still inconclusive. Goodman and Scocca [22] suggested a DUES role in transcription termination, and Karlin et al. [23] viewed the significant even spacing of these sequences around the genome as a sign of their role in DNA replication, repair, or compaction. However, any transcription termination activity is difficult to envisage for at least three reasons: (i) 65% of H. influenzae and 35% of N. gonorrhoeae DUESs are located within open reading frames [11], [13], [14], (ii) DUESs are not palindromes and only ∼10% occur as inverted repeats (M. Bakkali, unpublished), and (iii) there are more DUESs in non-coding regions than expected for an unbiased distribution [15]. Furthermore, no intracellular DUES-binding protein was identified, and DUESs show no orientation bias around the chromosome [13][15]—features expected from a sequence that interacts with the replication machinery.

DUESs as bacterial mate recognition systems [24] is the only uptake related function proposed for these sequences. In this case, DUESs could be tags for ‘safe sex’ among bacterial cells seeking recombination trough competence for natural transformation. Given the striking DUES genomic overrepresentation, any preferential uptake of DUES-containing DNA will inevitably result in preferential uptake of conspecific DNA, and DNA from species sharing the same DUES (i.e., closely related species). Hence, DUES-biased system of DNA uptake could evolve given a higher selective advantage of recombination with DNA from conspecifics compared to DNA from unrelated species, and computer simulations seem to confirm this possibility [25]. However, bacteria were suggested to take up DNA mainly as a nutrient rather than for recombination [26], [27]. Increasingly supported by recent findings [28][31], this hypothesis raises further questions about the evolution of a DUES-biased DNA uptake system that significantly limits the quantity of DNA available to the competent cells.

H. influenzae has ‘only’ 764 singly-mismatched DUESs [13][15], even though these can arise from the species' 1471 nine base pairs (bp) long DUESs by any of the 27 possible mutations. Testing this ostensible sequence homogeneity may therefore help understand DUES evolution. High sequence homogeneity is expected in repeats arising by copying, such as transposable elements and telomeres, and Smith et al. [13] suggested that the unexpectedly low frequency of singly-mismatched DUESs relative to the non-mismatched one could be due to a balance between mutation away from the latter and its restoration by preferential uptake. However, DUESs are not known to be transposable, and we have no reason to discard uptake bias in favor of singly-mismatched ones. Both H. influenzae and Neisseria meningitidis even take up DUES-lacking DNAs, though less efficiently [22], [32][34].

Based on sequence comparison of homologous pasteurellacean genes, DUESs were suggested to evolve by gradual accumulation of point mutations in preexisting sequences rather than by insertion/deletion of entire sequences [15]. As incoming DNA can replace homologous chromosomal regions by recombination, and provided there is enough bias towards some mismatched DUESs as well, DNA uptake could generate a drive that gradually imposes the DUES and some of its mismatched forms in the susceptible parts of the genome (i.e., non-coding and unconstrained coding regions). The resulting molecular drive, tested in [35], offers a simpler explanation of DUES accumulation where the perceived sequence homogeneity could be a consequence of stronger uptake bias towards the DUES than its mismatched forms.

Evaluation of these interpretations requires careful analysis of the genomic representation of all the DUES-like sequences as well as the actual bias of DNA uptake and its contribution to DUES evolution. In this work I examine and computer simulate the accumulation of DUESs in bacterial genomes by estimating and analyzing the frequencies of H. influenzae and N. gonorrhoeae DUESs and all their mutated forms. I subsequently monitor the overrepresentation of DUES nucleotides in sequences with different mismatch loads (i.e., number of mismatched positions compared to the DUES) to detect the genomic footprints of the DNA uptake bias and, thus, infer DUES evolutionary history. I then test whether uptake bias in itself can explain DUES accumulation by correlating the strength of DUES-like sequences, as estimated from the genomic frequencies of their nucleotides, to the uptake bias in their favor. Finally, I analyze the distribution of the genomic frequencies of several pro- and eukaryotic sequences to test whether the DUES mode of evolution could be extrapolated to other selectively constrained oligonucleotides. DUES offers a precious system for the study of such sequences—especially protein/transcription factor-binding—as it is selectively bound by the extracellular receptor for DNA uptake, and the uptake bias could be analogous to any other function/selective force.

Results

Sequence frequencies and overrepresentation

This analysis aims at answering two questions: (i) Do DUESs evolve by gradual accumulation of mutations? If so, then (ii) what is the minimum number of matches a sequence needs to share with the DUES for its uptake to be significantly preferential?

For a gradual evolution of DUES to take place, a degree of DNA uptake bias towards some of its mutated forms is also required. In such conditions, and assuming that there is no interference from other evolutionary forces than mutation and DNA uptake bias, over evolutionary time the latter should leave a trace in the genome in form of significant overrepresentation of the DUES and its mutated yet preferentially taken up forms. Furthermore, the magnitude of overrepresentation should inversely correlate to the mismatch load of the sequence compared to the DUES. However, no significant overrepresentation is expected for sequences not preferentially taken up and, if any, it does not have to correlate with the mismatch load.

Only DUESs in H. influenzae and N. gonorrhoeae genomes show significant overrepresentation that consistently decreases with the increase in mismatch load until underrepresentation at mismatch load 6 (Figure 1, Table 1, and Table S1). Significant overrepresentation then reappears at the last two sequence categories where it increases with the mismatch load. As expected from non-accumulating DNAs, the control tests show neither overrepresentation of the sequences analyzed, nor any consistent trend in sequence frequencies with respect to the mismatch load. Thus, if sequence overrepresentation is indicative of DNA uptake, these results suggest that DUESs gradually evolve in the genome by accumulation of point mutations.

Figure 1. Ratio of the observed to the expected number of sequences across mismatch loads.

Figure 1

X-axis: Mismatch load. A: DUESs. B: Control sequences. HDUESH and HDUESS refer to H. influenzae DUES in H. influenzae and S. pyogenes genomes, respectively. NDUESN and NDUESC refer to N. gonorrhoeae DUES in N. gonorrhoeae and C. tepidum genomes. CSH1, 2, and 3 and CSN1, 2, and 3 refer to the three control sequences in H. influenzae and N. gonorrhoeae (see Material and methods).

Table 1. Sequence representation (equation 2) of DUESs, control sequences, and all their mismatched forms.

Mismatch load H. influenzae S. pyogenes N. gonorrhoeae C. tepidum
HDUESH CSH1 CSH2 CSH3 HDUESS NDUESN CSN1 CSN2 CSN3 NDUESC
0 182.875 −0.750 −0.375 −0.750 −0.556 392 −0.800 −1.000 −1.000 −0.200
1 2.008 −0.621 0.099 −0.553 −0.360 4.776 0.828 −0.545 −0.940 −0.048
2 0.220 −0.479 0.029 −0.418 −0.195 0.652 1.667 −0.410 −0.746 0.010
3 0.069 −0.281 0.030 −0.267 −0.095 0.344 0.503 −0.214 −0.548 0.018
4 0.024 −0.133 0.018 −0.123 −0.044 0.190 0.312 −0.080 −0.311 0.012
5 0.001 −0.022 0.001 −0.023 −0.003 0.085 0.098 −0.002 −0.096 0.016
6 −0.012 0.034 −0.003 0.040 0.012 −0.011 −0.014 0.022 0.023 0.010
7 −0.010 0.044 −0.004 0.035 0.010 −0.051 −0.065 0.014 0.059 −0.003
8 0.004 −0.008 0.001 −0.015 −0.002 −0.037 −0.051 −0.012 0.046 −0.017
9 0.032 −0.124 0.010 −0.096 −0.035 0.027 0.038 −0.010 −0.043 −0.004
10 0.176 0.223 0.014 −0.183 0.065

Sequence names as in Figure 1. Significant values, as by χ2 test, are underlined (see Table S1).

The distribution of the overrepresentation values is strikingly similar between H. influenzae and N. gonorrhoeae, suggesting similar DUES genome dynamics and evolution. Nonetheless, overall, N. gonorrhoeae values are noticeably higher than those of H. influenzae, possibly due to more efficient/frequent DNA uptake. Two major drops in overrepresentation values can easily be identified in both species: The first between mismatch loads 0 and 1 (92 fold in H. influenzae and 82 in H. gonorrhoeae), and the second between loads 1 and 2 (9 fold in H. influenzae and 7 in N. gonorrhoeae). This reflects a non-linear relation between the mismatch load and DNA uptake, possibly due to high specificity of the receptor binding the DUES. Coinciding with the base length difference between N. gonorrhoeae and H. influenzae DUESs, significant overrepresentation values reach one mismatch further in the first species than in the latter. This highlights that for DNA uptake what matters is the number of matches to the DUES not the mismatches.

At four matches (i.e., mismatch load 5 in H. influenzae, and 6 in N. gonorrhoeae), the results are ambiguous with non-significant overrepresentation in H. influenzae, but significant underrepresentation in N. gonorrhoeae. This may be due to: (i) Non-preferential uptake, (ii) insufficient uptake for balancing sequence loss by mutational decay, (iii) insufficient drive from higher mismatch loads to balance the drive towards lower ones, or (iv) functional constraint selecting against some sequences at this mismatch loads. Sequences with 3 and 2 matches to the DUES are significantly underrepresented, thus, probably not preferentially taken up. They could possibly comprise the pool from which the drive imposed by the DNA uptake bias receives new sequences after mutation. Overrepresentation of sequences with 0 and 1 match to the DUES shows no negative correlation with the mismatch load and is clearly due to uptake independent factor(s) (e.g., functional and mutational constraints or codon bias).

Computer simulations

Overrepresentation of mismatched sequences suggests that they are preferentially taken up. Still, it could also be due to mutational decay of sequences with fewer mismatches that are preferentially taken up. DUESs are so overrepresented that singly-mismatched sequences could reach overrepresentation only by DUES mutational decay—as suggested in [13]. It is also possible that the two major drops in sequence overrepresentation reported above reflect uptake bias only towards sequences with less than 2 mismatches to the DUES. I therefore simulated the evolution of sequence frequencies in a population of 1018 genomes with the same distribution of sequence frequencies as those expected for randomized H. influenzae and N. gonorrhoeae genomes (see Table S1) evolving at 10−2 mutations per base per generation. Mutations alone are not able to drive neither H. influenzae nor N. gonorrhoeae DUES to their respective observed overrepresentation levels (data not shown), and an uptake bias of 104.4 and 477.5 respectively is needed (Figure 2A;D). However, the results do not support the possibility of mutational decay. By themselves, these uptake biases can neither drive the singly-mismatched sequences to their observed overrepresentation levels, nor give similar distributions of sequence frequencies as the observed ones (Figure 2A;D). An additional 1.02 and 5.284 uptake bias in favor of singly-mismatched H. influenzae and N. gonorrhoeae DUESs was needed to attain their observed frequencies, yet it did not give similar overall sequence frequency distribution as the observed one (Figure 2B;E). The same was true for additional uptake biases towards other mismatch loads (data not shown). An approximation to the observed distributions of sequence frequencies with an error margin of less than 0.01% was obtained only with the distributions of uptake biases shown in Figure 2C;F. This distribution agrees with my interpretation of the sequence overrepresentation results, as it supports uptake bias towards mismatched sequences and negative correlation between the mismatch load and the uptake bias. The exception seems to be H. influenzae sequences at mismatch loads 4 to 9, where uptake bias value increases with the mismatch load, which may reflect the involvement of other selective forces than uptake. The values of uptake bias used in the simulations certainly include the other selective forces, such as functional constraints, that led to the observed distributions of sequence frequencies in the real genomes.

Figure 2. Computer simulation of the accumulation of H. influenzae and N. gonorrhoeae DUESs.

Figure 2

X-axis: Cycle (i.e., generation). Y-axis: Ratio of the number of sequences after each simulation cycle to the observed (i.e., real) number. A and D: Genomes of H. influenzae (A) and N. gonorrhoeae (D) evolving with 104.4 and 477.5 uptake bias towards their respective DUES. B and E: The same genomes evolving with an additional 1.02 and 5.284 uptake bias towards the singly-mismatched DUESs. C and F: The same genomes evolving with the uptake bias combinations (156.4; 2.016; 0.352; 0.295; 0.340; 0.395; 0.449; 0.505; 0.563; 0.623) and (499.2; 5.567; 0.857; 0.493; 0.306; 0.175; 0.056; 0; 0; 0.051; 0.184) towards sequences with 0 to 9 and 0 to 10 mismatches to the respective DUES—semicolons separate uptake biases for different mismatch loads.

To reach the currently observed genomic distributions of sequence frequencies, simulations needed higher uptake bias values and more cycles (i.e., generations) for N. gonorrhoeae than for H. influenzae (Figure 2C;F). This may be due to: (i) Differences in the ancestral organization of the real genomes which, obviously, was not random as assumed in the simulations, (ii) more frequent uptake of DNA by the almost permanently competent N. gonorrhoeae [36], [37] than by the occasionally competent H. influenzae [4], [38], [39], or (iii) earlier evolution of the DUES-biased DNA uptake in N. gonorrhoeae than in H. influenzae.

Sequence logos and evolution of DUES nucleotides

The results reported above clearly indicate gradual evolution of DUESs by accumulation of point mutations. However, this is only certain if we see a gradual increase in the overrepresentation of the actual DUES nucleotides; as nucleotide overrepresentation at any particular position of the DUES may not show any trend with regards to the mismatch level.

So far I focused solely on the 9 and 10 bp H. influenzae and N. gonorrhoeae DUESs. However, for H. influenzae, this sequence is only the core of a larger (29 bp) DUES consensus containing two additional less conserved regions [14], [16]. Conversely, there is no report of less conserved DUES regions in N. gonorrhoeae [7]. DNA sequence logos were therefore generated for 100 pb sequences, containing the DUES cores or one of their mismatched forms, in order to include the less conserved regions and look for possible additional ones. H. influenzae DUES consensus is 5′-aAAGTGCGGTnrwttttnnnnnnrwtttw, where r = A or G and w = A or T (Figure 3A), which is similar to the consensus reported in [14]. N. gonorrhoeae DUES also seems to show less conserved nucleotides (Figure 3B), especially an adenine and a thymine at the 5′ side. Its consensus sequence thus being 5′-mdatGCCGTCTGAAvv, where d = A, G or T, m = A or C and v = A, G or C.

Figure 3. DNA sequence logos showing the accumulation of H. influenzae and N. gonorrhoeae DUES nucleotides.

Figure 3

The heights of the characters reflect the conservation of the nucleotides ([equation 3 in 66]) as well as the genomic representation of sequences at the mismatch load (equation 2). The number to the side of each logo is the mismatch load analyzed. The height of the logo should be multiplied by the number at the top, if applicable.

The logos support previous interpretations, as they show clear gradual accumulation of DUES nucleotides starting from the third match, with some nucleotides gaining overrepresentation earlier/faster than others, probably due to their importance for uptake. This, together with the results of the simulations, solves the ambiguity regarding sequences with 4 matches and suggests a certain bias in their uptake, though not sufficient to drive them to significant overrepresentation. Coinciding with the biases in genomic base composition, DUES Ws emerge first in H. influenzae whilst, in N. gonorrhoeae, the whole DUES core emerges at the same time with a slightly higher overrepresentation of Ss than Ws. Unlike the core, DUES nucleotides of the less conserved regions emerge only after mismatch load 2 and, in H. influenzae positions 14 and 15, there is a switch in the nature of the most overrepresented nucleotide; probably because of similarity in their overrepresentation. In neither of the species does overrepresentation seem to depend on uptake at less than 3 matches, since the logos do not show resemblance to the respective DUES cores.

Dependence of the uptake bias on the strength of the sequence

Dependence of sequence overrepresentation on DNA uptake is the key to interpreting the results described above. I therefore used the 28 DNA fragments tested for uptake efficiency by competent H. influenzae in [40]—which are the best experimental data set available on DNA uptake—to test the dependence of their uptake on the strength (equation 19) of their 29 bp DUES-like sequences. The small differences in the naming and sizes of some of the DNA fragments between what is reported in [40] and the sequences available at the database are insignificant and will not affect the intended analyses. 15 of these fragments were classified by Goodgal and Mitchell [40] as uptake fragments and the remaining 13 as non-uptake. As Table 2 shows, the uptake fragments are, overall, larger than the non-uptake ones, and the average strength of their strongest 29 bp sequences is higher, whilst the average mismatch load of these sequences is lower. The mean mismatch load of all the 29 bp sequences of every uptake and non-uptake fragment is similar, whereas their mean strength is lower in the uptake than the non-uptake fragments. In principle, this may be sufficient to infer dependence of the uptake efficiency on the strength of the strongest sequence of a DNA fragment. However, DNAs were named by Goodgal and Mitchell [40] prior to quantification of their uptake, and their classification into uptake and non-uptake fragments does not reflect the clear division between fragments taken up at more than 60 molecules per cell and those taken up at 30 or less. Three fragments originally classified as non-uptake are better seen as uptake fragments (14nu, 27nu and 58nu), and six vice versa (159u, 201u, 205u, 30u, 62u and 71u). After this reclassification, the overall uptake fragments (i.e., all their 29 bp sequences), on average, show less mismatches, but less strength, than the non-uptake ones, whereas their strongest 29 bp sequences show less mismatches and more strength.

Table 2. Characteristics of the DNA fragments tested for uptake in [40].

Fragment Characteristic Number of fragments Mean Minimum Maximum Standard Deviation
Uptake Mismatch load of the strongest sequence 15 (12) 1.467 (0.167) 0 (0) 5 (1) 1.807 (0.389)
Mean mismatch load of all the 29 bp sequences 6.817 (6.820) 6.763 (6.774) 6.89 (6.89) .029 (0.033)
Strength of the strongest sequence ×106 136014.044 (157773.523) 93438.933 (145161.802) 164318.553 (165301.003) 27130.732 (5570.456)
Mean strength of all the 29 bp sequences ×106 31734.307 (31549.857) 26186.814 (26186.814) 35824.358 (34947.958) 2187.048 (2486.952)
Length of the DNA fragment 177.133 (170.833) 94 (85) 295 (295) 55.631 (67.261)
Uptake of the DNA fragment 51.227 (72.833) 6.4 (67) 80 (80) 28.125 (4.282)
Non-uptake Mismatch load of the strongest sequence 13 (16) 2.385 (3.188) 0 (2) 4 (5) 1.325 (0.75)
Mean mismatch load of all the 29 bp sequences 6.806 (6.806) 6.72 (6.72) 6.88 (6.88) 0.039 (0.034)
Strength of the strongest sequence ×106 116059.201 (103481.125) 75958.096 (75958.096) 165301.003 (116996.024) 27537.782 (10406.055)
Mean strength of all the 29 bp sequences ×106 32571.148 (32552.578) 26924.022 (26924.022) 39076.779 (39076.779) 2941.195 (2591.2731)
Length of the DNA fragment 132.154 (145.313) 50 (50) 311 (311) 73.043 (66.938)
Uptake of the DNA fragment 24.946 (13.669) 6 (6) 79 (30) 28.146 (8.892)

Between parentheses are the results after reclassification of the uptake and non-uptake fragments (see Results).

Regression analysis to detect dependence of the number of DNA molecules taken up by competent H. influenzae on the strength and mismatch load of their sequences shows very significant values for the strongest 29 bp sequence in each of the 28 DNA fragments (Table 3). This suggests that the efficiency of DNA uptake depends on the strength of the best region of the DNA fragment. However, the results are also significant for the second 29 bp sequences in strength, implying that the preferential uptake might not always target the strongest sequence in a DNA fragment. Nevertheless, weaker sequences show no significant results, and the significance of the regression is much higher for the strongest sequence. The significant results of the second sequences in strength could, thus, be due to similarities with the strongest ones in some DNA fragments. In fact, 11 DNA fragments show less than 5% difference in strength between their strongest and the following sequence (fragments 8u2, 30u, 37nu2, 39nu, 45nu, 48u, 51nu, 71u, 159u, 201u, 205u). At 10% cut-off, the number of these fragments already reaches half the sample size (due to addition of fragments 43u, 48nu, 62nu).

Table 3. Dependence of the uptake efficiency of the DNA fragments in [40] on the strength and mismatch load of their 29 bp sequences.

29 bp sequence Characteristic BETA t26 p-level
Strongest Strength 0.894 10.171 <10−9
Mismatch load −0.867 −8.901 <10−8
Second Strength 0.520 3.103 0.005
Mismatch load −0.418 −2.349 0.027
Third Strength 0.295 1.574 0.128
Mismatch load −0.239 −1.253 0.221
Fourth Strength 0.288 1.536 0.137
Mismatch load 0.055 0.283 0.779
Fifth Strength 0.260 1.374 0.181
Mismatch load −0.311 −1.668 0.107

Only the data from the top five strongest 29 bp sequences are shown here.

Even though uptake DNA fragments on average are larger than the non-uptake ones (Table 2), DNA uptake efficiency does not depend on the length of the DNA fragment taken up (Table 4). However, had I tested more DNA fragments, such effect could have been detected as, in theory, the larger a DNA fragment is, the more likely it is to have a strong DUES. Furthermore, neither the mean mismatch load of all the 29 bp sequences of a DNA fragment, their mean strength, the position of the strongest one in the fragment, nor its orientation have any significant effect on DNA uptake (Table 4). Randomizing the 29 positions of the strongest sequence in each DNA fragment abolishes the significant dependence of the original uptake values on the strength of these sequences (e.g., Beta = 0.271, t26 = 1.433 and p-level = 0.164).

Table 4. Lack of dependence of uptake efficiency on the length of the DNA fragment, its average mismatch load, strength, and the position and orientation of its strongest DUES.

Characteristic of the DNA fragment BETA t26 p-level
Length 0.159 0.823 0.418
Mean mismatch load 0.258 1.361 0.185
Mean strength −0.251 −1.320 0.198
Position of the strongest 29 pb sequence 0.013 0.066 0.948
Orientation of the strongest 29 pb sequence 0.112 0.577 0.569

Ultimately, the results suggest that the efficiency of DNA uptake significantly depends on the strength of one sequence only—the strongest—which is the most likely to be bound and taken up independently of its position or orientation in the DNA fragment. Since the strength of sequences was estimated based on the genomic overrepresentation (conservation) of their nucleotides (see equation 19), these results suggest that DNA uptake bias alone can explain the genomic overrepresentation of DUESs and some of their mismatched forms.

Experimental verification of the theoretical results

For experimental confirmation of the abovementioned results, uptake was measured for 222 bp DNA fragments generated in vitro to contain the ‘best’ DUES consensus, consensuses mutated to the least frequent nucleotide at positions 2–11, 13 and 14, 20, or 25 and 26, or the worst 29 bp sequence. Uptake experiments were replicated several times and for different amounts of DNA (10, 20, 30 and 40 ηg). The data show considerable variance between experiments, and a nested ANCOVA was performed on the uptake values and their standard deviations to detect possible dependence of uptake on the availability of DNA and the strength of the DUES, as well as to estimate the magnitude of the noise (i.e., error) emanating from the variation between experiments. The results (Table 5) suggest that the amount of DNA taken up strongly depends on the nature of the DUES (i.e., stronger DUESs are taken up more efficiently), and on the amount of DNA given to the cells (i.e., at non-saturating concentrations, as in these experiments, more DNA in the medium increases the possibility for cells to ‘find’ it, bind it, and take it up). Nonetheless, the percentage of DNA taken up depends only on the nature of the DUES. The variability between experiments, however, could be due to differences in the uptake/transformation efficiency (i.e., competence state) of the cells used in each experiment and/or the use of different DNA concentrations and/or DUESs in different experiments. This may be the reason why the variation in the amount of DNA taken up is only explained by variation in the strength of the DUES and the amount of DNA given to the cells, whereas the variation in the percentage of DNA taken up solely depends on the nature of the DUES tested (i.e., position mutated).

Table 5. Effect of the DNA concentration in the medium, the strength of the DUES, and the experiment on DNA uptake efficiency by competent H. influenzae.

Dependant variable Effect (df) MS Effect MS Error F p
Nanograms of DNA taken up Mean DUES (14)* 22.474 1.043 21.554 0
DNA (1)** 30.908 2.254 13.712 0.003
Experiment (14)*** 2.128 0.902 2.359 0.011
Standard deviation DUES (14)* 0.183 0.046 4.004 <0.0001
DNA (1)** 0.340 0.059 5.730 0.034
Experiment (14)*** 0.058 0.044 1.316 0.224
Percentage of DNA taken up Mean DUES (14)* 347.15 12.754 27.219 0
DNA (1)** 119.129 29.2 4.08 0.065
Experiment (14)*** 27.492 10.844 2.535 0.006
Standard deviation DUES (14)* 2.735 0.829 3.298 0.0004
DNA (1)** 1.074 1.178 0.912 0.358
Experiment (14)*** 1.141 0.789 1.447 0.159

The test used is a nested ANCOVA.

*

Fixed effect.

**

Covariate effect.

***

Random effect.

Overall, the experimental findings agree with those of the theoretical analysis reported above, and mutations of theoretically important DUES positions also result in less DNA uptake (Figure 4). The little effect on uptake after mutating the second, third and fourth positions of the DUES consensus, however, do not agree with their conservation results. Still, this result could reflect the evolutionary variability of nucleotides at these positions as they show differences between the two pasteurellacean DUESs (Figure 4). The experiment confirms that, even within the DUES core, the importance of the different positions for DNA uptake varies, with the evolutionarily conserved ones being the most important. Indeed, Karlin et al. [23] reported a biased distribution of singly mismatched DUESs in H. influenzae genome.

Figure 4. Relative uptake efficiency of H. influenzae DUES and some of its mismatched forms.

Figure 4

X-axis: Sequence tested. Best DUES: H. influenzae best DUES consensus. Worst Sequence: The 29 bp sequence of the least frequent nucleotides. PxN: Best DUES mutated at position x (Px) towards the least frequent nucleotide (N). Y-axis: Percentage of DNA taken up. SD: Standard deviation of the mean, SE: Standard error. Note that the critical part of the DUES largely coincides with the evolutionarily conserved positions of the two pasteurellacean DUESs currently known.

The findings on DUES could be generalized to other oligonucleotides

In this work I took advantage of the significant amount of experimental data on DUES mediated DNA uptake and the convenience of working with bacterial genomes, and used H. influenzae and N. gonorrhoeae DUESs as examples of selectively driven—protein-binding—oligonucleotides. As Figure 5A;B and Tables 6 and 7 show, the gradual mode of evolution suggested by the results on these DUESs seems to be a common feature of other selectively constrained oligonucleotides, including the other pasteurellacean DUES as well as several other pro- and eukaryotic short repeated, palindromic and protein-binding DNAs. Sequences showing genomic overrepresentation seem to gradually evolve in the same way as the DUES (i.e., by molecular drive), whilst those underrepresented seem to evolve in an inversed fashion (i.e., by molecular drag). The differences in over- or underrepresentation levels between sequences and mismatch loads reflect the differences in strength and specificity of the selective force(s) directing the evolution of these sequences.

Figure 5. Ratio of the observed to the expected number of several pro- and eukaryotic oligonucleotides in their respective genome/chromosome.

Figure 5

X-axis: Mismatch load. A: Sequences genomically overrepresented. B: Sequences genomically underrepresented. The sequences are the same as in Tables 6 and 7.

Table 6. Genomic frequencies of various prokaryotic oligonucleotides and their mismatched forms showing signs of gradual drive or drag towards genomic over- or underrepresentation.

Mismatch load Actinobacillus pleuropneumoniae str. L20 Complete genome (∼2.28 Mb, 41.3% G+C) Accession no. NC_009053 Escherichia coli str. K12 Complete genome (∼4.64 Mb, 50.79% G+C) Accession no. NC_000913 Synechocystis sp. PCC 6803 Complete genome (∼3.57 Mb, 47.72% G+C) Accession no. NC_000911
5′-ACAAGCGGTC (DUES [16]) 5′-GCTGGTGG (Chi sequence; a recombination hotspot [69]) 5′-GGCGATCGCC (Palindrome frequent in cyanobacteria [70]. Contains the recognition site of a DNA methyltransferase [71] and of the endonuclease SgfI)
Seqexp Seqobs Repseq χ2 Seqexp Seqobs Repseq Χ2 Seqexp Seqobs Repseq Χ2
0 3 438 145.000 63075.042 151 1008 5.675 4863.980 5 5648 1128.600 6368694.256
1 86 947 10.012 8620.175 3579 8855 1.474 7780.642 160 1830 10.438 17431.015
2 1252 2635 1.105 1528.127 37198 60106 0.616 14164.433 2244 5212 1.323 3926.823
3 10796 14278 0.323 1125.710 220892 259638 0.175 6962.048 18646 22668 0.216 869.827
4 60767 66466 0.094 541.714 819776 822254 0.003 8.216 101666 112708 0.109 1216.584
5 233206 233417 0.001 0.201 1947014 1849597 −0.050 −3579.891 380012 391692 0.031 379.155
6 617875 602744 −0.024 −312.299 2890031 2823959 −0.023 −827.290 986147 966932 −0.019 −314.473
7 1115838 1098244 −0.016 −187.249 2451193 2490658 0.016 863.497 1754338 1692644 −0.035 −1463.755
8 1314386 1323863 0.007 96.098 909516 963275 0.059 3522.838 2047556 2036808 −0.005 −33.765
9 911850 923384 0.013 182.470 1415774 1451248 0.025 1108.418
10 282906 282548 −0.001 −0.423 440394 459550 0.043 887.952

Seqexp: Expected number of sequences, Seq0bs: Observed number of sequences, χ2: Goodness of fit Chi-squared, and Repseq: Sequence representation (equation 2). Significant values are underlined. Positive values mean overrepresentation, whereas negatives mean underrepresentation (equation 3).

Table 7. Genomic frequencies of various eukaryotic oligonucleotides and their mismatched forms showing signs of gradual drive or drag towards genomic over- or underrepresentation.

Mismatch load Caenorhabditis elegans Chromosome V (∼20.9 Mb, 35.43% G+C). Accession no. NC_003283.6 Drosophila melanogaster Chromosome 3R (27.9 Mb, 42.91%G+C). Accession no. NT_033777.2 Homo sapiens Partial chromosome 21 (∼28.6 Mb, 39.12% G+C). Build 36.2, contig Accession no. NT_011512.10
5′-GCAGCGCCC (Unknown function) 5′-AAATATTT (Palindrome of unknown function) 5′-CCAGCCTGG (Unknown function [62], highly conserved across mammalian and other vertebrates' Alu-like repeats [72])
Seqexp Seqobs Repseq χ2 Seqexp Seqobs Repseq χ2 Seqexp Seqobs Repseq χ2
0 13 1022 77.615 78313.947 2460 12562 4.107 41485.732 58 8537 146.190 1239543.342
1 514 1161 1.259 814.424 49266 107792 1.188 69587.929 1938 21892 10.296 205456.965
2 8936 14271 0.597 3185.800 431635 658690 0.526 120369.746 28579 75764 1.651 77943.122
3 90125 119514 0.326 9604.193 2160971 2537278 0.174 68168.806 244451 399058 0.632 98203.132
4 581112 667354 0.148 12979.326 6761788 6886084 0.018 2599.804 1335657 1619802 0.213 61892.796
5 2481719 2590153 0.044 5036.568 13541106 12684254 −0.063 −36850.113 4831548 4842194 0.002 25.621
6 7011409 6891951 −0.017 −1625.544 16948322 15939888 −0.060 −33834.194 11562803 10915084 −0.056 −27097.660
7 12616714 12241133 −0.030 −6353.284 12121636 12466476 0.028 12531.974 17640150 16924950 −0.041 −16078.306
8 13093968 13109859 0.001 28.071 3792921 4517082 0.191 148341.405 15554578 15967840 0.027 15077.269
9 5954282 6202374 0.042 12052.253 6035096 6459737 0.070 33400.451

Abbreviations as in Table 6. Significant values are underlined. Positive values mean overrepresentation, whereas negatives mean underrepresentation (equation 3).

Discussion

The current names of pasteurellacean uptake signal sequences (USS) and neisseriacean DNA uptake sequences (DUS) could be misleading, as they do not accurately describe the effect of these sequences on DNA uptake and imply that they represent different genetic elements. Both sequences show similar genomic distributions in the corresponding genomes [9], [10], [12][14], and have the same effect on DNA uptake by the corresponding competent bacteria [5], [8], [9], [22], [32], [41]. Therefore, USS and DUS refer to sequence variants of the same genetic element; which makes the case for a common nomenclature. This should not be USS, given that neither USSs nor DUSs actually signal DNA uptake, as bacteria need to be competent beforehand (i.e., non-competent cells will not take up DNA, no matter how many USSs or DUSs one gives them). DUS is not a satisfactory name either, as both competent Pasteurellaceae and Neisseriaceae species also take up DNA that lacks USS/DUS, though less efficiently [22], [32][34]. Here I suggest the unifying and more accurately descriptive name of DNA Uptake Enhancing Sequences (DUES) to highlight that both sequences are variants of the same genetic element which enhance the uptake of DNA rather than cause it.

DUESs genomic overrepresentation raises questions about their function and how the normally small bacterial genomes tolerate them in such large quantities. Among the several intracellular functions suggested [22], [23] none seems to be supported by the genomic distributions of these sequences (see Introduction) as: (i) DUESs are not palindromes, and most are neither inverted repeats nor biased towards locations downstream coding sequences—as it would normally have been expected from sequences terminating transcription, (ii) they show no orientation bias around the chromosome—feature expected from sequences interacting with the replication machinery—and we do not know of any intracellular DUES-binding protein, and (iii) they are not as evenly spaced as expected from sequences that help compact the chromosome. Bacterial mate recognition system, discussed in [24], is the only non-intracellular function proposed for these sequences, and the only one to account for their involvement in DNA uptake. The differences between the DUESs of the two groups of Pasteurellaceae species [16], and between these and the Neisseriaceae DUES ([5], [13]; Figure 3A;B) clearly supports this last hypothesis. Nevertheless, competence is suggested to be a mechanism more for nutritional than for sex and recombination purposes [26][31] and, even if this was not the case, some significantly distant bacterial species share the same DUES. These include: (i) H. influenzae, P. multocida, H. somnus, A. actinomycetemcomitans and M. succiniciproducens, (ii) A. pleuropneumoniae and M. haemolytica [6], [9], [13][16], [18] and (iii) N. gonorrhoeae and N. meningitidis [7], [10][13]. On the other hand, the concept of bacterial species is still a matter for debate [42][45] and, in some cases, including Neisseria [46], differentiation between species seems to be rather fuzzy. Furthermore, even if competence was for nutritional purposes, some of the DNA taken up still survives digestion and recombines with the chromosome. Occasional recombination with potentially harmful DNA from distantly related species may, thus, be a sufficient driving force for DUESs to evolve as a mechanism for minimizing such unwanted recombination. DUESs might also not be that advantageous, thus representing a sort of ‘lucky DNA’ probably driven to overrepresentation in genomic locations where they cause insignificant loss of fitness due to a possibly coincidental specificity of the protein that binds DNA (i.e., receptor) at the cell surface during competence. Every DNA-binding protein has some sort of inherent specificity towards a combination of nucleotides due to its amino acids sequence and spatial configuration [47].

The possible implication of DNA uptake bias towards DUESs on the accumulation of these sequences in the genome was oddly overlooked. Earlier we [15], hypothesized that the biased DNA receptor at the cell surface of a competent species may gradually enrich its genome with DNA fragments containing the preferred DNA sequence (i.e., DUES) and some of its mutated forms. Mutations towards the preferred DUES in the DNA of the ‘donor’ cell may preferentially spread to other cells, by uptake and recombination, when the ‘donor’ cell dies. On the other hand, mutations away from the DUES in the recipient chromosomes may be restored by uptake and recombination with incoming DNA from ‘donors’ with less-mutated sequences. This way, the combined action of mutation, uptake bias and reduced constraints in some genomic regions could be sufficient for DUESs to gradually accumulate in susceptible parts the genome and no additional function is needed. This would explain DUESs preferential location in non-coding regions [15], and in parts of the coding regions where they have little influence on protein configuration/function [23].

If such interpretation is correct, DNA uptake should leave a footprint on the genome, in form of higher overrepresentation of sequences depending on their similarity to the DUES, which is exactly what this work shows. Sequence overrepresentation is observed even for some mutated DUESs, where it negatively correlates to the mismatch load, and DUES nucleotides seem to emerge and start accumulating as soon as the third match. Simulations of DUES evolution in hypothetical genomes subjected to mutation and uptake bias towards the DUES only fail to reproduce the observed distribution of sequence overrepresentation across mismatch loads, whilst consideration of additional uptake bias towards mismatched sequences did. This, together with the clear correlation between genomic overrepresentation and uptake efficiency, suggests that mutations and DNA uptake bias result in a force sufficient enough to allow DUES gradual evolution by accumulation of point mutations in susceptible parts of the genome (i.e., by molecular drive). This is supported by the experimentally detectable uptake of DNAs from unrelated species (i.e., with mismatched or no DUESs) by competent Pasteurellaceae and Neisseriaceae species [22], [32][34].

The higher overrepresentation of N. gonorrhoeae DUES compared to that of H. influenzae could be due to the almost permanent character of competence in the first species [36], [37] as opposed to its occasional nature in the latter [4], [38], [39]. Even so, the increase in sequence overrepresentation accompanying the decrease in mismatch load is not linear in any of the two species, as we see sharp drops between the DUES and its singly-mismatched forms and, to a lower degree, between the latter and sequences with two mismatches. This distribution could reflect a strong specificity of the DNA-binding receptor, as well as the length of the evolutionary history of the DNA uptake bias and/or its efficiency. In principle, the more specific the receptor is, the more pronounced the overrepresentation of the DUES compared to its mismatched forms will be. Similarly, the longer the uptake bias was running for, or the more efficient it is, the closer genomes will be to an equilibrium, where the relative overrepresentations of the DUES and its mismatched forms better reflect the differences in the relative efficiency of their uptake.

In spite of all the similarities in genomic frequencies and evolutionary dynamics between H. influenzae and N. gonorrhoeae DUESs, their actual sequence consensuses are different both in shape and composition. In this work I identify less conserved nucleotides previously unreported at each side of the N. gonorrhoeae DUES core, bringing the consensus up to 16 bp, with four additional nucleotides at the 5′ side of the core (two of which strongly conserved) and two at the 3′ side. The resulting consensus is identical to that of N. meningitidis DUES reported in [13], suggesting common ancestry of the DUESs of these two sister species. Therefore, just like the Pasteurellaceae [15], Neisseria species seem to have ancestral DUES-mediated preferential uptake of conspecific DNA. The obvious reason for the differences between H. influenzae and N. gonorrhoeae DUESs is differences in structure and binding specificity of the DNA receptors at the cell surfaces of both species. PilC is the extracellular 110 kDa adhesin [48] suggested as binding DNA at the tips of N. gonorrhoeae's type IV pili [49], [50]. Its equivalent at the tips of H. influenzae type IV pili are a 216 amino acids protein (called HifD) and two units of a 435 amino acids protein (called HifE) [51][55]. Given this difference, it is hard to refrain from speculating that whilst PilC could be the receptor driving the entire N. gonorrhoeae DUES, HifD might be binding to and driving the core sequence of H. influenzae DUES, whereas the two HifE proteins could have specificity towards the two almost identical less conserved regions of this DUES. However, this can only be confirmed after proper experimental testing (e.g., gene knockouts, DNA-protein cross-linking, southern-blotting, band-shifts and DNaseI foot-printing, antibodies…).

The results of this work also show that, at least for H. influenzae, only one DUES seems to be bound by the receptor at the cell surface at any given time, independently of its orientation or position in the DNA fragment. A similar result was experimentally obtained in A. actinomycetemcomitans [9], suggesting that this is a general characteristic of DUES mediated DNA uptake. In addition, the most similar sequence to the DUES in a DNA fragment seems to be the one most likely to be bound. Both characteristics are of importance to the molecular drive model of DUES evolution postulated in [15], since the simultaneous binding of more than one DUES would not explain the genomic distribution of these sequences—it favors significant clustering in some chromosomal regions. In addition, bias towards DUESs in a particular orientation would have caused bias in their orientation around the chromosome. Indiscriminate binding of the receptor to sequences independently of their degree of similarity to the DUES, however, would loosen or even abolish the drive responsible for the emergence and gradual accumulation of DUESs in the genome.

Several genomically overrepresented oligonucleotides, both pro- and eukaryotic, seem to be selectively driven in a similar mode as the DUES, whilst those underrepresented seem to be gradually eliminated (dragged) in an inverse fashion. This suggests that the gradual mode of sequence evolution discussed above might be a general feature of short DNAs selectively driven or dragged to genomic over- or underrepresentation, including protein/transcription factor-binding, palindromic, repeated and other oligonucleotides. Statistical deviation from the expected frequencies [56][59] is used as indicator of DNA functionality, including transcription-factor binding [60][62]. DUESs themselves resemble sequence families that interact with sequence-specific DNA-binding proteins (e.g., CRP, LexA and other pro- and eukaryotic transcription factor-binding DNAs); which typically consist of a short conserved core extended by less conserved regions clustering around a consensus sequence without exactly reproducing it (see [63][65]).

In conclusion, the results of this work suggest a gradual accumulation of genomic oligonucleotides by molecular drive towards overrepresentation due to the combined effects of mutation, the function of the oligonucleotide and some of its mutated forms, and weak selection against changes in the functionally unconstrained regions of the genome. Underrepresented oligonucleotides, however, seem to be gradually eliminated by a molecular drag emanating from the combined effects of mutation, the negative effect of the oligonucleotide and some of its mutated forms on the fitness of the carrier, and selection against changes in the functionally constrained regions of the genome. Several oligonucleotides seem to evolve in this way, including transcription factor/protein-binding and functionally constrained palindromic and repeated DNAs. DUESs themselves might be a sort of ‘lucky DNA’ driven to genomic overrepresentation, in the susceptible parts of the genome, not due to any other function but the possibly coincidental specificity of the receptor that binds DNA for uptake during natural competence for transformation.

Materials and Methods

Theoretical analysis

Sequence frequencies and overrepresentation

The programs Sequence_Extractor.pl (a modified version of the Perl based program search_USSmissmatches_sequence.pl, see acknowledgements) and BioEdit version 7.0.1 were used to search both strands of the genomes of H. influenzae str. Rd KW20 (Pasteurellaceae, 1.83 Mb genome size, 38.1% G+C, Accession no. NC_000907) and N. gonorrhoeae str. FA1090 (Neisseriaceae, 2.154 Mb genome size, 51.5% G+C, Accession no. NC_002946) and count the number of their respective DUES and all its mutated forms. The same was done for three control sequences (CS) with the same base composition but no match to the DUES. These were CSH1 = 5′-GGTCAGTAG, CSH2 = 5′-TGCGAGATG and CSH3 = 5′-CGAGTGTAG for H. influenzae, and CSN1 = 5′-ATTCAGCCGG, CSN2 = 5′-TAATGGCCCG and CSN3 = 5′-AGGAGTCTCC for N. gonorrhoeae. In addition, similar analysis was carried out for H. influenzae and N. gonorrhoeae DUESs in the genomes of Streptococcus pyogenes str. M1 GAS (Streptococcaceae, 1.85 Mb genome size, 38.5% G+C, Accession no. NC_002737) and Chlorobium tepidum str. TLS (Chlorobiaceae, 2.155 Mb genome size, 56.5% G+C, Accession no. NC_002932), which have no DUES-biased DNA uptake and their genomes are of similar sizes and G+C contents as those of H. influenzae and N. gonorrhoeae.

The expected number of sequences at each mismatch load was calculated as:

graphic file with name pone.0000741.e001.jpg

(equation 1), where m is the mismatch load analyzed, τ is the number of adenines and thymines (Ws) in the DUES, σ is the number of cytosines and guanines (Ss) in the same DUES, x is the maximum number of mismatches that could affect DUES W positions at the mismatch load m, θ is the genomic frequency of Ws, ς is the genomic frequency of Ss and L is the genome's length in base pairs. Sequence representation was calculated at each mismatch load as RepSeq = (SeqobsSeqexp)/Seqexp (equation 2), where Seqobs is the observed number of sequences. Significant deviation of the observed numbers of sequences from the expected ones was tested using the goodness of fit Chi-squared (χ2), which values were multiplied by |SeqobsSeqexp|/(SeqobsSeqexp) (equation 3) to differentiate overrepresentation from underrepresentation.

Computer simulations

To test the contribution of DNA uptake bias to the observed distribution of sequence frequencies, I wrote the Perl based program Genome_Dynamics.pl (program available on request). It simulates the evolution of the genomic frequencies of a given sequence and all its mismatched forms starting from a given distribution until the equilibrium or any other target. Each generation, a mismatch load looses the sequences mutated to the immediately higher and lower mismatch loads while gaining some of the sequences mutating from the same. Assuming there is no other evolutionary force than single (i.e., point) mutation, sequences at a mismatch load m should decay (i.e., mutate to mismatch load m+1) at the proportion μ(Seq)(1−(m/n)) (equation 4) per mutation per generation, and improve (i.e., mutate to mismatch load m-1) at μ(Seq)(m/(3n)) (equation 5), where μ is the mutation rate per base per generation, Seq is the number of sequences at the mismatch load and generation analysed, and n is the size of the sequence in bp. Uptake biases U0 to Un in favour of sequences at the mismatch loads 0 to n should result in additional generational increase of the frequency of sequences at any mismatch load m by the proportion Um of sequences mutated from the mismatch loads m−1 and m+1. In addition, there will be a generational decrease by the proportion Um−1 of sequences improving from the mismatch load m to m−1, and the proportion Um+1 of sequences decaying from the load m to m+1—note that Um could also include other selective forces than DNA uptake. Repetition of these calculations at each mismatch load for many generations allows simulating the evolution of a given initial distribution of sequence frequencies in a population under a given mutation rate and distribution of uptake bias values towards sequences at different mismatch loads. Genome_Dynamics.pl assumes panmixia and I used a population of 1018 individuals (i.e., cells) evolving at 10−2 mutation per base per generation.

DNA Sequence logos and DUES nucleotides evolution

DNA sequence logos were generated in order to detect possible nucleotide conservation outside the 9 or 10 bp DUES, and to graphically monitor the emergence and evolution of DUES nucleotides in the respective genomes. I used the program Sequences_Extractor.pl to search both strands of H. influenzae and N. gonorrhoeae genomes and extract all the 100 bp sequences containing the respective DUES or any of its mismatched forms at position 38 from the 5′ end of the sequence. I then used WebLogo version 2.8.2 [66], [67] to generate logos for the sequences extracted at each mismatch load. Deviation of the genomic G+C content from the 50% assumed by WebLogo was considered by amending its source code and adding a correction factor F to the value of each base frequency used for generating the logo. F was calculated as |V|(0.5−θ)/θ (equation 6) for Ws, and |V|(0.5−ζ)/ζ (equation 7) for Ss, where V is the value being corrected. This adjustment permits more accurate assessment of the selective pressure on DUES Ss and Ws by minimising underestimation of the importance of the less frequent ones.

Nucleotide genomic overrepresentation and sequence quantification

The genomic representation of each nucleotide at each position of the sequences at each mismatch load was calculated as RepNuc = (NucobsNucexp)/Nucexp (equation 8), where Nucobs and Nucexp are the observed and the expected numbers of the nucleotide at the position and sequences analysed.

Contrarily to the unbiased distribution of nucleotides in sequence parts not targeted by the extraction program (positions outside the DUES core), the distribution of nucleotides within DUES core positions was biased due to targeting of specific combinations of nucleotides by the program Sequences_Extractor.pl. Thus, for nucleotides in the first type of positions, Nucexp was calculated as θSeqobs/2 (equation 9) for Ws, and ζSeqobs/2 (equation 10) for Ss. Whereas for matches at DUES core positions, Nucexp was calculated using equation 11 for Ws, and 12 for Ss. For mismatches at DUES core positions, however, calculation of Nucexp depends on the nature of both the mismatched nucleotide in question and the DUES match at the same position. Thus, for a W as mismatch at a DUES core position, Nucexp was calculated using equation 13, if the match at the same position is also a W, and 14, if it was an S. For an S as mismatch at a DUES core position, Nucexp was calculated using equation 15, if the match at the same position is a W, and 16, if it was an S as well.

Equation 11:

graphic file with name pone.0000741.e002.jpg

Equation 12:

graphic file with name pone.0000741.e003.jpg

Equation 13:

graphic file with name pone.0000741.e004.jpg

Equation 14:

graphic file with name pone.0000741.e005.jpg

Equation 15:

graphic file with name pone.0000741.e006.jpg

Equation 16:

graphic file with name pone.0000741.e007.jpg

A modified version of equation 3 in [66] was used to calculate a nucleotide conservation index (ConNuc) for each nucleotide at each position of sequences at each mismatch load as Inline graphic (equation 17), where Max_RepNuc is the highest representation possible for a nucleotide at the mismatch load analyzed. Max_RepNuc = (TotSeqNucexp)/Nucexp (equation 18), where TotSeq is the total number of sequences at the mismatch load. This way the strength of any given sequence, of the same length (C) as the DUES consensus, can be quantified from the overall conservation of its nucleotides across mismatch loads as Inline graphic (equation 19), where p is the position.

Dependence of the uptake bias on the strength of the sequence

I searched the 28 DNA fragments tested for DNA uptake in [40] (Accession no. M33432 to M33459) and extracted then calculated the strength of all their 29 bp sequences using the program Sequences_Extractor.pl and equation 19. To test whether the genomic overrepresentation of DUES nucleotides reflects the efficiency of their uptake, I tested the dependence of the uptake of these 28 DNA fragments on the strength of their 29 bp sequences by regression analyses using Statistica version 5.1. As negative control the positions of these 29 bp sequences were randomized, using a script written in Perl, and a similar regression analysis performed on the resulting sequence strengths and the uptake of the DNA fragment from where the sequences were originally extracted.

Experimental analysis

To further test the theoretical results, synthetic DNA sequences containing the most conserved H. influenzae DUES consensus or one of its mutated forms were tested for uptake efficiency by competent H. influenzae. Both a 29 bp sequence of the most frequent nucleotide at each H. influenzae DUES consensus position (best DUES) and another of the least frequent ones (worst sequence) were synthesized and cloned by blunt-end ligation into the SmaI site of the plasmid pGEM-7Zf(-) (Accession no. X65311). Control 222 bp DNA template fragments containing the best DUES or the worst sequence were then PCR amplified from the constructs. Sequences carrying point mutations towards the least frequent nucleotide were generated from the best DUES construct using a three-step cohesive-end PCR process. For each desired mutant, two half-fragments (113 bp and 135 bp, with 26 bp overlap) were first produced using overlapping internal primers mutated at the chosen DUES position. After gel electrophoresis and purification, both PCR products were combined as template for a third cohesive-end PCR reaction. The final 222 bp PCR products were then gel purified and sequenced before use as templates for Klenow radio-labeling reactions. These were carried out for three hours at room temperature and, for the initial hour, contained limiting α-33P-dATP (6.0 µM). Subsequent addition of unlabeled (cold) dATP ensured complete replication of each molecule. DNA was then gel purified, and incorporation of α-33P-dATP checked by autoradiography after electrophoresis of an aliquot in acrylamide gel. Radio-labeled DNAs were then mixed with unlabelled DNAs of the same sequence to a specific radioactivity of 1,000 counts per minute (cpm) per ηg of DNA. 10, 20, 30 or 40 ηg of each DNA sequence was then separately added to 0.5 ml of competent H. influenzae cells for transformation as described in [68]. After 15 min incubation at 37 C in rotating tubes, 25 µl of ice cold DNaseI (1 mg/ml) was added to each tube before gentle vortexing and incubation for 5 min on ice. 50 µl of ice cold 5 M NaCl was then added and the tubes gently vortexed then centrifuged at 16,000 g for 1 min at 4 °C. After re-suspension in ice-cold MIV medium [39] containing 1 M NaCl, gentle vortexing and centrifugation, the final pellets were re-suspended in scintillation vials containing 200 µl MIV medium and 1 ml aqueous scintillation fluid at room temperature. Scintillation count was carried out in a Beckman Scintillation Counter. For standardization and accuracy, vials were counted simultaneously, P33 decay factor considered, and the background's P33 cpm subtracted. The nanograms of DNA taken up were estimated using its specific radioactivity. Experiments were carried out in triplicate.

Supporting Information

Table S1

Goodness of fit Chi-squared test (χ 2) on the observed and expected numbers of DUESs, control sequences, as well as all their mismatched forms.

(0.12 MB DOC)

Acknowledgments

I am very thankful to my wife Mrs. Pernille Lavgesen for tolerating my working hours and for her continuous support. I also wish to thank Dr. Rosemary J. Redfield for allowing me to use the experimental data obtained in her laboratory thanks to the appreciable help of Mrs. Becky Dinesen M.Sc., Dr. Ta-Yuan Chen for providing me with the program search_USSmissmatches_sequence.pl and allowing me to modify it, and Dr. Chris M. Wade for allowing me access to his powerful computer. To the daughter or son my wife and I are expecting.

Footnotes

Competing Interests: The author has declared that no competing interests exist.

Funding: The author has no support or funding to report.

References

  • 1.Lorenz MG, Wackernagel W. Bacterial gene transfer by natural genetic transformation in the environment. Microbiol Rev. 1994;58:563–602. doi: 10.1128/mr.58.3.563-602.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Dubnau D. DNA uptake in bacteria. Annu Rev Microbiol. 1999;53:217–244. doi: 10.1146/annurev.micro.53.1.217. [DOI] [PubMed] [Google Scholar]
  • 3.Danner DB, Smith HO, Narang SA. Construction of DNA recognition sites active in Haemophilus transformation. Proc Natl Acad Sci U S A. 1982;79:2393–2397. doi: 10.1073/pnas.79.7.2393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Goodgal SH. DNA uptake in Haemophilus transformation. Annu Rev Genet. 1982;16:169–192. doi: 10.1146/annurev.ge.16.120182.001125. [DOI] [PubMed] [Google Scholar]
  • 5.Mathis LS, Scocca JJ. Haemophilus influenzae and Neisseria gonorrhoeae recognize different specificity determinants in the DNA uptake step of genetic transformation. J Gen Microbiol. 1982;128:1159–1161. doi: 10.1099/00221287-128-5-1159. [DOI] [PubMed] [Google Scholar]
  • 6.Wang Y, Goodman SD, Redfield RJ, Chen C. Natural transformation and DNA uptake signal sequences in Actinobacillus actinomycetemcomitans. J Bacteriol. 2002;184:3442–3449. doi: 10.1128/JB.184.13.3442-3449.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Elkins C, Thomas CE, Seifert HS, Sparling PF. Species-specific uptake of DNA by gonococci is mediated by a 10-base-pair sequence. J Bacteriol. 1991;173:3911–3913. doi: 10.1128/jb.173.12.3911-3913.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Graves JF, Biswas GD, Sparling PF. Sequence-specific DNA uptake in transformation of Neisseria gonorrhoeae. J Bacteriol. 1982;152:1071–1077. doi: 10.1128/jb.152.3.1071-1077.1982. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wang Y, Orvis J, Dyer D, Chen C. Genomic distribution and functions of uptake signal sequences in Actinobacillus actinomycetemcomitans. Microbiology. 2006;152:3319–3325. doi: 10.1099/mic.0.29018-0. [DOI] [PubMed] [Google Scholar]
  • 10.Tettelin H, Saunders NJ, Heidelberg J, Jeffries AC, Nelson KE, et al. Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. Science. 2000;287:1809–1815. doi: 10.1126/science.287.5459.1809. [DOI] [PubMed] [Google Scholar]
  • 11.Davidsen T, Rodland EA, Lagesen K, Seeberg E, Rognes T, et al. Biased distribution of DNA uptake sequences towards genome maintenance genes. Nucleic Acids Res. 2004;32:1050–1058. doi: 10.1093/nar/gkh255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Parkhill J, Achtman M, James KD, Bentley SD, Churcher C, et al. Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491. Nature. 2000;404:502–506. doi: 10.1038/35006655. [DOI] [PubMed] [Google Scholar]
  • 13.Smith HO, Gwinn ML, Salzberg SL. DNA uptake signal sequences in naturally transformable bacteria. Res Microbiol. 1999;150:603–616. doi: 10.1016/s0923-2508(99)00130-8. [DOI] [PubMed] [Google Scholar]
  • 14.Smith HO, Tomb JF, Dougherty BA, Fleischmann RD, Venter JC. Frequency and distribution of DNA uptake signal sequences in the Haemophilus influenzae Rd genome. Science. 1995;269:538–540. doi: 10.1126/science.7542802. [DOI] [PubMed] [Google Scholar]
  • 15.Bakkali M, Chen TY, Lee HC, Redfield RJ. Evolutionary stability of DNA uptake signal sequences in the Pasteurellaceae. Proc Natl Acad Sci U S A. 2004;101:4513–4518. doi: 10.1073/pnas.0306366101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Redfield RJ, Findlay WA, Bosse J, Kroll JS, Cameron AD, et al. Evolution of competence and DNA uptake specificity in the Pasteurellaceae. BMC Evol Biol. 2006;6:82. doi: 10.1186/1471-2148-6-82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bigas A, Garrido ME, de Rozas AM, Badiola I, Barbe J, et al. Development of a genetic manipulation system for Haemophilus parasuis. Vet Microbiol. 2005;105:223–228. doi: 10.1016/j.vetmic.2004.10.015. [DOI] [PubMed] [Google Scholar]
  • 18.Gioia J, Qin X, Jiang H, Clinkenbeard K, Lo R, et al. The genome sequence of Mannheimia haemolytica A1: insights into virulence, natural competence, and Pasteurellaceae phylogeny. J Bacteriol. 2006;188:7257–7266. doi: 10.1128/JB.00675-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Chen I, Christie PJ, Dubnau D. The ins and outs of DNA transfer in bacteria. Science. 2005;310:1456–1460. doi: 10.1126/science.1114021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chen I, Dubnau D. DNA transport during transformation. Front Biosci. 2003;8:s544–556. doi: 10.2741/1047. [DOI] [PubMed] [Google Scholar]
  • 21.Chen I, Dubnau D. DNA uptake during bacterial transformation. Nat Rev Microbiol. 2004;2:241–249. doi: 10.1038/nrmicro844. [DOI] [PubMed] [Google Scholar]
  • 22.Goodman SD, Scocca JJ. Identification and arrangement of the DNA sequence recognized in specific transformation of Neisseria gonorrhoeae. Proc Natl Acad Sci U S A. 1988;85:6982–6986. doi: 10.1073/pnas.85.18.6982. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Karlin S, Mrazek J, Campbell AM. Frequent oligonucleotides and peptides of the Haemophilus influenzae genome. Nucleic Acids Res. 1996;24:4263–4272. doi: 10.1093/nar/24.21.4263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Redfield RJ. Bacteria mating preferences. Nature. 1991;352:25–26. doi: 10.1038/352025b0. [DOI] [PubMed] [Google Scholar]
  • 25.Chu D, Lee HC, Lenaerts T. Evolution of DNA uptake signal sequences. Artif Life. 2005;11:317–338. doi: 10.1162/1064546054407176. [DOI] [PubMed] [Google Scholar]
  • 26.Redfield RJ. Genes for breakfast: the have-your-cake-and-eat-it-too of bacterial transformation. J Hered. 1993;84:400–404. doi: 10.1093/oxfordjournals.jhered.a111361. [DOI] [PubMed] [Google Scholar]
  • 27.Redfield RJ. Do bacteria have sex? Nat Rev Genet. 2001;2:634–639. doi: 10.1038/35084593. [DOI] [PubMed] [Google Scholar]
  • 28.Solomon JM, Grossman AD. Who's competent and when: regulation of natural genetic competence in bacteria. Trends Genet. 1996;12:150–155. doi: 10.1016/0168-9525(96)10014-7. [DOI] [PubMed] [Google Scholar]
  • 29.Palchevskiy V, Finkel SE. Escherichia coli competence gene homologs are essential for competitive fitness and the use of DNA as a nutrient. J Bacteriol. 2006;188:3902–3910. doi: 10.1128/JB.01974-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Finkel SE, Kolter R. DNA as a nutrient: novel role for bacterial competence gene homologs. J Bacteriol. 2001;183:6288–6293. doi: 10.1128/JB.183.21.6288-6293.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Woegerbauer M, Jenni B, Thalhammer F, Graninger W, Burgmann H. Natural genetic transformation of clinical isolates of Escherichia coli in urine and water. Appl Environ Microbiol. 2002;68:440–443. doi: 10.1128/AEM.68.1.440-443.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Albritton WL, Setlow JK, Thomas M, Sottnek F, Steigerwalt AG. Heterospecific transformation in the genus Haemophilus. Mol Gen Genet. 1984;193:358–363. doi: 10.1007/BF00330693. [DOI] [PubMed] [Google Scholar]
  • 33.Goodgal SH, Mitchell M. Uptake of heterologous DNA by Haemophilus influenzae. J Bacteriol. 1984;157:785–788. doi: 10.1128/jb.157.3.785-788.1984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Boyle-Vavra S, Seifert HS. Uptake-sequence-independent DNA transformation exists in Neisseria gonorrhoeae. Microbiology. 1996;142((Pt 10)):2839–2845. doi: 10.1099/13500872-142-10-2839. [DOI] [PubMed] [Google Scholar]
  • 35.Chu D, Rowe J, Lee HC. Evaluation of the current models for the evolution of bacterial DNA uptake signal sequences. J Theor Biol. 2006;238:157–166. doi: 10.1016/j.jtbi.2005.05.024. [DOI] [PubMed] [Google Scholar]
  • 36.Biswas GD, Sox T, Blackman E, Sparling PF. Factors affecting genetic transformation of Neisseria gonorrhoeae. J Bacteriol. 1977;129:983–992. doi: 10.1128/jb.129.2.983-992.1977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Biswas GD, Sparling PF. Entry of double-stranded deoxyribonucleic acid during transformation of Neisseria gonorrhoeae. J Bacteriol. 1981;145:638–640. doi: 10.1128/jb.145.1.638-640.1981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Goodgal SH, Herriott RM. Studies on transformations of Haemophilus influenzae. I. Competence. J Gen Physiol. 1961;44:1201–1227. doi: 10.1085/jgp.44.6.1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Herriott RM, Meyer EM, Vogt M. Defined nongrowth media for stage II development of competence in Haemophilus influenzae. J Bacteriol. 1970;101:517–524. doi: 10.1128/jb.101.2.517-524.1970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Goodgal SH, Mitchell MA. Sequence and uptake specificity of cloned sonicated fragments of Haemophilus influenzae DNA. J Bacteriol. 1990;172:5924–5928. doi: 10.1128/jb.172.10.5924-5928.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Scocca JJ, Poland RL, Zoon KC. Specificity in deoxyribonucleic acid uptake by transformable Haemophilus influenzae. J Bacteriol. 1974;118:369–373. doi: 10.1128/jb.118.2.369-373.1974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Cohan FM. Bacterial species and speciation. Syst Biol. 2001;50:513–524. doi: 10.1080/10635150118398. [DOI] [PubMed] [Google Scholar]
  • 43.Moreno E. In search of a bacterial species definition. Rev Biol Trop. 1997;45:753–771. [PubMed] [Google Scholar]
  • 44.Konstantinidis KT, Ramette A, Tiedje JM. The bacterial species definition in the genomic era. Philos Trans R Soc Lond B Biol Sci. 2006;361:1929–1940. doi: 10.1098/rstb.2006.1920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Cohan FM. What are bacterial species? Annu Rev Microbiol. 2002;56:457–487. doi: 10.1146/annurev.micro.56.012302.160634. [DOI] [PubMed] [Google Scholar]
  • 46.Hanage WP, Fraser C, Spratt BG. Fuzzy species among recombinogenic bacteria. BMC Biol. 2005;3:6. doi: 10.1186/1741-7007-3-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Ahmad S, Kono H, Arauzo-Bravo MJ, Sarai A. ReadOut: structure-based calculation of direct and indirect readout energies and specificities for protein-DNA recognition. Nucleic Acids Res. 2006;34:W124–127. doi: 10.1093/nar/gkl104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Jonsson AB, Nyberg G, Normark S. Phase variation of gonococcal pili by frameshift mutation in pilC, a novel gene for pilus assembly. Embo J. 1991;10:477–488. doi: 10.1002/j.1460-2075.1991.tb07970.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Rudel T, Facius D, Barten R, Scheuerpflug I, Nonnenmacher E, et al. Role of pili and the phase-variable PilC protein in natural competence for transformation of Neisseria gonorrhoeae. Proc Natl Acad Sci U S A. 1995;92:7986–7990. doi: 10.1073/pnas.92.17.7986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Rudel T, Scheurerpflug I, Meyer TF. Neisseria PilC protein identified as type-4 pilus tip-located adhesin. Nature. 1995;373:357–359. doi: 10.1038/373357a0. [DOI] [PubMed] [Google Scholar]
  • 51.van Ham SM, van Alphen L, Mooi FR, van Putten JP. The fimbrial gene cluster of Haemophilus influenzae type b. Mol Microbiol. 1994;13:673–684. doi: 10.1111/j.1365-2958.1994.tb00461.x. [DOI] [PubMed] [Google Scholar]
  • 52.St Geme JW, Pinkner JS, 3rd, Krasan GP, Heuser J, Bullitt E, et al. Haemophilus influenzae pili are composite structures assembled via the HifB chaperone. Proc Natl Acad Sci U S A. 1996;93:11913–11918. doi: 10.1073/pnas.93.21.11913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.McCrea KW, Watson WJ, Gilsdorf JR, Marrs CF. Identification of hifD and hifE in the pilus gene cluster of Haemophilus influenzae type b strain Eagan. Infect Immun. 1994;62:4922–4928. doi: 10.1128/iai.62.11.4922-4928.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.McCrea KW, Watson WJ, Gilsdorf JR, Marrs CF. Identification of two minor subunits in the pilus of Haemophilus influenzae. J Bacteriol. 1997;179:4227–4231. doi: 10.1128/jb.179.13.4227-4231.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Gilsdorf JR, McCrea KW, Marrs CF. Role of pili in Haemophilus influenzae adherence and colonization. Infect Immun. 1997;65:2997–3002. doi: 10.1128/iai.65.8.2997-3002.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Burge C, Campbell AM, Karlin S. Over- and under-representation of short oligonucleotides in DNA sequences. Proc Natl Acad Sci U S A. 1992;89:1358–1362. doi: 10.1073/pnas.89.4.1358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Karlin S, Burge C, Campbell AM. Statistical analyses of counts and distributions of restriction sites in DNA sequences. Nucleic Acids Res. 1992;20:1363–1370. doi: 10.1093/nar/20.6.1363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Karlin S, Campbell AM, Mrazek J. Comparative DNA analysis across diverse genomes. Annu Rev Genet. 1998;32:185–225. doi: 10.1146/annurev.genet.32.1.185. [DOI] [PubMed] [Google Scholar]
  • 59.Karlin S, Mrazek J, Campbell AM. Compositional biases of bacterial genomes and evolutionary implications. J Bacteriol. 1997;179:3899–3913. doi: 10.1128/jb.179.12.3899-3913.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Sinha S, Tompa M. Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 2002;30:5549–5560. doi: 10.1093/nar/gkf669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Sinha S, Tompa M. YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 2003;31:3586–3588. doi: 10.1093/nar/gkg618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Elemento O, Tavazoie S. Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach. Genome Biol. 2005;6:R18. doi: 10.1186/gb-2005-6-2-r18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Sengupta AM, Djordjevic M, Shraiman BI. Specificity and robustness in transcription control networks. Proc Natl Acad Sci U S A. 2002;99:2072–2077. doi: 10.1073/pnas.022388499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Gerland U, Hwa T. On the selection and evolution of regulatory DNA motifs. J Mol Evol. 2002;55:386–400. doi: 10.1007/s00239-002-2335-z. [DOI] [PubMed] [Google Scholar]
  • 65.Stormo GD, Fields DS. Specificity, free energy and information content in protein-DNA interactions. Trends Biochem Sci. 1998;23:109–113. doi: 10.1016/s0968-0004(98)01187-6. [DOI] [PubMed] [Google Scholar]
  • 66.Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18:6097–6100. doi: 10.1093/nar/18.20.6097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Poje G, Redfield RJ. Transformation of Haemophilus influenzae. Methods Mol Med. 2003;71:57–70. doi: 10.1385/1-59259-321-6:57. [DOI] [PubMed] [Google Scholar]
  • 69.Bianco PR, Kowalczykowski SC. The recombination hotspot Chi is recognized by the translocating RecBCD enzyme as the single strand of DNA containing the sequence 5′-GCTGGTGG-3′. Proc Natl Acad Sci U S A. 1997;94:6706–6711. doi: 10.1073/pnas.94.13.6706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Robinson NJ, Robinson PJ, Gupta A, Bleasby AJ, Whitton BA, et al. Singular over-representation of an octameric palindrome, HIP1, in DNA from many cyanobacteria. Nucleic Acids Res. 1995;23:729–735. doi: 10.1093/nar/23.5.729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Scharnagl M, Richter S, Hagemann M. The cyanobacterium Synechocystis sp. strain PCC 6803 expresses a DNA methyltransferase specific for the recognition sequence of the restriction endonuclease PvuI. J Bacteriol. 1998;180:4116–4122. doi: 10.1128/jb.180.16.4116-4122.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Stumph WE, Kristo P, Tsai MJ, O'Malley BW. A chicken middle-repetitive DNA sequence which shares homology with mammalian ubiquitous repeats. Nucleic Acids Res. 1981;9:5383–5397. doi: 10.1093/nar/9.20.5383. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1

Goodness of fit Chi-squared test (χ 2) on the observed and expected numbers of DUESs, control sequences, as well as all their mismatched forms.

(0.12 MB DOC)


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES