Skip to main content
Journal of Zhejiang University. Science. B logoLink to Journal of Zhejiang University. Science. B
. 2006 Nov 17;7(12):969–973. doi: 10.1631/jzus.2006.B0969

Regularities in the E. coli promoters composition in connection with the DNA strands interaction and promoter activity

Andrey Yu Berezhnoy 1, Yuriy G Shckorbatov 2,, Kiryu Hisanori 3
PMCID: PMC1661676  PMID: 17111465

Abstract

The energy of interaction between DNA strands in promoters is of great functional importance. Visualization of the energy of DNA strands distribution in promoter sequences was achieved. The separation of promoters in groups by their energetic properties enables evaluation of the dependence of promoter strength on the energetic properties. The analysis of groups (clusters) of promoters distributed by the energy of DNA strands interaction in −55, −35, −10 and +6 sequences indicates their connection with the transcriptional activity.

Keywords: DNA sequence, Promoter strength, DNA chains interaction energy, DNA sequences classification

INTRODUCTION

It is well known that the physical properties, namely DNA strands interaction in promoter are connected with their activity. As it was established the two promoter regions −35 and −10 are the most important for the promoter functioning, all bases in the −35 (TTGACA) and −10 (TATAAT) hexamers were highly conserved (Harley and Reynolds, 1987; Hawley and McClure, 1983). The functional importance of these sequences was proved in many investigations (Record et al., 1996). The open complex of RNAP and DNA includes the melted DNA in the region from −10 to +3 regardless of presence or absence of −35 element (Dombroski, 1997). RNA polymerase holoenzyme has been observed to bind single-stranded oligodeoxynucleotides bearing the −10 sequence of the nontemplate strand (Roberts and Roberts, 1996). These sites interact with different RNAP sigma-factor regions (Huang et al., 1997; Roberts and Roberts, 1996). If temperature affects the ability of the E. coli RNAP to transcribe the −10 sequence of the nifA1mut3 promoter, then the melting of TATGGT may be an important element. Studies with Escherichia coli (Eco) RNAP indicated that upon RNAP binding to promoter DNA, an unstable closed complex (RPc) is formed, followed by at least two additional intermediates (I1 and I2) before the initiation-competent, stable open complex (RPo) is formed (Craig et al., 1998; Saecker et al., 2002), in which a 14-bp region of the promoter DNA, including the start site of transcription, has been melted.

R+P↔RPc↔I1↔I2↔RPo

In addition to the promoter DNA, the RNAP is thought to undergo conformational changes as well (Craig et al., 1998; Saecker et al., 2002).

In our previous work the graphic method for presentation of the energetic properties of promoter sequences was elaborated. Four promoter groups with their energetic properties were revealed. The present work is dedicated to more detailed analysis of different sequences in promoter using the promoter sites distribution by their similarity.

MATERIALS AND METHODS

DNA sequences

We obtained 106 Escherichia coli promoter sequences using σ70 subunit from the Regulon database (©2004, CIFN/UNAM all Rights Reserved. RegulonDB DataBase V 4.0, 02-FEB-05) thanks to courtesy of Regulon database administration (Table 1). All promoter sequences were transcribed with the aid of σ70. Promoter strength (the promoter ability to initiate transcription) was measured with the help of fluorescent labelling method in microarray experiments on the total transcripts of E. coli. Promoter strength was determined in arbitrary units reflecting the fluorescence intensity (Kanehisa et al., 2004; Mori et al., 2000). Data for promoter strength was obtained from KEGG EXPRESSION database (http://www.genome.jp/kegg/expression/) containing microarray data obtained by the Japanese research community. Orientation of promoter sequence in genome was determined as forward or reverse depending on the gene position in the genome. As far as we know forward and reverse orientation is not connected with gene functioning. The number of sequences analyzed was 106.

Table 1.

Promoters list

Promoter number Promoter name
1 accA
2 accB
3 accD
4 adk
5 alaS
6 ampC
7 arsR
8 asnBp
9 aspC
10 astCp1
11 atpI
12 bcp
13 btuB
14 cdh
15 cedAp
16 cfap1
17 clpAp1
18 cmk
19 corA
20 creA
21 cusCp
22 cusRp
23 cutA
24 cysE
25 dapA
26 dapB
27 dapD
28 dcuDp
29 dinGp
30 Div
31 dppA
32 drpA
33 Efpp
34 fimBp1
35 folA
36 Frrp
37 ftsJp1
38 fxsAp
39 galRp
40 gcvR
41 glnS
42 Gnd
43 hemHp
44 hemN
45 hepAp
46 hisA
47 hisB
48 Hiss
49 hscB
50 katE
51 lacI
52 Lep
53 Lpp
54 lysP
55 manA
56 menAp
57 metB
58 metYp2
59 mhpR
60 mraZp
61 nanAp
62 narUp1
63 nohAp
64 otsB
65 paaXp
66 pcnBp
67 Pdx
68 pfkB
69 Pgi
70 Phe
71 pheS
72 pntA
73 ppc
74 pthp
75 ptr
76 purA
77 putA
78 pyrEp2
79 pyrF
80 relA
81 rep
82 rho
83 rimL
84 rnh
85 rplJ
86 rplK
87 rplT
88 rpoB
89 rpoN
90 rpsJ
91 rsdp2
92 sbcB
93 serB
94 smp
95 spc
96 str
97 sufAp
98 tesA
99 thrA
100 tufB
101 ung
102 upp
103 xseBp
104 ybjCp
105 yeiKp
106 yhcA

Computer analysis

We suggest the notion of DNA-sequence energy that is determined as a sum of energy of interaction of each nucleotide pair in promoter divided by nucleotide number. The energy of interaction of complementary nucleotides pair was determined (Kudritskaya and Danilov, 1976). The energy of AT-pair was determined as −29.33 kJ/mol and the energy of GC-pair was determined as −70.35 kJ/mol. We determined the mean energy of interaction in complementary nucleotides by means of “sliding-window” method. The mean energy of interaction was attributed to the nucleotide pair in the center of the 10 nucleotides window which shifted from left to right with the Step 1 nucleotide.

The process of division on clusters consists of the following stages:

1. We obtained the mean sequence of all promoter sequences sets. The procedure includes determination of the most frequently occurring nucleotide on each position of the sequence by simply comparing nucleotides in all of the analyzed sequences. Such a resulting sequence we call “mean sequence”.

2. We determined the quantity of nucleotides differences of each analyzed sequence with the “mean sequence”.

3. We associated the quantity of differences of each sequence and put all the analyzed sequences in order according to the quantity of their differences from the “mean sequence”.

4. We found the difference between the maximal and minimal number of differences from the mean sequence.

5. We divided the number obtained by 9, which is the clusters size.

6. We consequently assigned to each cluster all the sequences whose differences are within the limits of cluster. If the quantity of elements in every cluster was less than 3 we repeated the division on clusters once more with the number of clusters decreased by one. The resulting number of clusters for different sequences sets was different and equaled 5 to 4.

Computer programs for obtaining energy distribution according to nucleotide position and for arrangement of sequences into groups were elaborated on by Berezhnoy and Shckorbatov (2005).

RESULTS

In our previous work (Berezhnoy and Shckorbatov, 2005), the nucleotide interaction’s energy distribution as analyzed by our promoters looks as shown in Fig.1 indicating that the energy minima are connected with the four areas (−55, −35, −10 and +6). The sequences located at −35, −10 and +6 positions are often analyzed (Lewin, 2004). We propose additionally to analyze −55 sequence (10 nucleotides). We analyzed in the current article the nucleotide interaction energy distribution in all of these areas and in the whole promoter. We determined the mean (consensus) distribution of the nucleotide interaction energy via promoter. With cluster analysis using method for determining the Euclidian distances, we sorted all 106 promoters on four clusters. The results obtained are presented in Table 2. All the clusters significantly differ in the energy distribution through the promoter length. We determined the energy distribution throughout the promoters, −35 sequences, −10 sequences and +6 sequences. We determined five groups in these sequences based on the similarity of energy distribution in the above sequences. The dependences of mean strength of promoters in each cluster on the mean energy of the same cluster are presented in the Fig.2. The points in these figures represent the clusters of corresponding promoter sequences. Our data indicate the absence of linear dependence between these parameters. The functions describing these dependences are presented in Table 3. As one can see from the Fig.2, the dependences of mean strength in clusters on the energy of DNA chains interaction in the whole promoter, −35 sequence and +6 sequence are very similar and have parabolic shape. It means that the strongest promoters have mean energy values and that the most energetic promoters have minimal strength. Dependence of the mean clusters energy on the energy of DNA chain interaction in −55 sequence has relatively more complex character revealing two maxima of promoter strength. The promoters arranged by the energy of −10 sequence have exponential dependence of their strength on the energies of −10 sequence.

Fig. 1.

Fig. 1

Dependence of mean energy of interaction between complementary DNA nucleotides on the mean window center position in promoter sequences

Table 2.

Cluster’s composition in the whole promoter and in its parts (−55, −35, −10 and +6 sequences)

Promoter site Cluster’s number Numbers of promoters in cluster
Whole promoter 1 2, 10, 16, 17, 20, 28, 40, 41, 42, 53, 59, 61, 72, 76, 83, 99, 104
2 7, 8, 9, 11, 12, 21, 26, 29, 32, 34, 36, 37, 38, 43, 46, 50, 55, 56, 57, 62, 66, 68, 69, 71, 75, 77, 80, 84, 86, 87, 91, 93, 100, 101, 102, 105
3 3, 4, 5, 13, 14, 22, 24, 25, 27, 30, 31, 33, 35, 39, 44, 47, 52, 54, 58, 67, 70, 73, 74, 85, 88, 89, 90, 92, 94, 95, 96, 97, 103
4 1, 6, 15, 18, 19, 45, 48, 51, 60, 63, 64, 65, 78, 79, 81, 82, 106
5 23, 49, 98
−55 sequence 1 8, 11, 16, 22, 25, 26, 28, 34, 41, 43, 46, 51, 59, 61, 62, 69, 72, 91, 93, 100, 101, 102, 104, 105, 106
2 12, 13, 17, 18, 21, 29, 30, 33, 36, 38, 42, 50, 57, 58, 67, 68, 71, 73, 74, 75, 77
3 2, 4, 6, 7, 15, 19, 20, 27, 31, 35, 40, 44, 45, 49, 52, 53, 54, 55, 56, 64, 76, 78, 80, 83, 84, 86, 94, 95, 96, 97, 98, 103
4 3, 5, 9, 24, 32, 37, 39, 48, 60, 63, 66, 70, 81, 82, 85, 88, 89, 90, 92
5 1, 10, 14, 23, 47, 65, 79, 87, 99
−35 sequence 1 2, 3, 8, 10, 11, 14, 16, 28, 32, 36, 37, 38, 40, 42, 44, 47, 59, 61, 62, 78, 79, 80, 88, 92, 101, 105
2 1, 5, 6, 7, 9, 12, 18, 20, 21, 23, 25, 27, 33, 39, 41, 51, 53, 55, 56, 58, 60, 65, 66, 69, 70, 71, 72, 74, 76, 77, 84, 85, 87, 94, 96, 97, 99, 100, 103, 104
3 4, 13, 15, 17, 19, 22, 24, 29, 30, 31, 34, 35, 43, 45, 46, 48, 49, 50, 52, 54, 57, 63, 64, 67, 68, 73, 75, 81, 82, 83, 86, 89, 90, 91, 93, 95, 98, 102, 106
4 26
−10 sequence 1 10, 11, 18, 33, 34, 36, 37, 55, 56, 58, 59, 61, 85, 90, 95, 96, 98, 102
2 1, 3, 5, 6, 15, 16, 17, 19, 20, 22, 23, 27, 28, 32, 35, 38, 42, 43, 44, 46, 47, 48, 49, 50, 52, 53, 54, 60, 63, 64, 65, 66, 67, 68, 69, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 86, 87, 88, 92, 94, 97, 100
3 2, 4, 12, 13, 21, 25, 26, 29, 30, 31, 39, 40, 41, 45, 51, 57, 70, 77, 84, 91, 99, 101, 103, 104, 105, 106
4 7, 8, 9, 14, 24, 62, 89, 93
+6 sequence 1 1, 4, 11, 12, 16, 21, 30, 32, 34, 35, 50, 54, 55, 56, 61, 66, 70, 86, 91, 96, 102, 103
2 2, 5, 8, 10, 18, 19, 26, 27, 33, 37, 41, 42, 43, 45, 46, 52, 57, 58, 59, 60, 63, 65, 67, 68, 69, 73, 74, 77, 82, 83, 84, 85, 89, 95, 104
3 3, 6, 9, 14, 20, 22, 25, 28, 38, 39, 47, 51, 53, 64, 71, 72, 76, 79, 88, 90, 92, 93, 94, 97, 106
4 7, 13, 17, 23, 24, 36, 48, 75, 78, 80, 81, 87, 98, 99, 100, 101
5 15, 29, 31, 40, 44, 49, 62, 105

Fig. 2.

Fig. 2

Fig. 2

Fig. 2

Fig. 2

Fig. 2

Dependence of mean promoter strength in clusters on the energy of DNA chains interaction in (a) the whole promoter; (b) −55 sequence; (c) −35 sequence; (d) −10 sequence; (e) +6 sequence

Table 3.

Characteristics of promoter strength regression on the energy of DNA chains interaction in different promoter sites

Promoter site Promoter strength regression dependence on the promoter site energy Regression quotient
Whole promoter y=−152.5+0.343x−1.89×10−4x2 0.876
−55 sequence y=−128−5.82x+8.76×10−2x2−4.27×10−4x3 0.991
−35 sequence y=−13.4+0.525x−4.11×10−3x2 0.942
−10 sequence y=31−0.866x+6.26×10−3x2 1.000
+6 sequence y=−2.53+0.198x−1.68×10−3x2 0.813

CONCLUSIONS

1. We improved the original methods of computer analysis (Berezhnoy and Shckorbatov, 2005) enabling visualization of the energy distribution in promoter sequences and separation of the promoters in groups by their energetic properties, thus enabling us to judge the dependence of promoter strength on their energetic properties.

2. We revealed that −55 sequence composition is connected with promoter transcriptional activity.

3. The analysis of groups (clusters) of promoters distributed by the energy of DNA strands interaction in −55, −35, −10 and +6 sequences indicates their connection with the transcriptional activity.

4. The dependence of mean promoter strength in clusters has at least one maximum in the zone of mean energies of clusters in the cases of −55, −35 and +6 sequences but in the case of −10 sequence the dependence is monotonous.

References

  • 1.Berezhnoy AY, Shckorbatov YG. Dependence of E. coli promoter strength and physical parameters upon the nucleotide sequence. J Zhejiang University Sci B. 2005;6(11):1063–1068. doi: 10.1631/jzus.2005.B1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Craig ML, Tsodikov OV, McQuade KL, Schlax PEJr, Capp MW, Saecker RM, Record MTJr. DNA footprints of the two kinetically significant intermediates in formation of an RNA polymerase-promoter open complex: evidence that interactions with start site and downstream DNA induce sequential conformational changes in polymerase and DNA. J Mol Biol. 1998;283(4):741–756. doi: 10.1006/jmbi.1998.2129. [DOI] [PubMed] [Google Scholar]
  • 3.Dombroski AJ. Recognition of the −10 promoter sequence by a partial polypeptide of σ70 in vitro. J Biol Chem. 1997;272:3487–3494. [PubMed] [Google Scholar]
  • 4.Harley CB, Reynolds RP. Analysis of E. coli promoter sequences. Nucleic Acids Res. 1987;15:2343–2361. doi: 10.1093/nar/15.5.2343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hawley DK, McClure WR. Compilation and analysis of Escherichia coli promoter sequences. Nucleic Acids Res. 1983;11:2237–2255. doi: 10.1093/nar/11.8.2237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Huang X, Lopez de Saro FJ, Helmann JD. Sigma factor mutations affecting the sequence-selective interaction of RNA polymerase with −10 region single-stranded DNA. Nucleic Acids Res. 1997;25(13):2603–2609. doi: 10.1093/nar/25.13.2603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kanehisa M, Goto S, Kawashima S, Kuno Y, Hattori M. The KEGG resources for deciphering the genome. Nucleic Acids Res. 2004;32(90001):D277–D280. doi: 10.1093/nar/gkh063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kudritskaya ZG, Danilov VI. Quantum mechanical study of bases interactions in various associates in atomic dipole approximation. J Theor Biol. 1976;59(2):303–318. doi: 10.1016/0022-5193(76)90172-7. [DOI] [PubMed] [Google Scholar]
  • 9.Lewin B. Genes-VIII. New York: Pearson Prenice Hall; 2004. [Google Scholar]
  • 10.Mori H, Isono K, Horiuchi T, Miki T. Functional genomics of Escherichia coli in Japan. Res Microbiol. 2000;151(2):121–128. doi: 10.1016/S0923-2508(00)00119-4. [DOI] [PubMed] [Google Scholar]
  • 11.Record MTJr, Reznikoff WS, Craig ML, et al. Escherichia coli RNA Polymerase (Eσ70), Promoters, and the Kinetics of the Steps of Transcription Initiation. In: Neidhardt FC, Curtiss III R, Ingraham JL, et al., editors. Escherichia coli and Salmonella: Cellular and Molecular Biology. 2nd Ed. Washington DC: ASM Press; 1996. pp. 792–820. [Google Scholar]
  • 12.Roberts CW, Roberts JW. Base-specific recognition of the nontemplate strand of promoter DNA by E. coli RNA polymerase. Cell. 1996;86(3):495–501. doi: 10.1016/S0092-8674(00)80122-1. [DOI] [PubMed] [Google Scholar]
  • 13.Saecker RM, Tsodikov OV, McQuade KL, Schlax PEJr, Capp MW, Record MTJr. Kinetic studies and structural models of the association of E. coli sigma (70) RNA polymerase with the lambdaP (R) promoter: large scale conformational changes in forming the kinetically significant intermediates. J Mol Biol. 2002;319(3):649–671. doi: 10.1016/S0022-2836(02)00293-0. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Zhejiang University. Science. B are provided here courtesy of Zhejiang University Press

RESOURCES