Table 2.
Sequence | mononucleotide consensus | Total Rseq (bits) | Total number of sequencesb | Sequence used in analysisc | |||
---|---|---|---|---|---|---|---|
5′ Side | TATA box | 3′ side | total | ||||
1 3 5 7 9 11 13 15 17 19 21 23 25 27 | |||||||
Consensus of selected sequences based on mononucleotide frequenciesa | |||||||
MLP therm. | nknngnnnknTATAAAAGbnnnnnnndn | 0.1 (1) | 15.6 (1) | 0.2 (1) | 15.9 (3) | 47 | 42 |
MLP kinetic | nnnnktnnsnTATAAAAGktnrgkgnkn | 0.5 (1) | 15.6 (1) | 0.5 (1) | 16.6 (2) | 45 | 42 |
E4 therm. | nnnnnnsnGnTATATATAngnTGnCnbn | 0.3 (1) | 15.6 (1) | 1.4 (1) | 17.3 (2) | 51 | 47 |
E4 kinetic | nnnnnggnGCTATATATAcgsgGGnggn | 0.6 (1) | 15.6 (1) | 1.9 (1) | 18.1 (3) | 42 | 40 |
Consensus of selected sequences based on the mononucleotide frequencies observed in higher-order motifsd | |||||||
MLP therm. | nnnnnnnnkkTATAAAAGgttnnnnnnn | ||||||
MLP kinetic | nnswTTgGsrTATAAAAGktbrskgdbn | ||||||
E4 therm. | nnnnntCCGyTATATATAssytsvsvcn | ||||||
E4 kinetic | nnnnnsscgcTATATATAcgsgggsGGG | ||||||
Consensus of natural promotersa,e | |||||||
MLP eukaryote | vnnnggwggSTATAAAAGcvGvngbrcg | 0.85 (3) | 15.90 (3) | 0.75 (3) | 17.50 (5) | 185 | 176 |
MLP human | rnnngGnGnSTATAAAAGcvGnngGnsg | 1.4 (2) | 15.5 (1) | 1.1 (2) | 18.0 (3) | 42 | 38 |
E4 eukaryote | wynwnawcncTATATATASngngnnnnn | 0.3 (1) | 15.7 (1) | 0.4 (1) | 16.3 (2) | 70 | 59 |
aTATA boxes are underlined. Boldface letters in the flanking sequences indicate that the reduction in uncertainty for that position (Rseq) is larger than 1 SD from that expected for a sample of that size. Uppercase letters indicate that the frequency of that nucleotide is >50%. An ambiguous code is used whenever there are several nucleotides that are within 1 SD of the most frequent one, and is denoted by a uppercase letter when at least one nucleotide frequency is >50%. K = G or T; S = C or G; W = A or T; B = C, G or T; D = A, G or T; V = A, C or G.
bTotal number of sequences in the non-redundant data.
cUnequivocal TATA-box sequences only. Sequences were deleted if they contained additional and alternative TATA boxes in the flanking sequences.
dBased only on higher-order motifs (2, 3 or 4 bp long) that are statistically significant in the selected sequences (see Table 3 for details). Ambiguous codes are given as discussed in Footnote a. Uppercase letters indicate that this nucleotide is the only one observed in this position, in all three higher-order levels. Italicized letters indicate that the frequency of this base is >50% in all three levels.
eSequences were retrieved from the Eukaryotic Promoter Database [release 82 (35)].