Skip to main content
. 2006 Jan 10;34(1):104–119. doi: 10.1093/nar/gkj414

Table 2.

Consensus sequences and information content of sequences studied here

Sequence mononucleotide consensus Total Rseq (bits) Total number of sequencesb Sequence used in analysisc
5′ Side TATA box 3′ side total
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Consensus of selected sequences based on mononucleotide frequenciesa
    MLP therm. nknngnnnknTATAAAAGbnnnnnnndn 0.1 (1) 15.6 (1) 0.2 (1) 15.9 (3) 47 42
    MLP kinetic nnnnktnnsnTATAAAAGktnrgkgnkn 0.5 (1) 15.6 (1) 0.5 (1) 16.6 (2) 45 42
    E4 therm. nnnnnnsnGnTATATATAngnTGnCnbn 0.3 (1) 15.6 (1) 1.4 (1) 17.3 (2) 51 47
    E4 kinetic nnnnnggnGCTATATATAcgsgGGnggn 0.6 (1) 15.6 (1) 1.9 (1) 18.1 (3) 42 40
Consensus of selected sequences based on the mononucleotide frequencies observed in higher-order motifsd
    MLP therm. nnnnnnnnkkTATAAAAGgttnnnnnnn
    MLP kinetic nnswTTgGsrTATAAAAGktbrskgdbn
    E4 therm. nnnnntCCGyTATATATAssytsvsvcn
    E4 kinetic nnnnnsscgcTATATATAcgsgggsGGG
Consensus of natural promotersa,e
    MLP eukaryote vnnnggwggSTATAAAAGcvGvngbrcg 0.85 (3) 15.90 (3) 0.75 (3) 17.50 (5) 185 176
    MLP human rnnngGnGnSTATAAAAGcvGnngGnsg 1.4 (2) 15.5 (1) 1.1 (2) 18.0 (3) 42 38
    E4 eukaryote wynwnawcncTATATATASngngnnnnn 0.3 (1) 15.7 (1) 0.4 (1) 16.3 (2) 70 59

aTATA boxes are underlined. Boldface letters in the flanking sequences indicate that the reduction in uncertainty for that position (Rseq) is larger than 1 SD from that expected for a sample of that size. Uppercase letters indicate that the frequency of that nucleotide is >50%. An ambiguous code is used whenever there are several nucleotides that are within 1 SD of the most frequent one, and is denoted by a uppercase letter when at least one nucleotide frequency is >50%. K = G or T; S = C or G; W = A or T; B = C, G or T; D = A, G or T; V = A, C or G.

bTotal number of sequences in the non-redundant data.

cUnequivocal TATA-box sequences only. Sequences were deleted if they contained additional and alternative TATA boxes in the flanking sequences.

dBased only on higher-order motifs (2, 3 or 4 bp long) that are statistically significant in the selected sequences (see Table 3 for details). Ambiguous codes are given as discussed in Footnote a. Uppercase letters indicate that this nucleotide is the only one observed in this position, in all three higher-order levels. Italicized letters indicate that the frequency of this base is >50% in all three levels.

eSequences were retrieved from the Eukaryotic Promoter Database [release 82 (35)].