Skip to main content
. 2006 Jan 10;34(1):104–119. doi: 10.1093/nar/gkj414

Table 3.

Statistically significant higher-order motifs in the selected DNA pools

Positiona E4 kinetic selection (40 sequences) E4 thermodynamic selection (47 sequences) MLP kinetic selection (42 sequences)
Conb Statistically significant dinucleotidec Zd Statistically significant trinucleotidec Zd Statistically significant tetranucleotidec Zd Conb Statistically significant dinucleotidec Zd Statistically significant trinucleotidec Zd Most frequent tetranucleotidec Zd Conb Statistically significant dinucleotidec Zd Statistically significant trinucleotidec Zd Statistically significant tetranucleotidec Zd
1 n n n
2 n n n
3 n n n
CA 6 2.2
GT 7 2.8
4 n n n CAT 5 5.4
GTT 5 5.4
AT 9 4.1 CATT 3 7.0
GTTT 3 7.0
5 n AGG 4 4.3 n k ATT 6 6.6
GG 7 2.9 TT 12 6.0 TTTG 3 7.0
6 g n t TTG 7 7.9
TC 9 3.7 GTCC 3 6.6 TG 11 5.3 TTGG 4 9.5
TTCG 3 7.0
7 g s n TGG 7 7.9
CCCG 3 7.2 CC 8 3.1 TCCG 3 6.6 GG 9 4.1 TGGG 4 9.5
TGGC 3 7.0
8 n CCG 4 4.3 n CCG 7 7.4 n GGG 4 4.2
GCG 4 4.3 GGC 4 4.2
CG 11 5.6 GGTG 3 7.2 CG 10 4.3 CCGC 3 6.6
CCGT 3 6.6
9 G CGC 5 5.6 G CGT 5 5.0 s
CGC 4 3.8
GC 13 6.9 CGCT 5 5.6 GT 9 3.7 CGCT 4 3.8 GG 7 2.8
CGTT 5 5.0
10 C GCT 13 6.9 n GTT 9 3.7 n GGT 7 2.8
CT 20 3.7 GCTA 13 6.9 TT 16 1.4 GT 15 1.6
11 T T T
18 A A G
AC 17 2.6 AC 19 2.4 GT 17 2.3
19 c ACA 8 3.6 c AGG 7 2.4 k GTT 7 2.8
ACG 7 2.9 GGT 6 2.2
CA 8 3.6 ACGC 7 8.1 GG 7 2.4 TT 7 2.8
CG 7 3.0 ACAC 4 4.3
AGTG 4 4.3
20 g CGC 7 8.1 g n
CAC 4 4.3
GTG 4 4.3
GC 10 4.9 CGCG 3 7.2 7.2 GT 9 3.7 CCCT 3 6.6
TG 8 3.8 CGCC 3
21 s GCG 5 5.6 n GTT 6 6.2 n
GCT 4 3.8
CCT 4 3.8
CG 8 3.6 GCGG 3 7.2 CT 11 4.9 GTTG 4 8.9
GG 7 3.0 TT 9 3.7
GC 7 3.0 GG 7 2.4
22 g GCG 4 4.3 T TTG 6 6.2 r TAG 5 5.4
GGG 4 4.3 GGC 5 5.0
CGG 4 4.3 CTG 5 5.0
GG 10 4.9 CGGC 3 7.2 TG 15 7.3 TTGG 4 8.9 GG 7 2.8
GGGG 3 7.2 GGCA 3 6.6
23 G GGG 6 6.9 s TGG 6 6.2 s
GCG 4 4.3 TGA 5 5.0
GGC 4 4.3 GGA 4 3.8
GG 13 6.9 GG 8 3.1 TCCC 4 8.9
GC 7 3.0 GA 8 3.1 TACC 3 6.6
TGGC 3 6.6
24 G GGC 6 6.9 n CCC 4 4.3 k CTG 4 4.2
GGC 4 4.3
GC 8 3.6 CC 9 3.6 CGGC 3 6.6 CTGA 3 7.0
AC 7 2.4
25 n GGG 4 4.3 c n TGA 4 4.2
TGG 4 4.3 GCT 4 4.2
GG 9 4.2 TGGG 3 7.2 CG 7 2.4 GGCG 3 6.6 GG 7 2.8 GCTC 3 7.0
26 g GGG 6 6.9 n CCC 5 5.0 n
CGG 4 4.3 CAC 5 5.0
CGC 4 3.8
GG 12 6.2 GGGG 4 9.9 CC 8 3.1 CACG 3 6.6 GT 7 2.8
27 g GGG 7 8.3 b ACC 4 3.8 k
GG 7 3.0 CG 8 3.1 GC 7 2.8
28 n n n

aPosition in the sequence.

bMononucleotide-based consensus sequence. For details on the lettering see Table 2.

cDNA tracts (2, 3 or 4 bp long), which are statistically significant at each position. Dinucleotides are positioned between the two bases that constitute it, trinucleotides on the central base, and tetranucleotides between the second and third base. Here lettering is not an indication on the frequency of occurrence. The subscript numbers are the number of occurrences of each motif.

dZ-statistics or the deviation of the observed frequency of DNA tracts from that expected based on its mononucleotide composition. It is calculated by subtracting from the observed number of occurrences of the most frequent motif, the expected number of occurrences based on the mononucleotide frequency of the respective base pairs, and then dividing this value by the expected standard deviation (25). Statistically significant motifs are those that appear with frequency higher than that observed in a completely random sequence set, of similar size, in which there is an equal representation of each nucleotide in each position.