Table 3.
Statistically significant higher-order motifs in the selected DNA pools
Positiona | E4 kinetic selection (40 sequences) | E4 thermodynamic selection (47 sequences) | MLP kinetic selection (42 sequences) | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Conb | Statistically significant dinucleotidec | Zd | Statistically significant trinucleotidec | Zd | Statistically significant tetranucleotidec | Zd | Conb | Statistically significant dinucleotidec | Zd | Statistically significant trinucleotidec | Zd | Most frequent tetranucleotidec | Zd | Conb | Statistically significant dinucleotidec | Zd | Statistically significant trinucleotidec | Zd | Statistically significant tetranucleotidec | Zd | |
1 | n | n | n | ||||||||||||||||||
2 | n | n | n | ||||||||||||||||||
3 | n | n | n | ||||||||||||||||||
CA 6 | 2.2 | ||||||||||||||||||||
GT 7 | 2.8 | ||||||||||||||||||||
4 | n | n | n | CAT 5 | 5.4 | ||||||||||||||||
GTT 5 | 5.4 | ||||||||||||||||||||
AT 9 | 4.1 | CATT 3 | 7.0 | ||||||||||||||||||
GTTT 3 | 7.0 | ||||||||||||||||||||
5 | n | AGG 4 | 4.3 | n | k | ATT 6 | 6.6 | ||||||||||||||
GG 7 | 2.9 | TT 12 | 6.0 | TTTG 3 | 7.0 | ||||||||||||||||
6 | g | n | t | TTG 7 | 7.9 | ||||||||||||||||
TC 9 | 3.7 | GTCC 3 | 6.6 | TG 11 | 5.3 | TTGG 4 | 9.5 | ||||||||||||||
TTCG 3 | 7.0 | ||||||||||||||||||||
7 | g | s | n | TGG 7 | 7.9 | ||||||||||||||||
CCCG 3 | 7.2 | CC 8 | 3.1 | TCCG 3 | 6.6 | GG 9 | 4.1 | TGGG 4 | 9.5 | ||||||||||||
TGGC 3 | 7.0 | ||||||||||||||||||||
8 | n | CCG 4 | 4.3 | n | CCG 7 | 7.4 | n | GGG 4 | 4.2 | ||||||||||||
GCG 4 | 4.3 | GGC 4 | 4.2 | ||||||||||||||||||
CG 11 | 5.6 | GGTG 3 | 7.2 | CG 10 | 4.3 | CCGC 3 | 6.6 | ||||||||||||||
CCGT 3 | 6.6 | ||||||||||||||||||||
9 | G | CGC 5 | 5.6 | G | CGT 5 | 5.0 | s | ||||||||||||||
CGC 4 | 3.8 | ||||||||||||||||||||
GC 13 | 6.9 | CGCT 5 | 5.6 | GT 9 | 3.7 | CGCT 4 | 3.8 | GG 7 | 2.8 | ||||||||||||
CGTT 5 | 5.0 | ||||||||||||||||||||
10 | C | GCT 13 | 6.9 | n | GTT 9 | 3.7 | n | GGT 7 | 2.8 | ||||||||||||
CT 20 | 3.7 | GCTA 13 | 6.9 | TT 16 | 1.4 | GT 15 | 1.6 | ||||||||||||||
11 | T | T | T | ||||||||||||||||||
18 | A | A | G | ||||||||||||||||||
AC 17 | 2.6 | AC 19 | 2.4 | GT 17 | 2.3 | ||||||||||||||||
19 | c | ACA 8 | 3.6 | c | AGG 7 | 2.4 | k | GTT 7 | 2.8 | ||||||||||||
ACG 7 | 2.9 | GGT 6 | 2.2 | ||||||||||||||||||
CA 8 | 3.6 | ACGC 7 | 8.1 | GG 7 | 2.4 | TT 7 | 2.8 | ||||||||||||||
CG 7 | 3.0 | ACAC 4 | 4.3 | ||||||||||||||||||
AGTG 4 | 4.3 | ||||||||||||||||||||
20 | g | CGC 7 | 8.1 | g | n | ||||||||||||||||
CAC 4 | 4.3 | ||||||||||||||||||||
GTG 4 | 4.3 | ||||||||||||||||||||
GC 10 | 4.9 | CGCG 3 | 7.2 7.2 | GT 9 | 3.7 | CCCT 3 | 6.6 | ||||||||||||||
TG 8 | 3.8 | CGCC 3 | |||||||||||||||||||
21 | s | GCG 5 | 5.6 | n | GTT 6 | 6.2 | n | ||||||||||||||
GCT 4 | 3.8 | ||||||||||||||||||||
CCT 4 | 3.8 | ||||||||||||||||||||
CG 8 | 3.6 | GCGG 3 | 7.2 | CT 11 | 4.9 | GTTG 4 | 8.9 | ||||||||||||||
GG 7 | 3.0 | TT 9 | 3.7 | ||||||||||||||||||
GC 7 | 3.0 | GG 7 | 2.4 | ||||||||||||||||||
22 | g | GCG 4 | 4.3 | T | TTG 6 | 6.2 | r | TAG 5 | 5.4 | ||||||||||||
GGG 4 | 4.3 | GGC 5 | 5.0 | ||||||||||||||||||
CGG 4 | 4.3 | CTG 5 | 5.0 | ||||||||||||||||||
GG 10 | 4.9 | CGGC 3 | 7.2 | TG 15 | 7.3 | TTGG 4 | 8.9 | GG 7 | 2.8 | ||||||||||||
GGGG 3 | 7.2 | GGCA 3 | 6.6 | ||||||||||||||||||
23 | G | GGG 6 | 6.9 | s | TGG 6 | 6.2 | s | ||||||||||||||
GCG 4 | 4.3 | TGA 5 | 5.0 | ||||||||||||||||||
GGC 4 | 4.3 | GGA 4 | 3.8 | ||||||||||||||||||
GG 13 | 6.9 | GG 8 | 3.1 | TCCC 4 | 8.9 | ||||||||||||||||
GC 7 | 3.0 | GA 8 | 3.1 | TACC 3 | 6.6 | ||||||||||||||||
TGGC 3 | 6.6 | ||||||||||||||||||||
24 | G | GGC 6 | 6.9 | n | CCC 4 | 4.3 | k | CTG 4 | 4.2 | ||||||||||||
GGC 4 | 4.3 | ||||||||||||||||||||
GC 8 | 3.6 | CC 9 | 3.6 | CGGC 3 | 6.6 | CTGA 3 | 7.0 | ||||||||||||||
AC 7 | 2.4 | ||||||||||||||||||||
25 | n | GGG 4 | 4.3 | c | n | TGA 4 | 4.2 | ||||||||||||||
TGG 4 | 4.3 | GCT 4 | 4.2 | ||||||||||||||||||
GG 9 | 4.2 | TGGG 3 | 7.2 | CG 7 | 2.4 | GGCG 3 | 6.6 | GG 7 | 2.8 | GCTC 3 | 7.0 | ||||||||||
26 | g | GGG 6 | 6.9 | n | CCC 5 | 5.0 | n | ||||||||||||||
CGG 4 | 4.3 | CAC 5 | 5.0 | ||||||||||||||||||
CGC 4 | 3.8 | ||||||||||||||||||||
GG 12 | 6.2 | GGGG 4 | 9.9 | CC 8 | 3.1 | CACG 3 | 6.6 | GT 7 | 2.8 | ||||||||||||
27 | g | GGG 7 | 8.3 | b | ACC 4 | 3.8 | k | ||||||||||||||
GG 7 | 3.0 | CG 8 | 3.1 | GC 7 | 2.8 | ||||||||||||||||
28 | n | n | n |
aPosition in the sequence.
bMononucleotide-based consensus sequence. For details on the lettering see Table 2.
cDNA tracts (2, 3 or 4 bp long), which are statistically significant at each position. Dinucleotides are positioned between the two bases that constitute it, trinucleotides on the central base, and tetranucleotides between the second and third base. Here lettering is not an indication on the frequency of occurrence. The subscript numbers are the number of occurrences of each motif.
dZ-statistics or the deviation of the observed frequency of DNA tracts from that expected based on its mononucleotide composition. It is calculated by subtracting from the observed number of occurrences of the most frequent motif, the expected number of occurrences based on the mononucleotide frequency of the respective base pairs, and then dividing this value by the expected standard deviation (25). Statistically significant motifs are those that appear with frequency higher than that observed in a completely random sequence set, of similar size, in which there is an equal representation of each nucleotide in each position.