. 2006 Jan 10;34(1):104–119. doi: 10.1093/nar/gkj414

Table 3.

Statistically significant higher-order motifs in the selected DNA pools

Position^a	E4 kinetic selection (40 sequences)							E4 thermodynamic selection (47 sequences)							MLP kinetic selection (42 sequences)
	Con^b	Statistically significant dinucleotide^c	Z^d	Statistically significant trinucleotide^c	Z^d	Statistically significant tetranucleotide^c	Z^d	Con^b	Statistically significant dinucleotide^c	Z^d	Statistically significant trinucleotide^c	Z^d	Most frequent tetranucleotide^c	Z^d	Con^b	Statistically significant dinucleotide^c	Z^d	Statistically significant trinucleotide^c	Z^d	Statistically significant tetranucleotide^c	Z^d
1	n							n							n
2	n							n							n
3	n							n							n
																CA ₆	2.2
																GT ₇	2.8
4	n							n							n			CAT ₅	5.4
																		GTT ₅	5.4
																AT ₉	4.1			CATT ₃	7.0
																				GTTT ₃	7.0
5	n			AGG ₄	4.3			n							k			ATT ₆	6.6
		GG ₇	2.9													TT ₁₂	6.0			TTTG ₃	7.0
6	g							n							t			TTG ₇	7.9
									TC ₉	3.7			GTCC ₃	6.6		TG ₁₁	5.3			TTGG ₄	9.5
																				TTCG ₃	7.0
7	g							s							n			TGG ₇	7.9
						CCCG ₃	7.2		CC ₈	3.1			TCCG ₃	6.6		GG ₉	4.1			TGGG ₄	9.5
																				TGGC ₃	7.0
8	n			CCG ₄	4.3			n			CCG ₇	7.4			n			GGG ₄	4.2
				GCG ₄	4.3													GGC ₄	4.2
		CG ₁₁	5.6			GGTG ₃	7.2		CG ₁₀	4.3			CCGC ₃	6.6
													CCGT ₃	6.6
9	G			CGC ₅	5.6			G			CGT ₅	5.0			s
											CGC ₄	3.8
		GC ₁₃	6.9			CGCT ₅	5.6		GT ₉	3.7			CGCT ₄	3.8		GG ₇	2.8
													CGTT ₅	5.0
10	C			GCT ₁₃	6.9			n			GTT ₉	3.7			n			GGT ₇	2.8
		CT ₂₀	3.7			GCTA ₁₃	6.9		TT ₁₆	1.4						GT ₁₅	1.6
11	T							T							T
18	A							A							G
		AC ₁₇	2.6						AC ₁₉	2.4						GT ₁₇	2.3
19	c			ACA ₈	3.6			c			AGG ₇	2.4			k			GTT ₇	2.8
				ACG ₇	2.9													GGT ₆	2.2
		CA ₈	3.6			ACGC ₇	8.1		GG ₇	2.4						TT ₇	2.8
		CG ₇	3.0			ACAC ₄	4.3
						AGTG ₄	4.3
20	g			CGC ₇	8.1			g							n
				CAC ₄	4.3
				GTG ₄	4.3
		GC ₁₀	4.9			CGCG ₃	7.2 7.2		GT ₉	3.7			CCCT ₃	6.6
		TG ₈	3.8			CGCC ₃
21	s			GCG ₅	5.6			n			GTT ₆	6.2			n
											GCT ₄	3.8
											CCT ₄	3.8
		CG ₈	3.6			GCGG ₃	7.2		CT ₁₁	4.9			GTTG ₄	8.9
		GG ₇	3.0						TT ₉	3.7
		GC ₇	3.0						GG ₇	2.4
22	g			GCG ₄	4.3			T			TTG ₆	6.2			r			TAG ₅	5.4
				GGG ₄	4.3						GGC ₅	5.0
				CGG ₄	4.3						CTG ₅	5.0
		GG ₁₀	4.9			CGGC ₃	7.2		TG ₁₅	7.3			TTGG ₄	8.9		GG ₇	2.8
						GGGG ₃	7.2						GGCA ₃	6.6
23	G			GGG ₆	6.9			s			TGG ₆	6.2			s
				GCG ₄	4.3						TGA ₅	5.0
				GGC ₄	4.3						GGA ₄	3.8
		GG ₁₃	6.9						GG ₈	3.1			TCCC ₄	8.9
		GC ₇	3.0						GA ₈	3.1			TACC ₃	6.6
													TGGC ₃	6.6
24	G			GGC ₆	6.9			n			CCC ₄	4.3			k			CTG ₄	4.2
											GGC ₄	4.3
		GC ₈	3.6						CC ₉	3.6			CGGC ₃	6.6						CTGA ₃	7.0
									AC ₇	2.4
25	n			GGG ₄	4.3			c							n			TGA ₄	4.2
				TGG ₄	4.3													GCT ₄	4.2
		GG ₉	4.2			TGGG ₃	7.2		CG ₇	2.4			GGCG ₃	6.6		GG ₇	2.8			GCTC ₃	7.0
26	g			GGG ₆	6.9			n			CCC ₅	5.0			n
				CGG ₄	4.3						CAC ₅	5.0
											CGC ₄	3.8
		GG ₁₂	6.2			GGGG ₄	9.9		CC ₈	3.1			CACG ₃	6.6		GT ₇	2.8
27	g			GGG ₇	8.3			b			ACC ₄	3.8			k
		GG ₇	3.0						CG ₈	3.1						GC ₇	2.8
28	n							n							n

^aPosition in the sequence.

^bMononucleotide-based consensus sequence. For details on the lettering see Table 2.

^cDNA tracts (2, 3 or 4 bp long), which are statistically significant at each position. Dinucleotides are positioned between the two bases that constitute it, trinucleotides on the central base, and tetranucleotides between the second and third base. Here lettering is not an indication on the frequency of occurrence. The subscript numbers are the number of occurrences of each motif.

^dZ-statistics or the deviation of the observed frequency of DNA tracts from that expected based on its mononucleotide composition. It is calculated by subtracting from the observed number of occurrences of the most frequent motif, the expected number of occurrences based on the mononucleotide frequency of the respective base pairs, and then dividing this value by the expected standard deviation (25). Statistically significant motifs are those that appear with frequency higher than that observed in a completely random sequence set, of similar size, in which there is an equal representation of each nucleotide in each position.