Table 4. Over- and under-represented sequences in the IL sample.
IL under-represented oligonucleotides | |||||
4mer | cagt(0.80) | ctga(0.81) | gatg(0.81) | tcag(0.81) | tgga(0.83) |
5mer | acctt(0.61) | agcat(0.70) | cacag(0.72) | cagga(0.78) | cagtg(0.74) |
ctcag(0.67) | ctgac(0.61) | ctgga(0.80) | gaggt(0.69) | gattg(0.58) | |
gctga(0.78) | ggagt(0.64) | ggcac(0.66) | gtgga(0.76) | tcagg(0.71) | |
tgcct(0.65) | tggac(0.74) | ttgcc(0.58) | |||
IL over-represented oligonucleotides | |||||
4mer | cgaa(1.36) | cgac(1.47) | cgag(1.27) | cgca(1.58) | cgcg(2.49) |
cggc(1.51) | cggt(1.46) | ctcg(1.66) | gccg(1.64) | gcga(1.80) | |
gcgc(1.94) | ggcg(1.79) | tcgc(1.48) | |||
5mer | aaaaa(1.58) | aaaag(1.38) | aagcg(1.49) | accgc(1.67) | acgac(1.60) |
acgcg(2.22) | actcg(1.77) | agaaa(1.46) | ccgaa(1.63) | ccggt(2.17) | |
cgacg(2.33) | cgcag(1.65) | cgccg(2.53) | cgcga(2.95) | cgcgc(3.17) | |
cgcgg(2.20) | cgctc(1.59) | cggcg(2.36) | cggtg(1.76) | ctagc(1.84) | |
ctcgc(2.31) | ctcgt(1.68) | gaaag(1.44) | gccga(1.70) | gccgg(1.54) | |
gcgaa(2.13) | gcgag(1.67) | gcgat(1.99) | gcgca(1.95) | gcgcg(3.22) | |
gcggc(1.85) | gctcg(1.92) | ggccg(1.71) | ggcga(1.69) | ggcgc(2.20) | |
ggcgg(1.63) | gggcg(1.85) | gtgcg(1.83) | taggg(1.94) | tcgcg(3.86) | |
tctcg(1.58) | tgcga(1.90) | ||||
IL under-represented codons with context | |||||
codon_N1 | aca g(0.74) | gat g(0.76) | tca g(0.66) | tcc a(0.80) | |
codon_N1N2 | aca gg(0.49) | acc tt(0.65) | cag ga(0.66) | cca ga(0.58) | cct ga(0.62) |
cgg at(0.26) | ctg ac(0.62) | ctg ca(0.63) | gct ga(0.68) | gct ta(0.27) | |
ggc ac(0.64) | gtg at(0.56) | gtg ga(0.72) | tac at(0.61) | tcc at(0.54) | |
IL over-represented codons with context | |||||
codon_N1 | aga a(1.62) | aga g(1.51) | agg g(1.56) | atg t(1.32) | ccc g(1.59) |
ccg a(1.95) | cgc a(1.62) | cgc g(2.18) | ctc g(1.57) | gcc g(1.82) | |
gcg c(2.10) | gcg g(1.91) | ggc g(2.22) | tcg g(1.66) | ||
codon_N1N2 | acc gc(2.06) | aga aa(1.85) | aga ga(1.83) | cac gc(2.00) | ccg aa(2.46) |
ccg ag(2.29) | ccg gt(2.90) | cga cg(3.66) | cgc ac(2.12) | cgc ag(1.96) | |
cgc cg(2.08) | cgc ga(3.19) | cgc gg(2.63) | ctc gt(1.75) | gcc ga(1.73) | |
gcg aa(2.65) | gcg cc(2.25) | gcg cg(3.51) | gcg gc(2.44) | gcg gt(2.00) | |
ggc ga(1.78) | ggc gc(3.55) | tca cg(2.21) | tcg cg(4.01) | tcg gg(2.53) | |
tgc ga(2.45) | tta gg(2.48) |
List of all 4–5mer oligonucleotides and codons with N1 and N1N2 context over- or under-represented in the IL sample compared to all the 200 random IC subsets.
The number in parentheses (k) beside every motif represents the corresponding relative abundance in the IL sample compared to the whole IC sample and was calculated using the formula: k = NIL/NIC × LIC/LIL, where NIL and NIC are the occurrences of the examined sequence in the IL and IC samples, respectively, and LIL and LIC are the sizes of the samples.
It should be noted that putative exonic splicing enhancers are among the IL under-represented sequences and putative exonic splicing silencers are among the IL over-represented.