Skip to main content
. 2016 Sep 17;14(5):265–279. doi: 10.1016/j.gpb.2016.05.004

Table 5.

Under-represented and over-represented 5-mers of the E. coli data

Group Template reads
Complement reads
2D reads
Kmer Occurrence/100 bp
Diff Kmer Occurrence/100 bp
Diff Kmer Occurrence/100 bp
Diff
in Ref in Templ in Ref in Compl in Ref in 2D
Under-represented AAAAA 0.247 0.086 −0.161 CGCCA 0.288 0.092 −0.196 TTTTT 0.251 0.047 −0.204
TTTTT 0.251 0.093 −0.158 AAAAA 0.247 0.055 −0.192 AAAAA 0.247 0.058 −0.189
CGCTG 0.258 0.104 −0.155 TTTTT 0.251 0.065 −0.186 CAAAA 0.170 0.111 −0.058
GCTGG 0.279 0.148 −0.132 CACCA 0.184 0.054 −0.130 AAAAT 0.195 0.138 −0.057
CGCCA 0.288 0.168 −0.120 CCAGC 0.288 0.162 −0.126 AAAAG 0.132 0.081 −0.051
CCAGC 0.288 0.180 −0.108 CGCTG 0.258 0.135 −0.123 CGCCA 0.288 0.239 −0.049
GCCAG 0.280 0.173 −0.107 GCCAG 0.280 0.157 −0.122 TAAAA 0.145 0.097 −0.048
CTGGC 0.278 0.178 −0.100 CAGCA 0.262 0.140 −0.122 TGGTG 0.185 0.138 −0.048
CAGCA 0.262 0.168 −0.095 CTGGC 0.278 0.159 −0.119 CGCTG 0.258 0.213 −0.046
CGGCA 0.222 0.129 −0.093 TGGCG 0.275 0.163 −0.112 GCCAG 0.280 0.238 −0.042



Over-represented ACCCC 0.040 0.136 0.096 ACCCC 0.040 0.143 0.103 CAAAT 0.105 0.164 0.059
CCCCG 0.055 0.149 0.093 CCCCG 0.055 0.134 0.079 GGGGT 0.039 0.074 0.035
CCCCC 0.033 0.122 0.089 CCCCA 0.064 0.128 0.065 CCCAA 0.047 0.080 0.033
CCCCA 0.064 0.138 0.075 CCTAG 0.003 0.066 0.063 TGAAT 0.121 0.154 0.033
CCTAG 0.003 0.075 0.072 CTGAG 0.050 0.112 0.063 GAAGG 0.094 0.127 0.033
GCCCC 0.062 0.131 0.069 TACCC 0.073 0.136 0.062 CGGGG 0.054 0.087 0.032
CTCCC 0.039 0.107 0.067 CCTAA 0.026 0.087 0.061 ACCGT 0.123 0.155 0.032
TCTAC 0.048 0.113 0.065 GACCC 0.040 0.100 0.060 CGTGA 0.102 0.134 0.032
TCCCC 0.056 0.121 0.065 TCCCC 0.056 0.115 0.059 GAAGC 0.124 0.156 0.032
TACCC 0.073 0.138 0.064 TCCTA 0.013 0.071 0.058 AGGCA 0.093 0.124 0.031

Note: Poretools was used to extract FASTA sequences for template, complement, and 2D reads from the FAST5 files. The 5-mer counts of the reads of the E. coli data [13] and of the reference assembly were calculated separately using the oligonucleotide frequency function of the R package Biostring. Frequencies of each 5-mer occurrence in reads per 100 bp were calculated and differences in reads relative to Ref are indicated. Ref, reference assembly; Templ, template read; Compl, complement read.