Table 5.
Group |
Template reads |
Complement reads |
2D reads |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Kmer |
Occurrence/100 bp |
Diff | Kmer |
Occurrence/100 bp |
Diff | Kmer |
Occurrence/100 bp |
Diff | ||||
in Ref | in Templ | in Ref | in Compl | in Ref | in 2D | |||||||
Under-represented | AAAAA | 0.247 | 0.086 | −0.161 | CGCCA | 0.288 | 0.092 | −0.196 | TTTTT | 0.251 | 0.047 | −0.204 |
TTTTT | 0.251 | 0.093 | −0.158 | AAAAA | 0.247 | 0.055 | −0.192 | AAAAA | 0.247 | 0.058 | −0.189 | |
CGCTG | 0.258 | 0.104 | −0.155 | TTTTT | 0.251 | 0.065 | −0.186 | CAAAA | 0.170 | 0.111 | −0.058 | |
GCTGG | 0.279 | 0.148 | −0.132 | CACCA | 0.184 | 0.054 | −0.130 | AAAAT | 0.195 | 0.138 | −0.057 | |
CGCCA | 0.288 | 0.168 | −0.120 | CCAGC | 0.288 | 0.162 | −0.126 | AAAAG | 0.132 | 0.081 | −0.051 | |
CCAGC | 0.288 | 0.180 | −0.108 | CGCTG | 0.258 | 0.135 | −0.123 | CGCCA | 0.288 | 0.239 | −0.049 | |
GCCAG | 0.280 | 0.173 | −0.107 | GCCAG | 0.280 | 0.157 | −0.122 | TAAAA | 0.145 | 0.097 | −0.048 | |
CTGGC | 0.278 | 0.178 | −0.100 | CAGCA | 0.262 | 0.140 | −0.122 | TGGTG | 0.185 | 0.138 | −0.048 | |
CAGCA | 0.262 | 0.168 | −0.095 | CTGGC | 0.278 | 0.159 | −0.119 | CGCTG | 0.258 | 0.213 | −0.046 | |
CGGCA | 0.222 | 0.129 | −0.093 | TGGCG | 0.275 | 0.163 | −0.112 | GCCAG | 0.280 | 0.238 | −0.042 | |
Over-represented | ACCCC | 0.040 | 0.136 | 0.096 | ACCCC | 0.040 | 0.143 | 0.103 | CAAAT | 0.105 | 0.164 | 0.059 |
CCCCG | 0.055 | 0.149 | 0.093 | CCCCG | 0.055 | 0.134 | 0.079 | GGGGT | 0.039 | 0.074 | 0.035 | |
CCCCC | 0.033 | 0.122 | 0.089 | CCCCA | 0.064 | 0.128 | 0.065 | CCCAA | 0.047 | 0.080 | 0.033 | |
CCCCA | 0.064 | 0.138 | 0.075 | CCTAG | 0.003 | 0.066 | 0.063 | TGAAT | 0.121 | 0.154 | 0.033 | |
CCTAG | 0.003 | 0.075 | 0.072 | CTGAG | 0.050 | 0.112 | 0.063 | GAAGG | 0.094 | 0.127 | 0.033 | |
GCCCC | 0.062 | 0.131 | 0.069 | TACCC | 0.073 | 0.136 | 0.062 | CGGGG | 0.054 | 0.087 | 0.032 | |
CTCCC | 0.039 | 0.107 | 0.067 | CCTAA | 0.026 | 0.087 | 0.061 | ACCGT | 0.123 | 0.155 | 0.032 | |
TCTAC | 0.048 | 0.113 | 0.065 | GACCC | 0.040 | 0.100 | 0.060 | CGTGA | 0.102 | 0.134 | 0.032 | |
TCCCC | 0.056 | 0.121 | 0.065 | TCCCC | 0.056 | 0.115 | 0.059 | GAAGC | 0.124 | 0.156 | 0.032 | |
TACCC | 0.073 | 0.138 | 0.064 | TCCTA | 0.013 | 0.071 | 0.058 | AGGCA | 0.093 | 0.124 | 0.031 |
Note: Poretools was used to extract FASTA sequences for template, complement, and 2D reads from the FAST5 files. The 5-mer counts of the reads of the E. coli data [13] and of the reference assembly were calculated separately using the oligonucleotide frequency function of the R package Biostring. Frequencies of each 5-mer occurrence in reads per 100 bp were calculated and differences in reads relative to Ref are indicated. Ref, reference assembly; Templ, template read; Compl, complement read.