Table 3.
Characterizing the 20 most anomalous 1 kb segments of the E. coli K12 genome based on dinucleotide signature dissimilarity as measured by (a) DRA-divergence, (b) delta-distance, (c) Euclidean distance and (d) the quadratic discriminant
# | (a) chi-square |
(b) delta-distance |
(c) Euclidean |
(d) quadratic |
(e) G + C |
|||||
---|---|---|---|---|---|---|---|---|---|---|
loc. | HT | loc. | HT | loc. | HT | loc. | HT | loc. | HT | |
1 | 200 | yaeT | 151 | −1 | 151 | −1 | 284 | M | 583 | M |
2 | 227 | 0 | 227 | 0 | 274 | M | 525 | M | 584 | M |
3 | 393 | M | 394 | −1 | 575 | M | 526 | M | 1212 | −1, −1 |
4 | 394 | −1 | 526 | M | 777 | −1 | 575 | M | 1636 | M |
5 | 526 | M | 777 | −1 | 978 | mukB | 777 | −1 | 2102 | −1 |
6 | 777 | −1 | 1287 | −1 | 1142 | rne | 1142 | rne | 2105 | +1 |
7 | 978 | mukB | 1427 | M | 1395 | M | 1427 | M | 2468 | M |
8 | 1142 | rne | 1465 | 0 | 1427 | M | 1465 | 0 | 2773 | M |
9 | 1465 | 0 | 1527 | 0 | 1465 | 0 | 1527 | 0 | 2783 | 0 |
10 | 1527 | 0 | 1707 | −1 | 1527 | 0 | 1707 | −1 | 2785 | 0 |
11 | 1707 | −1 | 2071 | M | 2101 | M | 2104 | +1 | 2989 | +1 |
12 | 2071 | M | 2072 | M | 2104 | +1 | 2105 | +1 | 2990 | −1 |
13 | 2072 | M | 2104 | +1 | 2105 | +1 | 2989 | +1 | 2994 | 0 |
14 | 2104 | +1 | 2994 | 0 | 2994 | 0 | 2990 | −1 | 3267 | +1, +1 |
15 | 3312 | −1, infB | 3312 | −1, infB | 3314 | −1 | 2992 | −1 | 3581 | +1 |
16 | 3450 | rplWD | 3602 | ftsY | 3450 | rplWD | 2994 | 0 | 3797 | +1,+1 |
17 | 3602 | ftsY | 3915 | −1 | 3602 | ftsY | 3602 | ftsY | 3798 | −1 |
18 | 3915 | −1 | 4058 | −1 | 4121 | −1 | 3620 | M | 3803 | −1, +1 |
19 | 4058 | −1 | 4187 | rpoC | 4503 | M | 4503 | M | 4267 | −1 |
20 | 4181 | rpoB | 4474 | −1, +1 | 4504 | M | 4504 | M, M | 4475 | +1 |
Genes | 19 | 18 | 18 | 18 | 21 | |||||
Errors | 14(8) | 12(3) | 9(5) | 6(2) | 7(0) |
Each segment, indexed by location (loc.) in kb from the published origin, overlaps zero, one or two genes in the protein table. Intergenic segments are classified as ‘0’. HT is indicated by ‘+1’ or by ‘M’ (if the gene is a mobile element). False positives are indicated by gene locus (if essential) or by ‘−1’ (if not). False positives account for total errors (bottom row) and the number of essential genes is given (in parentheses). The analysis is repeated for the 20 most anomalous segments with respect to (e) GC-divergence.