Table 3. General features of the Escherichia coli and E. fergusonii genomes sequenced in this work with E. coli K-12 MG1655 as reference (plasmid features).
Plasmid features | E. coli strains | E. fergusonii ATCC | ||||
55989 | ED1a | S88 | UMN026 | |||
Genome Size (bp) | 72 482 | 119 594 | 133 853 | 122 301 | 33 809 | 55 150 |
G+C content (%) | 46.1 | 49.2 | 49.3 | 50.5 | 42 | 48.5 |
Total Protein-coding genesa | 100 | 150 | 144 | 149 | 49 | 54 |
Pseudogenesb (nb) | 7 | 11 | 9 | 8 | 0 | 5 |
Protein coding densityc | 75.6 | 86.2 | 87 | 79.4 | 87.5 | 88.7 |
Assigned functiond (%) | 74 | 53 | 65 | 65.7 | 35.4 | 46.6 |
Orphans (%) | 17 | 31.5 | 25.8 | 27.8 | 12.5 | 20.7 |
Hypothetical (%) | 9 | 15.5 | 9.2 | 6.5 | 52.2 | 32.7 |
IS-like genes (nb) | 18 | 14 | 14 | 15 | 0 | 4 |
The number of protein-coding genes is given without the number of coding sequences annotated as artifactual genes (Supplementary Table 2A).
The number of pseudogenes computed for each genome corresponds to the real number of genes that are pseudogenes: one pseudogene can be made of only one CDS (in this case the gene is partial compared to the wild type form in other E. coli strains) or of several CDSs (generally two or three CDSs corresponding to the different fragments of the wild type form in other E. coli strains). These lists of pseudogenes are available in Supplementary Table 1.
The computed protein coding density takes into account the total length of protein genes excluding overlaps between genes, artifacts, and RNA genes.
Protein genes with assigned function include the total number of definitive and putative functional assignments.