Table 4.
The dictionary and parse sizes for prefixes of a database of Salmonella genomes, with three settings of the parameters w and p
| Number of genomes | Size | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Dict. | Parse | % | Dict. | Parse | % | Dict. | Parse | % | ||
| 50 | 249 | 68 | 43 | 44 | 77 | 20 | 39 | 91 | 10 | 40 |
| 100 | 485 | 83 | 85 | 35 | 99 | 39 | 28 | 122 | 19 | 29 |
| 500 | 2436 | 273 | 424 | 29 | 314 | 194 | 21 | 377 | 96 | 19 |
| 1000 | 4861 | 475 | 847 | 27 | 541 | 388 | 19 | 643 | 192 | 17 |
| 5000 | 24936 | 2663 | 4334 | 28 | 2915 | 1987 | 20 | 3196 | 985 | 17 |
| 10,000 | 49420 | 4190 | 8611 | 26 | 4652 | 3939 | 17 | 5176 | 1955 | 14 |
Again, all sizes are reported in megabytes; percentages are the sums of the sizes of the dictionaries and parses, divided by the sizes of the uncompressed files
For each prefix, the sizes are in italics for the settings with the best overall compression