Skip to main content
. 2023 Nov 21;24:437. doi: 10.1186/s12859-023-05570-z

Table 1.

Compression and runtime comparisons of gzip and PIC

Protein Atom Original Rounded coordinates gzip PIC Compression Images Decompression
ID Count File Text Binary Size CR Size CR RMSD Time Number Space Time
size (KB) size (KB) size (KB) (KB) (KB) (min:sec) used used (%) (min:sec)
2ja9 1458 163.3 24.1 6.6 6.3 3.834 10.0 2.412 0.031 0:0.1 1 [0.9] 0:0.4
2jan 12591 1101.2 206.1 56.7 54.1 3.813 61.2 3.368 0.047 0:1.3 1 [2.7] 0:2.1
2jbp 27367 2397.4 447.8 133.4 130.2 3.439 108.8 4.117 0.043 0:4.1 1 [11.1] 0:5.0
2ja8 32000 2831.2 507.6 144.0 139.6 3.637 138.0 3.678 0.043 0:5.6 1 [6.5] 0:7.0
2ign 41758 3579.2 666.7 187.9 180.8 3.688 147.3 4.526 0.069 0:9.0 1 [9.5] 0:11.4
2jd8 50351 4457.6 828.1 226.6 219.7 3.769 196.8 4.207 0.056 0:12.8 1 [7.7] 0:15.9
2ja7 63924 5605.5 1077.0 287.7 278.6 3.866 258.8 4.161 0.055 0:19.6 1 [10.2] 0:24.7
2fug 73916 6386.9 1180.7 360.3 347.5 3.398 283.3 4.168 0.060 0:26.2 1 [10.7] 0:33.3
2b9v 80710 6818.4 1279.8 393.5 379.4 3.373 289.0 4.428 0.073 0:32.2 1 [10.3] 0:39.5
2j28 95358 8152.3 1526.2 429.1 412.2 3.702 346.6 4.403 0.055 0:47.0 1 [13.7] 1:0.2
6hif 118753 12726.2 2105.2 534.4 516.2 4.078 372.2 5.656 0.062 1:30.5 2 [34.0, 0.1] 1:48.6
3j7q 140540 16027.2 2529.7 737.8 707.6 3.575 475.6 5.318 0.058 2:28.2 1 [20.3] 2:44.4
3j9m 158384 17995.2 2845.4 772.1 765.8 3.716 525.7 5.413 0.069 3:28.8 1 [21.7] 3:55.9
6gaw 178372 20825.4 3179.9 869.6 862.1 3.688 587.6 5.411 0.071 4:58.1 1 [23.5] 5:39.1
5t2a 200172 22787.6 3253.9 900.8 872.4 3.73 651.7 4.993 0.068 7:8.2 2 [31.1, 1.7] 8:59.1
4ug0 218776 24906.9 3841.4 1066.5 1056.7 3.635 707.3 5.431 0.069 8:34.2 2 [33.8, 1.7] 9:25.5
4v60 241956 24377.8 4207.8 1179.5 1167.2 3.605 730.2 5.762 0.120 9:50.8 2 [45.6, 2.1] 13:48.9
4wro 260090 35661.1 4363.1 1267.9 1246.2 3.501 848.8 5.14 0.086 13:54.0 1 [29.6] 16:6.9
6fxc 281510 31329.0 5067.1 1477.9 1424.2 3.558 917.7 5.522 0.100 15:52.9 2 [34.6, 1.0] 17:11.6
4wq1 299951 40130.9 5042.1 1462.3 1438.0 3.506 968.8 5.204 0.087 19:59.6 2 [34.7, 0.2] 22:39.0

PIC compression algorithm, ε=2.5, results. Rounded Coordinates Text Size and Binary Size are the sizes of the text and binary files (in kilobytes, i.e. 1000 × bytes, rather than kibibytes), respectively, that contain only the Cartesian coordinates found in the original file, rounded to one decimal place. The binary file (which uses a variable-length encoding) is then gzipped. The gzip and PIC compression ratios (CR) are the ratios of the Rounded Coordinates Text Size to the size the gzip file and PNG image output(s) from the PIC compressor, respectively. Bolded values are the best of gzip and PIC. Compression and decompression times are for the PIC algorithm; note that our code is unoptimized, as the focus is on compression ratios, but we include these times here for completeness. As an aside, (de)compression for gzip takes negligible time for files of this size. We also include RMSD values to measure the lossiness of PIC compression. Image Space Used gives the proportion of the image space that was used to encode the protein coordinate data, or part thereof, in each image constructed by the PIC compressor (for large proteins, more than one image is needed to represent all the atoms)