. 2021 Jun 21;16:10. doi: 10.1186/s13015-021-00192-7

Table 2.

The weights and sizes of various string set representations

Dataset	UST		ESS-Tip-Compress		ESS-Compress		Eq. (3.1) lower bound
Dataset	# strings	#char/ $k$ -mer	# strings	#char/ $k$ -mer	# strings	#char/ $k$ -mer	#char/ $k$ -mer
R. sphaeroides	240,562	2.22	61,909	1.38	36,456	1.29	1.28
Human RNA-seq	4,098,389	2.22	1,834,945	1.60	1,098,938	1.42	1.39
Gingiva metagenome	3,095,476	1.91	1,499,270	1.48	917,388	1.33	1.32
Soybean RNA-seq	1,806,078	1.49	1,137,350	1.32	515,244	1.17	1.17
Tongue metagenome	6,030,814	2.10	2,664,422	1.53	1,327,701	1.33	1.32
Whole human	22,072,219	1.32	21,320,263	1.28	10,321,275	1.15	1.14

The rightmost column shows the lower bound computed by Eq. (3.1) in Sect. "The weight of the ESS-Compress representation". The weight of ESS-Compress was verified to be the same as predicted by Theorem 3.2