Skip to main content
. 2020 Oct 1;16(10):e1008174. doi: 10.1371/journal.pcbi.1008174

Table 1. Experimental dataset properties.

The notations |S|, L(S), LCard(S), LDen(S), DL(S), and PDL(S) represent number of instances, number of pathway labels, pathway labels cardinality, pathway labels density, distinct pathway labels set, and proportion of distinct pathway labels set for S, respectively. The notations R(S), RCard(S), RDen(S), DR(S), and PDR(S) have similar meanings as before but for the enzymatic reactions E in S. PLR(S) represents a ratio of L(S) to R(S). The last column denotes the domain of S.

Dataset |S| L(S) LCard(S) LDen(S) DL(S) PDL(S) R(S) RCard(S) RDen(S) DR(S) PDR(S) PLR(S) Domain
EcoCyc 1 307 307 1 307 307 1134 1134 1 719 719 0.2707 Escherichia coli K-12 substr.MG1655
HumanCyc 1 279 279 1 279 279 1177 1177 1 693 693 0.2370 Homo sapiens
AraCyc 1 510 510 1 510 510 2182 2182 1 1034 1034 0.2337 Arabidopsis thaliana
YeastCyc 1 229 229 1 229 229 966 966 1 544 544 0.2371 Saccharomyces cerevisiae
LeishCyc 1 87 87 1 87 87 363 363 1 292 292 0.2397 Leishmania major Friedlin
TrypanoCyc 1 175 175 1 175 175 743 743 1 512 512 0.2355 Trypanosoma brucei
SixDB 63 37295 591.9841 0.0159 944 14.9841 210080 3334.6032 0.0159 1709 27.1270 0.1775 Composed from six databases
Symbiont 3 119 39.6667 0.3333 59 19.6667 304 101.3333 0.3333 130 43.3333 0.3914 Composed of Moranella and Tremblaya
CAMI 40 6261 156.5250 0.0250 674 16.8500 14269 356.7250 0.0250 1083 27.0750 0.4388 Simulated microbiomes of low complexity
HOT 4 2178 311.1429 0.1429 781 111.5714 182675 26096.4286 0.1429 1442 206.0000 0.0119 Metagenomic Hawaii Ocean Time-series (10m, 75m, 110m, and 500m)
Synset-1 15000 6801364 453.4243 0.00007 2526 0.1684 30901554 2060.1036 0.00007 3650 0.2433 0.2201 Synthetically generated (uncorrupted)
Synset-2 15000 6806262 453.7508 0.00007 2526 0.1684 34006386 2267.0924 0.00007 3650 0.2433 0.2001 Synthetically generated (corrupted)