Skip to main content
. 2022 Dec 12;23(Suppl 2):154. doi: 10.1186/s12859-022-04582-5

Table 2.

Top Table: Unbalanced setup (first to fifth column): for each dataset and each task (first to fifth row from top to bottom) executed on data from genome version hg19, we report the class unbalancing ratio, computed as the ratio between the cardinality of the most-represented class and the cardinality of the less-represented class. Bottom table: the unbalancing ratios describing the unbalancing for data in hg38 are shown

Genome version Task Unbalancing ratios for different setups
Unbalanced setup Full-balanced setup [34]
HepG2 HelaS3 K562 GM12878 Average All cell lines
hg19 IE versus IP 2.78 2.46 2.41 2.62 2.57 1
AP versus IP 8.39 7.34 8.22 6.83 7.70 2
AE versus IE 23.59 17.42 38.47 9.78 22.32 2
AE versus AP 7.83 5.83 11.27 3.76 7.17 1
AE + AP versus else 18.49 17.76 20.47 15.29 18.00 8
Avg per cell line 12.22 10.16 16.17 7.66 11.55 2.8
Unbalanced setup
HepG2 K562 GM12878 Average
hg38 IE versus IP 1.53 1.51 1.66 1.57
AP versus IP 6.09 6.98 6.12 6.39
AE versus IE 7.82 10.46 4.46 7.58
AE versus AP 1.96 2.27 1.21 1.81
Avg per cell line 4.35 5.30 3.36 4.34

The fifth column (Average) shows the average unbalancing ratio over all the four cell lines, when the unbalanced setup is used. Task AE versus IE and task AE + AP versus else are, on average, the most unbalanced. Full-balanced setup (sixth column): the unbalancing ratio in each task is equal for all the cell lines. The comparison between the averages (over each cell lines) of the unbalancing factors (fifth and sixth columns) shows the striking difference between the two unbalancing modes.