Skip to main content
. 2017 May 19;45(Web Server issue):W162–W170. doi: 10.1093/nar/gkx449

Table 1. Evaluation results of four criteria on benchmark dataset MTBLS79 (selected measure under each criterion was shown in bracket).

Criterion (a) Criterion (b) Criterion (c) Criterion (d)
(PMAD) (distribution of P-value) (consistency) (AUC)
Auto scaling 0.8360 Good 14.6500 0.8344
Contrast 0.7797 Fair 9.7500 0.6250
Cubic splines 0.1393 Excellent 13.7500 0.8322
Cyclic loess 0.3188 Good 15.6500 0.8356
EigenMS 0.1799 Good 16.4000 0.8010
Level scaling 0.2890 Good 15.1000 0.8345
Linear baseline 0.6035 Fair 6.3000 0.7072
Log-transform 0.1349 Good 14.7500 0.8168
Mean 0.3100 Good 14.7500 0.8213
Median 0.3100 Good 14.5500 0.8177
MSTUS 0.0064 Good 14.3500 0.8405
Pareto scaling 0.5320 Good 14.9500 0.8344
Power scaling 0.1660 Good 14.9500 0.8314
PQN 0.3260 Good 13.7000 0.8309
Quantile 0.2989 Excellent 13.8000 0.8119
Range scaling 0.1573 Good 15.3500 0.8344
Total sum 2.4336 Fair 14.7000 0.7538
Vast scaling 2.7200 Good 15.0000 0.8344
VSN 0.5626 Excellent 13.7500 0.8373

The way calculating those measures was described in ‘Materials and Methods’ section and ‘Supplementary Methods’ section. Besides of quantitative measures, qualitative ones such as distribution of P-value were also evaluated and three performance levels were provided (Excellent, Good and Fair). Qualitative measures were evaluated by visual inspection and examples illustrating how those three performance levels were assigned were shown in Supplementary Figure S1.