Table 1. Ranking of gene set analysis methods.
Raw Data | Z-scores | Overall methodrank in category | ||||||||
Method | Sampling type | Category | Sensitivity | Prioritization | Specificity | Sensitivity | Prioritization | Specificity | Sum Z-scores | |
med. p | med. rank(%) | FP (α = 1%) | med. p | med. rank(%) | FP (α = 1%) | |||||
PLAGE | subject | I | 0.0022 | 25.0 | 1.1% | −1.5 | −0.4 | Not used | −1.86 | 1 |
GLOBALTEST | subject | I | 0.0001 | 27.9 | 2.0% | −1.5 | −0.2 | Not used | −1.69 | 2 |
PADOG | subject | I | 0.0960 | 9.7 | 2.5% | 0.0 | −1.5 | Not used | −1.45 | 3 |
ORA | gene | I | 0.0732 | 18.3 | 2.5% | −0.4 | −0.9 | Not used | −1.21 | 4 |
SAFE | subject | I | 0.1065 | 18.8 | 1.3% | 0.2 | −0.8 | Not used | −0.64 | 5 |
SIGPATH. Q2 | subject | I | 0.0565 | 38.0 | 0.9% | −0.6 | −0.5 | Not used | −0.09 | 6 |
GSA | subject | I | 0.1420 | 21.0 | 1.3% | 0.7 | −0.7 | Not used | 0.07 | 7 |
SSGSEA | subject | I | 0.0808 | 40.3 | 1.0% | −0.2 | 0.7 | Not used | 0.45 | 8 |
ZSCORE | subject | I | 0.0950 | 39.8 | 1.0% | 0.0 | 0.7 | Not used | 0.65 | 9 |
GSEA | subject | I | 0.1801 | 33.1 | 2.3% | 1.3 | 0.2 | Not used | 1.52 | 10 |
GSVA | subject | I | 0.1986 | 51.5 | 1.1% | 1.6 | 1.5 | Not used | 3.10 | 11 |
CAMERA | subject | I | 0.3126 | 43.0 | 0.5% | 3.4 | 0.9 | Not used | 4.30 | 12 |
MRGSE | gene | II | 0.0100 | 18.8 | 4.9% | −0.59 | −1.68 | −1.27 | −3.54 | 1 |
GSEAP | gene | II | 0.0644 | 36.2 | 15.8% | 0.59 | 0.02 | −0.08 | 0.53 | 2 |
GAGE | gene | II | 0.0024 | 35.9 | 37.9% | −0.76 | −0.02 | 2.33 | 1.56 | 3 |
SIGPATH. Q1 | gene | II | 0.1165 | 49.7 | 17.2% | 1.72 | 1.33 | 0.08 | 3.14 | 4 |
Surrogate sensitivity, prioritization ability and specificity are combined after transformation into Z-scores. A ranking is produced separately for methods in category I and methods in category II. Methods in category II produce substantially higher false positives than methods in category I under phenotype permutation.
We have evaluated the methods ranking stability as a function of several factors that could potentially impact the gene set analysis in different ways, such as the sample size of the microarray datasets, the gene set size, the type of experiment design, and the effect size of the condition under the study. The resulting rankings in the 8 scenarios shown in Table 2 were correlated with the original ranking of the methods (based on all datasets), with the Spearman correlation ranging between 0.78 for large gene sets scenario to 0.98 (all p<0.0001) for unpaired design scenario. The exception to the rule was the paired design scenario for which a 0.34 correlation coefficient was observed with the original ranking. Among the possible factors considered in Table 2, the sample size had the least effect on the methods ranking, with the correlation between the original ranking (based on 42 datasets) and the one based on the smallest and largest 21 datasets being 0.97 and 0.92 respectively (p<0.0001).