Skip to main content
. 2013 Nov 15;8(11):e79217. doi: 10.1371/journal.pone.0079217

Table 1. Ranking of gene set analysis methods.

Raw Data Z-scores Overall methodrank in category
Method Sampling type Category Sensitivity Prioritization Specificity Sensitivity Prioritization Specificity Sum Z-scores
med. p med. rank(%) FP (α = 1%) med. p med. rank(%) FP (α = 1%)
PLAGE subject I 0.0022 25.0 1.1% −1.5 −0.4 Not used −1.86 1
GLOBALTEST subject I 0.0001 27.9 2.0% −1.5 −0.2 Not used −1.69 2
PADOG subject I 0.0960 9.7 2.5% 0.0 −1.5 Not used −1.45 3
ORA gene I 0.0732 18.3 2.5% −0.4 −0.9 Not used −1.21 4
SAFE subject I 0.1065 18.8 1.3% 0.2 −0.8 Not used −0.64 5
SIGPATH. Q2 subject I 0.0565 38.0 0.9% −0.6 −0.5 Not used −0.09 6
GSA subject I 0.1420 21.0 1.3% 0.7 −0.7 Not used 0.07 7
SSGSEA subject I 0.0808 40.3 1.0% −0.2 0.7 Not used 0.45 8
ZSCORE subject I 0.0950 39.8 1.0% 0.0 0.7 Not used 0.65 9
GSEA subject I 0.1801 33.1 2.3% 1.3 0.2 Not used 1.52 10
GSVA subject I 0.1986 51.5 1.1% 1.6 1.5 Not used 3.10 11
CAMERA subject I 0.3126 43.0 0.5% 3.4 0.9 Not used 4.30 12
MRGSE gene II 0.0100 18.8 4.9% −0.59 −1.68 −1.27 −3.54 1
GSEAP gene II 0.0644 36.2 15.8% 0.59 0.02 −0.08 0.53 2
GAGE gene II 0.0024 35.9 37.9% −0.76 −0.02 2.33 1.56 3
SIGPATH. Q1 gene II 0.1165 49.7 17.2% 1.72 1.33 0.08 3.14 4

Surrogate sensitivity, prioritization ability and specificity are combined after transformation into Z-scores. A ranking is produced separately for methods in category I and methods in category II. Methods in category II produce substantially higher false positives than methods in category I under phenotype permutation.

We have evaluated the methods ranking stability as a function of several factors that could potentially impact the gene set analysis in different ways, such as the sample size of the microarray datasets, the gene set size, the type of experiment design, and the effect size of the condition under the study. The resulting rankings in the 8 scenarios shown in Table 2 were correlated with the original ranking of the methods (based on all datasets), with the Spearman correlation ranging between 0.78 for large gene sets scenario to 0.98 (all p<0.0001) for unpaired design scenario. The exception to the rule was the paired design scenario for which a 0.34 correlation coefficient was observed with the original ranking. Among the possible factors considered in Table 2, the sample size had the least effect on the methods ranking, with the correlation between the original ranking (based on 42 datasets) and the one based on the smallest and largest 21 datasets being 0.97 and 0.92 respectively (p<0.0001).