Subramanian et al. 10.1073/pnas.0506580102. |
A
B
C
Fig. 4. Asymmetry of GSEA results due to unbalanced global phenotype expression and gene set collection bias. (A) The GSEA observed and null distributions when a collection of random gene sets with the same number and size distribution as the functional C2 collection is run against the diabetes data set from Mootha et al. (1). Random sets have small biases (Left) so that only a modest correction is made by the normalization procedure (Right). The middle of the observed and null distributions coincide as they should. (B) Here, the actual C2 collection is run against the diabetes data set, and we clearly see the bias in the observed distribution caused by unequal representation gene sets in the two phenotypes. Normalizing the positive and negative side of the distribution independently helps to ameliorate this bias. (C) The Leukemia data set (2) illustrates bias in the gene expression correlation profiles between the two phenotypes. On the acute lymphoid leukemia (ALL) side, there are more markers and they are more highly correlated markers (Lower Left). Again, independent normalization of positive and negative scoring sets decreases this bias (Lower Right).
1. Mootha, V. K., Lindgren, C. M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., et al. (2003) Nat. Genet. 34, 267273.
2. Armstrong, S. A., Staunton, J. E., Silverman, L. B., Pieters, R., den Boer, M. L., Minden, M. D., Sallan, S. E., Lander, E. S., Golub, T. R. & Korsmeyer, S. J. (2002) Nat. Genet. 30, 4147.
Fig. 5. Single gene overlaps in lung cancer studies. This Venn diagram shows the pairwise and three-way overlap between the top 100 genes correlated with poor outcome in the Michigan, Boston, and Stanford data sets. Pairwise overlap is determined by using genes that appear on the technology platforms of both studies. Three-way overlap is the overlap of the pairwise overlaps. Restricting to genes on all three platforms would reduce the gene space by 50% in the Michigan study and by 70% in the Boston and Stanford studies.
Fig. 6. Enrichment plots for poor outcome signatures across lung cancer studies. Enrichment plots for the SBoston and SMichigan signatures of poor outcome against the Michigan and Boston data sets, respectively. Signatures are defined as those genes in the set of top 100 outcome markers that are also represented in the other study. The signatures are scored against data from genes represented in both studies.
Fig. 7. Enrichment plots for the original and current Gene Set Enrichment Analysis (GSEA) methods for the set of genes up-regulated by p53 in the p53 wild-type phenotype.