Skip to main content
. 2007 Nov 21;2(11):e1195. doi: 10.1371/journal.pone.0001195

Figure 1. Subclass mapping (SubMap) methodology. Two independent data sets, A and B, are clustered separately, compared and integrated.

Figure 1

(a) Candidate subclasses are defined by clustering A and B (predetermined phenotype can also be used). Marker genes of each candidate subclass in A (Ai) are selected, and mapped onto a gene list ranked according to their differential expression with respect to a subclass of B (Bj). Their over-representation at the top of the ranking is evaluated using the enrichment score (ESAiBj), and significance is assessed as a nominal p-value, pAiBj, by randomly permuting sample class labels in B. This process is repeated by interchanging the role of A and B to compute ESBjAi and pBjAi. (b) Mutual enrichment information, pAiBj and pBjAi, are combined using the Fisher inverse chi-square statistic, Fij. Its significance is estimated based on a null distribution for the Fij generated by randomly picking the nominal-p from corresponding null distributions for ESAiBj and ESBjAi. After multiple hypothesis testing (MHT) correction, p-values for Fij are summarized in the subclass association (SA) matrix. Clustering of the SA matrix reveals subclasses common to A and B.