Supporting information for Klein et al. (2003) Proc. Natl. Acad. Sci. USA, 10.1073/pnas.0437996100
Fig. 5.
Unsupervised discovery of genes expressed by human B cell subpopulations. The 20 data sets corresponding to five samples each of the purified CB, CC, naïve, and memory B cells were examined by unsupervised methods using (i) the pattern discovery-based Genes@Work algorithm (1, 2) (a), or (ii) a hierarchical clustering algorithm based on the average-linkage method (3, 4) (b). (a) Matrices showing the hierarchical clustering of the 20 data sets using the pattern-discovery based Genes@Work algorithm (for details, see ref. 2) in an unsupervised mode. In this approach, the pattern discovery algorithm was used on all samples without any prior knowledge of the properties of any given sample, and the pattern identified will contain a subset of samples and a subset of genes that separate this subset from the rest. The strongest pattern (Left), the significance of which depends on the sample size as well as the number of genes in the pattern, corresponded to a set of genes that separated the CBs and CCs from the naïve and memory B cells. To see whether the 10 samples representing both naïve and memory B cells on the one hand, and the CBs and CCs on the other, would separate the samples according to their cellular derivation, unsupervised pattern discovery was applied selectively on the 10 naïve and memory cell samples (Right) and the 10 CB and CC samples (Center). A strong pattern of genes was thus identified that separated the five memory B cell samples from the five naïve B cell samples. Altogether three patterns with a similar number of genes (which was smaller than the number of genes in the memory/naïve B cell pattern) were identified by using the 10 CB and CC samples for unsupervised analysis. One of those patterns correctly divided the CB and CC samples according to their immunophenotype, i.e. CD77+ or CD77. Thus, the signature discriminating CB and CC is statistically less relevant as compared to the signature of the naïve vs. memory B cells. (Recall that the statistical significance of a pattern increases with the number of genes.) Columns represent individual samples, rows correspond to genes. Color changes within a row indicate expression levels relative to the average of the sample population. Values are quantified by the scale bar that visualizes the difference in the zge score (expression difference/standard deviation) relative to the mean (0). Genes are ranked according to their zg score (mean expression difference of the respective gene between phenotype and control group/standard deviation). Genes known to be differentially expressed among the B cell subpopulations are indicated: CD38, A-myb, CD10, Ki67, Bcl6, in GC cells (5-8) and CD32, CD39, CD44, CD62 (6, 9), and Bcl2 in naïve and memory B cells; specific CD23 expression in naïve B cells (6, 10) and up-regulation of the CD27, CD80, CD86, and CD95 genes in memory B cells; and expression of RAG-1, TdT, and 14.1 surrogate light chain (11) in the CC vs. the CB fraction, although the latter result may be caused by a small fraction of copurifying immature B cells (see Discussion). (b) Dendrogram and matrix showing the clustering of the 20 data sets by using the average linkage-based hierarchical clustering method. To construct the dendrogram, a subset of genes was used out of the total of 12,000 gene segments present on the U95A microarray, whose expression levels vary the most among the 20 samples, and which are thus most informative. For the dendrogram, only genes were chosen whose average change in expression level from the mean across the whole panel was at least 1.3-fold (624 genes selected). Independent analyses were performed by using all genes and only gene segments whose average change in expression level was at least 1.5-fold or more. The expression values of each selected gene are normalized to have zero mean and unit standard deviation. The distance between two individual samples is calculated by Euclidian distance with the normalized expression values. The results show that naïve B cells and memory B cells form separate clusters, whereas the CBs and CCs appear in a single cluster without a strong separation between CBs and CCs. Altogether, the results of the average-linkage analysis are consistent with the results obtained by unsupervised pattern discovery analysis (see above).1. Califano, A., Stolovitzky, G. & Tu, Y. (2000) Proc. Ind. Conf. Intell. Syst. Mol. Biol. 8, 7585.
2. Klein, U., Tu, Y., Stolovitzky, G. A., Mattioli, M., Cattoretti, G., Husson, H., Freedman, A., Inghirami, G., Cro, L., Baldini, L., et al. (2001) J. Exp. Med. 194, 16251638.
3. Hartigan, J. A. (1975) Clustering Algorithms (Wiley, New York).
4. Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. (1998) Proc. Natl. Acad. Sci. USA 95, 1486314868.
5. Cattoretti, G., Chang, C. C., Cechova, K., Zhang, J., Ye, B. H., Falini, B., Louie, D. C., Offit, K., Chaganti, R. S. & Dalla-Favera, R. (1995) Blood 86, 4553.
6. Liu, Y.-J., Barthelemy, C., de Bouteiller, O., Arpin, C., Durand, I. & Banchereau, J. (1995) Immunity 2, 239248.
7. Onizuka, T., Moriyama, M., Yamochi, T., Kuroda, T., Kazama, A., Kanazawa, N., Sato, K., Kato, T., Ota, H. & Mori, S. (1995) Blood 86, 2837.
8. Golay, J., Broccoli ,V., Lamorte, G., Bifulco, C., Parravicini, C., Pizzey, A., Thomas, N. S., Delia, D., Ferrauti, P., Vitolo, D. & Introna, M. (1998) J. Immunol. 160, 27862793.
9. van Der Vuurst De Vries, A. R. & Logtenberg, T. (1999) Eur. J. Immunol. 29, 38983907.
10. Klein, U., Rajewsky, K. & Küppers, R. (1998) J. Exp. Med. 188, 16791689.
11. Meffre, E., Papavasiliou, F., Cohen, P., de Bouteiller, O., Bell, D., Karasuyama, H., Schiff, C., Banchereau, J., Liu, Y. J. & Nussenzweig, M. C. (1998) J. Exp. Med. 188, 765772.