Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Mar 18.
Published in final edited form as: Nature. 2008 Aug 24;455(7211):401–405. doi: 10.1038/nature07213

Regulatory networks define phenotypic classes of human stem cell lines

Franz-Josef Müller 1,2, Louise C Laurent 1,3, Dennis Kostka 4,, Igor Ulitsky 5, Roy Williams 6, Christina Lu 1, In-Hyun Park 7, Mahendra S Rao 8,9, Ron Shamir 5, Philip H Schwartz 10,11, Nils O Schmidt 12, Jeanne F Loring 1,6
PMCID: PMC2637443  NIHMSID: NIHMS81942  PMID: 18724358

Abstract

Stem cells are defined as self-renewing cell populations that can differentiate into multiple distinct cell types. However, hundreds of different human cell lines from embryonic, fetal, and adult sources have been called stem cells, even though they range from pluripotent cells, typified by embryonic stem cells, which are capable of virtually unlimited proliferation and differentiation, to adult stem cell lines, which can generate a far more limited repertory of differentiated cell types. The rapid increase in reports of new sources of stem cells and their anticipated value to regenerative medicine1, 2 have highlighted the need for a general, reproducible method for classification of these cells3. We report here the creation and analysis of a database of global gene expression profiles (“Stem Cell Matrix”) that enables the classification of cultured human stem cells in the context of a wide variety of pluripotent, multipotent, and differentiated cell types. Using an unsupervised clustering method4, 5 to categorize a collection of ~150 cell samples, we discovered that pluripotent stem cell lines group together, while other cell types, including brain-derived neural stem cell lines, are very diverse. Using further bioinformatic analysis6 we uncovered a protein-protein network (“PluriNet”) that is shared by the pluripotent cells (embryonic stem cells, embryonal carcinomas, and induced pluripotent cells). Analysis of published data showed that the PluriNet appears to be a common characteristic of pluripotent cells, including mouse ES and iPS cells and human oocytes. Our results offer a new strategy for classifying stem cells and support the idea that pluripotence and self-renewal are under tight control by specific molecular networks.


Cultured cell populations are traditionally classified as having the qualities of stem cells by their expression of immunocytochemical or PCR markers.7 This approach can often be misleading if these markers are used to categorize novel stem cell preparations or predict inherent multi- or pluripotent features.8 To develop a more robust classification system, we created a framework for identifying putative novel stem cell preparations by their whole genome mRNA expression phenotypes (Figure 1). The core reference dataset, which we call the Stem Cell Matrix, includes cultures of human cells that have been reported to have either stem cell or progenitor qualities, including human embryonic stem cells, mesenchymal stem cells, and neural stem cells. To provide the context in which to place the stem cells, we included non-stem cell samples such as fibroblasts and differentiated embryonic stem cell derivatives. To avoid biasing the classification methods, it was critical that we designate the input cell types with terminology that carried as little preconception about their identity as possible. Our nomenclature (“Source Code”) has two components: the first is the tissue or cultured cell line of origin. The second term captures a description of the culture itself. Supplementary Tables 1 – 8 summarize the descriptions of the core samples and their assigned Source Codes.

Figure 1. Sample collection and analysis for the Stem Cell Matrix.

Figure 1

Cell preparations for the Stem Cell Matrix are cultured in the authors’ laboratory or collected from other sources worldwide. Samples are assigned source codes that capture their biological origin and an relatively unbiased description of the cell type (such as BNLin for brain-derived neural lineage). Samples are collected and processed at a central lab for microarray analysis on a single Illumina BeadStation instrument.

The genomics data are processed by unsupervised algorithms that are capable of grouping the samples based on non-obvious expression patterns encoded in transcriptional phenotypes. For pathway discovery, existing high-content databases with experimental data (e.g. protein-protein-interaction data or gene sets) are combined with our transcriptional database, a priori assumed identity of cell types and bootstrapped sparse non-negative matrix factorization (sample clustering) to produce metadata that can be mined with Gene Set Analysis software and topology-based gene set discovery methods (systems wide network analysis). Web-based, computer-aided visualization methodologies can be used by researchers to formulate testable hypotheses and generate results and insights in stem cell biology.

Two exemplary results we report in this paper are the classification of novel stem cell types in the context of other better understood stem cell preparations, and a molecular map of interacting proteins which appear to function in concert in pluripotent stem cells.

To sort the cell types we used an unsupervised machine learning approach to cluster transcriptional profiles of the cell preparations into stable distinct groups. Sparse nonnegative matrix factorization (sNMF) was adjusted for this task by implementing a bootstrapping algorithm to find the most stable groupings (see also Supplementary Discussion 1).4, 5 The stability of the clustering9 indicated that the dataset most likely contained about twelve different types of samples (Figure 2; Supplementary Method 2). The composition of the stable clusters revealed both predictable and unpredicted groupings of a priori designations (Figure 2 and Supplementary Figure 1). The twenty samples identified as undifferentiated human pluripotent stem cell (PSC) preparations were grouped together in one dominant cluster (Figure 2, Cluster 1) and one secondary cluster (Figure 2, Cluster 5). Sixty-two of the samples were brain-derived cells that were described as neural stem or progenitor cells based on their source, culture methods and classical markers. Most of the designated neural stem cells were distributed among multiple clusters, indicating a great deal of diversity in neural stem cell preparations. But one group of the brain-derived lines, those derived from surgical specimens from living patients (HANSE cells, see below), remained together throughout the iterative clusterings (Figure 2, Cluster 6; Supplementary Figure 3; Supplementary Method 1). The HANSE cell group consisted of transcriptional profiles that were derived from neurosurgical specimens following published protocols for multipotent neural progenitor derivation and propagation.10, 11 These cells expressed markers that are commonly used to identify neural stem cells12 (see Supplementary Figure 4), but the clustering clearly separated them from the other samples that had been derived from postmortem brains of prematurely born infants (see Figure 2).10,11

Figure 2. Clusters of samples based on machine learning algorithm.

Figure 2

Samples were distributed on the basis of their transcriptional profiles into consensus clusters using sNMF.

A. Consensus matrix from consensus clustering results (center matrix plot). The consensus matrix is a visual representation of the clustering results and the separation of the sample clusters from each other. Blue indicates no consensus, and red very high consensus. The numbers (1-12) on the diagonal row of clusters indicate the number assigned to the cluster by sNMF. These numbers (“Cluster 1” …“Cluster 12”) are used throughout the text to indicate the group of samples in that cluster. The bar graph above the consensus matrix plot shows the summary statistics assessing the overall quality of each cluster. The cluster consensus value (0-1) is plotted above the corresponding cluster in the matrix plot. Note that most clusters (Clusters 10, 12, 6, 4, 9, 1, 8, 11, 7, 2) have a high quality measurement. To the left of the consensus matrix is another view of the consensus data, visualized as a dendrogram. This is a representation of the hierarchical clustering tree of the consensus matrix

B. The content of the sample clusters resulting from the same sNMF run are displayed. Numbers are the same cluster numbers assigned by the consensus clustering algorithm that are used throughout the text and figures. For more information on samples and Source Code and references see Supplementary Tables 1 – 10.

# Number of samples,

¶ Samples were derived from adult brain specimens

We tested the ability of our dataset to categorize additional preparations by adding 66 samples comprising new cultures derived from PSC lines that were already in the matrix, preparations that were not yet included (but their presumptive cell type was already represented), or new cell types. We chose two new types of cells: a differentiated cell type (umbilical vein endothelial cells [HUVEC]) and a recently developed new source of pluripotent cells, induced pluripotent stem cells13-16 (iPSC, Supplementary Table 9). iPSCs have been generated from somatic cells, including adult fibroblasts, by genetic manipulation of certain transcription factors.13, 15-17 We re-computed clustering results including the test dataset (Supplementary Table 10). All of the HUVEC samples clustered together and formed a distinct group. Most of the additional PSC lines (human ES cells [embryonic PSC; ePSC] and iPSCs) from several different labs were placed into a context that contained solely PSC lines. The three additional germ cell tumor lines clustered together with the tumor-derived pluripotent stem cell (tPSC) line 2102Ep and samples of three human ES cell lines: BG01v18, Hues719, and Hues1319. BG01v is an established aneuploid variant line and the two Hues lines were aneuploid variants of the originally euploid lines (not shown).

We used a combination of analysis tools to explore the basis of the unsupervised classification of the samples in the core dataset. Gene Set Analysis3 (GSA) is a means to identify the underlying themes in transcriptional data in terms of their biological relevance.

GSA uses lists of genes5 that are related in some way; the common criterion is that the relationships among the genes in the lists are supported by empirical evidence.20 GSA highlighted numerous significant differences among the computationally defined categories. (See Supplementary Figure 2, Supplementary Table 11 and Supplementary Online Materials).

While GSA is valuable for discovering specific differences among sample groups, it is limited to curated gene lists and cannot be used to discover new regulatory networks. The MATISSE algorithm6 (http://acgt.cs.tau.ac.il/matisse) takes predefined protein-protein interactions (e.g. from yeast-two-hybrid screens) and seeks connected subnetworks that manifest high similarity in sample subsets. The modified version used in this analysis is capable of extracting sub-networks that are co-expressed in many samples but also significantly up- or down-regulated in a specific sample cluster. Since the PSC preparations were consistently clustered together we used MATISSE to look for distinctive molecular networks that might be associated with the unique PSC qualities of pluripotence and self-renewal. A Nanog-associated regulatory network has been outlined in mouse embryonic PSC,21 and we looked for the elements of this network in human PSCs using our unbiased algorithm. We found that the algorithm predicts that human PSC possess a similar NANOG-linked network (Figure 3a; elements labelled in red). However, we also discovered that the human NANOG network appears to be integrated as a small component of a much larger protein-protein interaction network that is up-regulated in human PSCs (Figure 3). Remarkably, this PSC-specific network (termed Pluripotency associated Network, PluriNet) contains key regulators that are involved in the control of cell cycle, DNA replication, DNA repair, DNA methylation, SUMOylation, RNA processing, histone modification and nucleosome positioning (see also Supplementary Discussion 2 and www.openstemcellwiki.org). Many of the genes in the PluriNet have been linked to embryogenesis, tumorigenesis, and aging (Figure 3c and Supplementary Figure 6). We further explored the hypothesis that pluripotency is closely linked to PluriNet expression by analyzing published gene expression datasets from human oocytes, various types of PSCs, and murine embryos (see Table 1 for a summary of our findings in various model systems). Analysis of a microarray dataset22 that spans development from murine oocytes to the late blastocyst stage revealed that the PluriNet expression is dynamic and up-regulated during early mammalian embryogenesis (Table1; Supplementary Figures 7 - 9).23 Also, our preliminary analyses indicate that the PluriNet is strongly up-regulated in mouse PSCs, mouse iPSCs, and mouse epiblast-derived stem cells24 when compared to somatic cells. Therefore the PluriNet may be useful as a biologically inspired gauge for classifying both murine and human PSC phenotypes (Table 1; Supplementary Figures 10 – 13).

Figure 3. Pluripotent Stem Cell-specific protein-protein interaction network detected by MATISSE.

Figure 3

Clusters from the sNMF k=12 analysis were used in combination with the transcriptional database to identify protein-protein interaction networks enhanced in PSC.

A. A large differentially expressed connected subnetwork (“PluriNet”) shows the dominance of cell cycle regulatory networks in PSC (see legend). All of the dark blue symbols are genes that are highly expressed in most PSCs compared to the other cell samples in the dataset. Front nodes as represented by Stem Cell Matrix expression data and back nodes as inferred by MATISSE are displayed with different colour shades.6 Highlighted in red are the interactions of a group of proteins associated with pluripotency in murine ePSC21. Interestingly, this subnetwork shows a significant enrichment in genes that are targeted in the genome by the transcription factors NANOG (p=5.88 * 10-4), SOX2 (p=0.058) and E2F (p=1.29 * 10-16, all p-values are Bonferroni corrected). For an interactive visualization of PluriNet, see www.stemcellmatrix.org.

B. Heat map-like visualization of PluriNet genes for samples from the test dataset: HUVEC (UC-EC, a-b, derived from three independent individuals), germ cell tumor derived pluripotent stem cells (tPSC-UN, d-f, lines GCT-C4, GCT-72, GCT-27X, derived from three independent individuals), induced pluripotent stem cells (iPSC-UN, g-i, BJ1-iPS12, MSC-iPS1, hFib2-iPS5 three independently derived lines from different somatic sources) and embryonic stem cells (ePSC-UN, j-l, lines Hues22, HSF6, ES2, derived from three independent blastocysts in three independent labs). Most PluriNet genes are markedly up-regulated in iPSC-UN and ePSC-UN. tPSC-UN do show a less consistent expression pattern. UC-EC show lower expression levels of most PluriNet genes. Please refer to Supplementary Figure 5 for a larger version of the same Net-Heatmaps

C. Analysis of genes from PluriNet in the context of phenotypes, which have been reported to result from specific genetic manipulations (e.g. gene knock-out) in mice in the MGI 3.6 phenotype ontology database (http://www.informatics.jax.org/). We find significant overrepresentation of phenotypes “lethality (perinatal/embryonic)”, “tumorigenesis”, “cellular”, “embryogenesis”, “reproductive system” and “life span and aging” among the genes in PluriNet. Although these broad categories might be rather unspecific surrogate markers for PSC function in mammals, this analysis might point towards PluriNet’s role in vivo. For more details, see also Supplementary Figure 6 and Supplementary Table 12.

Table 1.

PluriNet Expression patterns in various model system for pluripotecy

A: Expression of PluriNet genes
in murine model systems
MII Oocytes up-regulateda
Zygote up-regulateda
Embryo (2 cell–Blastocyst) up-regulateda
ePSC up-regulatedb
EpiSC up-regulatedb
iPSC up-regulatedb
Fibroblasts (normal) down-regulatedb
Fibroblasts (transformed) down-regulatedb
B: Successful PluriNet based, post-hoc classification for …
in murine model systems pluripotency germ-line transmission
ePSC yesc yesc
EpiSC yesc yesc
iPSC yesc yesc
Fibroblasts (normal) yesc yesc
Fibroblasts (transformed) yesc yesc
C: Expression of PluriNet genes
in human model systems
MII Oocytes up-regulatedd
tPSC up-regulatede
ePSC up-regulatede/f
iPSC up-regulatede/f
ePSC derived cell types down-regulatedf
somatic cell types down-regulatede/f
somatic cancer line (HeLa) down-regulatedg
D: Successful PluriNet based, post-hoc classification for …
in human model systems pluripotency
tPSC yesh
ePSC yesh
iPSC yesh
ePSC derived cell types yesh
somatic cell types yesh

graphic file with name nihms81942u1.jpg

a

for more details see Supplementary Figure 8

b

for more details see Supplementary Figures 9 and 10

c

for more details see Supplementary Figure 10

d

for more details see Supplementary Figure 7

e

for more details see Figure 3B, Supplementary Figures 5 and 12

f

for more details see Supplementary Figure 11

g

for more details see Supplementary Discussion 2 PluriNet and Cell Cycle

h

for more details see Supplementary Figure 12

PAM – Prediction Analysis of Microarray classifier with leave-one-out cross validation27;

“yes” in Table 3B and 3D stands for: correct classification of pluripotent state (pluripotent or not pluripotent) in > 90% of samples.

This table summarizes the expression patterns of PluriNet in various model systems of pluripotence and differentiation. More details on the specific tests and explanations of the data sources for the results can be found at the respective Supplementary Figures and Materials as listed in the above.

In summary, our data indicate that an unbiased global molecular profiling approach combined with a transcriptional phenotype collection using suitable machine learning algorithms can be used to understand and codify the phenotypes of stem cells.4, 5, 25 Although it is more extensive than any stem cell dataset reported to date, we consider our database and the PluriNet to be a work in progress. As more direct evidence for protein-protein interactions in human cells becomes available, it will be possible to refine the networks we’ve defined and make them more useful for testing hypotheses about the nature of stem cell pluri- and multipotence. Also, our sample collection is limited to pluri- and multipotent stem cell types that grow well in culture, and does not include some of the most well-studied lineages, such as hematopoietic stem cells. Resolution and reliability of a context-based unsupervised classification can be expected to grow with the breadth and depth of the database content.26 Even with these limitations, we have shown that the dataset and PluriNet have already proved useful for categorizing cell types using unbiased criteria. As more stem cell populations become available, cultured by new methods, isolated from new sources, or induced by new methods, we will use the PluriNet and the Stem Cell Matrix as a reference system for phenotyping the cells and comparing them with existing cell lines.

Methods Summary

For an overview of the general workflow, please also refer to Figure 1. A detailed list of the samples, culture methods and reference publications is provided in the Supplementary materials.11. Generally, RNA from each sample was prepared from approximately 1 × 106 cultured cells. Sample amplification, labeling and hybridization on Illumina WG8 and WG6 Sentrix BeadChips were performed for all arrays in this study according to the manufacturer’s instructions (http://www.illumina.com) at a single Illumina BeadStation facility. We used the Consensus Clustering framework9 to cluster transcription profiles and to assess stability of the results. As the algorithm, we used sparse non-negative matrix factorization.5 For data perturbation, 30 sub-sampling runs were performed for each considered number of clusters (k). In each run, 80% of the data was subjected to ten random restarts. The R-script can be downloaded at the accompanying website www.stemcellmachinelearning.org. Details on the application of GSA,20 PAM,27 MATISSE6 as well as publicly available datasets used in this study can be found in the Methods section. We modified the MATISSE6 computational framework to fit the goals of this study. For the present analysis we used the human physical interaction network that we had previously assembled6 and augmented it with additional interactions from recent publications.21, 28 29 The 64 interactions in Wang et al. 200621 were mapped to the corresponding human orthologs using the NCBI Homologene database. The microarray data has been deposited at NCBI GEO (GEO series accession number: GSE11508). It can also be accessed, processed and downloaded at www.stemcellmesa.org.

Supplementary Material

Supplementary

Supplementary Information is linked to the online version of the paper on www.nature.com/nature.

Acknowledgments

We thank Chris Stubban, Helga Dittmer, Svenja Zapf and Hildegard Meissner for their work with various cell cultures. We are grateful to Dustin Wakeman, Rodolfo Gonzalez, Scott McKercher, Jean Pyo Lee, Hyun-Sook Park, and Shin Yong Moon for sharing their cell preparations for the type collection. We are especially grateful to Robin Wesselschmidt and Martin Pera for their unique GCT lines and George Daley for providing human iPSCs. Arif Murat Kocabas and Jose Cibelli shared their human oocyte expression data with us. Aaron Barsky let us use the CEREBRAL 2.0 plug-in before its publication. Maggie Rosentraeger helped to compile the cell culture meta-data. We thank Josef Aldenhoff, Dunja Hinze-Selch, Manfred Westphal, Katrin Lamszus, Uwe Kehler, David Barker, and Anja Fritz for their support and discussions of this project.

Financial support This study has been supported by the following grants and awards: Christian-Abrechts University Young Investigator Award (FJM), SFB-654/C5 Sleep and Plasticity (FJM and Dunja Hinze-Selch), Hamburger Krebsgesellschaft Grant (NOS), Edmond J. Safra Bioinformatics program fellowship at Tel-Aviv University (UI), Converging Technologies Program of The Israel Science Foundation Grant No 1767.07 (RS), Raymond and Beverly Sackler Chair in Bioinformatics (RS), Reproductive Scientist Development Program Scholar Award K12 5K12HD000849-20 (LL), California Institute for Regenerative Medicine Clinical Scholar Award (LL), NIH P20 GM075059-01 (JFL), the Alzheimer’s Association (JFL), and anonymous donations in support of stem cell research.

References

  • 1.Müller FJ, Snyder EY, Loring JF. Gene therapy: can neural stem cells deliver? Nat Rev Neurosci. 2006;7:75–84. doi: 10.1038/nrn1829. [DOI] [PubMed] [Google Scholar]
  • 2.Murry CE, Keller G. Differentiation of embryonic stem cells to clinically relevant populations: lessons from embryonic development. Cell. 2008;132:661–80. doi: 10.1016/j.cell.2008.02.008. [DOI] [PubMed] [Google Scholar]
  • 3.Adewumi O, et al. Characterization of human embryonic stem cell lines by the International Stem Cell Initiative. Nat Biotechnol. 2007;25:803–16. doi: 10.1038/nbt1318. [DOI] [PubMed] [Google Scholar]
  • 4.Brunet JP, Tamayo P, Golub TR, Mesirov JP. Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci U S A. 2004;101:4164–9. doi: 10.1073/pnas.0308531101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Gao Y, Church G. Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics. 2005;21:3970–5. doi: 10.1093/bioinformatics/bti653. [DOI] [PubMed] [Google Scholar]
  • 6.Ulitsky I, Shamir R. Identification of functional modules using network topology and high-throughput data. BMC Syst Biol. 2007;1:8. doi: 10.1186/1752-0509-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Carpenter MK, Rosler E, Rao MS. Characterization and differentiation of human embryonic stem cells. Cloning Stem Cells. 2003;5:79–88. doi: 10.1089/153623003321512193. [DOI] [PubMed] [Google Scholar]
  • 8.Goldman B. Magic Marker Myths. Nature Reports Stem Cells 2008. 2008 [Google Scholar]
  • 9.Monti S, Tamayo P, Mesirov J, Golub T. Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning. 2003;52:91–118. [Google Scholar]
  • 10.Palmer TD, et al. Cell culture. Progenitor cells from human brain after death. Nature. 2001;411:42–3. doi: 10.1038/35075141. [DOI] [PubMed] [Google Scholar]
  • 11.Schwartz PH, et al. Isolation and characterization of neural progenitor cells from post-mortem human cortex. J Neurosci Res. 2003;74:838–51. doi: 10.1002/jnr.10854. [DOI] [PubMed] [Google Scholar]
  • 12.Kornblum HI, Geschwind DH. Molecular markers in CNS stem cell research: hitting a moving target. Nat Rev Neurosci. 2001;2:843–6. doi: 10.1038/35097597. [DOI] [PubMed] [Google Scholar]
  • 13.Takahashi K, Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006;126:663–76. doi: 10.1016/j.cell.2006.07.024. [DOI] [PubMed] [Google Scholar]
  • 14.Takahashi K, et al. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell. 2007;131:861–72. doi: 10.1016/j.cell.2007.11.019. [DOI] [PubMed] [Google Scholar]
  • 15.Yu J, et al. Induced Pluripotent Stem Cell Lines Derived from Human Somatic Cells. Science. 2007 doi: 10.1126/science.1151526. [DOI] [PubMed] [Google Scholar]
  • 16.Park IH, et al. Reprogramming of human somatic cells to pluripotency with defined factors. Nature. 2008;451:141–6. doi: 10.1038/nature06534. [DOI] [PubMed] [Google Scholar]
  • 17.Okita K, Ichisaka T, Yamanaka S. Generation of germline-competent induced pluripotent stem cells. Nature. 2007 doi: 10.1038/nature05934. [DOI] [PubMed] [Google Scholar]
  • 18.Zeng X, et al. BG01V: a variant human embryonic stem cell line which exhibits rapid growth after passaging and reliable dopaminergic differentiation. Restor Neurol Neurosci. 2004;22:421–8. [PubMed] [Google Scholar]
  • 19.Cowan CA, et al. Derivation of embryonic stem-cell lines from human blastocysts. N Engl J Med. 2004;350:1353–6. doi: 10.1056/NEJMsr040330. [DOI] [PubMed] [Google Scholar]
  • 20.Bradley Efron RT. On testing the significance of sets of genes. The Annals of Applied Statistics. 2007;1:107–129. [Google Scholar]
  • 21.Wang J, et al. A protein interaction network for pluripotency of embryonic stem cells. Nature. 2006;444:364–8. doi: 10.1038/nature05284. [DOI] [PubMed] [Google Scholar]
  • 22.Wang QT, et al. A genome-wide study of gene activity reveals developmental signaling pathways in the preimplantation mouse embryo. Dev Cell. 2004;6:133–44. doi: 10.1016/s1534-5807(03)00404-0. [DOI] [PubMed] [Google Scholar]
  • 23.Chambers I, et al. Nanog safeguards pluripotency and mediates germline development. Nature. 2007;450:1230–4. doi: 10.1038/nature06403. [DOI] [PubMed] [Google Scholar]
  • 24.Tesar PJ, et al. New cell lines from mouse epiblast share defining features with human embryonic stem cells. Nature. 2007;448:196–9. doi: 10.1038/nature05972. [DOI] [PubMed] [Google Scholar]
  • 25.Golub TR, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–7. doi: 10.1126/science.286.5439.531. [DOI] [PubMed] [Google Scholar]
  • 26.Donoho D, Stodden V. When Does Non-Negative Matrix Factorization Give Correct Decomposition into Parts? Advances in Neural Information Processing Systems NIPS*2003 Online Papers. 2003 [Google Scholar]
  • 27.Lacayo NJ, et al. Gene expression profiles at diagnosis in de novo childhood AML patients identify FLT3 mutations with good clinical outcomes. Blood. 2004;104:2646–54. doi: 10.1182/blood-2003-12-4449. [DOI] [PubMed] [Google Scholar]
  • 28.Ewing RM, et al. Large-scale mapping of human protein-protein interactions by mass spectrometry. Mol Syst Biol. 2007;3:89. doi: 10.1038/msb4100134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mishra GR, et al. Human protein reference database--2006 update. Nucleic Acids Res. 2006;34:D411–4. doi: 10.1093/nar/gkj141. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

Supplementary Information is linked to the online version of the paper on www.nature.com/nature.

RESOURCES