Abstract
Much of the excitement generated by induced pluripotent stem cell technology is concerned with the possibility of disease modeling as well as the potential for personalized cell therapy. However, to pursue this it is important to understand the ‘normal’ pluripotent state including its inherent variability. We have performed various molecular profiling assays for 21 hESC lines and 8 hiPSC lines to generate a comprehensive snapshot of the undifferentiated state of pluripotent stem cells. Analysis of the gene expression data revealed no iPSC-specific gene expression pattern in accordance with previous reports. We further compared cells, differentiated as embryoid bodies in 2 media proposed to initiate differentiation towards separate cell fates, as well as 20 adult tissues. From this analysis we have generated a gene list which defines pluripotency and establishes a baseline for the pluripotent state. Finally, we provide lists of genes enriched under both differentiation conditions which show the proposed bias toward independent cell fates.
Introduction
As alternatives to human embryonic stem cells (hESCs), such as induced pluripotent stem cells (hiPSCs) (Park et al., 2008; Takahashi et al., 2007; Takahashi and Yamanaka, 2006; Yu et al., 2007) are explored, an accurate definition of what constitutes pluripotency becomes important. Continued progress toward realizing the potential of human pluripotent stem cells will be facilitated by robust datasets and complementary resources that are easily accessed and interrogated by the stem cell community. Many genome-wide microarray expression studies have been performed on hESCs using a variety of different technologies (Bock et al., 2011; Chin et al., 2009; Liu et al., 2006; Muller et al., 2011; Rao et al., 2004; Skottman et al., 2005; Sperger et al., 2003 and reviewed in Bhattacharya et al., 2009). To complement the existing data, we report here the establishment of the Human Pluripotent Stem Cell Database at the National Institutes of Health (NIH), StemCellDB, where we provide an in-house dataset of pluripotent human stem cells. StemCellDB provides data on all twenty one hESC lines available on the pre-2008 NIH Human Pluripotent Stem Cell Registry and eight human induced pluripotent stem cells (iPSCs), derived in-house by retroviral transduction of human fibroblasts. To facilitate comparisons of gene expression data between human pluripotent stem cells for the casual user, in both the undifferentiated and differentiated states, we have created a user-friendly search engine. This may be accessed directly at http://stemcelldb.nih.gov or through the ‘Searchable Databases’ link on the NIH Stem Cell Unit homepage, http://stemcells.nih.gov/research/nihresearch/scunit/. Here, a single gene portal allows users to examine individual genes for expression under all culture conditions.
To demonstrate the value of the database, we have compared the microarray gene expression profiles from undifferentiated and differentiated hESCs, as well as from 20 adult tissues and provide a list of 169 gene probes which can be used to define pluripotency at the gene expression level. Although overall gene expression is similar in the hESC lines, reproducible differences in expression between certain genes are observed. In addition to gene expression microarray data, StemCellDB provides access to data for single nucleotide polymorphism (SNP) genotyping, array-based comparative genomic hybridization (aCGH), miRNA array and DNA methylation analysis from matched samples (http://stemcelldb.nih.gov). The data may also be accessed through the NCBI GEO public database (Superseries number GSE34200). This facilitates interrogation and comparison of transcriptional regulation to advance our understanding of the pluripotent state. Taken together, the data deposited in StemCellDB constitute a benchmark reference data set which should be of great interest to the scientific community.
Materials and methods
Human ES cell culture
All culture reagents were acquired from Invitrogen unless stated otherwise. Standard culture conditions of 37 °C, 5% CO2 and 95% humidity were maintained for all cells. Cell lines used and their suppliers are listed in Table 1.
Table 1.
Cell line | Supplier name | Supplier | Passage # | Karyotype | Chr 12&17 fish |
---|---|---|---|---|---|
BG01 | hESBGN-01 | BresaGen, Inc | 79 | Normal | Normal |
BG02 | hESBGN-02 | BresaGen, Inc | 54 | Normal | Normal |
BG03 | hESBGN-03 | BresaGen, Inc | ND | ND | ND |
ES01 | HES-1 | ES Cell International | 72 | Normal | ND |
ES02 | HES-2 | ES Cell International | 49 | Normal | Normal |
ES03 | HES-3 | ES Cell International | 88 | Normal | Normal |
ES04 | HES-4 | ES Cell International | 76 | Normal | Normal |
ES05 | HES-5 | ES Cell International | 59 | Normal | Normal |
ES06 | HES-6 | ES Cell International | 62 | Normal | Normal |
SA01 | Sahlgrenska-1 | Cellartis AB | 32 | Normal | Normal |
SA02 | Sahlgrenska-2 | Cellartis AB | 39 | Abnormal a | Normal |
TE03 | I3 | Technion – Israel Institute of Technology | 70 | Normal | Normal |
TE04 | I4 | Technion – Israel Institute of Technology | ND | ND | ND |
TE06 | I6 | Technion – Israel Institute of Technology | 64 | Abnormal b | Normal |
UC01 | HSF-1 | University of California, San Francisco | 64 | Normal | 1/200 trisomy 12 |
UC06 | HSF-6 | University of California, San Francisco | 59 | Normal | Normal |
UC06 | HSF-6 | University of California, San Francisco | 114 | Normal | Normal |
WA01 | H1 | WiCell Research Institute | 57 | Normal | Normal |
WA07 | H7 | WiCell Research Institute | 54 | Normal | 2/200 trisomy 17 |
WA09 | H9 | WiCell Research Institute | 45 | Normal | Normal |
WA13 | H13 | WiCell Research Institute | ND | ND | ND |
WA14 | H14 | WiCell Research Institute | 40 | Normal | ND |
NIH-i1 | Neonatal HFF | NIH/Vogel Lab | 16 | Normal | Normal |
NIH-i2 | AG20443 | Coriell | 24 | Normal | Normal |
NIH-i4 | AG20443 | Coriell | 21 | Abnormal c | Normal |
NIH-i5 | AG20443 | Coriell | 21 | Normal | Normal |
NIH-i7 | AG08395 | Coriell | 21 | Normal | Normal |
NIH-i11 | AG20443 | Coriell | 25 | Abnormal c | Normal |
NIH-i12 | AG08396 | Coriell | 21 | Normal | Normal |
NIH-i13 | AG08396 | Coriell | 18 | Normal | Normal |
Trisomy 13 characteristic of this line.
Nonclonal aberrations in 2/20.
Balanced translocation present in the parent fibroblasts.
Human ES cells (hESCs) were cultured on a feeder-layer of irradiated CF1 mouse embryonic fibroblasts (MEFs) in DMEM:F12(Cat# 11330–032) containing 20% Knockout Serum Replacement (KSR) (Cat# 10828–028), 1 mM glutamine (Cat# 25030–081), 0.1 mM β-mercaptoethanol (β-ME; Sigma), 1× non-essential amino acids (NEAA; Cat# 11140–050) and 4 ng/ml bFGF (R&D Systems) (Cat# 233-FB). Fibroblasts were cultured in DMEM (Cat# 11965–092) containing 10% fetal bovine serum (FBS) (Gemini Bio-products), 2 mM glutamine and 1× NEAA. Fibroblasts were irradiated with ∼6500 rads using a Faxitron RX650 X-irradiator. They were subsequently plated on Falcon 6-well tissue culture dishes, coated with 0.1% gelatin, at a density of 0.1875×106/well. hESCs were plated in small clumps the following day, medium was exchanged every day and colonies were passaged by collagenase treatment every 3–4 days. Briefly, cultures were treated with 1.5 mg/ml collagenase IV for 20–40 min and either tapped sharply or scraped to dislodge colonies. Colonies were allowed to sediment for 5 min, the supernatant was removed and fresh media added. This process was repeated for a total of 3 sediments. At this point cells were triturated to generate colonies of approximately 10–100 cells for passaging or 50–250 cells for embryoid body (EB) formation. Embryoid bodies were cultured in fibroblast medium (FBS; EB_mesend) or in hESC medium without bFGF (EB_ecto) in 60 mm Corning Low Attachment dishes for a total of 8 days. Media were changed by sedimentation every 2 days. An important point to note is that the same lot number of FBS was used for all studies.
Nucleic acid extractions
For Comparative Genomic Hybridization (CGH), genomic DNA was extracted using the Wizard Genomic DNA Purification Kit (Promega) according to the manufacturer's instructions.
For gene expression microarray analysis, RNA was extracted using a modification of the basic Trizol (Invitrogen) protocol. Briefly, 1 ml of Trizol was added to sedimented colonies or EBs and triturated to dissociate the cells. At this point the lysates were stored at −80 °C until all samples for that cell line were collected. Upon thaw, lysates were incubated at room temperature for 10 min, mixed with 200 μl chloroform and centrifuged in a Phase-Lock Gel (heavy) Eppendorf tube (Qiagen). RNA was precipitated from the aqueous phase by the addition of 250 μl of isopropanol and 250 μl of a high salt buffer (0.8 M sodium citrate and 1.2 M NaCl) followed by centrifugation. The RNA pellet was washed twice with 75% ethanol, dried and resuspended in nuclease-free water. RNA was DNase treated for 20 min and the DNase removed using Ambion's DNA-Free kit. Concentration was determined using a NanoDrop ND-1000 UV–VIS spectrophotometer.
Array technologies
Global gene expression analysis was performed using Agilent human One Color Gene Expression Oligo arrays, reagents and software as previously described (Tesar et al., 2007). Comparative genomic hybridization and analysis was performed using Agilent software, reagents and arrays according to the manufacturer's instructions using 3 μg genomic DNA. Control male and female DNA was obtained from Promega. SNP analysis and methylation profiling were conducted by AGTC, Fairfax, VA using the Illumina Human1M-Duov3 and Human Methylation 27 k platforms respectively. MicroRNA arrays were performed using Agilent Human miRNA microarray kits, reagents and software.
Microarray data statistical analysis
The statistical programming language R (http://cran.r-project.org/) was used. Details are also shown in Supplemental Fig. 1. Raw expression measurements for all gene probes for all samples were log (base=2) transformed then quantile normalized. Quality of data was assured via sample-level inspection by Tukey box plot, covariance-based PCA scatter plot and correlation-based Heat Map. Raw expression measurements for samples deemed outliers were discarded and quantile normalization repeated. Gene probes not having at least one expression measurement greater than system noise post normalization were deemed “noise-biased” and discarded. System noise was defined as the lowest observed expression measurement at which the LOWESS (locally weighted scatterplot smoothing) fit of the CV (coefficient of variation) by mean for each gene probe for each class of samples (i.e., “ES undiff”, “ES EB_ecto”, “ES EB_mesend”) grossly deviates from linearity. For gene probes not discarded, expression measurements were floored to equal system noise if less than system noise and were then subject to the one-factor ANOVA (analysis of variance) under BH (Benjamini and Hochberg) FDR (false discovery rate) MCC (multiple comparison correction) condition. Gene probes with a corrected P-value <0.05 were deemed “potentially informative” and subject to the TukeyHSD (honestly significant difference) post-hoc test. Gene probes having a post-hoc P-value <0.05 and a difference of class means ≥1.50 for a specific comparison of classes were deemed to have expression “significantly different” between the two classes. For these gene probes, measurements were subsequently interrogated for association with processing time and/or differences in gender using PolySerial correlation and ANOVA respectively under BH FDR MCC condition (alpha<0.05). Those gene probes having measurements significantly associated with processing time were deemed “processing-biased” while gene probes having measurements significantly associated with differences in gender were deemed “gender-biased”. Annotations and associated functions for each gene probe were obtained using IPA (Ingenuity, Inc.).
Results and discussion
Comparison of hESCs and hiPSCs gene expression profiles
All twenty one hESC lines available on the pre-2008 NIH Human Pluripotent Stem Cell Registry and eight human iPSCs, derived in-house by retroviral transduction of human fibroblasts were adapted to one standard culture protocol. The cells were expanded to assess their identity and genomic integrity. Short Tandem Repeat (STR) and single nucleotide polymorphism (SNP) genotyping confirmed that each line was genetically unique. Cytogenetic and array comparative genomic hybridization (aCGH) analysis showed that most cell lines have a normal chromosome complement (Table 1). In addition, flow cytometry demonstrated that nearly all cells expressed the pluripotent markers POU5F1 (Oct-4) and Tra-1-81. Quality control reports are available on our website, http://stemcelldb.nih.gov.
Covariance principal component analysis (PCA) and Pearson correlation of the gene expression microarray data indicated that hESCs and hiPSCs are grossly similar (class means>0.865) in the undifferentiated and differentiated states (Figs. 1A and B). In no class was any gene found to be exclusively expressed by one population of pluripotent cell versus the other. Thus, in agreement with published reports (Guenther et al., 2010), we conclude by this measure that there is no absolutely unique gene expression profile that can be assigned to hESCs or hiPSCs.
Pluripotency-associated genes
We assessed the expression and regulation of pluripotency markers in hESCs only and generated a list of 489 gene probes which are down-regulated in both differentiation conditions (Supplemental Table 1). Of this list, 169 gene probes were found to be expressed in somatic tissues at a level less than the 5th percentile observed in hESCs and are designated markers of pluripotency (Table 2). Included in this “pluripotency” list are genes involved in maintenance of the pluripotent state such as POU5F1 and NANOG (Figs. 2A and B) as well as many components/targets of the TGFβ-superfamily signaling network including NODAL and TDGF1 (Figs. 2C and D). This is consistent with a requirement for Activin/Nodal signaling in the maintenance of hESCs as described previously (James et al., 2005; Vallier et al., 2004). Also in the “pluripotency” list are gene probes that have not been annotated at this time, raising the possibility of novel pluripotency-associated genes. The use of these 169 probes in a focused array could possibly be used as a fingerprint for pluripotent stem cells.
Table 2.
GeneProbe | Gene | Gene_description |
---|---|---|
A_32_P74847 | LARP7 | La ribonucleoprotein domain family, member 7 |
A_24_P668974 | LARP7 | La ribonucleoprotein domain family, member 7 |
A_24_P383640 | POU5F1P3 | POU class 5 homeobox 1 pseudogene 3 |
A_32_P211752 | LOC100506507 | Hypothetical LOC100506507 |
A_32_P132563 | POU5F1 | POU class 5 homeobox 1 |
A_24_P144601 | POU5F1 | POU class 5 homeobox 1 |
A_24_P214841 | POU5F1 | POU class 5 homeobox 1 |
A_23_P327910 | ZIC3 | Zic family member 3 |
A_23_P140362 | VRTN | Vertebrae development homolog (pig) |
A_23_P204640 | NANOG | Nanog homeobox |
A_23_P25587 | LECT1 | Leukocyte cell derived chemotaxin 1 |
A_23_P329798 | CER1 | Cerberus 1, cysteine knot superfamily, homolog (Xenopus laevis) |
A_23_P59138 | POU5F1 | POU class 5 homeobox 1 |
A_23_P72817 | GDF3 | Growth differentiation factor 3 |
A_23_P380526 | DPPA4 | Developmental pluripotency associated 4 |
A_32_P135985 | TDGF1 | Teratocarcinoma-derived growth factor 1 |
A_23_P127322 | NODAL | Nodal homolog (mouse) |
A_23_P137484 | L1TD1 | LINE-1 type transposase domain containing 1 |
A_23_P374844 | GAL | Galanin prepropeptide |
A_23_P366376 | TDGF1 | Teratocarcinoma-derived growth factor 1 |
A_23_P216149 | TERF1 | Telomeric repeat binding factor (NIMA-interacting) 1 |
A_24_P357266 | GRPR | Gastrin-releasing peptide receptor |
A_32_P220696 | TERF1 | Telomeric repeat binding factor (NIMA-interacting) 1 |
A_23_P137573 | LEFTY2 | Left–right determination factor 2 |
A_24_P90022 | SEPHS1 | Selenophosphate synthetase 1 |
A_24_P192434 | TERF1 | Telomeric repeat binding factor (NIMA-interacting) 1 |
A_23_P207999 | PMAIP1 | Phorbol-12-myristate-13-acetate-induced protein 1 |
A_23_P102471 | MSH2 | MutS homolog 2, colon cancer, nonpolyposis type 1 (E. coli) |
A_24_P392475 | BPTF | Bromodomain PHD finger transcription factor |
A_23_P28153 | SCLY | Selenocysteine lyase |
A_23_P209337 | METTL21A | Methyltransferase like 21A |
A_24_P50458 | TERF1 | Telomeric repeat binding factor (NIMA-interacting) 1 |
A_23_P204246 | PHC1 | Polyhomeotic homolog 1 (Drosophila) |
A_23_P156310 | SKP2 (includes EG:27401) | S-phase kinase-associated protein 2 (p45) |
A_32_P137926 | MMS22L | MMS22-like, DNA repair protein |
A_23_P14821 | GABRB3 | Gamma-aminobutyric acid (GABA) A receptor, beta 3 |
A_32_P87531 | DNAH14 | Dynein, axonemal, heavy chain 14 |
A_23_P256142 | AKIRIN1 | Akirin 1 |
A_24_P162929 | METTL21A | Methyltransferase like 21A |
A_32_P741851 | GLB1L3 | Galactosidase, beta 1-like 3 |
A_24_P118452 | SEPHS1 | Selenophosphate synthetase 1 |
A_23_P47058 | CUZD1 | CUB and zona pellucida-like domains 1 |
A_24_P655268 | LOC729082 | Hypothetical LOC729082 |
A_24_P916586 | BICD1 | Bicaudal D homolog 1 (Drosophila) |
A_23_P156842 | EEF1E1 | Eukaryotic translation elongation factor 1 epsilon 1 |
A_23_P259127 | ESRP1 | Epithelial splicing regulatory protein 1 |
A_32_P76091 | HSPD1 | Heat shock 60 kDa protein 1 (chaperonin) |
A_24_P134727 | TFAM | Transcription factor A, mitochondrial |
A_23_P160336 | LEFTY1 | Left–right determination factor 1 |
A_24_P244699 | NUDT15 | Nudix (nucleoside diphosphate linked moiety X)-type motif 15 |
A_24_P52921 | BCAT1 | Branched chain amino-acid transaminase 1, cytosolic |
A_23_P214907 | MTHFD1L | Methylenetetrahydrofolate dehydrogenase (NADP+dependent) 1-like |
A_32_P213091 | SHISA9 | Shisa homolog 9 (Xenopus laevis) |
A_23_P323094 | PHC1 | Polyhomeotic homolog 1 (Drosophila) |
A_23_P82823 | PINX1 | PIN2/TERF1 interacting, telomerase inhibitor 1 |
A_23_P162256 | DENR | Density-regulated protein |
A_23_P365060 | MDN1 | MDN1, midasin homolog (yeast) |
A_23_P18818 | CNOT6 | CCR4-NOT transcription complex, subunit 6 |
A_23_P148484 | RLIM | Ring finger protein, LIM domain interacting |
A_23_P111373 | MRS2 (includes EG:380836) | MRS2 magnesium homeostasis factor homolog (S. cerevisiae) |
A_23_P203201 | DDX6 | DEAD (Asp-Glu-Ala-Asp) box polypeptide 6 |
A_23_P92410 | CASP3 | Caspase 3, apoptosis-related cysteine peptidase |
A_23_P216118 | UNC5D | Unc-5 homolog D (C. elegans) |
A_23_P214111 | KIF13A | Kinesin family member 13A |
A_23_P138465 | NOLC1 | Nucleolar and coiled-body phosphoprotein 1 |
A_23_P121423 | CDC25A | Cell division cycle 25 homolog A (S. pombe) |
A_23_P136504 | SLC25A21 | Solute carrier family 25 (mitochondrial oxodicarboxylate carrier), member 21 |
A_23_P73220 | FGD6 | FYVE, RhoGEF and PH domain containing 6 |
A_23_P421436 | ADD2 | Adducin 2 (beta) |
A_23_P23356 | RRP15 (includes EG:327053) | Ribosomal RNA processing 15 homolog (S. cerevisiae) |
A_32_P34826 | C21orf88 | Chromosome 21 open reading frame 88 |
A_24_P128977 | G3BP2 | GTPase activating protein (SH3 domain) binding protein 2 |
A_23_P405761 | RRAS2 | Related RAS viral (r-ras) oncogene homolog 2 |
A_23_P70168 | TARS | Threonyl-tRNA synthetase |
A_24_P415260 | DDX21 | DEAD (Asp-Glu-Ala-Asp) box polypeptide 21 |
A_24_P253215 | EMG1 | EMG1 nucleolar protein homolog (S. cerevisiae) |
A_23_P54834 | NIP7 | Nuclear import 7 homolog (S. cerevisiae) |
A_23_P155407 | RTP1 | Receptor (chemosensory) transporter protein 1 |
A_24_P297888 | MTAP | Methylthioadenosine phosphorylase |
A_23_P351215 | SKIL | SKI-like oncogene |
A_32_P1614 | LOC100506054 | Hypothetical LOC100506054 |
A_24_P213794 | CCRN4L | CCR4 carbon catabolite repression 4-like (S. cerevisiae) |
A_23_P10966 | GABRB3 | Gamma-aminobutyric acid (GABA) A receptor, beta 3 |
A_23_P160881 | SMPDL3B | Sphingomyelin phosphodiesterase, acid-like 3B |
A_23_P373119 | HMG4L | High mobility group box 3 pseudogene 1 |
A_23_P27167 | RNASEH1 | Ribonuclease H1 |
A_24_P49747 | HMGB3P24 | High mobility group box 3 pseudogene 24 |
A_23_P213908 | PHAX | Phosphorylated adaptor for RNA export |
A_23_P358417 | TIMM8A | Translocase of inner mitochondrial membrane 8 homolog A (yeast) |
A_24_P902052 | SNHG13 | Small nucleolar RNA host gene 13 (non-protein coding) |
A_24_P24685 | HMGB3P22 | High mobility group box 3 pseudogene 22 |
A_24_P13533 | LRR1 | Leucine rich repeat protein 1 |
A_23_P215484 | CCL26 | Chemokine (C–C motif) ligand 26 |
A_23_P252362 | MRPS30 | Mitochondrial ribosomal protein S30 |
A_24_P943922 | CACHD1 | Cache domain containing 1 |
A_32_P194264 | CHAC2 | ChaC, cation transport regulator homolog 2 (E. coli) |
A_24_P922606 | NUP160 | Nucleoporin 160 kDa |
A_23_P133216 | NLN | Neurolysin (metallopeptidase M3 family) |
A_23_P128991 | SLIRP | SRA stem-loop interacting RNA binding protein |
A_23_P56553 | METTL8 | Methyltransferase like 8 |
A_23_P355075 | CENPN | Centromere protein N |
A_23_P134008 | USP45 | Ubiquitin specific peptidase 45 |
A_23_P41255 | G3BP2 | GTPase activating protein (SH3 domain) binding protein 2 |
A_23_P145724 | C7orf16 | Chromosome 7 open reading frame 16 |
A_23_P87759 | EMG1 | EMG1 nucleolar protein homolog (S. cerevisiae) |
A_23_P56865 | DDX18 | DEAD (Asp-Glu-Ala-Asp) box polypeptide 18 |
A_24_P134626 | TXLNG | Taxilin gamma |
A_24_P234196 | RRM2 | Ribonucleotide reductase M2 |
A_23_P214354 | EXOC2 | Exocyst complex component 2 |
A_23_P5370 | RPRM | Reprimo, TP53 dependent G2 arrest mediator candidate |
A_24_P12573 | CCL26 | Chemokine (C–C motif) ligand 26 |
A_23_P72770 | USP44 | Ubiquitin specific peptidase 44 |
A_24_P272389 | LOC285216 | Methylenetetrahydrofolate dehydrogenase (NADP+dependent) 2, methenyltetrahydrofolate cyclohydrolase pseudogene |
A_23_P54540 | EIF2AK4 | Eukaryotic translation initiation factor 2 alpha kinase 4 |
A_24_P347624 | SNURF | SNRPN upstream reading frame |
A_24_P128085 | RC3H2 | Ring finger and CCCH-type domains 2 |
A_23_P102183 | PNO1 | Partner of NOB1 homolog (S. cerevisiae) |
A_32_P71788 | FKBP4 | FK506 binding protein 4, 59 kDa |
A_23_P204170 | TMPO | Thymopoietin |
A_32_P44775 | C9orf85 | Chromosome 9 open reading frame 85 |
A_23_P143958 | RPL22L1 | Ribosomal protein L22-like 1 |
A_24_P914479 | SNX5 | Sorting nexin 5 |
A_23_P427217 | JMJD1C | Jumonji domain containing 1C |
A_23_P204380 | GNPTAB | N-acetylglucosamine-1-phosphate transferase, alpha and beta subunits |
A_24_P344307 | PSME3 | Proteasome (prosome, macropain) activator subunit 3 (PA28 gamma; Ki) |
A_24_P100664 | MKKS | McKusick-Kaufman syndrome |
A_23_P218918 | FGF2 | Fibroblast growth factor 2 (basic) |
A_24_P314477 | TUBB2B | Tubulin, beta 2B |
A_24_P15754 | TOMM40 | Translocase of outer mitochondrial membrane 40 homolog (yeast) |
A_23_P37497 | MYO1E | Myosin IE |
A_24_P143843 | LOC729566 | Zinc finger and BTB domain containing 8 opposite strand pseudogene 1 |
A_24_P152404 | C10orf76 | Chromosome 10 open reading frame 76 |
A_23_P125001 | RAC3 | Ras-related C3 botulinum toxin substrate 3 (rho family, small GTP binding protein Rac3) |
A_23_P135063 | ||
A_24_P161773 | ||
A_24_P178523 | ||
A_24_P179646 | ||
A_24_P195286 | ||
A_24_P221285 | ||
A_24_P340659 | ||
A_24_P341106 | ||
A_24_P341731 | ||
A_24_P358302 | ||
A_24_P367326 | ||
A_24_P392505 | ||
A_24_P410000 | ||
A_24_P41189 | ||
A_24_P455060 | ||
A_24_P560332 | ||
A_24_P58597 | ||
A_24_P67063 | ||
A_24_P67681 | ||
A_24_P695223 | ||
A_24_P707102 | ||
A_24_P711050 | ||
A_24_P752362 | ||
A_24_P76142 | ||
A_24_P901084 | ||
A_24_P928765 | ||
A_32_P104334 | ||
A_32_P146320 | ||
A_32_P152696 | ||
A_32_P157504 | ||
A_32_P207147 | ||
A_32_P24068 | ||
A_32_P63086 | ||
A_32_P65691 | ||
A_32_P885123 | ||
A_32_P89049 |
Differentiation pathways in two embryoid body culture media
The differentiation conditions selected for embryoid body (EB) differentiation were designated EB_ecto, for ectodermal lineage, and EB_mesend, for mesendodermal lineage. We examined which genes changed under each condition to see if the differentiation media truly affected fate bias. We found 595 gene probes up-regulated in both conditions, 243 gene probes enriched in the EB_ecto condition, and 1086 gene probes enriched in the EB_mesend condition (Supplemental Table 2). Many genes encoding neurectodermal markers, such as PAX6, RAX (Figs.2E and F), LHX2, and LMO1, are detected in the EB_ecto-enriched group, fitting gene ontology analysis indicating roles for this group in nervous system development and function. The 1086 genes enriched in the EB_mesend group include many genes encoding mesendodermal markers, such as MYH6 and TNNT2 (Figs. 2G and H), as well as many HOX and hemoglobin genes. Gene ontology analysis of this group demonstrates roles in cardiogenesis, vasculogenesis as well as muscular development. Taken together, the gene probes found to be up-regulated in EB_ecto or EB_mesend can discriminate lineage differentiation.
Using the StemCellDB gene expression search engine
The StemCellDB website is designed primarily to facilitate interrogation of the gene expression information by a casual user. Upon accessing the site, either directly at http://stemcelldb.nih.gov or through the ‘Searchable Databases’ link on the NIH Stem Cell Unit homepage http://stemcells.nih.gov/research/nihresearch/scunit/, an option to search the Agilent or Affymetrix datasets is presented. The Affymetrix dataset is a minor subset of the Agilent data which we have provided for comparison to other datasets commonly available. Selecting either dataset will not only present various options to search for a gene of interest for a casual user but also allows access to all datasets, for an advanced user, using the GEO submissions link on the sidebar (Fig. 3A). Upon searching the preferred cell type or tissue data, multiple probes may be returned as hits, not all of which may give useful information depending on probe location and other factors. With the Agilent dataset, a pop-up menu gives a snapshot of the data spread for each probe to allow the user to select the most informative probes for further evaluation (Fig. 3B). Once a probe is selected, the data is available for download or for charting. The Agilent dataset provides median normalized data as well as quantile raw-based, quantile median normalized-based and quantile log2 raw-based data, which are used for our analysis (Fig. 3C). Using the drop-down menus, the dataset may be downloaded as PDF, MS-Excel or text formats and charts plotted according to the desired data type. A more detailed tutorial for the use of the gene expression search engine may be found on the navigational sidebar (Fig. 3C).
We have also provided the quantile log2 gene expression data for the cell lines as a Microsoft Excel spreadsheet in Supplemental data (Supplemental Table 3).
Conclusion
Here we report the launch of StemCellDB, a database of molecular profiles which together provide a comprehensive snapshot of human embryonic stem cells in their undifferentiated state including general differentiation potential. As described in other studies, we find no iPSC-specific gene expression pattern under any of the three culture conditions. We have analyzed the data to provide a list of 169 gene probes, which may be used as a fingerprint of pluripotency and show that 2 independent differentiation conditions can upregulate genes associated with different lineages. We have designed a user-friendly search engine to facilitate casual interrogation of the gene expression data. Together, this provides a useful resource for the stem cell community.
Supplementary Material
Acknowledgments
We would like to thank Dr. Pamela Gehron Robey for helpful discussions and Dr. Jeanette Beers for help with the fibroblast culture. This research was supported by the Intramural Research Program of the NIH.
Footnotes
Supplementary data to this article can be found online at http://dx.doi.org/10.1016/j.scr.2012.09.002.
References
- Bhattacharya B, Puri S, Puri RK. A review of gene expression profiling of human embryonic stem cell lines and their differentiated progeny. Curr Stem Cell Res Ther. 2009;4:98–106. doi: 10.2174/157488809788167409. [DOI] [PubMed] [Google Scholar]
- Bock C, Kiskinis E, Verstappen G, Gu H, Boulting G, Smith ZD, Ziller M, Croft GF, Amoroso MW, Oakley DH, et al. Reference maps of human ES and iPS cell variation enable high-throughput characterization of pluripotent cell lines. Cell. 2011;144:439–452. doi: 10.1016/j.cell.2010.12.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chin MH, Mason MJ, Xie W, Volinia S, Singer M, Peterson C, Ambartsumyan G, Aimiuwu O, Richter L, Zhang J, et al. Induced pluripotent stem cells and embryonic stem cells are distinguished by gene expression signatures. Cell Stem Cell. 2009;5:111–123. doi: 10.1016/j.stem.2009.06.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guenther MG, Frampton GM, Soldner F, Hockemeyer D, Mitalipova M, Jaenisch R, Young RA. Chromatin structure and gene expression programs of human embryonic and induced pluripotent stem cells. Cell Stem Cell. 2010;7:249–257. doi: 10.1016/j.stem.2010.06.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- James D, Levine AJ, Besser D, Hemmati-Brivanlou A. TGFbeta/activin/nodal signaling is necessary for the maintenance of pluripotency in human embryonic stem cells. Development. 2005;132:1273–1282. doi: 10.1242/dev.01706. [DOI] [PubMed] [Google Scholar]
- Liu Y, Shin S, Zeng X, Zhan M, Gonzalez R, Mueller FJ, Schwartz CM, Xue H, Li H, Baker SC, et al. Genome wide profiling of human embryonic stem cells (hESCs), their derivatives and embryonal carcinoma cells to develop base profiles of U.S. Federal government approved hESC lines. BMC Dev Biol. 2006;6:20. doi: 10.1186/1471-213X-6-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muller FJ, Schuldt BM, Williams R, Mason D, Altun G, Papapetrou EP, Danner S, Goldmann JE, Herbst A, Schmidt NO, et al. A bioinformatic assay for pluripotency in human cells. Nat Methods. 2011;8:315–317. doi: 10.1038/nmeth.1580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park IH, Zhao R, West JA, Yabuuchi A, Huo H, Ince TA, Lerou PH, Lensch MW, Daley GQ. Reprogramming of human somatic cells to pluripotency with defined factors. Nature. 2008;451:141–146. doi: 10.1038/nature06534. [DOI] [PubMed] [Google Scholar]
- Rao RR, Calhoun JD, Qin X, Rekaya R, Clark JK, Stice SL. Comparative transcriptional profiling of two human embryonic stem cell lines. Biotechnol Bioeng. 2004;88:273–286. doi: 10.1002/bit.20245. [DOI] [PubMed] [Google Scholar]
- Skottman H, Mikkola M, Lundin K, Olsson C, Stromberg AM, Tuuri T, Otonkoski T, Hovatta O, Lahesmaa R. Gene expression signatures of seven individual human embryonic stem cell lines. Stem Cells. 2005;23:1343–1356. doi: 10.1634/stemcells.2004-0341. [DOI] [PubMed] [Google Scholar]
- Sperger JM, Chen X, Draper JS, Antosiewicz JE, Chon CH, Jones SB, Brooks JD, Andrews PW, Brown PO, Thomson JA. Gene expression patterns in human embryonic stem cells and human pluripotent germ cell tumors. Proc Natl Acad Sci U S A. 2003;100:13350–13355. doi: 10.1073/pnas.2235735100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takahashi K, Tanabe K, Ohnuki M, Narita M, Ichisaka T, Tomoda K, Yamanaka S. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell. 2007;131:861–872. doi: 10.1016/j.cell.2007.11.019. [DOI] [PubMed] [Google Scholar]
- Takahashi K, Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006;126:663–676. doi: 10.1016/j.cell.2006.07.024. [DOI] [PubMed] [Google Scholar]
- Tesar PJ, Chenoweth JG, Brook FA, Davies TJ, Evans EP, Mack DL, Gardner RL, McKay RD. New cell lines from mouse epiblast share defining features with human embryonic stem cells. Nature. 2007;448:196–199. doi: 10.1038/nature05972. [DOI] [PubMed] [Google Scholar]
- Vallier L, Reynolds D, Pedersen RA. Nodal inhibits differentiation of human embryonic stem cells along the neuroectodermal default pathway. Dev Biol. 2004;275:403–421. doi: 10.1016/j.ydbio.2004.08.031. [DOI] [PubMed] [Google Scholar]
- Yu J, Vodyanik MA, Smuga-Otto K, Antosiewicz-Bourget J, Frane JL, Tian S, Nie J, Jonsdottir GA, Ruotti V, Stewart R, et al. Induced pluripotent stem cell lines derived from human somatic cells. Science. 2007;318:1917–1920. doi: 10.1126/science.1151526. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.