Skip to main content
Physiological Genomics logoLink to Physiological Genomics
. 2011 Aug 9;43(20):1117–1134. doi: 10.1152/physiolgenomics.00099.2011

Transcriptome profiling and sequencing of differentiated human hematopoietic stem cells reveal lineage-specific expression and alternative splicing of genes

Poching Liu 1, Jennifer Barb 2, Kimberly Woodhouse 1, James G Taylor VI 3, Peter J Munson 2, Nalini Raghavachari 1,
PMCID: PMC3217327  PMID: 21828245

Abstract

Hematopoietic differentiation is strictly regulated by complex network of transcription factors that are controlled by ligands binding to cell surface receptors. Disruptions of the intricate sequences of transcriptional activation and suppression of multiple genes cause hematological diseases, such as leukemias, myelodysplastic syndromes, or myeloproliferative syndromes. From a clinical standpoint, deciphering the pattern of gene expression during hematopoiesis may help unravel disease-specific mechanisms in hematopoietic malignancies. Herein, we describe a human in vitro hematopoietic model system where lineage-specific differentiation of CD34+ cells was accomplished using specific cytokines. Microarray and RNAseq-based whole transcriptome and exome analysis was performed on the differentiated erythropoietic, granulopoietic, and megakaryopoietic cells to delineate changes in expression of whole transcripts and exons. Analysis on the Human 1.0 ST exon arrays indicated differential expression of 172 genes (P < 0.0000001) and significant alternate splicing of 86 genes during differentiation. Pathway analysis identified these genes to be involved in Rac/RhoA signaling, Wnt/B-catenin signaling and alanine/aspartate metabolism. Comparison of the microarray data to next generation RNAseq analysis during erythroid differentiation demonstrated a high degree of correlation in gene (R = 0.72) and exon (R = 0.62) expression. Our data provide a molecular portrait of events that regulate differentiation of hematopoietic cells. Knowledge of molecular processes by which the cells acquire their cell-specific fate would be beneficial in developing cell-based therapies for human diseases.

Keywords: hematopoiesis, differentiation, microarrays, gene expression, splicing, qPCR, RNAseq


recent developments in stem cell biology have generated much excitement about the potential for regenerative medicine and cell-based therapies in a variety of clinical applications, such as treating Parkinson's disease, leukemia, and spinal cord injuries (23). Crucial to the success of these applications is the detailed understanding of how the cells remain stem cells and the cues that they require to differentiate and commit themselves to specific cell fates. Given that hematopoietic stem cells are a particularly interesting class of stem cells and a well-characterized cellular differentiation system, a number of studies have recently been undertaken to decipher their genetic program both in culture and in vivo (22).

Hematopoiesis is the process by which all the different cell lineages that form the blood and immune system are generated from a common pluripotent stem cell (28, 35, 36). A complex interplay between the intrinsic genetic processes of hematopoietic cells and their environment, including the effects of specific cytokines such as interleukins and granulocyte/monocyte stimulating factors, determines whether stem cells, lineage-specified progenitors, and mature blood cells self-renew, remain quiescent, proliferate, differentiate, or undergo apoptosis (1, 68, 11, 1618, 24, 34, 37). Catastrophic consequences to aberrant hematopoiesis have been described in diseases such as leukemia, lymphoma, etc. (10, 12, 19, 21, 25, 26, 29). Hence, understanding the nature of the hematopoietic stem cells, as well as the molecular process by which these cells acquire their specific cell fate, is crucial for understanding disease pathogenesis and for the success of cell-based therapies.

As the phenotype of any given cell is ultimately the product of the genes, it is critical to identify gene expression patterns during lineage-specific differentiation. It is also believed that an important source of diversity in the transcriptome of differentiated cells is due to the splicing process in multiexon genes. Alternative splicing is thought to regulate differentiation through coordination of gene networks where each network coordinates a different cell function (4). Current studies of hematopoiesis have mostly examined gene level changes in expression and have not been extended to understand alternative splicing events.

The advancement of genomic technologies has now provided us a platform in the form of Genechip Human Exon 1.0 ST arrays and massively parallel sequencing to study the exome profile of cells. The current study exploits these advances in microarray and sequencing technologies for elucidating the global changes in the whole transcriptome and exome expression during ex vivo lineage-specific hematopoietic cell differentiation.

MATERIALS AND METHODS

Human Granulocyte Colony-stimulating Factor-mobilized CD34+ Peripheral Blood Cells

Human granulocyte colony-stimulating factor (G-CSF)-mobilized CD34+ peripheral blood cells (CD34+ PBCs) were collected by apheresis from healthy volunteers who were given 5 days of G-CSF (10 μg/kg per day). After CD34+ antigen-mediated selection with immunomagnetic beads (ISOLEX300i system; Baxter Healthcare, Deerfield, IL), purified CD34+ PBCs were collected and preserved in liquid nitrogen until use.

Suspension Cultures and Growth Factors

CD34+ PBCs were cultured in X-VIVO10 (BioWhittaker, Walkersville, MD) supplemented with 1% human serum albumin. At least 1 × 106 CD34+ cells were assayed in six-well plates and incubated at 37°C and 5% CO2 in a fully humidified atmosphere in air. Lineage-specific differentiation was induced on CD34+ cells using the method described in Komor et al. (15). In brief, lineage-specific differentiation was induced by the addition of growth factors (R&D Systems, Wiesbaden-Nordenstadt, Germany) stem cell factor (SCF, 50 ng/ml), Flt3-ligand (50 ng/ml), IL-3 (10 ng/ml), erythropoietin (10 U/ml) for erythropoietic differentiation; SCF (50 ng/ml), Flt3-ligand (50 ng/ml), IL-3 (10 ng/ml), G-CSF, and granulocyte/macrophage colony-stimulating factor (each, 10 ng/ml) for granulopoietic differentiation; SCF (50 ng/ml), Flt3-ligand (50 ng/ml), thrombopoietin (20 ng/ml) for megakaryopoietic differentiation. Differentiated cells were harvested on day 11 and purified by immunomagnetic beads using the MACS system using CD71+ microbeads for erythropoietic (E) group, CD15+ microbeads for granulopoietic (G) cells, and CD61+ microbeads for megakaryopoietic (M) cells (15).

Flow Cytometry

Uninduced CD34+ and cultured cells were characterized by dual-color immunofluorescence using a BD FACSCanto flow cytometer. E cells were characterized by staining with an anti-CD71 FITC antibody. Megakaryocytic cells were determined with an anti-CD61 FITC antibody, and G cells were analyzed with anti-CD15 FITC antibodies. Isotype-matched nonspecific antibodies were used as controls. Analysis gates were set to exclude dead cells and debris, with 10,000 viable cells analyzed per sample. Morphology of the flow-sorted differentiated cells was examined by Diff-Quik stain (Dade Behring, Newark, DE) following the manufacturer's protocols. Micrographs were taken with a Leica microscope at ×10 objective lens.

RNA Isolation

Total RNA was extracted using RNeasy mini kit (Qiagen, Valencia, CA) following the manufacturer's directions. Genomic DNA was removed by using the gDNA eliminator spin columns. The concentration of the isolated RNA was determined using the Nanodrop ND-100 spectrophotometer (Nanodrop Technologies, Wilmington, DE). Quality and integrity of the total RNA isolated were assessed on the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA).

Target Preparation and Hybridization to Human Exon 1.0 ST Arrays

Labeling of samples for hybridization to the Exon array was performed according to Affymetrix GeneChip Whole Transcript Sense Target Labeling Assay. One microgram of total RNA was subjected to ribosomal RNA reduction following Invitrogen's ribominus reduction procedure. Double-stranded cDNA was synthesized using random hexamers incorporating a T7 promoter sequence to generate cRNA by in vitro transcription. Labeled single-stranded cDNA in the sense direction was generated from cRNA and used for array hybridization. Hybridization was performed at 45°C overnight, followed by washing and staining using FS450 Fluidics station. Scanning was carried out using the 7G GCS3000 scanner.

Microarray Data Collection and Annotation

Exon-level core robust multiarray average (RMA)-sketch intensity values for each of 12 chips were collected using Affymetrix Expression Console (EC) Software (Affymetrix, Santa Clara, CA). The 287,329 core probe-sets from EC were mapped to genomic address for human genome build hg18 using the Affymetrix probe-set selection region (downloaded from EC, March 2008). These probe-sets were then mapped to genes and exons using a reduced RefSeq gene model (Hg19, downloaded from http://genome.ucsc.edu). A gene model consists of all possible exons for a set of RefSeq transcripts for a single gene. The gene model is necessary to allow for testing of alternative splicing using a statistical approach, described below. Specifically, the reduced gene model consists of a set of nonoverlapping intervals that represent the possible exons within a gene. When more than one core probe-set maps to the same RefSeq exon interval, the average RMA intensity is calculated. This process yields 174,792 distinct exons grouped into 17,457 genes.

The data has been deposited in the Gene Expression Omnibus (GSE29989).

Principal Component Analysis

Data from 12 samples, after reduction to the RefSeq modeled genes and exons, were subjected to a principal component analysis (PCA) for detection of outliers. The four conditions (CD34+, E cells, granulocytes, megakaryocytes) were studied in triplicate. The bi-plot of PC1 vs. PC2, giving rise to 70% of the variability, revealed four distinct clusters, corresponding to the four conditions.

Analysis of Exon Arrays by ExonSVD

The Affymetrix Exon arrays offer the capability for alternative splicing detection, unlike older 3′ IVT arrays, because each exon of the gene is separately probed. The ExonANOVA, a three-factor, nested mixed-effect ANOVA, has been the widely used model for the analysis of exon arrays (http://www.partek.com, http://www.partek.com/html/products/pdf/Brochures/). The ExonANOVA model fits the following formula

yijk=μ+Ai+ACik+Ck+βj(i)+εijk

to the data.

The two fixed factors are the treatment effect (A) and exon effect (C). The random factor (β) is the sample within treatment effect. The fixed interaction between the treatment and exon effects (AC) determines if an alternative splicing event has occurred. This model makes a strong assumption of additivity, which may fail if the probe-set “sensitivity” is low and the data fall in the background range, or if the values become “saturated” at the high end of the data range. These two situations describe a dead or unresponsive probe-set, respectively, but there may be other causes of nonadditivity. Probe-sets exhibiting this behavior can strongly affect the alternative splicing signal and thus can falsely increase the number of detected alternative splicing events. Other assumptions required by the design of the ANOVA model are independence of cases and variance homogeneity. These last two assumptions are generally accounted for by proper experimental design and data transformations.

We introduce a new analytical method, termed the ExonSVD model (30), which aims to overcome the limitations inherent in the ExonANOVA. The ExonSVD model

yijk=μ+AiDk+Eik+Ck+βj(i)+εijk

has three new parameters compared with the ExonANOVA. The A′ and AC terms of the ExonANOVA model have been combined and then split into the three new parameters where A′ represents the treatment effect, D represents the probe sensitivity, and E represents the residual or deviation from the simple model. The E factor is tested for significance to determine if alternative splicing has occurred. This new model alleviates the need to detect and remove dead and unresponsive probe-sets. The P (P–E) values for the E term were generated by numerical simulation, fitting rational polynomials to the sum-of-squares curves, as estimates of degrees of freedom, so that one can generate a statistic with approximately an F distribution, and thereby obtain approximate P values.

Selection of Lineage-specific and Differentially Expressed Genes

Using the ExonSVD, differentially expressed genes were detected using the P value for the A′ parameter (differential expression) and a fold-change cutoff. P values ≤10^−7 and fold changes greater than two for any of the three comparisons (E vs. CD34+, M vs. CD34+, G vs. CD34+) were required. Additionally, a gene was defined to be “specifically upregulated” in a particular cell type (E, M, or G) compared with CD34+ if it was significant at P < 10^−4, and the largest observed change was in that cell type, and the largest change was greater than twofold, and neither of the other cell types showed an upward change >1.4-fold.

Identification of Alternatively Spliced Genes

Alternatively spliced genes were defined as having a P-E (P value for alternative splicing) of ≤10^−7 and a largest deviation of twofold or more where the largest deviation is maxi,k|Ei,kECD34,k|, where i ranges over E, G, and M, and k ranges over the exons in a gene.

Next Generation Sequencing RNA Transcript Analysis on SOLiD Sequencer

Library preparation.

Total RNA (2 μg) from CD34+ and E cells were depleted of rRNA and enzymatically fragmented using 1 unit of RNase III (Ambion) by incubation at 37°C for 10 min. The fragmented RNA was size selected using the flashPAGE fractionator (Ambion) to collect RNA fragments ranging in size from ∼50 to 150 nucleotides in length. The RNA fragments were then ligated to adaptors, converted into cDNA, and amplified by 15 cycles of PCR using the SOLiD RNA Expression Kit (Ambion). The PCR reactions were purified using the Qiagen Mini elute PCR purification kit and separated on a native Novex 6% TBE polyacrylamide gel (Invitrogen). PCR products ranging in size from ∼150 to 200 bp (corresponding to RNA fragment insert sizes of ∼60–110 nucleotides) were cut out of the gel, and the products eluted overnight and precipitated. The gel-purified material was quantitated by Nanodrop and prepared for emulsion PCR and sequencing on an Applied Biosystems SOLiD sequencer (version 3.0; Applied Biosystems, Carlsbad, CA).

Sequence read processing and alignment.

mRNA-Seq sequencing reads were analyzed using Applied Biosystems' whole transcriptome software tools (http://www.solidsoftwaretools.com/). Reads of length 50 bases originating from each sample were first aligned to the human genome (US National Center for Biotechnology Information Build 36.3) using Applied Biosystems' SOLiD System Analysis Pipeline Tool (Bioscope). The aligned reads were mapped to RefSeq exons downloaded from the UCSC Genome Browser (Human genome build hg18), and reads per kilobase per million reads (RPKM) values were obtained for each RefSeq exon. The RPKM calculations were adapted at the exon, gene, and transcript level and can be thought of as a normalized expression level (E) based on the read count across the region of interest.

The calculation for RPKM is as follows: RPKM = 10^9 * C/NL, where C = counts or number of reads falling in the exon, N = total mapped reads, and L = length of the transcript.

RPKM values for each of two CD34+ (CD34+_1, CD34+_2) and two erythropoietic (E_1, E_2) samples were obtained for further analysis with the microarray data.

RNAseq and Microarray Data Correlation Analysis

To compare the RNAseq data to the microarray data at the exon level, it was necessary to match each RefSeq exon RPKM value with the corresponding reduced model exon microarray RMA value (see materials and methods). A total of 149,765 RNAseqmicroarray value pairs were obtained. Correlations between the two methods on exon-level fold changes were computed. Gene-level fold changes were determined by averaging the log fold-change values across exons of each gene.

QPCR Analysis to Determine Gene Expression Values

First-strand cDNA was synthesized using 500 ng of RNA and random primers in a 20 μl reverse transcriptase reaction mixture using Invitrogen's Superscript cDNA synthesis kit (Invitrogen, Carlsbad, CA) following the manufacturer's directions. Quantitative real-time PCR assays were carried out with the use of gene-specific double fluorescently labeled probes in a 7900 Sequence Detector (PE Applied Biosystems, Norwalk, CT). In brief, PCR amplification was performed in a 384-well plate with a 20-μl reaction mixture containing 300 nm of each primer, 200 nm probe, 200 nm dNTP in 1× real-time PCR buffer and passive reference (ROX) fluorochrome. The thermal cycling conditions were 2 min at 50°C and 10 min at 95°C, followed by 40 cycles of 15 denaturation at 95°C and 1 min annealing and extension at 60°C. Samples were analyzed in duplicate and the CT values obtained were normalized to the housekeeping gene β-actin. The comparative CT (ΔΔCT) method (33), which compares the differences in CT values between groups, was used to achieve the relative fold change in gene expression between the four groups in the study.

RESULTS

In Vitro Differentiation of CD 34+ Cells

Lineage-specific cells were characterized by immunophenotyping as shown in Fig. 1A. The dot plots illustrate the expression of lineage-specific cell surface markers that define the undifferentiated CD34+ and differentiated E, G, and M cells, respectively. Figure 1A, top left, shows the unstained CD34+ cells on day 11 as a control for panels at top right and bottom with erythroid, granulocytic, and megakaryocytic cells. After 11 days in culture, out of the stained cells, 98% stained positive for CD71 representing erythropoietic differentiation, 98% stained positive for CD15 representing granulopoiesis, and 78% stained positive for CD61 representing megakaryopoietic differentiation. Flow-sorted differentiated cells were analyzed for cell morphology using the Diff-Quik stain set. Figure 1B shows the micrographs taken with a Leica microscope at ×10 objective lens, thereby confirming the purity of the cells selected for transcriptome analysis.

Fig. 1.

Fig. 1.

A: flow cytometry characterization of CD34+ and differentiated erythropoietic (E), granulopoietic (G), and megakaryopoietic (M) cells by dual-color immunofluorescence using a BD FACSCanto flow cytometer. Unstained CD34+ cells (top left); E cells stained with anti-CD71 FITC antibody (top right); G cells stained with anti-CD15 FITC antibody (bottom left); M cells stained with anti-CD61 FITC antibody (bottom right). B: morphology of differentiated CD34+ cells into E cells (a), G cells (b), and M cells (c).

QPCR Analysis to Determine the Expression Levels of Few Lineage-specific Genes

QPCR was carried out on undifferentiated CD34+ and purified differentiated cells (E, M, and G) to determine the expression levels of genes known to be specific for each of the cell type. Table 1 illustrates the fold change in expression levels of few genes ANK1, GYPA (specific for E cells); FCGRA, MIP2 (specific for G cells); and GPIBA and PF4 (specific for M cells). We observed a significant increase in the expression of genes that are specific for each of the cell type confirming the cytokine-induced differentiation of CD34+ cells to their specific lineages. The E group of cells showed a significant increase in the expression of ANK1 and GYPA compared with the CD34+. Genes known to be specific or abundant for M and G groups, such as FCGR2A and MIP2 for G group and GPIBA and PF4 for M group, showed significant increase in their expression in the respective cell types confirming the lineage-specific differentiation.

Table 1.

Gene expression and QPCR expression fold changes for some lineage-specific genes in hematopoietic differentiation

Symbol Gene Title FC ± SD
E-specific genes E vs. CD34
    ANK1 ankyrin 1 8,192.44 ± 204.81
    GYPA glycophorin A 78,612.44 ± 2,358.88
G-specific genes G vs. CD34
    FCGR2A Fc fragment of IgG, low affinity receptor 2A 4.79 ± 0.09
    MIP2 macrophage inflammatory protein 2 14.24 ± 0.23
M-specific genes M vs. CD34
    GPIBA glycoprotein 1B-alpha 19.16 ± 0.38
    PF4 platelet Factor 4 44.46 ± 0.88

Fold changes (FC) were calculated by comparing the ΔCT values of differentiated group to the ΔCT values of undifferentiated CD34. Values are given as means ± SD (n = 3) in each group.

Abbreviations: E, erythropoietic cells; G, granulopoietic cells; M, megakaryopoietic cells.

Microarray-based Confirmation of Differentiated Cells by Lineage-specific Gene Expression

PCA was first used to identify outliers within groups and to characterize the stem cells and differentiated cells based on their expression profile on microarrays. Principal component 1 (PC1) vs. principal component 2 (PC2) as plotted in Fig. 2A showed a clear segregation and clustering between the four groups of cells. We observed 60% variability in PC1, which accounted for the difference between the M cells and the other three cell types.

Fig. 2.

Fig. 2.

A: bi-plot of the first 2 principal components showing 82% of the variability within the data [principal component 1 (PC1, x-axis), principal component (PC2, y-axis)]. The bi-plot shows a clear separation of the 4 differentiated cell types. Most of the variability in the data represented by PC1 (59.8%) shows megakaryocytes separated from all other cell types. Principal component analysis calculated from averaged RefSeq exons robust multiarray average (RMA) values by RefSeq gene symbol. This ellipse is drawn around the grouping variable (i.e., cell type C-CD34+, E, G, and M) specified for each sample of the principal components. The ellipsoid is computed from the bivariate normal distribution fit to the X and Y variables in JMP statistical software (Cary, NC). The bivariate normal density is a function of the means and standard deviations of the X and Y variables, PC 1 and PC2, and the correlation between the 2 is calculated. B: confirmation of the in vitro differentiation of CD34+ cells by microarray analysis. During lineage-specific differentiation, the expression of well-known marker genes for each hematopoietic lineage was analyzed and the relative increase in expression compared with CD34+ was determined. x-Axis denotes the fold increase in expression of genes for the differentiated E, G, and M groups. y-Axis denotes the lineage-specific genes represented as gene symbol. The genes analyzed include erythropoietic-specific genes erythrocytic Ankyrin 1 (ANK1), Glycophorin A (GYPA), and Tropomodulin (TMOD1); granulopoietic-specific genes FCGR2A and FCGRA2B, Fc fragment of immunoglobulin G, low-affinity IIA and IIb, Cluster of Differentiation 300A (CD300A), and Macrophage Inflammatory Protein 2 (MIP2); megakaryopoietic-specific genes serine proteinase inhibitor, clade E (SERPINE1), Glycoprotein 1B alpha (GPIBA), Platelet factor 4 (PF4), and Prostaglandin-endoperoxide synthase 1 (PTGS1).

In an effort to confirm established lineage-specific differentiation of CD34+ cells, we examined the microarray-based expression of differentiated cell-specific genes (Fig. 2B). As depicted in the figure, when CD34+ cells differentiate into erythrocytic cells, a 2- to 14-fold (log) increase in the expression of ankyrin, glycophorin A, and tropomodulin 1 (ANK1, GYPA, TMOD1) was observed, which code for erythrocyte membrane proteins. Similarly, the granulocyte-specific genes CD300A and MIP2 and low-affinity immunoglobulin gamma FC region receptor II-A and B proteins (CD300A, FCGR2 A and B) showed the highest fold change in G cells. Finally, megakaryocyte-specific genes prostaglandin-endoperoxide synthase 1, platelet factor 4, glycoprotein 1B alpha and serine (or cysteine) proteinase inhibitor, clade E (PF4, GPIBA, PTGS1, SERPINE1) had highest expression in megakaryocytic cells confirming the lineage specific differentiation of CD34+ cells.

Gene Level Analysis to Identify Differentially Expressed Transcripts During Hematopoietic Differentiation

Global gene-level analysis of transcripts comparing CD 34+ progenitor cells to the three differentiated lineages identified several genes with significant differences in expression between each of the differentiated cell groups. As mentioned in materials and methods, statistical filters were applied to find a total of 172 differentially expressed genes between any of the three cell types compared with CD34 progenitor cells. Selecting at a P-A ≤ 1e−7 and a twofold or more change, the following lists were generated: MvsCD34, 148 genes; GvsCD34, 52 genes; and EvsCD34, 70 genes. These lists show some overlap across the groups. Hierarchical clustering showed the differential expression pattern between these 172 transcripts across the lineage-specific stem cell types as depicted in Fig. 3. Comparison of these significantly altered transcripts by Venn diagram as shown in Fig. 4A shows that 26 transcripts are commonly differentially expressed in all three differentiated groups. Furthermore, 24 transcripts are found to be common to both the M and E groups, 14 are common between G and E, and four transcripts are found to be common between G and M. Six of the top ranking genes are a unique signature for erythrocyte differentiation, including SLC4A1, ALAS, EPPB9, HBB, SELENBP1, and GYPA. The unique granulocyte differentiation signature has only four genes including C20orf12, C19orf16, PRGC, and CEACAM6, while alterations in 90 transcripts appear to be unique for megakaryopoiesis. Table 2 lists all 172 transcripts that are common and unique to each cell lineage.

Fig. 3.

Fig. 3.

Hierarchical cluster analysis performed on the profiles of 172 differentially expressed genes. Genes were declared to be differentially expressed by comparing 3 lineage-specific cell types M, G, E to CD34+ stem cells and requiring >2-fold change and P value <10^7. The cluster shows E, G, and M relative to CD34+.

Fig. 4.

Fig. 4.

A: Venn analysis of the overlap between the 172 differentially selected genes based on P value and fold-change cutoff between M, G, and E groups relative to CD34+ (See Fig. 3). B: parallel plots of 331 specifically upregulated genes (See materials and methods). Specifically upregulated genes are those whose upregulation is >2-fold in 1 lineage but unchanged or upregulated <1.4-fold in the other 2 lineages. Top: upregulated transcripts specific for E cells relative to CD34+. Middle: upregulated transcripts specific for G cells relative to CD34+. Bottom: upregulated transcripts specific for M cells relative to CD34+.

Table 2.

List of differentially expressed genes during lineage-specific differentiation of hematopoietic stem cells

Gene Symbol Gene Title E vs. CD34 G vs. CD34 M vs. CD34 Groups
ARHGEF17 Rho guanine nucleotide exchange factor (GEF) 17 −2.38 −1.79 −2.41 E, G, M
CD177 CD177 molecule −2.79 −2.71 −2.73 E, G, M
CDT1 chromatin licensing and DNA replication factor 1 1.50 1.46 −1.07 E, G, M
CMTM5 CKLF-like MARVEL transmembrane domain containing 5 3.30 1.64 5.79 E, G, M
DNTT deoxynucleotidyltransferase, terminal −3.94 −3.88 −4.12 E, G, M
FCGR3A Fc fragment of IgG, low affinity IIIa, receptor (CD16a) −2.71 −2.84 −2.77 E, G, M
GPSM1 G protein signaling modulator 1 (AGS3-like, C. elegans) −1.40 −1.33 −2.52 E, G, M
HBG1 hemoglobin, gamma A 6.37 2.73 1.77 E, G, M
HBG2 hemoglobin, gamma G 7.06 3.84 2.62 E, G, M
ID1 inhibitor of DNA binding 1 −2.11 −2.10 −3.18 E, G, M
ITGB3 integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61) 3.49 1.58 7.26 E, G, M
KCNA3 potassium voltage-gated channel, member 3 −2.03 −2.05 2.46 E, G, M
LAT linker for activation of T cells 3.50 2.01 4.58 E, G, M
LSP1 lymphocyte-specific protein 1 −2.46 −1.72 −3.89 E, G, M
MDK midkine (neurite growth-promoting factor 2) −2.06 −1.98 −2.65 E, G, M
MGC29671 −1.12 1.49 −2.69 E, G, M
MGC35402 −3.03 −2.93 −2.81 E, G, M
MN1 meningioma (disrupted in balanced translocation) 1 −2.93 −2.78 −3.15 E, G, M
NPTX2 neuronal pentraxin II −1.55 −1.94 −1.67 E, G, M
P2RY11 purinergic receptor P2Y, G-protein coupled, 11 −1.47 −1.47 −2.87 E, G, M
PCDH21 protocadherin 21 1.78 1.52 4.51 E, G, M
RNASE2 ribonuclease, RNase A family, 2 3.01 4.56 −1.54 E, G, M
S100A12 S100 calcium binding protein A12 −4.64 −4.53 −4.92 E, G, M
SH2D3C SH2 domain containing 3C −1.70 −1.19 −2.50 E, G, M
SLC11A1 solute carrier family 11 −2.45 −2.55 −2.75 E, G, M
UBE2C ubiquitin-conjugating enzyme E2C 3.66 3.23 3.19 E, G, M
SLC4A1 solute carrier family 4, anion exchanger, member 1 3.42 −0.22 −0.56 E
ALAS2 aminolevulinate, delta-, synthase 2 3.26 0.41 −0.32 E
EPPB9 1.06 0.94 −0.31 E
HBB hemoglobin, beta 3.66 0.76 −0.20 E
SELENBP1 selenium binding protein 1 2.13 0.36 −0.09 E
GYPA glycophorin A (MNS blood group) 3.71 0.76 0.00 E
PRKG1 protein kinase, cGMP-dependent, type I −1.84 −2.15 −0.82 E, G
CA1 carbonic anhydrase I 4.39 1.40 −0.48 E, G
H2AFX H2A histone family, member X 1.59 1.18 −0.43 E, G
ERAF erythroid associated factor 4.51 1.07 −0.08 E, G
ELA2 1.19 5.17 −0.05 E, G
PRTN3 proteinase 3 1.17 4.69 0.13 E, G
PRG1 1.14 2.65 0.49 E, G
IL9R interleukin 9 receptor 3.00 2.24 0.69 E, G
NFKBIA nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha −2.18 −1.52 0.75 E, G
MYL4 myosin, light chain 4, alkali; atrial, embryonic 3.30 1.12 0.80 E, G
CST7 cystatin F (leukocystatin) 1.46 4.08 0.83 E, G
EBP emopamil binding protein (sterol isomerase) 2.08 2.44 0.94 E, G
H3/o 4.21 3.69 0.99 E, G
HIST2H3C histone cluster 2, H3c 4.21 3.69 0.99 E, G
LITAF lipopolysaccharide-induced TNF factor −1.21 −0.11 −4.20 E, M
LAIR1 leukocyte-associated immunoglobulin-like receptor 1 −1.56 0.15 −3.97 E, M
TNFRSF1B tumor necrosis factor receptor superfamily, member 1B −1.67 −0.80 −3.65 E, M
CECR1 cat eye syndrome chromosome region, candidate 1 −1.95 0.51 −3.19 E, M
PRAM1 PML-RARA regulated adaptor molecule 1 −1.85 0.07 −2.69 E, M
KLF4 Kruppel-like factor 4 (gut) −1.72 −0.86 −2.28 E, M
PTPRCAP protein tyrosine phosphatase, receptor type, C-associated protein −1.41 −0.67 −2.02 E, M
BEX2 brain expressed X-linked 2 −1.25 0.18 −1.98 E, M
HDC histidine decarboxylase 1.45 0.93 −1.98 E, M
HBA2 hemoglobin, alpha 2 2.16 −0.78 −1.58 E, M
HBA1 hemoglobin, alpha 1 2.04 −0.70 −1.41 E, M
HIST1H2BL histone cluster 1, H2bl 1.04 0.64 −1.09 E, M
FAM89A family with sequence similarity 89, member A 1.41 0.59 −1.07 E, M
PPP1R3B protein phosphatase 1, regulatory (inhibitor) subunit 3B −1.35 −0.67 1.37 E, M
PPP1R14A protein phosphatase 1, regulatory (inhibitor) subunit 14A 2.00 0.59 1.89 E, M
GATA1 GATA binding protein 1 (globin transcription factor 1) 2.31 0.62 2.30 E, M
CCND3 cyclin D3 1.04 0.35 2.33 E, M
THBS1 thrombospondin 1 1.44 −0.74 4.65 E, M
PF4 platelet factor 4 1.28 0.16 4.72 E, M
ITGA2B integrin, alpha 2b 2.83 0.99 5.14 E, M
ARHGAP6 Rho GTPase activating protein 6 2.11 0.63 5.22 E, M
CTTN cortactin 1.52 0.36 5.29 E, M
PPBP proplatelet basic protein (chemokine (C-X-C motif) ligand 7) 1.24 0.07 5.67 E, M
RGS6 regulator of G protein signaling 6 1.61 0.38 6.27 E, M
C20orf112 chromosome 20 open reading frame 112 −0.75 −1.32 −0.70 G
C19orf10 chromosome 19 open reading frame 10 0.91 1.85 −0.44 G
PRG2 proteoglycan 2, bone marrow 0.40 2.15 −0.24 G
CEACAM6 carcinoembryonic antigen-related cell adhesion molecule 6 0.42 4.00 −0.11 G
FLJ11151 −0.89 −1.09 −2.82 G, M
NKG7 natural killer cell group 7 sequence −0.81 2.01 −2.31 G, M
IGLL1 immunoglobulin lambda-like polypeptide 1 0.75 1.99 −2.13 G, M
CEBPA CCAAT/enhancer binding protein (C/EBP), alpha 0.22 1.25 −1.59 G, M
TRIM58 tripartite motif-containing 58 0.05 −1.06 1.87 G, M
SEPT5 septin 5 −0.55 −1.83 3.14 G, M
ADCY6 adenylate cyclase 6 −0.24 −1.24 3.70 G, M
KIAA0513 KIAA0513 −0.01 −1.11 4.23 G, M
ZFP36L2 zinc finger protein 36, C3H type-like 2 −0.97 −0.87 −3.22 M
PRSSL1 protease, serine-like 1 −0.77 0.76 −3.18 M
SPI1 spleen focus forming virus (SFFV) proviral integration oncogene −0.95 0.20 −2.68 M
TRIM14 tripartite motif-containing 14 −0.53 0.25 −2.61 M
UNQ501 −0.35 0.44 −2.34 M
GSTP1 glutathione S-transferase pi 1 −0.18 0.42 −2.19 M
RAB3D RAB3D, member RAS oncogene family −0.90 −0.32 −2.09 M
TIMM13 translocase of inner mitochondrial membrane 13 0.15 −0.04 −2.02 M
ITPA inosine triphosphatase 0.24 0.14 −1.96 M
IFITM1 interferon induced transmembrane protein 1 (9–27) 0.35 0.14 −1.96 M
NHP2L1 NHP2 nonhistone chromosome protein 2-like 1 0.41 0.49 −1.93 M
ASB13 ankyrin repeat and SOCS box-containing 13 0.27 0.73 −1.89 M
SYNGR1 synaptogyrin 1 0.42 0.53 −1.83 M
ARL6IP4 ADP-ribosylation-like factor 6 interacting protein 4 0.26 0.25 −1.72 M
POFUT1 protein O-fucosyltransferase 1 0.55 0.68 −1.66 M
ZXDB zinc finger, X-linked, duplicated B −0.88 −0.79 −1.59 M
CEBPD CCAAT/enhancer binding protein (C/EBP), delta −0.59 0.89 −1.54 M
UBE2L6 ubiquitin-conjugating enzyme E2L 6 −0.16 −0.32 −1.49 M
SLIC1 −0.83 −0.41 −1.46 M
TSNARE1 t-SNARE domain containing 1 −0.49 −0.25 −1.42 M
ECHS1 enoyl Coenzyme A hydratase, short chain, 1, mitochondrial 0.54 0.59 −1.41 M
RPUSD1 RNA pseudouridylate synthase domain containing 1 0.08 0.12 −1.37 M
MRPS16 mitochondrial ribosomal protein S16 0.45 0.50 −1.35 M
CUEDC2 CUE domain containing 2 0.06 0.21 −1.28 M
PDXP pyridoxal (pyridoxine, vitamin B6) phosphatase 0.54 0.24 −1.24 M
CKAP4 cytoskeleton-associated protein 4 0.78 0.83 −1.22 M
STK32C serine/threonine kinase 32C 0.56 0.24 −1.21 M
ENDOG endonuclease G 0.05 −0.09 −1.20 M
NDUFB7 NADH dehydrogenase (ubiquinone) 1 beta subcomplex, 7 0.36 0.38 −1.16 M
PALM paralemmin −0.14 0.26 −1.15 M
DCI dodecenoyl-Coenzyme A delta isomerase 0.54 0.49 −1.13 M
ATP5D ATP synthase, H+ transporting, delta subunit 0.47 0.44 −1.11 M
ABHD14A abhydrolase domain containing 14A 0.36 0.73 −1.08 M
LOC92154 −0.16 −0.42 1.35 M
SNN stannin −0.70 −0.47 1.37 M
BCL2L2 BCL2-like 2 −0.32 −0.34 1.37 M
LIPC lipase, hepatic 0.16 0.19 1.52 M
IRS2 insulin receptor substrate 2 −0.48 0.44 1.60 M
GLYATL1 glycine-N-acyltransferase-like 1 −0.14 −0.03 1.66 M
RGS10 regulator of G protein signaling 10 0.52 0.27 1.97 M
SRC v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian) −0.74 −0.85 2.00 M
CDKN1A cyclin-dependent kinase inhibitor 1A (p21, Cip1) −0.63 −0.38 2.32 M
TULP2 tubby like protein 2 0.21 0.31 2.37 M
ADRA2A adrenergic, alpha-2A, receptor 0.73 −0.47 2.37 M
MGC13057 0.94 −0.31 2.38 M
TAL1 T-cell acute lymphocytic leukemia 1 0.62 −0.78 2.38 M
C20orf175 0.45 −0.89 2.48 M
HTR2A 5-hydroxytryptamine (serotonin) receptor 2A 0.10 −0.08 2.52 M
NRGN neurogranin (protein kinase C substrate, RC3) 0.57 0.18 2.53 M
BCL2L1 BCL2-like 1 0.69 −0.34 2.53 M
SEC14L5 SEC14-like 5 (S. cerevisiae) 0.15 0.16 2.57 M
TPM1 tropomyosin 1 (alpha) 0.78 −0.56 2.71 M
C20orf32 0.14 −0.21 2.75 M
C6orf188 −0.07 0.08 2.83 M
HEXIM1 hexamethylene bis-acetamide inducible 1 0.07 −0.12 2.95 M
ENDOD1 endonuclease domain containing 1 0.65 −0.57 3.00 M
GP5 glycoprotein V (platelet) 0.22 −0.01 3.05 M
PLA2G4C phospholipase A2, group IVC (cytosolic, calcium-independent) 0.03 0.22 3.10 M
TUBA8 tubulin, alpha 8 0.97 0.13 3.10 M
IL21R interleukin 21 receptor 0.90 0.08 3.16 M
PANX1 pannexin 1 0.57 0.06 3.22 M
C5orf4 chromosome 5 open reading frame 4 0.58 −0.45 3.29 M
PF4V1 platelet factor 4 variant 1 0.91 0.19 3.36 M
SERPINE1 serpin peptidase inhibitor, clade E 0.46 0.02 3.38 M
CXCL3 chemokine (C-X-C motif) ligand 3 0.16 0.16 3.52 M
SLC22A17 solute carrier family 22, member 17 0.69 −0.54 3.59 M
PDGFB platelet-derived growth factor beta polypeptide 0.05 0.29 3.63 M
F2RL2 coagulation factor II (thrombin) receptor-like 2 0.41 −0.28 3.65 M
MFAP3L microfibrillar-associated protein 3-like 0.16 −0.08 3.66 M
SDPR serum deprivation response 0.72 −0.40 3.66 M
EGLN3 egl nine homolog 3 (C. elegans) 0.88 0.14 3.86 M
CD9 CD9 molecule −0.09 −0.73 3.87 M
GABRE gamma-aminobutyric acid (GABA) A receptor, epsilon 0.24 0.36 3.96 M
ESAM endothelial cell adhesion molecule 0.81 −0.44 4.22 M
CCR4 chemokine (C-C motif) receptor 4 0.85 0.26 4.32 M
GNAZ guanine nucleotide binding protein (G protein), alpha z polypeptide 0.37 −0.01 4.52 M
GP9 glycoprotein IX (platelet) 0.81 −0.06 4.55 M
LRRC32 leucine rich repeat containing 32 0.06 −0.49 4.56 M
C6orf21 0.82 0.10 4.82 M
TRPC6 transient receptor potential cation channel, subfamily C, member 6 0.88 0.12 4.84 M
TSPAN9 tetraspanin 9 0.50 0.26 4.88 M
PCSK6 proprotein convertase subtilisin/kexin type 6 0.30 0.19 4.97 M
PDE5A phosphodiesterase 5A, cGMP-specific 0.10 −0.06 5.10 M
ABCC3 ATP-binding cassette, subfamily C (CFTR/MRP), member 3 −0.08 −0.30 5.25 M
ITGB5 integrin, beta 5 0.47 0.06 5.40 M
HPSE heparanase 0.62 −0.32 5.69 M
CD226 CD226 molecule 0.10 −0.24 5.75 M
VWF von Willebrand factor 0.44 −0.47 5.98 M
PDE3A phosphodiesterase 3A, cGMP-inhibited 0.63 −0.11 6.18 M
CLEC1B C-type lectin domain family 1, member B 0.66 0.67 6.62 M

FC values are given as log 2 means from an n of 3 per group. P < 0.0000001.

Identification of Differentially Regulated Genes During Lineage-specific Differentiation to E, G, and M Cells

In an effort to identify transcripts that are altered during cell differentiation, we specifically examined “upregulated” genes (see materials and methods) compared with CD34+ cells. We identified three patterns (Fig. 4B) of upregulation. Erythropoietic differentiation resulted in upregulation of 32 transcripts. When CD 34 cells differentiated into G cells, 30 genes were found to be significantly upregulated, and megakaryopoietic differentiation resulted in a much larger modulation of genes as identified by the upregulation of 269 transcripts. A partial list of these genes are tabulated in Tables 3, 4, and 5. (A complete list of these altered genes is shown as Supplemental Table S1).1

Table 3.

Expression of genes highly specific for E cells

Gene Symbol Gene Title FC - E vs. CD34
TMEM56 transmembrane protein 56 3.60
SLC4A1 solute carrier family 4, anion exchanger, member 1 3.42
ALAS2 aminolevulinate, delta-, synthase 2 3.26
IRF6 interferon regulatory factor 6 2.51
DNAJA4 DnaJ (Hsp40) homolog, subfamily A, member 4 2.49
ADD2 adducin 2 (beta) 2.47
GYPB glycophorin B (MNS blood group) 2.36
GALNT5 UDP-N-acetyl-alpha-d-galactosamine:polypeptide5 2.21
HBA2 hemoglobin, alpha 2 2.16
SELENBP1 selenium binding protein 1 2.13
HBA1 hemoglobin, alpha 1 2.04
ATP7B ATPase, Cu2+ transporting, beta polypeptide 1.95
SLC14A1 solute carrier family 14 (urea transporter), member 1 1.85
LBH limb bud and heart development homolog 1.83
GYPE glycophorin E 1.81
SEC14L4 SEC14-like 4 1.73
ACSBG1 acyl-CoA synthetase bubblegum family member 1 1.67
PCSK9 proprotein convertase subtilisin/kexin type 9 1.60
TGM2 transglutaminase 2 1.55
SLC25A21 solute carrier family 25 1.42
ITGB4 integrin, beta 4 1.30
LEF1 lymphoid enhancer-binding factor 1 1.29
PISD phosphatidylserine decarboxylase 1.18
NMNAT3 nicotinamide nucleotide adenylyltransferase 3 1.14
ICAM4 intercellular adhesion molecule 4 1.08
TRIB2 tribbles homolog 2 1.05
S100A4 S100 calcium binding protein A4 1.04
TRIM29 tripartite motif-containing 29 1.03
ZYX zyxin −1.07
BTG1 B-cell translocation gene 1, anti-proliferative −1.08
ZBTB34 zinc finger and BTB domain containing 34 −1.10
TCEA2 transcription elongation factor A (SII), 2 −1.18
CXCR4 chemokine (C-X-C motif) receptor 4 −1.21
TSC22D3 TSC22 domain family, member 3 −1.37

List of genes highly specific for E cells. FC values are given as log 2 means for an n of 3 per group.

Table 4.

Expression of genes highly specific for G cells

Gene Symbol Gene Title FC - G vs. CD34
BPI bactericidal/permeability-increasing protein 5.23
CEACAM6 carcinoembryonic antigen-related cell adhesion molecule 6 4.00
CD24 CD24 molecule 3.67
CLEC5A C-type lectin domain family 5, member A 2.88
IL2RA interleukin 2 receptor, alpha 2.58
BEX1 brain expressed, X-linked 1 2.42
FAM107B family with sequence similarity 107, member B 2.17
PRG2 plasticity-related gene 2 2.15
PRG2 proteoglycan 2, bone marrow 2.15
ELOVL3 elongation of very long chain fatty acids 2.10
SLPI secretory leukocyte peptidase inhibitor 2.04
NKG7 natural killer cell group 7 sequence 2.01
GLT25D2 glycosyltransferase 25 domain containing 2 1.84
RHOU ras homolog gene family, member U 1.75
P2RY2 purinergic receptor P2Y, G protein-coupled, 2 1.75
GPR97 G protein-coupled receptor 97 1.72
JDP2 Jun dimerization protein 2 1.68
RBP4 retinol binding protein 4, plasma 1.68
SERPINB2 serpin peptidase inhibitor, clade B, member 2 1.63
CD48 CD48 molecule 1.56
CDA cytidine deaminase 1.47
HIP1 huntingtin interacting protein 1 1.47
LPO lactoperoxidase 1.45
ANPEP alanyl (membrane) aminopeptidase 1.37
CILP2 cartilage intermediate layer protein 2 1.36
LYZ lysozyme (renal amyloidosis) 1.31
NT5DC3 5′-nucleotidase domain containing 3 1.27
CLEC11A C-type lectin domain family 11, member A 1.23
SERPINB8 serpin peptidase inhibitor, clade B, member 8 1.21
FGFR1 fibroblast growth factor receptor 1 1.20
PADI2 peptidyl arginine deiminase, type II 1.16
DERL3 Der1-like domain family, member 3 1.16
MFAP4 microfibrillar-associated protein 4 1.12
FUT4 fucosyltransferase 4 alpha (1,3) fucosyltransferase 1.05
GRAMD1B GRAM domain containing 1B 1.01
CBFA2T3 core-binding factor, runt domain, alpha subunit 2; −1.03
ARHGEF12 Rho guanine nucleotide exchange factor (GEF) 12 −1.10

List of genes highly specific for G cells. FC values are given as log 2 means for an n of 3 per group.

Table 5.

Expression of genes highly specific for M cells

Gene Symbol Gene Title FC - M vs. CD34
VWF von Willebrand factor 5.98
CD226 CD226 molecule 5.75
PKHD1L1 polycystic kidney and hepatic disease 1 5.49
ITGB5 integrin, beta 5 5.40
ABCC3 ATP-binding cassette, subfamily C 5.25
PDE5A phosphodiesterase 5A, cGMP-specific 5.10
PCSK6 proprotein convertase subtilisin/kexin type 6 4.97
TUBB1 tubulin, beta 1 4.93
TSPAN9 tetraspanin 9 4.88
LRRC32 leucine rich repeat containing 32 4.56
GNAZ guanine nucleotide binding protein (G protein) 4.52
PTPRJ protein tyrosine phosphatase, receptor type, J 4.47
MMD monocyte to macrophage differentiation-associated 4.45
EGF epidermal growth factor (beta-urogastrone) 4.43
VEGFC vascular endothelial growth factor C 4.42
MYOM1 myomesin 1, 185 kDa 4.37
KIAA0513 KIAA0513 4.23
TMEM40 transmembrane protein 40 4.13
GP1BA glycoprotein Ib (platelet), alpha polypeptide 3.99
GABRE gamma-aminobutyric acid (GABA) 3.96
ARHGAP21 Rho GTPase activating protein 21 3.96
TSPAN18 tetraspanin 18 3.93
DAB2 disabled homolog 2, 3.90
SLC9A9 solute carrier family 9 3.90
CD9 CD9 molecule 3.87
LIPH lipase, member H 3.80
CCL5 chemokine (C-C motif) ligand 5 3.77
LGMN legumain 3.77
MFAP3L microfibrillar-associated protein 3-like 3.66
F2RL2 coagulation factor II (thrombin) receptor-like 2 3.65
PDGFB platelet-derived growth factor beta polypeptide 3.63
DENND2C DENN/MADD domain containing 2C 3.57
CXCL3 chemokine (C-X-C motif) ligand 3 3.52
C1orf71 chromosome 1 open reading frame 71 3.47
GNG11 guanine nucleotide binding protein gamma 11 3.42
SERPINE1 serpin peptidase inhibitor, clade E 3.38
GRIK4 glutamate receptor, ionotropic, kainate 4 3.37
OR2G3 olfactory receptor 2, subfamily G, member 3 3.36
SLC6A4 solute carrier family 6 3.36
DGKD diacylglycerol kinase, delta 130 kDa 3.35
AQP10 aquaporin 10 3.34
IRAK2 interleukin-1 receptor-associated kinase 2 3.31
ABLIM3 actin binding LIM protein family, member 3 3.26
MRVI1 murine retrovirus integration site 1 homolog 3.25
CD40LG CD40 ligand 3.24
NEXN nexilin (F actin binding protein) 3.22
SEPT5 septin 5 3.14
PAPSS2 3′-phosphoadenosine 5′-phosphosulfate synthase 2 3.14
PLA2G4C phospholipase A2, group IVC 3.10
ITPK1 inositol 1,3,4-triphosphate 5/6 kinase −2.00
UCK2 uridine-cytidine kinase 2 −2.01
ALDH1B1 aldehyde dehydrogenase 1 family, member B1 −2.02
TIMM13 translocase of inner mitochondrial membrane 13 −2.02
NSMCE1 nonSMC element 1 homolog (S. cerevisiae) −2.03
ANKH ankylosis, progressive homolog (mouse) −2.04
MRPL48 mitochondrial ribosomal protein L48 −2.05
HIST1H1D histone cluster 1, H1d −2.07
TIMM50 translocase of inner mitochondrial membrane 50 −2.07
DIP interstitial pneumonitis, desquamative, familial −2.07
NIPSNAP3A nipsnap homolog 3A (C. elegans) −2.07
UBL7 ubiquitin-like 7 (bone marrow stromal cell-derived) −2.08
EBPL emopamil binding protein-like −2.09
DENND1A DENN/MADD domain containing 1A −2.09
NT5C3L 5′-nucleotidase, cytosolic III-like −2.09
COMMD9 COMM domain containing 9 −2.10
NDUFA12 NADH dehydrogenase 1 alpha subcomplex, 12 −2.11
POLE4 polymerase (DNA-directed), epsilon 4 −2.12
LIG3 ligase III, DNA, ATP-dependent −2.12
KIAA0664 KIAA0664 −2.13
IGLL1 immunoglobulin lambda-like polypeptide 1 −2.13
MRPL16 mitochondrial ribosomal protein L16 −2.14
DFFA DNA fragmentation factor, 45 kDa, alpha −2.16
RPUSD4 RNA pseudouridylate synthase domain 4 −2.16
TFAP4 transcription factor AP-4 −2.19
GSTP1 glutathione S-transferase pi 1 −2.19
GLT25D1 glycosyltransferase 25 domain containing 1 −2.19
C9orf123 chromosome 9 open reading frame 123 −2.20
MRPL38 mitochondrial ribosomal protein L38 −2.21
MFHAS1 malignant fibrous histiocytoma sequence 1 −2.22
MRPL21 mitochondrial ribosomal protein L21 −2.22
MYBBP1A MYB binding protein (P160) 1a −2.23
MPO myeloperoxidase −2.23
NANS N-acetylneuraminic acid synthase −2.25
ALDH3A2 aldehyde dehydrogenase 3 family, member A2 −2.27
SLC35F2 solute carrier family 35, member F2 −2.29
ZNF259 zinc finger protein 259 −2.29
ALOX5AP arachidonate 5-lipoxygenase-activating protein −2.33
TGFBR1 transforming growth factor, beta receptor 1 −2.34
SPN sialophorin −2.35
FBL fibrillarin −2.37
POLR1E polymerase (RNA) I polypeptide E, 53 kDa −2.37
SFXN4 sideroflexin 4 −2.38
GYPC glycophorin C −2.40
CAT catalase −2.40
COMMD2 COMM domain containing 2 −2.61
IGFBP7 insulin-like growth factor binding protein 7 −2.78
RCN1 reticulocalbin 1, EF-hand calcium binding domain −2.81
SPG21 spastic paraplegia 21 −2.86
B3GNT5 UDP beta-1,3-N-acetylglucosaminyltransferase 5 −3.21
IL2RG interleukin 2 receptor, gamma −3.29
OSM oncostatin M −3.34
HINT1 histidine triad nucleotide binding protein 1 −3.42
ANXA2 annexin A2 −3.58

List of genes highly specific for M cells. FC values are given as log 2 means for an n of 3 per group.

Gene Ontology Analysis

Gene lists specific for each differentiated groups were subjected to gene ontology analysis to determine their molecular functions. This analysis yielded seven major functional groups including binding, catalytic, signal transducer, transcription regulator, structural molecule, enzyme regulator, and transporter activity as represented in Fig. 5. Genes with binding function were comparable between the three hematopoietic lineages. Genes with catalytic activity were found to be expressed at a higher percentage in E group; percentage of genes with signal transduction function was the highest in G group; Genes with functions such as enzyme activity and transporter activity were found to be expressed at a higher percentage in M groups.

Fig. 5.

Fig. 5.

Functional classification of differentially expressed genes during lineage-specific differentiation of CD34+ hematopoietic cells to E, G, and M cells by Ingenuity Pathway Analysis. The functional classification of a specific gene may be redundant, resulting from the assignment of 1 gene to more than 1 category.

Validation of Microarray Data by QPCR

To confirm the expression data obtained from the exon microarray studies, we analyzed the expression of a selection of few significantly up regulated genes in each group by real-time PCR. Table 6 illustrates the fold changes in the expression of transcripts between E, G, and M vs. CD34+ from microarray and QPCR analyses. A high degree of correlation was observed between the microarray data and the QPCR data.

Table 6.

Validation of microarray data by QPCR

Gene Symbol E vs. CD34 G vs. CD34 M vs. CD34 P Value
Solute carrier family 4, anion exchanger, member 1
SLC4A1 exon array 10.68 0.86 0.68
QPCR 117.26 1.14 0.03 < 0.001
Proteoglycan 2, bone marrow
PRG2 exon array 1.32 4.45 0.85
QPCR 1.28 49.70 1.23 < 0.05
C-type lectin domain family 1, member B
CLEC1B exon array 1.58 1.59 98.22
QPCR 19.08 5.21 222.86 < 0.05
Phosphodiesterase 3A, cGMP-inhibited
PDE3A exon array 1.55 0.93 72.41
QPCR 1.49 0.52 6.96 < 0.05
Spleen focus forming virus proviral integration oncogene
SPI1 exon array 0.52 1.14 0.16
QPCR 0.38 1.21 0.42 <0.001
Translocase of inner mitochondrial membrane 13
TIMM13 exon array 1.11 0.97 0.25
QPCR 1.14 0.51 0.13 <0.005
von Willebrand factor
VWF exon array 1.35 0.72 63.13
QPCR 0.26 0.04 29.16 <0.05
Elastase 2
ELA2 exon array 2.28 35.91 0.97
QPCR 47.50 192.67 104.69 <0.001
Erythroid-associated factor
ERAF exon array 22.84 2.09 0.95
QPCR 315.73 109.89 0.13 <0.001
Proteinase 3
PRTN3 exon array 2.25 25.79 1.09
QPCR 33.59 84.44 57.28 <0.001
Rho guanine nucleotide exchange factor (GEF) 17
ARHGEF17 exon array 0.19 0.29 0.19
QPCR 0.03 0.04 0.05 <0.001
Fc fragment of IgG, low affinity IIIa, receptor
FCGR3A exon array 0.15 0.14 0.15
QPCR 0.01 0.06 0.01 <0.001
Integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61)
ITGB3 exon array 11.22 2.98 152.96
QPCR 3.65 1.01 42.22 <0.001
Solute carrier family 11 member 1
SLC11A1 exon array 0.18 0.17 0.15
QPCR 0.01 0.06 0.01 <0.05
Linker for activation of T lymphocytes
LAT exon array 11.16 4.01 23.92
QPCR 11.47 6.77 15.24 <0.001

Validation of microarray data by QPCR. FC in the expression for some differentially expressed genes are given as means for 3 samples in each group from exon array analysis and QPCR.

Alternative Splicing Events During Lineage-specific Hematopoietic Differentiation

We identified 86 alternatively spliced genes among the differentiated cells as shown in Tables 7, 8, and 9. An alternatively spliced gene based on this analysis is a gene with at least one exon whose behavior deviates by a certain magnitude relative to the other exons within the gene. In comparing across the three lineages, we observed 31 alternatively spliced genes in common including CAST, EPHX1, FANCA, CLCN7, PDE4D, PDE4DIP. E and G groups showed an overlap of 14 alternatively spliced genes, while 11 genes were common to E and M groups. G and M groups showed the lowest overlap of five spliced genes. Splicing was unique to 13 genes (ALDOA, ASS1, ATP5H, CLEC12A, ELMO1, NUMA1, PLK4CA, PTPN6, PTPRA, SLC39A4, SMG1, UBE2D3, XLT1) in the M group, two genes (CD97, RGR4275) in G group, and 10 genes (CMTM5, COL6A2, CYBR53, DNMT33, LPHN1, PHC2, SLC2A14, SORL1, STAB1, TPM1) in the E group. These gene lists also include previously identified alternatively spliced genes CAST, CLMN, EPHX1, GAB1, and vWF.

Table 7.

Alternatively spliced genes in E group

Gene Symbol Gene Title E vs. CD34
GAB1 GRB2-associated binding protein 1 2.73
AKR1C2 aldo-keto reductase family 1, member C2 2.62
COL24A1 collagen, type XXIV, alpha 1 2.62
SLC2A14 solute carrier family 2 member 14 2.46
EPHX1 epoxide hydrolase 1, microsomal 2.43
PARL presenilin associated, rhomboid-like 2.34
GRAP2 GRB2-related adaptor protein 2 2.20
ORC4L origin recognition complex, subunit 4-like 2.18
RBBP6 retinoblastoma binding protein 6 2.01
TYRO3 TYRO3 protein tyrosine kinase 1.82
CTNND1 catenin (cadherin-associated protein), delta 1 1.82
STAB1 stabilin 1 1.66
VWF von Willebrand factor 1.57
CLCN7 chloride channel 7 1.55
COL6A2 collagen, type VI, alpha 2 1.55
NID2 nidogen 2 (osteonidogen) 1.44
RPL21 ribosomal protein L21 1.36
ATP5O ATP synthase, H+ transporting, O subunit 1.26
RARA retinoic acid receptor, alpha 1.24
RABGAP1L RAB GTPase activating protein 1-like 1.22
TSC22D3 TSC22 domain family, member 3 1.22
SORL1 sortilin-related receptor, L 1.17
LDB1 LIM domain binding 1 1.13
HMHA1 histocompatibility (minor) HA-1 1.07
BOLA2 bolA homolog 2 (E. coli) 1.06
C13orf3 chromosome 13 open reading frame 3 1.06
COL18A1 collagen, type XVIII, alpha 1 0.89*
ATP5H ATP synthase, H+ transporting, subunit d 0.89*
ALDOA aldolase A, fructose-bisphosphate 0.81*
SMG1 SMG1 homolog, phosphatidylinositol 3-kinase-related kinase 0.78*
AKAP13 A kinase (PRKA) anchor protein 13 0.75*
UBE2D3 ubiquitin-conjugating enzyme E2D 3 0.63*
CLEC12A C-type lectin domain family 12, member A 0.58*
PTPRA protein tyrosine phosphatase, receptor type, A 0.56*
ELMO1 engulfment and cell motility 1 0.50*
HADHB hydroxyacyl-Coenzyme A dehydrogenase beta subunit 0.45*
PTPN6 protein tyrosine phosphatase, nonreceptor type 6 0.26*
XYLT1 xylosyltransferase I −0.31*
MPO myeloperoxidase −0.31*
NUMA1 nuclear mitotic apparatus protein 1 −0.44*
CD97 CD97 molecule −0.45*
IL16 interleukin 16 (lymphocyte chemoattractant factor) −0.58*
ASS1 argininosuccinate synthetase 1 −0.75*
SLC39A4 solute carrier family 39 (zinc transporter), member 4 −0.78*
CMTM5 CKLF-like MARVEL transmembrane domain containing 5 −1.04
ADAMTS13 ADAM metallopeptidase with thrombospondin,motif 13 −1.12
BPI bactericidal/permeability-increasing protein −1.16
PDE4D phosphodiesterase 4D, cAMP-specific −1.21
AARSD1 alanyl-tRNA synthetase domain containing 1 −1.23
CYB5R3 cytochrome b5 reductase 3 −1.31
RTN4 reticulon 4 −1.34
PHC2 polyhomeotic homolog 2 (Drosophila) −1.35
PDE4DIP phosphodiesterase 4D interacting protein −1.39
RGS3 regulator of G-protein signaling 3 −1.40
LPHN1 latrophilin 1 −1.42
UBAP2 ubiquitin associated protein 2 −1.47
TPM1 tropomyosin 1 (alpha) −1.53
WIPF1 WAS/WASL interacting protein family, member 1 −1.57
FANCA Fanconi anemia, complementation group A −1.58
ATP2A3 ATPase, Ca2+ transporting, ubiquitous −1.60
TRIM16 tripartite motif-containing 16 −1.63
EPB49 erythrocyte membrane protein band 4.9 −1.69
CR1 complement component (3b/4b) receptor 1 −1.73
SIGLEC12 sialic acid binding Ig-like lectin 12 −1.77
DNMT3B DNA (cytosine-5-)-methyltransferase 3 beta −1.79
MYLK myosin light chain kinase −1.79
TMEM49 transmembrane protein 49 −1.84
C1orf113 chromosome 1 open reading frame 113 −1.85
TLE1 transducin-like enhancer of split 1 −1.86
PTK2 PTK2 protein tyrosine kinase 2 −1.88
CD44 CD44 molecule (Indian blood group) −1.97
CLMN calmin (calponin-like, transmembrane) −2.04
TPCN2 two pore segment channel 2 −2.11
SLC25A3 solute carrier family 25 member 3 −2.30
CAST calpastatin −2.60
SMARCAD1 SWI/SNF matrix-associated regulator of chromatin −2.85
TPD52L2 tumor protein D52-like 2 −2.90

List of alternatively spliced genes. The largest deviation of 2-fold over the exons in a gene for the E group is shown (log base 2 scale), The boldfaced genes are unique to the E group comparison.

*

Genes alternatively spliced in one of the other 2 groups.

Table 8.

Alternatively spliced genes in G group

Gene Symbol Gene Title G vs. CD34
PARL presenilin associated, rhomboid-like 2.23
RBBP6 retinoblastoma binding protein 6 2.11
COL24A1 collagen, type XXIV, alpha 1 1.81
RGR4275 1.75
GAB1 GRB2-associated binding protein 1 1.68
TYRO3 TYRO3 protein tyrosine kinase 1.61
HADHB hydroxyacyl-Coenzyme A dehydrogenase/beta 1.60
GRAP2 GRB2-related adaptor protein 2 1.56
ATP5O ATP synthase, H+ transporting, 1.50
PDE4D phosphodiesterase 4D, cAMP-specific 1.50
ORC4L origin recognition complex, subunit 4-like 1.39
VWF von Willebrand factor 1.34
MPO myeloperoxidase 1.32
RPL21 ribosomal protein L21 1.27
C13orf3 chromosome 13 open reading frame 3 1.27
EPHX1 epoxide hydrolase 1, microsomal 1.17
RARA retinoic acid receptor, alpha 1.15
AKAP13 A kinase (PRKA) anchor protein 13 1.14
CTNND1 catenin (cadherin-associated protein), delta 1 1.08
AKR1C2 aldo-keto reductase family 1, member C2 0.96*
STAB1 stabilin 1 0.94*
ADAMTS13 ADAM metallopeptidase with thrombospondin 13 0.88*
BOLA2 bolA homolog 2 (E. coli) 0.87*
ELMO1 engulfment and cell motility 1 0.84*
ATP5H ATP synthase, H+ transporting, mitochondrial 0.78*
SLC2A14 solute carrier family 2 member 14 0.76*
SMG1 SMG1 homolog, phosphatidylinositol 3-kinase 0.76*
ALDOA aldolase A, fructose-bisphosphate 0.71*
DNMT3B DNA (cytosine-5-)-methyltransferase 3 beta 0.68*
LDB1 LIM domain binding 1 0.68*
SORL1 sortilin-related receptor, L (DLR class) 0.63*
PTPRA protein tyrosine phosphatase, receptor type, A 0.62*
TLE1 transducin-like enhancer of split 1 0.57*
SLC39A4 solute carrier family 39 member 4 0.50*
UBE2D3 ubiquitin-conjugating enzyme E2D 3 0.48*
TSC22D3 TSC22 domain family, member 3 0.44*
NUMA1 nuclear mitotic apparatus protein 1 0.43*
CLEC12A C-type lectin domain family 12, member A 0.39*
HMHA1 histocompatibility (minor) HA-1 0.25*
PTPN6 protein tyrosine phosphatase, 6 −0.22*
XYLT1 xylosyltransferase I −0.34*
WIPF1 WAS/WASL interacting protein family, member 1 −0.41*
LPHN1 latrophilin 1 −0.50*
PHC2 polyhomeotic homolog 2 (Drosophila) −0.61*
TPM1 tropomyosin 1 (alpha) −0.66*
CYB5R3 cytochrome b5 reductase 3 −0.74*
COL6A2 collagen, type VI, alpha 2 −0.75*
CMTM5 CKLF-like MARVEL transmembrane domain 5 −0.82*
ASS1 argininosuccinate synthetase 1 −0.88*
CLMN calmin (calponin-like, transmembrane) −0.91*
BPI bactericidal/permeability-increasing protein −0.96*
SIGLEC12 sialic acid binding Ig-like lectin 12 −1.04
CD97 CD97 molecule −1.20
FANCA Fanconi anemia, complementation group A −1.22
TRIM16 tripartite motif-containing 16 −1.30
CR1 complement component (3b/4b) receptor 1 −1.31
IL16 interleukin 16 (lymphocyte chemoattractant factor) −1.32
COL18A1 collagen, type XVIII, alpha 1 −1.39
TMEM49 transmembrane protein 49 −1.41
CAST calpastatin −1.42
UBAP2 ubiquitin associated protein 2 −1.44
RTN4 reticulon 4 −1.44
MYLK myosin light chain kinase −1.45
RGS3 regulator of G protein signaling 3 −1.48
PTK2 PTK2 protein tyrosine kinase 2 −1.56
EPB49 erythrocyte membrane protein band 4.9 −1.57
RABGAP1L RAB GTPase activating protein 1-like −1.58
PDE4DIP phosphodiesterase 4D interacting protein −1.63
NID2 nidogen 2 (osteonidogen) −1.65
CD44 CD44 molecule (Indian blood group) −1.68
TPCN2 two pore segment channel 2 −1.71
ATP2A3 ATPase, Ca2+ transporting, ubiquitous −1.84
AARSD1 alanyl-tRNA synthetase domain containing 1 −1.90
TPD52L2 tumor protein D52-like 2 −2.16
C1orf113 chromosome 1 open reading frame 113 −2.22
SLC25A3 solute carrier family 25 member 3 −2.79
SMARCAD1 SWI/SNFmatrix-associated regulator of chromatin −3.44

List of alternatively spliced genes. The largest deviation of 2-fold over the exons in a gene for the G qroup is shown (log base 2 scale). The boldfaced genes are unique to the G group comparison.

*

Genes alternatively spliced in one of the other 2 groups.

Table 9.

Alternatively spliced genes in M group

Gene Symbol Gene Title M vs. CD34
PARL presenilin associated, rhomboid-like 3.26
ATP5O ATP synthase, H+ transporting, O subunit 2.69
TYRO3 TYRO3 protein tyrosine kinase 2.55
COL24A1 collagen, type XXIV, alpha 1 2.27
CR1 complement component (3b/4b) receptor 1 2.26
XYLT1 xylosyltransferase I 2.18
PDE4DIP phosphodiesterase 4D interacting protein 2.12
CLEC12A C-type lectin domain family 12, member A 2.12
C13orf3 chromosome 13 open reading frame 3 2.10
CTNND1 catenin (cadherin-associated protein), delta 1 1.99
AARSD1 alanyl-tRNA synthetase domain containing 1 1.94
ASS1 argininosuccinate synthetase 1 1.93
AKR1C2 aldo-keto reductase family 1, member C2 1.88
NUMA1 nuclear mitotic apparatus protein 1 1.79
EPHX1 epoxide hydrolase 1, microsomal (xenobiotic) 1.77
RGS3 regulator of G protein signaling 3 1.71
BPI bactericidal/permeability-increasing protein 1.60
HMHA1 histocompatibility (minor) HA-1 1.52
HADHB hydroxyacyl-Coenzyme A dehydrogenase, beta 1.50
COL18A1 collagen, type XVIII, alpha 1 1.46
IL16 interleukin 16 1.44
PTPN6 protein tyrosine phosphatase,nonreceptor 6 1.40
MPO myeloperoxidase 1.37
RBBP6 retinoblastoma binding protein 6 1.35
SMG1 SMG1 homolog, phosphatidylinositol 3-kinase 1.31
ORC4L origin recognition complex, subunit 4-like 1.29
ATP5H ATP synthase, H+ transporting, subunit d 1.28
TSC22D3 TSC22 domain family, member 3 1.21
CLCN7 chloride channel 7 1.15
BOLA2 bolA homolog 2 (E. coli) 1.10
UBE2D3 ubiquitin-conjugating enzyme E2D 3 1.10
SIGLEC12 sialic acid binding Ig-like lectin 12 1.10
SLC39A4 solute carrier family 39 zinc transporter 4 1.10
RARA retinoic acid receptor, alpha 1.03
ELMO1 engulfment and cell motility 1 1.01
LDB1 LIM domain binding 1 1.01
VWF von Willebrand factor 0.95*
RPL21 ribosomal protein L21 0.87*
DNMT3B DNA (cytosine-5-)-methyltransferase 3 beta 0.87*
LPHN1 latrophilin 1 0.81*
COL6A2 collagen, type VI, alpha 2 0.66*
STAB1 stabilin 1 0.66*
GRAP2 GRB2-related adaptor protein 2 0.62*
GAB1 GRB2-associated binding protein 1 0.57*
TPM1 tropomyosin 1 (alpha) 0.44*
SLC2A14 solute carrier family 2 member 14 0.41*
SORL1 sortilin-related receptor, L(DLR class) 0.30*
EPB49 erythrocyte membrane protein band 4.9 0.23*
PHC2 polyhomeotic homolog 2 −0.09*
CMTM5 CKLF-like MARVEL transmembrane domain 5 −0.10*
TMEM49 transmembrane protein 49 −0.38*
CD44 CD44 molecule (Indian blood group) −0.41*
CD97 CD97 molecule −0.43*
SLC25A3 solute carrier family 25, member 3 −0.76*
CYB5R3 cytochrome b5 reductase 3 −0.78*
RTN4 reticulon 4 −0.82*
NID2 nidogen 2 (osteonidogen) −0.85*
MYLK myosin light chain kinase −0.91*
PTK2 PTK2 protein tyrosine kinase 2 −0.93*
UBAP2 ubiquitin associated protein 2 −1.03
ALDOA aldolase A, fructose-bisphosphate −1.08
ADAMTS13 ADAM metallopeptidase with thrombospondin −1.10
AKAP13 A kinase (PRKA) anchor protein 13 −1.25
PTPRA protein tyrosine phosphatase, receptor typeA −1.39
CLMN calmin (calponin-like, transmembrane) −1.47
TPCN2 two pore segment channel 2 −1.51
FANCA Fanconi anemia, complementation group A −1.70
TPD52L2 tumor protein D52-like 2 −1.72
RABGAP1L RAB GTPase activating protein 1-like −1.76
C1orf113 chromosome 1 open reading frame 113 −1.78
ATP2A3 ATPase, Ca2+ transporting, ubiquitous −1.83
CAST calpastatin −1.89
WIPF1 WAS/WASL interacting protein family, 1 −1.97
PDE4D phosphodiesterase 4D, cAMP-specific −2.06
TRIM16 tripartite motif-containing 16 −2.27
SMARCAD1 chromatin DEAD/H box 1 −2.89
TLE1 transducin-like enhancer of split 1 −3.24

List of alternatively spliced genes. The larqest deviation of 2-fold over the exons in a gene for the M group is shown (log base 2 scale). The boldfaced genes are unique to the M group comparison.

*

Genes alternatively spliced in one of the other 2 groups.

Important Pathways and Networks Affected by Alternative Splicing

We analyzed these 86 lineage-specific alternatively spliced genes using Ingenuity Pathway Analysis software (http://www.ingenuity.com) for further insights into their potential functional roles during hematopoietic differentiation. Not surprisingly, these analyses of showed preferential enrichment of biological processes related to hematological system development and function. The Ingenuity Pathway Analysis implicated alternative splicing in aspects of cellular development, molecular transport, cellular function and maintenance, cell-to-cell signaling and interaction, hematological system development and function, immune cell trafficking, cell-mediated immune response, antigen presentation and protein trafficking.

The top five canonical pathways were found to be Rac and RhoA signaling, leukocyte extravascular signaling, and wnt/β-catenin signaling, alanine and aspartate metabolism, and regulation of actin based motility by Rho.

Alternative Splicing During Erythroid Differentiation by RNA Sequence Analysis

Along with microarray analysis, we conducted a massively parallel sequencing study of the transcriptome using the SOLiD next generation sequencing platform on the CD34+ and differentiated E cells. Data generated from this study allowed for direct comparison and validation of the microarray data at both the gene and exon levels and offered the opportunity to assess the reliability of this emerging sequencing technology for transcriptome analysis. Using the RPKM measurements for the level of expression of an exon, or transcript, we observed a strong correlation between microarrays and RNA sequencing. At the gene level we observed a correlation of R = 0.72 (Fig. 6A) and at the exon level we observed an R value of 0.62 (Fig. 6B). We further interrogated several genes that had high correlations (R values range: 0.83–0.91) between the two platforms including TPD5L2, GAB1, SLC25A3, AND CAST. The fold-change measurements for the exons of these genes using both technologies are illustrated in Fig. 7, A–D. Shown in the figure panels, each gene tends to have at least one exon deviating by a large degree away from the behavior of the other exons, suggesting an alternative splicing event. Strikingly, each platform shows the same exon deviating by a large magnitude of change, indicating that these are very likely due to alternative splicing. This is further suggested by the RefSeq intron/exon isoforms plotted directly below the relative abundance for each transcript which show the inclusion (Fig. 7, B and D) or exclusion (Fig. 7, A, C, and D) of a specific exon corresponding to an alternatively spliced RefSeq isoform. In the gene TPD52L2 as illustrated in Fig. 7A, Exon 3 shows a deviation in microarray data only, but its magnitude is modest (less than twofold), and it does not correspond to a known splice variant. Thus it is likely due to experimental noise of the microarray. Similarly for the gene GAB1 as shown in Fig. 7B, Exon 2 shows a deviation for RNAseq data only with somewhat smaller magnitude than for Exon 8. This may be evidence for a novel transcript (exclusion in E), but as it is not confirmed by another method, it remains speculative. Thus RNAseq data confirms our microarray findings for statistically significant gene expression changes and alternatively spliced genes observed during erythropoietic differentiation.

Fig. 6.

Fig. 6.

Comparison of RNAseq to microarray fold change for E vs. CD34+. Correlation analysis between gene level and exon level microarray (x-axis) and SOLiD (y-axis) data. Red line shows the line of identity. A: gene level correlation analysis. Log10 fold-change RNAseq vs. Log10 fold-change microarray of gene level data (n = 14,105, R = 0.72). Each point represents a gene. B: exon level correlation analysis. Log10 fold-change RNAseq vs. Log10 fold-change microarray of microarray data of gene level data (n = 149,765, R = 0.62). Each point represents an exon.

Fig. 7.

Fig. 7.

Fig. 7.

Exon-specific fold change (E over CD34+ expression) comparing RNAseq (blue) vs. microarray (red). A: gene TPD52L2. Correlation, R = 0.86. Exon 7 (red) signals AS event on both platforms. Six known RefSeq transcripts are plotted under the overlay plot. Exon 7 is excluded in 2 known RefSeq isoforms (NM_199359 and NM_003288), agreeing with both the microarray and RNAseq data. B: gene GAB1. Correlation, R = 0.83. Exon 8 (green) signals an alternative splicing (AS) event on both platforms. Two known RefSeq transcripts are plotted under the overlay plot. Exon 8 is included in isoform NM_207123, agreeing with both the microarray and RNAseq data. C: gene SLC25A3. Correlation, R = 0.91. Exon 2 (red) signals an AS event on both platforms. Three known RefSeq transcripts are plotted under the overlay plot. Exon 2 is found to be alternatively spliced in transcript NM_005888. D: gene CAST. Correlation R = 0.83. Exons 2 and 3 (red), and 4 (green) signal an AS event on both platforms. Ten known RefSeq transcripts are plotted under the overlay plot. Exons 2 and 3 show downregulation and are unique to 5 of the RefSeq transcripts. Exon 4 shows upregulation on both platforms and appears to be the starting exon on 5 RefSeq transcripts.

DISCUSSION

Blood cells share numerous functional properties (cell motility, immune functions) that distinguish them from differentiated cells of solid tissues. These specific functions are acquired during hematopoietic cell differentiation, and the differentiated cells become fully operative the moment they leave bone marrow or other organs of the immune system toward the peripheral circulation.

Previous reports suggest that the self-renewal and differentiation of hematopoietic cells is not likely be governed by a single or few factors but rather by the integration of many integrating signal inputs affecting gene transcription including chromatin regulation, transcription factors, alternative splicing, and posttranslational modification. In particular, alternative splicing of exons is believed to contribute extensively to transcript and protein complexity in differentiated stem cells. While it is acknowledged that alternative splicing is a major determinant affecting global transcript protein complexity, this has not been examined in functional studies of hematopoietic cell differentiation. Hence, we undertook this study to gain insight into transcriptome and exome of hematopoietic process by using an in vitro human hematopoietic model system that permits analysis of CD34+ differentiation into major blood cell lineages.

Exploiting the QPCR and microarray technology, we analyzed the cytokine induced differentiation of CD34+ progenitors to identify gene signatures and to determine the degree to which alternative splicing might regulate this process. In an effort to confirm and validate this in vitro model we analyzed the parent and differentiated cells by flow cytometry for specific cell surface markers such as CD71, CD15 and CD66 specific for E, G, and M, respectively, and lineage-specific expression of transcripts such as PTGS1, SERPINE1, GPIBA, PF4 (specific for M), FCGRA, MIP2, FCGR2B, and CD300A, (specific for G), and GYPA, TMOD1, ANK1 (specific for E). These analyses showed increased expression of lineage-specific transcripts and proteins in the differentiated groups compared with CD34+ cells, indicating that the day 11 cultures are indeed differentiated cells comparable to those found in the peripheral blood. Our observed data also correlate with other published reports on the lineage-specific gene expression confirming the identity of the cells being studied here.

Initial microarray analysis detected 172 genes that are significantly modulated during differentiation. These transcripts by ingenuity pathway analysis showed them to be involved in cell motility, immune system development, and cell signaling as would be expected for developing hematopoietic cells. CD34+ cells on the other hand showed an upregulated expression of genes such as ARHGEF17, CD177, GPSM1, ID1, S100A12, and SLC11A1, and these genes appear to participate in cell signaling processes. In E cells, genes such as SLC4A1, ALAS2, HBB, GYPA, and SELENBP1 are overexpressed. While this red cell signature is expected for SLC4A1 (band 3 red cell membrane protein), HBB (Hemoglobin subunit), and GYPA (glycophorin a membrane protein), SELENBP1 has only been implicated in the pathogenesis of cancer and neuronal disorders. The G cells showed very few overexpressed genes upon differentiation and these genes included PRG2, CEACAM6. Interestingly, CD34+ cells differentiation into megakaryocytes demonstrated significant modulations for a diversity of genes, 90 in total, which included 32 downregulated genes and 58 upregulated transcripts. These genes as shown in Table 2 are associated with regulation of cell proliferation, cell cycle signaling, and immune system development. Whether or not individual genes within these signatures play a functional role in hematopoietic differentiation will require additional studies.

We further analyzed the dataset to generate highly selective gene lists that would predict the nature of the differentiated cell types and the parent stem cell. The criteria that were used to determine selectivity, in order for a gene to be selectively up for either of the three comparisons (E vs. CD34+, M vs. CD34+, G vs. CD34+) were to have a P < 0.0001 and at least twofold up for the comparison of interest and unchanged or upregulated <1.4-fold in the other two comparisons. Applying these criteria, we were able to generate three different clusters that showed an upregulation of 30 transcripts highly selective for the E group, 32 for the G group, and 269 for the M group. Gene ontology analysis of these highly selective genes classified them to be involved in seven major functions such as binding, catalytic, signal transducer, transcription regulator, structural molecule, enzyme regulator, and transporter activity. However, it should be noted that the functional classification of a gene may be redundant and few genes could be classified into many different functional categories.

Of particular interest, we observed that during erythropoietic differentiation, the expression of GTPase activator proteins are upregulated. These GTPase activator proteins are known to be involved in actin cytoskeleton organization, membrane trafficking, gene expression, and cell proliferation. We also observed and validated increased expression of several genes associated with homeostasis and platelets during megakaryopoietic differentiation including CD44, TPM1, vWF, GP5, PDGFB, F2RL2, and ELMO1.

Having defined the transcriptome of the differentiated hematopoietic cells, we then examined the differences in exon expression that would represent putative alternative splicing. Using the ExonSVD model, we identified 86 known genes to be alternately spliced among the differentiated cells compared with progenitors. These genes include 31 transcripts that are common to the E, G, and M groups. E and G groups showed an overlap of 14 alternatively spliced genes, while G and M groups showed the lowest overlap of five spliced genes. Eleven spliced genes were found to be common to E and M groups, which are of potential functional importance as erythrocytes and megakaryocytes share a common progenitor. Lineage-specific splicing was observed in 13 genes in the megakaryocytes, while only two genes were specific for the granulocytic group. Red cell differentiation identified 10 genes with differentially expressed exon transcripts. This is the first report showing alternative splicing events during lineage-specific in vitro hematopoietic differentiation. Our gene list also includes alternatively spliced genes (CAST, CLMN, EPHX1, GAB1, vWF) that have previously been identified by others using PCR, SAGE, and sequencing technologies. Importantly, we have identified additional novel spliced genes in the current study as shown in Tables 7, 8, and 9. Finally, we were able to validate our microarray-based detection of alternative splicing with the whole transcriptome sequencing on a next generation massively parallel sequencing platform with a significant degree of correlation. It is likely that this sequence-based expression profiling would become a widely used platform future transcriptome studies.

Functional pathway analysis of these 86 alternatively spliced genes showed preferential enrichment of biological processes related to hematological system development and function, molecular transport, cellular function and maintenance, cell-to-cell signaling and interaction, immune cell trafficking, cell-mediated immune response, antigen presentation, and protein trafficking. Most importantly, the top five canonical pathways were found to be Rac and RhoA signaling, leukocyte extravascular signaling and wnt/B-catenin signaling, alanine and aspartate metabolism, and regulation of actin-based motility by Rho.

The Rac signaling pathway has a role in many cellular functions including cell motility and adhesion, cell growth and proliferation, and cell survival and apoptosis (5, 13, 20, 27). Rac proteins constitute a subgroup of the Rho family of small GTPases and include Rac1, Rac2, Rac3, and the splice variant of Rac1, Rac1b. By acting as molecular switches, they control a variety of signal pathways that are essential for cell functions (5, 13, 20, 27). Rac GTPases are key regulators of the actin cytoskeleton, cell-cycle progression and gene transcription, cell survival and apoptosis, and the NADPH oxidase for producing reactive oxygen species. Aberrant Rac signaling is found in some human cancers as a result of changes in the GTPase itself or in its regulation loops (38).

Rho signaling pathway, another pathway that was found to be modulated during hematopoietic differentiation, orchestrates cellular processes as diverse as cell migration, cell-cycle progression and cytokinesis, microbial killing (through phagocytosis and NADPH oxidase activity), and agonist-regulated gene transcription. In particular, Rac- and Rho-induced effects, which correlate with membrane protrusion and contractility, respectively, antagonize each other in a variety of cell types.

Wnt proteins are a family of highly conserved signaling factors controlling cell fate and differentiation during development including regulation of signals regulating the self-renewal and differentiation interface in hematopoietic stem cells (31). Alterations of the Wnt/β-catenin signaling pathway are known to be associated with the tumorigenesis of tissues with a high renewal potential such as that of bone marrow-hematopoietic tissue (2, 3, 9, 14, 32). Moreover, different human wnt genes show a complex organization and pattern of expression with alternative promoters and RNA splicing responsible for the expression of isoforms. Therefore, though less attention has thus far been paid to the regulation of Wnt expression, such an analysis appears to be required to understand and define the respective role of individual Wnt proteins, if not individual Wnt isoforms, in the control of cell fate, differentiation, and tissue regeneration. Further studies are still needed to determine if alternatively spliced isoforms of Wnt pathway genes play a functional role leading to hematopoietic cell differentiation.

In summary, we used an in vitro model to analyze differentiating hematopoietic stem cells and applied microarray and sequencing technologies to generate detailed expression and exon profiles during lineage-specific differentiation of cells. Findings for erythroid differentiation have been validated and extended with next generation sequencing technology. Knowledge about the specific transcriptional programs during normal hematopoiesis may contribute to further understanding of the complex process of hematopoietic stem cell development to define new pathophysiological pathways that can possibly be used for the strategy of target-specific treatment in the near future.

GRANTS

Funding by Intramural Research, NHLBI and CIT, NIH.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author(s).

Supplementary Material

Table S1
tableS1.xls (228KB, xls)

ACKNOWLEDGMENTS

We acknowledge Intramural Research, NHLBI, for the funding and Edge Biosciences and Applied Biosystems for generous help in RNAseq analysis using the SOLiD platform. We appreciate the help of Drs. Harry L. Malech and Uimook Choi, in the Genetic Immunotherapy Section, Laboratory of Host Defenses, National Institute of Allergy and Infectious Diseases, NIH, for providing the CD34+ cells. We acknowledge Dr. Phil Mccoy and Ms. Leigh Samsel in the NHLBI Flow Cytometry Core for help with the flow characterization of cells. We gratefully acknowledge the help of Dr. Zu Xi Yu in the Pathology Core Facility-NHLBI for help in the staining of cells.

Footnotes

1

The online version of this article contains supplemental material.

REFERENCES

  • 1. Adams GB. Deconstructing the hematopoietic stem cell niche: revealing the therapeutic potential. Regen Med 3: 523–530, 2008. [DOI] [PubMed] [Google Scholar]
  • 2. Arnsdorf EJ, Tummala P, Jacobs CR. Non-canonical Wnt signaling and N-cadherin related beta-catenin signaling play a role in mechanically induced osteogenic cell fate. PLoS One 4: e5388, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Auclair BA, Benoit YD, Rivard N, Mishina Y, Perreault N. Bone morphogenetic protein signaling is essential for terminal differentiation of the intestinal secretory cell lineage. Gastroenterology 133: 887–896, 2007. [DOI] [PubMed] [Google Scholar]
  • 4. Blencowe BJ. Alternative splicing: new insights from global analyses. Cell 126: 37–47, 2006. [DOI] [PubMed] [Google Scholar]
  • 5. Cancelas JA, Jansen M, Williams DA. The role of chemokine activation of Rac GTPases in hematopoietic stem cell marrow homing, retention, and peripheral mobilization. Exp Hematol 34: 976–985, 2006. [DOI] [PubMed] [Google Scholar]
  • 6. Chasis JA, Mohandas N. Erythroblastic islands: niches for erythropoiesis. Blood 112: 470–478, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Christofi T, Raptis DA, Kallis A, Ambasakoor F. True trilineage haematopoiesis in excised heterotopic ossification from a laparotomy scar: report of a case and literature review. Ann R Coll Surg Engl 90: W12–W14, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Dent AL, Kaplan MH. T cell regulation of hematopoiesis. Front Biosci 13: 6229–6236, 2008. [DOI] [PubMed] [Google Scholar]
  • 9. Eisenmann DM. Wnt signaling. WormBook: 1–17, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Elefanty AG, Robb L, Begley CG. Factors involved in leukaemogenesis and haemopoiesis. Baillieres Clin Haematol 10: 589–614, 1997. [DOI] [PubMed] [Google Scholar]
  • 11. Giebel B, Punzel M. Lineage development of hematopoietic stem and progenitor cells. Biol Chem 389: 813–824, 2008. [DOI] [PubMed] [Google Scholar]
  • 12. Greaves MF. Differentiation-linked leukemogenesis in lymphocytes. Science 234: 697–704, 1986. [DOI] [PubMed] [Google Scholar]
  • 13. Hall A. Rho GTPases and the control of cell behaviour. Biochem Soc Trans 33: 891–895, 2005. [DOI] [PubMed] [Google Scholar]
  • 14. Katoh M. WNT signaling pathway and stem cell signaling network. Clin Cancer Res 13: 4042–4045, 2007. [DOI] [PubMed] [Google Scholar]
  • 15. Komor M, Guller S, Baldus CD, de Vos S, Hoelzer D, Ottmann OG, Hofmann WK. Transcriptional profiling of human hematopoiesis during in vitro lineage-specific differentiation. Stem Cells 23: 1154–1169, 2005. [DOI] [PubMed] [Google Scholar]
  • 16. Kosaki G. Platelet production by megakaryocytes: protoplatelet theory justifies cytoplasmic fragmentation model. Int J Hematol 88: 255–267, 2008. [DOI] [PubMed] [Google Scholar]
  • 17. Mikhail A, Covic A, Goldsmith D. Stimulating erythropoiesis: future perspectives. Kidney Blood Press Res 31: 234–246, 2008. [DOI] [PubMed] [Google Scholar]
  • 18. Muller-Sieburg C, Sieburg HB. Stem cell aging: survival of the laziest? Cell Cycle 7: 3798–3804, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Nissen-Druey C, Tichelli A, Meyer-Monard S. Human hematopoietic colonies in health and disease. Acta Haematol 113: 5–96, 2005. [DOI] [PubMed] [Google Scholar]
  • 20. Nowak JM, Grzanka A, Zuryn A, Stepien A. (The Rho protein family and its role in the cellular cytoskeleton). Postepy Hig Med Dosw (Online) 62: 110–117, 2008. [PubMed] [Google Scholar]
  • 21. Olsson I, Bergh G, Ehinger M, Gullberg U. Cell differentiation in acute myeloid leukemia. Eur J Haematol 57: 1–16, 1996. [DOI] [PubMed] [Google Scholar]
  • 22. Orkin SH. Diversification of haematopoietic stem cells to specific lineages. Nat Rev Genet 1: 57–64, 2000. [DOI] [PubMed] [Google Scholar]
  • 23. Orkin SH. Stem cell alchemy. Nat Med 6: 1212–1213, 2000. [DOI] [PubMed] [Google Scholar]
  • 24. Orlovskaya I, Schraufstatter I, Loring J, Khaldoyanidi S. Hematopoietic differentiation of embryonic stem cells. Methods 45: 159–167, 2008. [DOI] [PubMed] [Google Scholar]
  • 25. Palis J, Segel GB. Developmental biology of erythropoiesis. Blood Rev 12: 106–114, 1998. [DOI] [PubMed] [Google Scholar]
  • 26. Passegue E, Weisman IL. Leukemic stem cells: where do they come from? Stem Cell Rev 1: 181–188, 2005. [DOI] [PubMed] [Google Scholar]
  • 27. Pernis AB. Rho GTPase-mediated pathways in mature CD4+ T cells. Autoimmun Rev 8: 199–203, 2009. [DOI] [PubMed] [Google Scholar]
  • 28. Pohar TT, Sun H, Davuluri RV. HemoPDB: Hematopoiesis Promoter Database, an information resource of transcriptional regulation in blood cell development. Nucleic Acids Res 32: D86–D90, 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Scandura JM. Advances in the molecular genetics of acute leukemia. Curr Oncol Rep 7: 323–332, 2005. [DOI] [PubMed] [Google Scholar]
  • 30. Solier S, Barb J, Zeeberg BR, Varma S, Ryan MC, Kohn KW, Weinstein JN, Munson PJ, Pommier Y. Genome-wide analysis of novel splice variants induced by topoisomerase I poisoning shows preferential occurrence in genes encoding splicing factors. Cancer Res 70: 8055–8065, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Staal FJ, Luis TC. Wnt signaling in hematopoiesis: crucial factors for self-renewal, proliferation, and cell fate decisions. J Cell Biochem 109: 844–849, 2010. [DOI] [PubMed] [Google Scholar]
  • 32. Sumi T, Tsuneyoshi N, Nakatsuji N, Suemori H. Defining early lineage specification of human embryonic stem cells by the orchestrated balance of canonical Wnt/beta-catenin, Activin/Nodal and BMP signaling. Development 135: 2969–2979, 2008. [DOI] [PubMed] [Google Scholar]
  • 33. Thomsen R, Solvsten CA, Linnet TE, Blechingberg J, Nielsen AL. Analysis of qPCR data by converting exponentially related Ct values into linearly related X0 values. J Bioinform Comput Biol 8: 885–900, 2010. [DOI] [PubMed] [Google Scholar]
  • 34. Warren LA, Rossi DJ. Stem cells and aging in the hematopoietic system. Mech Ageing Dev 130: 46–53, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Watkins NA, Gusnanto A, de Bono B, De S, Miranda-Saavedra D, Hardie DL, Angenent WG, Attwood AP, Ellis PD, Erber W, Foad NS, Garner SF, Isacke CM, Jolley J, Koch K, Macaulay IC, Morley SL, Rendon A, Rice KM, Taylor N, Thijssen-Timmer DC, Tijssen MR, van der Schoot CE, Wernisch L, Winzer T, Dudbridge F, Buckley CD, Langford CF, Teichmann S, Gottgens B, Ouwehand WH. A HaemAtlas: characterizing gene expression in differentiated human blood cells. Blood 113: e1–e9, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Weih F, Carrasco D, Durham SK, Barton DS, Rizzo CA, Ryseck RP, Lira SA, Bravo R. Multiorgan inflammation and hematopoietic abnormalities in mice with a targeted disruption of RelB, a member of the NF-kappa B/Rel family. Cell 80: 331–340, 1995. [DOI] [PubMed] [Google Scholar]
  • 37. Weiss MJ, dos Santos CO. Chaperoning erythropoiesis. Blood 113: 2136–2144, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Yamazaki D, Kurisu S, Takenawa T. Regulation of cancer cell motility through actin reorganization. Cancer Sci 96: 379–386, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1
tableS1.xls (228KB, xls)

Articles from Physiological Genomics are provided here courtesy of American Physiological Society

RESOURCES