Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2015 Sep 1.
Published in final edited form as: Nat Biotechnol. 2015 Feb 9;33(3):269–276. doi: 10.1038/nbt.3154

Decoding the Regulatory Network for Blood Development from Single-Cell Gene Expression Measurements

Victoria Moignard 1,2,#, Steven Woodhouse 1,2,#, Laleh Haghverdi 3,4, Andrew J Lilly 5, Yosuke Tanaka 1,2,6, Adam C Wilkinson 1,2, Florian Buettner 3, Iain C Macaulay 7, Wajid Jawaid 1, Evangelia Diamanti 1,2, Shin-Ichi Nishikawa 6, Nir Piterman 8, Valerie Kouskoff 5, Fabian J Theis 3,4, Jasmin Fisher 9,10,*, Berthold Göttgens 1,2,*
PMCID: PMC4374163  EMSID: EMS61867  PMID: 25664528

Abstract

Here we report the use of diffusion maps and network synthesis from state transition graphs to better understand developmental pathways from single cell gene expression profiling. We map the progression of mesoderm towards blood in the mouse by single-cell expression analysis of 3,934 cells, capturing cells with blood-forming potential at four sequential developmental stages. By adapting the diffusion plot methodology for dimensionality reduction to single-cell data, we reconstruct the developmental journey to blood at single-cell resolution. Using transitions between individual cellular states as input, we develop a single-cell network synthesis toolkit to generate a computationally executable transcriptional regulatory network model that recapitulates blood development. Model predictions were validated by showing that Sox7 inhibits primitive erythropoiesis, and that Sox and Hox factors control early expression of Erg. We therefore demonstrate that single-cell analysis of a developing organ coupled with computational approaches can reveal the transcriptional programs that control organogenesis.


Blood has long served as a model to study organ development, owing to the accessibility of blood cells and the availability of markers for specific cell populations. Blood development initiates at gastrulation from multipotent Flk1+ (Kdr) mesodermal cells, which initially have the potential to form blood, endothelium and smooth muscle cells1,2. Blood development represents one of the earliest stages of organogenesis, as the production of primitive erythrocytes is required to support the growing embryo. Single-cell gene expression analysis has already been successfully applied to study the earliest stages of preimplantation mouse and human development3-5, to identify lineage commitment6 and transcriptional regulatory7 events in blood, and more recently to probe the emergence of HSCs from the haemogenic endothelium of the dorsal aorta8.

Here we report in-vivo gene-expression analysis of early blood development at the single-cell level, focusing on transcription factors (TFs) as regulators of cell fate. Using qRT-PCR, we analyzed >40 genes in 3,934 cells with blood and endothelial potential from five populations at four sequential stages of post-implantation mouse development between E7.0 and E8.25. We adapted the diffusion plot methodology previously reported in non-biological contexts9 for dimensional reduction of single-cell data, where pseudotemporal ordering of individual cells revealed a putative developmental hierarchy branching towards both blood and endothelial-like fates. To discover the underlying regulatory network, we developed a single-cell network synthesis (SCNS) toolkit for the synthesis of executable Boolean network models from binary single-cell expression states, which correspond to the on and off patterns of TF expression. Using this toolkit we identified a core network of 20 highly connected TFs, which could reach eight stable states representing blood and endothelium. We validated model predictions to demonstrate that Sox7 blocks primitive erythroid development, while Sox and Hox factors directly regulate expression of the HSC regulator, Erg. The SCNS toolkit therefore opens up network reconstruction for other systems without the requirement for prior knowledge of regulatory interactions.

Results

Capturing single cells with blood-forming potential from gastrulating embryos

The first wave of primitive haematopoiesis originates from Flk1+ mesoderm1,2,10, with all haematopoietic potential in the mouse contained within the Flk1+ population from E7.0 onwards. Although some blood progenitor cells lose Flk1 expression just before the onset of circulation11, previous work using a LacZ reporter knocked into the Runx1 locus showed that haematopoietic potential remains confined to the Runx1+ fraction12, which was confirmed with a GFP reporter driven by the Runx1 +23 enhancer, which reproduces Runx1 expression 8. Using Flk1 expression in combination with a Runx1-ires-GFP reporter mouse13 therefore allowed us to capture cells with blood potential at distinct anatomical stages across a time course of mouse development (Fig. 1a,b). Single Flk1+ cells were flow sorted at E7.0 (primitive streak, PS), E7.5 (neural plate, NP) and E7.75 (head fold, HF) stages. We subdivided E8.25 cells into putative blood and endothelial populations by isolating GFP+ cells (four somite, 4SG) and Flk1+GFP cells (4SFG), respectively (Fig. 1b, Supplementary Fig. 1a). Cells were sorted from multiple embryos at each time point, with 3,934 cells going on to subsequent analysis (Fig. 1c). Total cell numbers (Supplementary Fig. 1b) and numbers of cells of appropriate phenotypes (Fig. 1d) present in each embryo were estimated from FACS data, indicating that for the first three stages, more than one embryo equivalent of Flk1+ cells was collected.

Figure 1. Single-cell gene expression analysis of early blood development.

Figure 1

(a) Flk1 and Runx1 staining in E7.5 mesoderm and blood band, respectively. Scale bar is 100 μm. (b) Single cells sorted from five populations at four anatomically distinct stages from E7.0-8.25. (c) Quantification of cells sorted and retained for analysis after quality control. (d) Quantification of Flk1+, GFP+ or Flk1+GFP− cells in embryos at each time point from FACS data (Supplementary Fig. 1a). Line indicates median. (e) Unsupervised hierarchical clustering of gene expression for the 33 TFs and 7 markers in all cells. Coloured bar indicates embryonic stage. Major clusters indicated. ND, not detected.

We next quantified the expression of 33 TFs involved in endothelial and haematopoietic development14, nine marker genes including the embryonic globin Hbb-bH1 and cell surface markers such as Cdh5 (VE-Cadherin) and Itga2b (CD41), as well as four reference housekeeping genes in all 3,934 cells using microfluidic qRT-PCR technology7 (Supplementary Table 1), which resulted in >150,000 quantitative expression scores.

Development of blood progenitor cells is not synchronized

Unsupervised hierarchical clustering of the 33 TF and 9 marker genes across all 3,934 cells revealed three major clusters (Fig. 1e). Cluster I was small and comprised mostly PS and NP cells. It lacked expression of blood-associated genes, but showed low expression of some endothelial genes and high expression of Cdh1 (E-cadherin), likely representing mesodermal cells at the primitive streak15. Cluster II contained the greatest number of cells and included most of the PS, NP, HF and 4SFG cells, was characterized by endothelial gene expression, and contained sub-clusters with elevated expression of haemogenic endothelial genes, such as Cdh5, or haematopoietic genes such as Gfi1, indicating that this cluster contains a continuum of cells maturing from mesodermal to haematopoietic and endothelial fates. Cluster III was formed by most of the E8.25 Runx1GFP+ 4SG cells, and had robust expression of haematopoietic genes (including Hbb-bH1, Gata1, Nfe2, Gfi1b, Ikzf1 (Ikaros) and Myb), and low expression of endothelial genes (Erg, Sox7, Sox17, Hoxb4, Cdh5). The mixing of cells from different anatomical stages by hierarchical clustering analysis therefore suggested that developmental maturation of single cells in early mesodermal cell populations is asynchronous, with cells at multiple stages expressing similar combinations of developmental regulators. This is consistent with the gradual ingression of cells through the primitive streak and lineage commitment during gastrulation.

Principal component analysis (PCA) for all 3,934 cells confirmed the large-scale mixing of cells from different anatomical stages, with only 4SG cells forming a stage-specific group (Supplementary Fig. 2a). The PCA was retrospectively coloured to show which embryo each cell belongs to (Supplementary Fig. 2b), to determine whether this mixing is the result of developmental asynchrony within embryos or differences in maturation between different embryos classified as being of the same anatomical stage. We quantified the percentage of cells from each embryo belonging to clusters I, II and III identified by hierarchical clustering (Fig. 1e and Supplementary Fig. 2c,d). This showed that cells collected from each embryo at the PS, NP and HF stages were distributed across clusters I and II, with the earlier stages showing a greater bias towards cluster I than later stages. These results are therefore consistent with a model whereby cells representing both early and later stages along the differentiation trajectory towards blood are present throughout the PS, NP and HF timepoints, captured as snapshot measurements in our high-throughput single-cell expression profiling.

A proportion of Flk1+ cells will give rise to mesodermal lineages other than blood and endothelium, and the extent to which they emerge over time and contribute to the variability would need to be analyzed using different gene sets. Notably, however, >50% of PS, NP and HF cells co-expressed Flk1 and Runx1 at the mRNA level, highlighting the presence of Flk1+ cells with haemogenic potential8,12 from the earliest timepoints (Supplementary Fig. 3). Analysis of 50-cell pools from the PS, NP and HF stages by RNA-seq showed graded expression increases of haematopoietic and endothelial genes from the day 7.0 to the day 7.5 and day 7.75 samples. This is entirely consistent with the continuous emergence of blood-specified cells deduced from our single-cell data, as an increase in the proportion of cells expressing a given gene between stages will increase population-averaged expression measurements. Key mesodermal and cardiac genes, by contrast, showed graded down-regulation in the pooled-cell RNA-Seq (Supplementary Fig. 4). These graded expression changes over time are not consistent with a discrete on or off switch at a specific developmental timepoint, but could again be due to gradual changes in the proportion of cells expressing the marker genes, similar to our observations from single-cell analysis of blood and endothelial genes. Alternatively, quantitative changes in expression levels within a constant proportion of cardiac-specified cells would give a similar result and cannot be excluded from the pooled-cell RNA-Seq. Therefore, our results indicate, at least for cells destined to become blood and endothelium, that these cells arise at all stages of the analyzed time course rather than in a synchronized fashion at one precise time point, consistent with the gradual nature of gastrulation. Notably, only single-cell analysis over a developmental time-course has the power to reveal the contribution to cellular heterogeneity made by unsynchronized maturation of individual cells.

Diffusion maps identify developmental trajectories

To identify and visualize putative developmental trajectories from the PS to 4S stages in the single-cell gene expression data, we developed a computational approach for dimension reduction (Materials and Methods). Our method is based on the concept of diffusion distances which can be interpreted as a metric for objects (here: cells) which are related to each other via a gradual but stochastic diffusion-like process such as cellular differentiation. In brief, similarities between all 3,934 cells are calculated based on their gene expression patterns, and then visualized globally in a 3D map (Fig. 2, Supplementary Fig. 5). The resultant components span a low-dimensional diffusion-space in which distance reflects how similar cells are in terms of their diffusion distance and can be inferred to represent developmental time.

Figure 2. Diffusion plots identify developmental trajectories.

Figure 2

Diffusion plot of all 3934 cells calculated from the expression of 33 TFs and seven marker genes (top left). Blue, PS; green, NP; orange, HF; red, 4SG; purple, 4SFG. The expression levels of individual genes were then overlaid onto the diffusion plot to highlight patterns of expression (see Supplementary Fig. 5 for additional genes). Circle, PS; diamond, NP; triangle, HF; cross, 4SG; square, 4SFG (visible in high resolution version of figure).

Although there is extensive mixing between PS, NP, HF and 4SFG populations in the diffusion plot, there is a general progression in the cell stages present in different regions of the plot from largely early E7.0 PS and E7.5 NP cells through the later HF cells to the E8.25 4SG cells that form a homogeneous cluster, in line with the expected developmental progression of the blood system, or the 4SFG cells (see Supplementary Fig. 6 for projection of individual populations). Furthermore, we observed that whereas the E8.25 Flk1+Runx1-GFP (4SFG) cells mostly mix with earlier Flk1+ cells, a subset that was not identified by clustering or PCA branches off. This branch expresses endothelial and haemogenic endothelial genes (Cdh5, Erg, Itga2b, Pecam1 (CD31), Sox7, Fli1) with lower to absent expression of Etv2 and Runx1. This observation is consistent with the known bifurcation of blood and endothelium (reviewed in ref. 16) and the down-regulation of Runx1 in more mature endothelial cells17. This bifurcation was more apparent in the diffusion maps than by PCA, independent component analysis or t-SNE (Supplementary Fig. 7). Genes that mark early, intermediate and late stages of blood development showed dynamic expression across the diffusion map (Figure 2), with Cdh1 expressed first, followed by Cdh5 and then the embryonic globin Hbb-bH1. The transcription factors Etv2, Tal1 (Scl), Runx1 and Gata1 were expressed in a pattern consistent with their known sequential roles during the development of haemangioblasts through to erythroid cells18-25. Dynamic expression patterns were also observed for other TFs not previously recognized as major regulators of primitive haematopoiesis, including Erg, Sox7 and Hoxb4. The diffusion map method therefore represents an attractive approach for ordering cells in developmental time, identifying patterns of expression for key regulators and bifurcation events not readily found with standard algorithms.

Synthesis of a network model for early blood development

The correspondence between the diffusion map and known developmental timelines suggested that the measured expression changes reflect developmental trajectories and might be exploited to define the regulatory networks that drive mesodermal cells toward a haematopoietic fate. Cell fate decisions have been modeled successfully using state space analysis of asynchronous Boolean regulatory network models26,27. In this approach, Boolean network dynamics are modeled by a series of asynchronous single-gene changes, and state space analysis reveals the final stable states of the model. We were interested in the inverse problem: if we think of the single-cell expression profiles as the state space of a Boolean network, can we identify the underlying gene regulatory logic? Although single-cell data have been used to refine static networks curated from the literature28, to our knowledge Boolean rules have not been derived directly from single-cell expression data without a priori knowledge of the structure of the network. To tackle this complex question of revealing the molecular changes underpinning cell state transitions, we developed the SCNS toolkit to synthesize Boolean networks based on single-gene transitions in our data.

We first discretized all 3,934 single-cell expression profiles to binary states and connected those states that differ in the expression of only one gene. The threshold for binary discretization was determined as described in Methods. This yielded a connected state-transition graph of 1,448 expression states, connected by single-gene transitions (Fig. 3a,b). The number of times each state occurs is indicated in Supplementary Fig. 8. The probability of seeing even one repeated state or neighbor in the whole theoretical state space is negligible, illustrating the non-random nature of the data. Most states that corresponded to the Runx1-GFP+ 4SG cells clustered together at one end of the state-transition graph, whereas states corresponding to cells from other time points were dispersed between two additional clusters. Likely developmental transitions were revealed, with specific genes consistently switching on or off along all routes linking the major clusters. We therefore considered this state-transition graph as a possible representation of developmental expression state changes based on single-gene switches, and next asked whether this could be used for regulatory network reconstruction. Notably, analysis of real and simulated populations of 20 cells showed that pools for the same stage clustered closely together, which masked variation and therefore would not have provided the number of transcriptional states required for network synthesis (Supplementary Fig. 9).

Figure 3. Regulatory network synthesis from single-cell expression profiles.

Figure 3

(a) Discretisation of 3,934 expression profiles for 33 TFs yields 3,070 unique binary states, 1448 of which can be connected by single-gene changes to yield a state graph. (b) Representation of resulting state graph, coloured by first embryonic stage appearing in each state. Blue, PS; green, NP; orange, HF; red, 4SG; purple, 4SFG. Magnification of fate transition towards 4SG states, with for example Sox7 expression switching off along all routes. (c) Representation of synthesised asynchronous Boolean network models for core network of 20 TFs. Green edges indicate activation; red edges indicate repression. Square boxes represent AND operations. Circles connecting edges indicate multiple update rules.

The direction of movement between two states in the state transition graph is initially not defined. Our method assigns a direction to each connection based on overall movement from the early PS to the later 4SG states, and then finds Boolean update functions for each gene that are consistent with its expression changes across the entire transition graph. Unlike previous analyses of single-cell gene expression data, which have largely relied on statistical properties of the data viewed as a whole, our method can recover mechanistic logic and determine the direction of interactions. When the method was applied to our dataset, we obtained a core network of 20 TFs with endothelial and blood-associated gene modules centered on Sox7, Hoxb4 and Erg, and on Gata1 and Spi1 (PU.1), respectively. For some genes, there were multiple possible consistent update functions. For example, there are two solutions for Erg, both of which include activation by Hoxb4 and Sox17. In total there were 39 possible functions, an average of two per gene. This led to 46,656 possible models from the different combinations of the 39 update rules (Fig. 3c and Supplementary Table 2). Repeating the network synthesis with bootstrapping and a different discretization threshold demonstrated the robustness of our protocol (Supplementary Tables 3 and 4).

Network synthesis predicts direct regulation of Erg

We next asked whether links in our single-cell expression-derived network models can reveal direct regulatory interactions. To provide support for our model, we identified high-confidence gene regulatory regions in the gene loci of the 20 TFs in our network by interrogating a compendium of TF ChIP-seq data from haematopoietic cell types29, followed by identification of binding sites for the 20 TFs within these regions (Supplementary Fig. 10). 27 of the 39 Boolean rules (70%) are supported by the presence of evolutionarily highly conserved motifs for the upstream regulators in the target gene locus (Supplementary Table 2), with support for at least one Boolean rule for 16/20 TFs. This finding suggested that many of the regulatory interactions proposed in our model may be direct upstream regulator/ downstream target gene relationships. To provide further validation, we focused on Erg, which our models predicted is activated by Sox17, or by Hoxb4 in combination with Lyl1 or Scl (Tal1). By analyzing a Hoxb4 ChIP-Seq dataset30, we showed that Hoxb4 can bind to the Erg+85kb enhancer (Supplementary Fig. 11a), which we previously showed to be active in blood stem and progenitor cells31,32. Moreover, comparative sequence analysis revealed that the Erg+85kb contains highly conserved Hox and Sox binding sites (Fig. 4a).

Figure 4. Network analysis predicts transcriptional interactions.

Figure 4

(a) Alignment of mammalian Erg+85 enhancer. Hox sites, red. Sox sites, light blue. (b) Percentage of Flk1+CD41−, Flk1+CD41+ and Flk1−CD41+ cells on days 3-7 of differentiation expressing YFP. Data are mean and s.e.m of triplicate differentiations of 2-3 clones per construct. P-values are reported in Supplementary Table 6.

To investigate regulation of Erg by Hox and Sox factors, we took advantage of a recently described embryonic stem cell-based reporter system in which single-copy enhancer transgenes linked to the Hsp68/Venus reporter are targeted to the Hprt locus33, allowing robust comparisons of wild type and mutant enhancer activity during in vitro differentiation. We tracked enhancer activity during embryoid body differentiation, where cells transit from a Flk1+CD41 mesoderm/haemangioblast state, through a Flk1+CD41+ intermediate, to a Flk1CD41+ haematopoietic state33-36. Flow cytometric analysis revealed a dynamic pattern of YFP expression for the wild-type enhancer, peaking at days 4-5 and highest in the Flk1+CD41+ population (Fig. 4b, Supplementary Fig. 11b,c). Similar expression was seen in the Sox mutant, while mutation of the Hox motifs caused a reduction of YFP+ cells, and the combined Hox and Sox mutant reduced the proportion of YFP+ cells further still. We also saw similar patterns of expression in the other populations, which constitute a larger proportion of the EB cells but have a lower percentage of YFP+ cells (Fig. 4b and Supplementary Fig. 11b,c). This suggests that Hox and Sox factors activate and maintain Erg expression largely independently and additively. When abstracted to the Boolean level, this result is therefore more consistent with the OR logic in our network than with the alternative AND logic, where single mutations would result in an effect as strong as the combined mutant.

Model execution reveals key switches during development

Next, we assessed whether our network models faithfully recapitulate blood and cardiovascular development, in which endothelial and primitive blood cells emerge from a common mesodermal progenitor. To do this, we determined the stable states of the network model, which correspond to those expression patterns for the 20 TFs that satisfy all the Boolean network rules, and therefore can remain stable. We found that only eight stable states are reachable in total across all possible models, including “endothelial-like” (WT-S7) and “blood-like” expression states (WT-S2 to S6) (Figure 5a). Of note, 432 models had both the endothelial-like state and at least one of the blood-like states (WT-6) as stable states, thus capturing the functionality of bipotential Flk1+ precursors.

Figure 5. In silico perturbations predict key regulators of blood development.

Figure 5

(a) Network stable states for wild-type and Sox7 overexpression. Red indicates expressed; blue indicates not expressed. (b) Colony assays with or without doxycycline from genotyped E8.25 embryos from iSox7+rtTA+ mice crossed with wild types. (c) Quantification of primitive erythroid colonies after 4 days (mean and s.e.m for the number of embryos indicated). P-value was determined using the student’s t-test for the number of embryos indicated.

Finally, we explored the consequences of in silico perturbation. Overexpression and knockout experiments were simulated for each TF and the ability of the network to reach wild type or new stable states was assessed (Supplementary Table 5). For a number of factors, stable states 6 or 7 were no longer reachable. Among these, enforced expression of Sox7, a factor normally down-regulated when cells transit towards the 4SG state (Fig. 3b), resulted in the stabilization of the endothelial module and an inability to reach any of the blood-like states (Fig. 5a). Only two stable states were possible, among the lowest for all factors, and furthermore, Sox7 is predicted to regulate more targets than any other TF, suggesting that perturbing its expression could have significant downstream consequences (Supplementary Table 5). To validate this prediction, we crossed the previously reported iSox7+rtTA+ male mice37 with wild type females, collected embryos at E8.25 and performed colony forming assays (Fig. 5b). Embryos carrying both transgenes showed a significant reduction of primitive erythroid colony formation and simultaneous appearance of undifferentiated haemangioblast-like colonies following doxycycline-induced Sox7 expression compared to controls (Fig. 5c and data not shown). This suggests, in agreement with modeling data and gene expression patterns, that down-regulation of Sox7 is important for the specification of primitive erythroid cells.

Discussion

Determining the structure and function of transcriptional regulatory networks is crucial to advancing our understanding of developmental and disease processes and is therefore a key aim of stem and developmental biology. However, studies to date have mainly used population-based data for network construction, or have focused on statistical properties of populations of single cells for network inference.

Bayesian network methods provide a very computationally efficient approach to inferring causal relationships among a set of variables, and have previously been applied to infer cellular signaling networks from single-cell data38. However, these approaches infer a directed acyclic graph where there is no feedback between nodes, a limitation not shared by our approach. In addition, the inference of edges is reliant on network interventions in which many different cell populations are generated by experimentally perturbing genes, and the differences between these populations are used to infer causality. Generating such intervention data is very time consuming and cannot be done when studying wild-type in vivo development. Instead, researchers typically look at the pairwise correlation of genes across single-cell measurements7,39. For example, partial correlation analysis measures the degree of association between two genes while controlling for potential effects of all other genes40. We performed this analysis (Supplementary Fig. 12), and found agreement with many of the edges in our synthesized network; however, this analysis failed to predict the Sox/Hox regulation of Erg which we validated experimentally. Moreover, connections do not specify which gene is the upstream regulator and which is the downstream target, and therefore do not reveal mechanistic logic.

To our knowledge no previous study has analyzed the development of an entire mammalian organ at single-cell resolution. We demonstrate that single-cell expression profiling coupled with computational approaches for network synthesis can reveal molecular control mechanisms of mammalian organogenesis. Analysis of 46 genes in blood precursors across 1.25 days of post-implantation mouse development showed that cellular maturation may be asynchronous, with individual cells maturing at different speeds and a large proportion expressing both Flk1 and Runx1, indicating that they are committing to haemogenic endothelial development. The graded changes in expression for key regulators of other mesodermal fates seen in the cell pools analyzed by RNA-Seq are also consistent with cells expressing the gene emerging over the time-course analyzed, although alternative explanations such as changes in the level of expression cannot be excluded. Furthermore, our diffusion map methodology highlighted the hierarchical nature of organ development, with waves of TF and marker expression and a bifurcation at the 4 somite stage. The presence of embryonic globin and erythroid TF Gata1 in one branch and endothelial markers such as Pecam1, and Cdh5 in the other suggest that this bifurcation represents the separation of blood and endothelial fates14,16. Trapnell et al. 41 recently reported an exciting method related to our diffusion map approach for the analysis of single-cell RNA-seq time-course data, where construction of a minimum spanning tree ordered differentiating cells in developmental pseudotime. Although the authors suggested that this methodology could be used to map regulatory networks, such results were not included in their paper. Moreover, cells were sampled from in vitro differentiating cells rather than directly from embryos.

Here we achieved reconstruction of regulatory network models by deriving expression state graphs from high-throughput single-cell gene expression profiling data and using the expression state graphs to determine gene regulatory rules. Firstly, gene expression profiles are discretized to binary expression states, where 1 represents a gene that is expressed and 0 represents a lack of measurable expression. Then, pairs of states are connected if they differ in the expression state of exactly one gene, resulting in a state graph. Finally, Boolean rules are found for each gene which allow a walk from early states to late states via a series of single-gene transitions. The result is a set of Boolean rules matching the experimental data that can be combined into a network model. This method is provided as the SCNS toolkit. It requires no prior knowledge of regulatory interactions but instead derives logic directly from the gene expression data.

We followed this method of network synthesis with steady state and in silico perturbation analyses that identified blood and endothelial-like expression patterns and implicated Sox7 in the regulation of erythroid fate, which we subsequently validated using transgenic mouse assays. Network synthesis also identified several previously known TF interactions, including close linkage of Etv2, Fli1 and Tal1, where the latter two are known to function downstream of Etv2 in the haemangioblast42,43. To test whether our network model reveals additional direct interactions, we focused on Erg, an essential TF for definitive haematopoiesis and adult HSC function44,45. Our network predicted that Erg expression can be activated either by Sox17 or Hoxb4. The Erg+85 enhancer was previously shown to be controlled by Ets and Gata factors and to be active during haematopoietic development32 and in HSCs31. However, neither Hox nor Sox TFs had been implicated in Erg+85 activity.

Sox7 and Sox17 belong to the SoxF family of TFs which have recently been shown to confer arterial identity in combination with RBPJ/Notch46. Arterial identity is linked with the blood-forming potential of haemogenic endothelial cells in the embryo. Moreover, Hoxb4 expression is also known to enhance blood potential47, yet there is very little knowledge about how SoxF factors or Hoxb4 integrate into the wider network regulating blood development. Our integrated approach of single-cell expression profiling coupled with network synthesis and subsequent experimental validation identifies Erg as a downstream target of Sox and Hox factors during early blood specification. Coupled with our observations here that down-regulation of Sox7 is a key event in the development of primitive erythroid cells, our study demonstrates how network modeling from single cells can help to reveal the transcriptional hierarchies that control mammalian development. Rapid technological advances in our ability to perform single-cell profiling48,49 suggest that this approach will be widely applicable to other organ systems, and may also inform the development of improved cellular programming strategies.

Methods

Timed matings and embryo collection

Timed mating were set up between homozygous Runx1 reporter male and female mice using the Runx1-ires-GFP knock-in mouse previously described13. Animals also contained a Gata1-mCherry reporter transgene not utilized in this study. All animal experiments were carried out in accordance with the RIKEN guidelines for animal and recombinant DNA experiments. Embryos were staged according to morphologic criteria50. Suspensions of embryo cells were prepared as described previously12 and single-cell suspensions were stained with Flk-1-APC (AVAS12 at 1:100 dilution; BD Bioscience). Cells were sorted using a FACS Aria II (BD Bioscience) and 100um nozzle. 4SG cells were not sorted for Flk1 as its expression begins to be down-regulated by this time. 4SFG cells were specifically Runx1-GFP at the protein level in order to exclude committed blood cells of the 4SG population, but may express Runx1 at the mRNA level.

Single-cell qRT-PCR

Single-cell qRT-PCR was carried out using the Fluidigm BioMark platform as described7, with a limit of detection (LOD) of Ct 25. The LOD was determined according to Stahlberg et al51 and manufacturer’s instructions, Briefly, standard curves were run on the BioMark with six repeats of each dilution. For each gene, the LOD was the average Ct value for the last dilution at which all six replicates had positive amplification. The overall LOD for the gene set was the median Ct value across all genes. TaqMan assays (Life Technologies) used are listed in Supplementary Table 1. Raw Ct values and normalized data can be found in Supplementary Table 7. Gene expression was subtracted from the limit of detection and normalized on a cell-wise basis to the mean expression of the four housekeeping genes (Eif2b1, Mrpl19, Polr2a and Ubc) in each cell. Cells that did not express all four housekeeping genes were excluded from subsequent analysis, as were cells for which the mean of the four housekeepers was +/− 3SD from the mean of all cells. A dCt value of −14 was then assigned where a gene was not detected. 85-90% of sorted cells were retained for further analysis. Gata2 did not amplify correctly and HoxB3 was not expressed in any cells, so these factors have been excluded from the analysis. Hierarchical clustering was performed in R (www.r-project.org) using the hclust package and heatmap.2 from the gplots package using Spearman rank correlation and complete linkage.

RNA sequencing

Cells were sorted into 2 μl of lysis buffer (0.2 % (v/v) Triton X-100 and 2 U/μl RNase inhibitor (Clontech)) and stored at −80 °C. RNA-seq was carried out using the Smart-seq2 protocol according to Picelli et al52 and sequenced on an Illumina HiSeq 2500.

The reads for five biological replicates for each subtype were mapped using the RNA-Seq aligner STAR version 2.3.053. Parameters used to align with STAR were “--outFilterMultimapScoreRange 1 --outSAMstrandField intronMotif --genomeLoad NoSharedMemory --outStd SAM”. Mus musculus Ensembl assembly GRCm38 (equivalent to UCSC mm10) was used to build the STAR index file, along with the GTF file (version GRCm38.74) from Ensembl. Samtools version 0.1.18 was used to sort the STAR SAM output file and convert it to BAM format which would then be used as input for the HTSEQ-counts program. Counts were determined using the HTSeq-counts program version 0.6.1 (http://wwwhuber.embl.de/users/anders/HTSeq/doc/overview.html) with the parameter “-stranded=no. FPKM values were calculated using in-house scripts and are provided in Supplementary Table 8.

Diffusion plots

Our visualization approach is based on the diffusion map formalism9. In brief, affinities between all cells based on their expression levels are calculated using a diffusion metric. Next, the cells are organised in 2D or 3D such that the Euclidean distance between the cells corresponds to the diffusion metric. We determined the cell-cell affinities using an isotropic Gaussian kernel

P(i,j)=1Ziexp((xixj)2)

with xi and xj being the gene-expression vector for cells i and j and Zi being a normalization constant such that Σi=1NP(i,j)=1.

The quantity Pt (i,j) can then be interpreted as the transition probability of a diffusion process between cells. Consequently, it is particularly well suited for representing the gradual change in the transcriptional landscape related to developmental trajectories. In contrast, other methods for dimensionality reduction and visualization of high dimensional data such t-SNE54,55, encourage a representation of the data as disjoint clusters which is less meaningful for modeling continuous developmental trajectories.

In order to account for the non-uniform density ρ of cells in the gene-expression space (i.e. the potential presence of rare populations), we re-normalize the affinities P(i,j) between two cells i and j based on the local density ρ(i) and ρ(j) to P~(i,j). Furthermore, we encourage a better representation of local behavior by only using a subset of neighboring cells (20% nearest neighbors) for computing the affinities and by setting the diagonal of the affinity matrix P~ to zero. We then calculate the eigenvectors of P~ and retain the eigenvectors with the greatest eigenvalues, which we use for visualization.

Network analysis

In computer science, synthesis is a general term for the counterpart of verification. In verification, a hand-build model is given, along with a specification of how it ought to behave. Then the model is checked to ensure it satisfies the specification. In synthesis, a specification is given and a model is automatically generated which satisfies this specification. In biology, the specification is the experimental data that the model should reproduce. In our case, it is the state-transition graph, which was derived from the single-cell gene expression data. Synthesis has recently been applied in the context of biology56. In that work, state machine-like models were synthesized which were consistent with known experimental mutation results, given in a genotype-phenotype table. Both the data and the type of model considered were different to those dealt with in the current work, which called for a different approach.

To synthesize a Boolean network model, we would like to orient the transitions in the state graph (previously every pair of states that differ in the expression of exactly one gene were connected by an undirected edge) such that a given set of final states will be reachable from a given set of initial states. We will allow edges to be directed in one direction, both directions, or in neither. We would then like to extract the Boolean update functions that give rise to these directed transitions. We try to get the best possible network by maximizing the number of states in which no transitions induced by the update functions are missing (condition 2, below). We can state our synthesis problem formally as follows. We are given a set of variables V, corresponding to genes, and an undirected graph G = (N, E) where each node nN is labeled with a state s: V → {0, 1}, and each edge {s1, s2} ∈ E is labeled with the single variable that changes between state s1 and s2. Note that by s we denote both a state and the node labeled by that state, unambiguously. We are also given a designated set IN of initial vertices and a designated set FN of final vertices, along with a threshold ti for each variable viV.

Our synthesis method searches for an orientation of G, along with an update function ui: {0, 1}n → {0, 1} for each variable viV, such that the following conditions hold:

  1. For each edge (s1, s2) labeled with variable vi in the orientated graph, the update function for vi takes state s1 to state s2: ui(s1) = s2(i).

  2. For every variable viV, let Ni be the set of states without a vi-labeled edge. For every i the number of states sNi such that ui(s) = s(i) is greater than or equal to ti.

  3. Every final vertex fF is reachable from some initial vertex iI by a directed path in the orientated graph.

We restrict the update function ui to have the form:

f1¬f2,

where fj is a Boolean formula that has and-nodes of in-degree two and or-nodes of arbitrary in-degree, and where f1 has a maximum depth of Ni and f2 has a maximum depth of Mi. Ni and Mi are given as parameters to the method.

The search for edge orientations and associated Boolean update rules is encoded as a Boolean satisfiability (SAT) problem. The update functions of each variable can be sought after separately, giving rise to reasonably sized satisfiability queries. We then combine compatible single gene update functions by restricting our attention to combinations that permit paths from initial to final nodes. Paths between initial and final nodes in oriented graphs are found using a breadth-first search for the shortest path between two nodes. We restrict our search to these shortest paths both for efficiency reasons, and to eliminate routes that seem “unbiological”, for example routes that cross a fate transition and then return to where they began. We exhaustively search for all satisfying solutions. The method is implemented in the F# programming language, and uses the Z3 solver to handle SAT queries.

After assessing the method’s capability to reconstruct literature-derived asynchronous Boolean networks from their own state spaces (Supplementary Note), we applied it to our biological data. From the resulting synthesized Boolean network models, we obtained a core network of 20 factors.

For our initial states, we took the set of the PS states in the earliest state cluster in the state transition graph. As the final states, we took a core of the 4SG states in the latest cluster. These states are listed Supplementary Table 9.

Note that due to the intermixing between populations, there is no guarantee that a state measured on day 7.75 is further ahead in development than a state measured on day 7.5 (for example). We therefore only constrain reachability from start states to end states, and do not require that experimental measurement time is respected. To obtain the thresholds, ti, and the maximum sizes of the activating and repressing portions of update functions N and M, we performed an optimization step for each gene independently, where the size of allowed update functions was steadily increased until Ni reached a maximum. ti was then set to 0. 66Ni in order to allow the method room to find Boolean update rules that permit a path from initial to final states. To obtain the stable states of Boolean network models, the algorithm from Garg et al.57 was applied. Binary states can be found in Supplementary Table 10 and cells with equal cell states are listed in Supplementary Table 11.

Synthesis bootstrapping

To assess the robustness of the predictions of network synthesis, we performed bootstrapping. A random sample of 75% of the 3,934 gene expression profiles was retained, and a new state transition graph was built from this reduced data set. This state transition graph was then used as the basis to synthesize new Boolean rules, using the same parameters as the original analysis. The results of repeating this process five times are shown in Supplementary Tables 3a-e. Bold entries indicate a rule is identical to a rule synthesized from the original data set. Underlined entries indicate that a rule is contained within a larger rule from the original synthesis. We see that in most cases the original rule or a closely related, underlined, rule is synthesized. In general, the number of possible solutions for a gene’s update function grows as the amount of data used is decreased, and including the full data set narrows these possibilities.

Assessing sensitivity of synthesized rules to binary discretization threshold

In order to construct a state transition graph and apply our synthesis method, experimental data must first be discretized to binary values that indicate whether a gene is expressed or not expressed. The details of how we determine this threshold are covered in the section entitled “Single-cell q-RT-PCR”.

To assess sensitivity of results to the choice of threshold, we repeated our analysis with a more stringent cut off, increasing it by 2 cycles. This resulted in a state transition graph of 1249 nodes (199 fewer nodes than the original state transition graph) which was then used as the basis to synthesize new Boolean rules, using the same parameters as the original analysis. The results are shown in Supplementary Table 4. Bold entries indicate a rule is identical to a rule synthesized from the original data set. Underlined entries indicate that a rule is contained within a larger rule from the original synthesis. We see that in most cases the original rule or a closely related, underlined, rule is synthesized. In general, the number of possible solutions for a gene’s update function grows as the amount of data used is decreased, and including the full data set narrows these possibilities.

Erg+85 enhancer reporter cassette generation

Hprt locus-targeting enhancer reporter cassettes containing the Erg+85 (wild-type or mutated) element upstream of a Venus YFP fluorescent reporter gene driven by the Hsp68 minimal promoter (Erg+85/Hsp/Venus) were generated by Gateway cloning as previously described33, and verified by sequencing. The coordinates of the cloned region in the mouse mm10 genome build are chr16:95439106-95439643. The wild-type and mutated Erg+85 elements were initially PCR amplified from synthetic Gene Art Strings (Life Technologies) using primers with attB sequences (underlined) upstream of enhancer-specific sequence (Erg+85attb1F GGGGACAAGTTTGTACAAAAAAGCAGGCTGCCTAAGGGCCGAGGTTG, Erg+85attb1R GGGGACCACTTTGTACAAGAAAGCTGGGTGCATGAAATCACCTTGGAAATTTGTC; see Fig. 4a for sequences of mutated motifs).

Embryonic stem (ES) cell gene targeting, differentiation and analysis

Erg+85/Hsp/Venus cassettes were targeted to the Hprt locus in Hprt-deficient mouse HM-1 ES cells58 to generate clonal lines that were differentiated into blood by embryoid body formation and analyzed at the stated time points by flow cytometry for Flk1 (as above) and CD41-PECy7 (eBioMWReg30, 1:500, Biolegend), all as previously described33. Data are the combined average of three biological replicates from two ES cell clones. Two HM-1 lines carrying an enhancer-less Hsp/Venus cassette were used as a control, as described previously33.

Sox7 induction and colony assays

Timed matings were set up between transgenic male iSox7+rtTA+ and wild-type ICR female mice37. The morning of vaginal plug detection was considered embryonic day (E) 0.5. All animal work was performed under regulations governed by the Home Office Legislation under the Animal Scientific Procedures Act (ASPA) 1986. Cells from E8.25 embryos were tested in a clonogenic replating assay for haematopoietic progenitors with or without 1μg/ml doxycycline as previously described37. For each embryo 1/10 of the cells was used for genotyping and the remaining cells were equally divided into −dox and +dox conditions. Primitive erythroid colonies were counted after 4 days in culture. Primers used for genotyping were rtTA-F ACAAGGTTTTTCACTAGAGAACGCG, rtTA-R AGATCGAAATCGTCTAGCGCGTCG, iSox7-F CTAGATCTCGAAGGATCTGGAG, iSox7-R ATACTTTCTCGGCAGGAGCA

Availability of computational resources and data

We provide our SCNS toolkit and associated data at http://scns.stemcells.cam.ac.uk as well as online accompanying this paper. This includes the full code for the synthesis method, along with scripts for:

  1. Constructing a state transition graph from single-cell gene expression data.

  2. Automating the process of finding stable states and performing all single-gene in-silico perturbations of synthesised Boolean networks. This second script also categorises perturbations in terms of alterations to the stable states that the model is able to reach. Both a failure to reach states normally reachable for the wild-type model, as well as stabilisation at novel “unnatural” states can be important, with the former mimicking for example the failure of a cell to develop down a given lineage, while the latter could be used to gain mechanistic understanding of pathological cellular states (such as in cancer cells). For example, to look for factors involved in blood differentiation, we collected all perturbations which retained the desired “endothelial-like” state, removed the undesired “blood-like” state, and then ranked these perturbations by the number of additional, undesired states that were introduced.

RNA-Seq data have been deposited into the NCBI Gene Expression Omnibus portal under the accession number GSE61470 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE61470).

Supplementary Material

1
dataset10
dataset11
dataset5
dataset7
dataset8
dataset9

Acknowledgements

We thank James Downing (St Jude Children’s Research Hospital, Memphis, TN) for the Runx1-ires-GFP mouse. Research in the authors’ laboratory is supported by the Medical Research Council, Biotechnology and Biological Sciences Research Council, Leukaemia and Lymphoma Research, The Leukemia and Lymphoma Society, Microsoft Research and core support grants by the Wellcome Trust to the Cambridge Institute for Medical Research and Wellcome Trust - MRC Cambridge Stem Cell Institute. V.M. is supported by a Medical Research Council Studentship and Centenary Award and S.W. by a Microsoft Research PhD Scholarship.

Footnotes

Competing financial interests statement

The authors declare no competing financial interests.

References

  • 1.Shalaby F, et al. A requirement for Flk1 in primitive and definitive hematopoiesis and vasculogenesis. Cell. 1997;89:981–90. doi: 10.1016/s0092-8674(00)80283-4. [DOI] [PubMed] [Google Scholar]
  • 2.Shalaby F, et al. Failure of blood-island formation and vasculogenesis in Flk-1-deficient mice. Nature. 1995;376:62–6. doi: 10.1038/376062a0. [DOI] [PubMed] [Google Scholar]
  • 3.Guo G, et al. Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst. Dev. Cell. 2010;18:675–85. doi: 10.1016/j.devcel.2010.02.012. [DOI] [PubMed] [Google Scholar]
  • 4.Yan L, et al. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat. Struct. Mol. Biol. 2013;20:1131–9. doi: 10.1038/nsmb.2660. [DOI] [PubMed] [Google Scholar]
  • 5.Xue Z, et al. Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature. 2013;500:593–7. doi: 10.1038/nature12364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Pina C, et al. Inferring rules of lineage commitment in haematopoiesis. Nat. Cell Biol. 2012;14:287–94. doi: 10.1038/ncb2442. [DOI] [PubMed] [Google Scholar]
  • 7.Moignard V, et al. Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis. Nat. Cell Biol. 2013;15:363–72. doi: 10.1038/ncb2709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Swiers G, et al. Early dynamic fate changes in haemogenic endothelium characterized at the single-cell level. Nat. Commun. 2013;4:2924. doi: 10.1038/ncomms3924. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Coifman RR, et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl. Acad. Sci. U. S. A. 2005;102:7426–7431. doi: 10.1073/pnas.0500334102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lux CT, et al. All primitive and definitive hematopoietic progenitor cells emerging before E10 in the mouse embryo are products of the yolk sac. Blood. 2008;111:3435–8. doi: 10.1182/blood-2007-08-107086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ding G, Tanaka Y, Hayashi M, Nishikawa S-I, Kataoka H. PDGF receptor alpha+ mesoderm contributes to endothelial and hematopoietic cells in mice. Dev. Dyn. 2013;242:254–68. doi: 10.1002/dvdy.23923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Tanaka Y, et al. Early ontogenic origin of the hematopoietic stem cell lineage. Proc. Natl. Acad. Sci. U. S. A. 2012;109:4515–20. doi: 10.1073/pnas.1115828109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lorsbach RB, et al. Role of RUNX1 in adult hematopoiesis: analysis of RUNX1-IRES-GFP knock-in mice reveals differential lineage expression. Blood. 2004;103:2522–9. doi: 10.1182/blood-2003-07-2439. [DOI] [PubMed] [Google Scholar]
  • 14.Moignard V, Woodhouse S, Fisher J, Göttgens B. Transcriptional hierarchies regulating early blood cell development. Blood Cells. Mol. Dis. 2013;51:239–47. doi: 10.1016/j.bcmd.2013.07.007. [DOI] [PubMed] [Google Scholar]
  • 15.Thiery JP, Acloque H, Huang RYJ, Nieto MA. Epithelial-mesenchymal transitions in development and disease. Cell. 2009;139:871–890. doi: 10.1016/j.cell.2009.11.007. [DOI] [PubMed] [Google Scholar]
  • 16.Costa G, Kouskoff V, Lacaud G. Origin of blood cells and HSC production in the embryo. Trends Immunol. 2012;33:215–23. doi: 10.1016/j.it.2012.01.012. [DOI] [PubMed] [Google Scholar]
  • 17.Samokhvalov IM, Samokhvalova NI, Nishikawa S. Cell tracing shows the contribution of the yolk sac to adult haematopoiesis. Nature. 2007;446:1056–61. doi: 10.1038/nature05725. [DOI] [PubMed] [Google Scholar]
  • 18.Fujiwara Y, Browne CP, Cunniff K, Goff SC, Orkin SH. Arrested development of embryonic red cell precursors in mouse embryos lacking transcription factor GATA-1. Proc. Natl. Acad. Sci. U. S. A. 1996;93:12355–8. doi: 10.1073/pnas.93.22.12355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Robb L, et al. Absence of yolk sac hematopoiesis from mice with a targeted disruption of the scl gene. Proc. Natl. Acad. Sci. U. S. A. 1995;92:7075–9. doi: 10.1073/pnas.92.15.7075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Shivdasani RA, Mayer EL, Orkin SH. Absence of blood formation in mice lacking the T-cell leukaemia oncoprotein tal-1/SCL. Nature. 1995;373:432–4. doi: 10.1038/373432a0. [DOI] [PubMed] [Google Scholar]
  • 21.Schlaeger TM, Mikkola HKA, Gekas C, Helgadottir HB, Orkin SH. Tie2Cre-mediated gene ablation defines the stem-cell leukemia gene (SCL/tal1)-dependent window during hematopoietic stem-cell development. Blood. 2005;105:3871–4. doi: 10.1182/blood-2004-11-4467. [DOI] [PubMed] [Google Scholar]
  • 22.Chen MJ, Yokomizo BM, Zeigler E, Dzierzak E, Speck NA. Runx1 is required for the endothelial to haematopoietic cell transition but not thereafter. Nature. 2009;457:887–91. doi: 10.1038/nature07619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.North T, et al. Cbfa2 is required for the formation of intra-aortic hematopoietic clusters. Development. 1999;126:2563–75. doi: 10.1242/dev.126.11.2563. [DOI] [PubMed] [Google Scholar]
  • 24.Wareing S, et al. The Flk1-Cre-mediated deletion of ETV2 defines its narrow temporal requirement during embryonic hematopoietic development. Stem Cells. 2012;30:1521–31. doi: 10.1002/stem.1115. [DOI] [PubMed] [Google Scholar]
  • 25.Sumanas S, et al. Interplay among Etsrp/ER71, Scl, and Alk8 signaling controls endothelial and myeloid cell formation. Blood. 2008;111:4500–10. doi: 10.1182/blood-2007-09-110569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Krumsiek J, Marr C, Schroeder T, Theis FJ. Hierarchical differentiation of myeloid progenitors is encoded in the transcription factor network. PLoS One. 2011;6:e22649. doi: 10.1371/journal.pone.0022649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bonzanni N, et al. Hard-wired heterogeneity in blood stem cells revealed using a dynamic regulatory network model. Bioinformatics. 2013;29:i80–8. doi: 10.1093/bioinformatics/btt243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Xu H, Ang Y-S, Sevilla A, Lemischka IR, Ma’ayan A. Construction and Validation of a Regulatory Network for Pluripotency and Self-Renewal of Mouse Embryonic Stem Cells. PLoS Comput. Biol. 2014;10:e1003777. doi: 10.1371/journal.pcbi.1003777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Sánchez-Castillo M, et al. CODEX: a next-generation sequencing experiment database for the haematopoietic and embryonic stem cell communities. Nucleic Acids Res. 2014 doi: 10.1093/nar/gku895. In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Fan R, et al. Dynamic HoxB4-regulatory network during embryonic stem cell differentiation to hematopoietic cells. Blood. 2012;119:e139–47. doi: 10.1182/blood-2011-12-396754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Thoms J. a I., et al. ERG promotes T-acute lymphoblastic leukemia and is transcriptionally regulated in leukemic cells by a stem cell enhancer. Blood. 2011;117:7079–89. doi: 10.1182/blood-2010-12-317990. [DOI] [PubMed] [Google Scholar]
  • 32.Wilson NK, et al. The transcriptional program controlled by the stem cell leukemia gene Scl/Tal1 during early embryonic hematopoietic development. Blood. 2009;113:5456–65. doi: 10.1182/blood-2009-01-200048. [DOI] [PubMed] [Google Scholar]
  • 33.Wilkinson AC, et al. Single site-specific integration targeting coupled with embryonic stem cell differentiation provides a high-throughput alternative to in vivo enhancer analyses. Biol. Open. 2013;2:1229–38. doi: 10.1242/bio.20136296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Mitjavila-Garcia MT, et al. Expression of CD41 on hematopoietic progenitors derived from embryonic hematopoietic cells. Development. 2002;129:2003–13. doi: 10.1242/dev.129.8.2003. [DOI] [PubMed] [Google Scholar]
  • 35.Mikkola HKA, Fujiwara Y, Schlaeger TM, Traver D, Orkin SH. Expression of CD41 marks the initiation of definitive hematopoiesis in the mouse embryo. Blood. 2003;101:508–16. doi: 10.1182/blood-2002-06-1699. [DOI] [PubMed] [Google Scholar]
  • 36.Kabrun N, et al. Flk-1 expression defines a population of early embryonic hematopoietic precursors. Development. 1997;124:2039–48. doi: 10.1242/dev.124.10.2039. [DOI] [PubMed] [Google Scholar]
  • 37.Gandillet A, et al. Sox7-sustained expression alters the balance between proliferation and differentiation of hematopoietic progenitors at the onset of blood specification. Blood. 2009;114:4813–22. doi: 10.1182/blood-2009-06-226290. [DOI] [PubMed] [Google Scholar]
  • 38.Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP. Causal protein-signaling networks derived from multiparameter single-cell data. Science. 2005;308:523–9. doi: 10.1126/science.1105809. [DOI] [PubMed] [Google Scholar]
  • 39.Guo G, et al. Mapping cellular hierarchy by single-cell analysis of the cell surface repertoire. Cell Stem Cell. 2013;13:492–505. doi: 10.1016/j.stem.2013.07.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Bailey NTJ. Statistical Methods in Biology. Cambridge University Press; 1995. p. 255. 1995. [Google Scholar]
  • 41.Trapnell C, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 2014 doi: 10.1038/nbt.2859. advance on. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Pimanda JE, et al. Gata2, Fli1, and Scl form a recursively wired gene-regulatory circuit during early hematopoietic development. Proc. Natl. Acad. Sci. U. S. A. 2007;104:17692–7. doi: 10.1073/pnas.0707045104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Kataoka H, et al. Etv2/ER71 induces vascular mesoderm from Flk1+PDGFRα+ primitive mesoderm. Blood. 2011;118:6975–86. doi: 10.1182/blood-2011-05-352658. [DOI] [PubMed] [Google Scholar]
  • 44.Loughran SJ, et al. The transcription factor Erg is essential for definitive hematopoiesis and the function of adult hematopoietic stem cells. Nat. Immunol. 2008;9:810–9. doi: 10.1038/ni.1617. [DOI] [PubMed] [Google Scholar]
  • 45.Taoudi S, et al. ERG dependence distinguishes developmental control of hematopoietic stem cell maintenance from hematopoietic specification. Genes Dev. 2011;25:251–62. doi: 10.1101/gad.2009211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Sacilotto N, et al. Analysis of Dll4 regulation reveals a combinatorial role for Sox and Notch in arterial development. Proc. Natl. Acad. Sci. U. S. A. 2013;110:11893–8. doi: 10.1073/pnas.1300805110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Kyba M, Perlingeiro RCR, Daley GQ. HoxB4 confers definitive lymphoid-myeloid engraftment potential on embryonic stem cell and yolk sac hematopoietic progenitors. Cell. 2002;109:29–37. doi: 10.1016/s0092-8674(02)00680-3. [DOI] [PubMed] [Google Scholar]
  • 48.Tischler J, Surani MA. Investigating transcriptional states at single-cell-resolution. Curr. Opin. Biotechnol. 2013;24:69–78. doi: 10.1016/j.copbio.2012.09.013. [DOI] [PubMed] [Google Scholar]
  • 49.Tang F, Lao K, Surani MA. Development and applications of single-cell transcriptome analysis. Nat. Methods. 2011;8:S6–11. doi: 10.1038/nmeth.1557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Downs KM, Davies T. Staging of gastrulating mouse embryos by morphological landmarks in the dissecting microscope. Development. 1993;118:1255–66. doi: 10.1242/dev.118.4.1255. [DOI] [PubMed] [Google Scholar]
  • 51.Ståhlberg A, et al. Defining cell populations with single-cell gene expression profiling: correlations and identification of astrocyte subpopulations. Nucleic Acids Res. 2011;39:e24. doi: 10.1093/nar/gkq1182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Picelli S, et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 2014;9:171–81. doi: 10.1038/nprot.2014.006. [DOI] [PubMed] [Google Scholar]
  • 53.Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Van der Maaten L, Hinton G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008;9:2579–2605. [Google Scholar]
  • 55.Amir ED, et al. viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat. Biotechnol. 2013;31:545–52. doi: 10.1038/nbt.2594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Koksal AS, et al. Synthesis of biological models from mutation experiments; POPL ’13 Proc. 40th Annu. ACM SIGPLAN-SIGACT Symp. Princ. Program. Lang. 48; 2013.pp. 469–482. [Google Scholar]
  • 57.Garg A, Di Cara A, Xenarios I, Mendoza L, De Micheli G. Synchronous versus asynchronous modeling of gene regulatory networks. Bioinformatics. 2008;24:1917–25. doi: 10.1093/bioinformatics/btn336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Magin TM, McWhir J, Melton DW. A new mouse embryonic stem cell line with good germ line contribution and gene targeting frequency. Nucleic Acids Res. 1992;20:3795–6. doi: 10.1093/nar/20.14.3795. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
dataset10
dataset11
dataset5
dataset7
dataset8
dataset9

RESOURCES