Summary
Adult mitotic tissues like the intestine, skin, and blood undergo constant turnover throughout the life of an organism. Knowing the identity of the stem cell is crucial to understanding tissue homeostasis and its aberrations upon disease. Here we present a computational method for the derivation of a lineage tree from single-cell transcriptome data. By exploiting the tree topology and the transcriptome composition, we establish StemID, an algorithm for identifying stem cells among all detectable cell types within a population. We demonstrate that StemID recovers two known adult stem cell populations, Lgr5+ cells in the small intestine and hematopoietic stem cells in the bone marrow. We apply StemID to predict candidate multipotent cell populations in the human pancreas, a tissue with largely uncharacterized turnover dynamics. We hope that StemID will accelerate the search for novel stem cells by providing concrete markers for biological follow-up and validation.
Graphical Abstract
Highlights
-
•
StemID infers the lineage tree and identifies stem cells from single-cell mRNA-seq data
-
•
Direct links of stem cells to distinct sub-types reflect transcriptome plasticity
-
•
The permissive stem cell transcriptome is characterized by high entropy
-
•
StemID infers candidate multipotent cell populations in the human pancreas
Grün et al. developed an algorithm, StemID, for the derivation of cell lineage trees and identification of stem cells from single-cell mRNA sequencing data. StemID successfully recovered known adult stem cell populations from the small intestine and bone marrow and was then used to predict a novel multipotent cell population in the human pancreas.
Introduction
The identification of a stem cell in a tissue is a major challenge of pivotal importance. Being able to detect the stem cell population allows for powerful approaches to study cell differentiation dynamics by, for example, lineage tracing (Barker et al., 2007, Busch et al., 2015). Additionally, it provides a first step toward ex vivo propagation of primary stem cells in organoid cultures (Lancaster et al., 2013, Sato et al., 2009), important for applications in regenerative medicine. Moreover, stem cell populations relevant for disease progression, such as cancer stem cells, are promising targets for therapeutic intervention. Stem cells are typically rare, which makes their discovery by traditional population-based assays very difficult. For example, it took decades of dedicated research to define the population of hematopoietic stem cells (HSCs) (Eaves, 2015), but it remains an open question how much heterogeneity exists within this subpopulation of bone marrow cells (Wilson et al., 2015). Similarly, the discovery of intestinal stem cells (van der Flier and Clevers, 2009) took years of work, and heterogeneity within this compartment remains under debate (Buczacki et al., 2013).
The recent availability of single-cell mRNA sequencing methods allows profiling of healthy and diseased tissues with single-cell resolution (Grün et al., 2015, Jaitin et al., 2014, Macosko et al., 2015, Patel et al., 2014, Paul et al., 2015, Treutlein et al., 2014, Zeisel et al., 2015). The transcriptome of a cell can be interpreted as a fingerprint, revealing its identity. However, biological gene expression noise (Eldar and Elowitz, 2010, Raj and van Oudenaarden, 2008) and technical noise because of amplification of minute amounts of mRNA from a single cell (Brennecke et al., 2013, Grün et al., 2014) affects the readout and makes it a challenge to discriminate cell types based on their transcriptome. By sequencing large numbers of randomly sampled single cells from a tissue, it is now possible to compile a nearly complete inventory of cell types.
These inventories can be screened for cell types of particular interest, such as stem cells. An obvious strategy for the identification of the stem cell is the derivation of a lineage tree from single-cell sequencing data. However, transcriptomes of randomly sampled cells only represent a snapshot of the system, and temporal differentiation dynamics cannot be directly derived. However, if the system of interest comprises all differentiation stages, such as the intestinal epithelium or the bone marrow, then attempts can be made to infer a lineage tree by assembling single-cell transcriptomes in a pseudo-temporal order. Existing approaches assume a continuous temporal change of transcript levels to assemble differentiation trajectories (Bendall et al., 2014, Haghverdi et al., 2015, Trapnell et al., 2014), but resolving the correct tree topology remains a challenge.
Here we present a method to identify rare and abundant cell types of a system and use these cell type classifications to guide the inference of a lineage tree. We investigate the general properties characterizing the position of a cell type within the lineage tree and identify the number of branches and the transcriptome uniformity of a cell type as features correlating with the degree of pluripotency. We show that our approach successfully recovers the identity of the stem cell in the intestine and in the bone marrow, two systems with a well described stem cell population. We then use our method to predict multipotent cell populations in the adult human pancreas.
Results
Robust Identification of Mouse Intestinal Cell Types by RaceID2
To develop a robust approach for the inference of differentiation trajectories, we used a previously published dataset from a lineage tracing experiment comprising the progeny of Lgr5-positive mouse intestinal stem cells (Grün et al., 2015). This system is ideal for testing the inference of differentiation dynamics because the lineage tree is already well characterized (Figure 1A). The continuously self-renewing intestinal epithelium is arranged in crypts and villi, with a small number of Lgr5+ stem cells, also known as crypt base columnar cells (CBCs), residing near the crypt bottom. These CBCs give rise to rapidly proliferating transit-amplifying (TA) cells that migrate upward along the crypt-villus axis and develop into the terminally differentiated cell types (Barker, 2014, van der Flier and Clevers, 2009). Although absorptive enterocytes constitute the most abundant cell type, the secretory lineage comprises rare cells, such as mucus-producing goblet cells, hormone-secreting enteroendocrine cells, and antimicrobial Paneth cells. Labeled cells were collected 5 days after label induction using an Lgr5-CreERT2 construct and a Rosa26-YFP reporter with a loxP-flanked transcriptional roadblock (Figure 1B).
We first improved the robustness of the initial clustering step of our previously developed RaceID algorithm (Grün et al., 2015) by replacing the k-means clustering with k-medoids clustering (Figure S1). Second, we noticed that the previously used gap statistic (Tibshirani et al., 2001) was not ideal for determining the cluster number. Although increasing the number of clusters in many cases leads to a growing gap statistic, the decrease of the within-cluster dispersion (Tibshirani et al., 2001) saturates quickly. A further increase of the cluster number, therefore, reduces cluster reproducibility. In RaceID2, we thus determine the cluster number by identifying the saturation point of the within-cluster dispersion. Together, these two changes lead to a more robust initial clustering of RaceID2 (Experimental Procedures; Figure S1).
For the intestinal lineage tracing data (Experimental Procedures), RaceID2 recovered a larger group of Lgr5+ stem cells (cluster 2) and early progeny (clusters 1 and 8) as well as the major mature cell types; i.e., enterocytes (cluster 3), goblet (clusters 4 and 19), Paneth (clusters 5 and 6), and enteroendocrine cells (cluster 7) (Figures 1C and 1D). These cell types could be unambiguously assigned based on the cluster-specific upregulation of marker genes inferred by RaceID2 (Table S1).
Inference of the Lineage Tree with Guided Topology
One of the major challenges for the inference of differentiation pathways in a system with multiple cell lineages is the determination of branching points. To overcome this problem, we predefined the topology of the lineage tree by allowing differentiation trajectories linking each pair of clusters. A putative differentiation trajectory links the medoids of two clusters, and the ensemble of all inter-cluster links defines the possible topology of the lineage tree. To minimize the effect of technical noise and, at the same time, the computational burden, we first reduce the dimensionality of the input space requiring maximal conservation of all point-to-point distances. In a second step, we assign each cell to its most likely position on a single inter-cluster link. To find this position, the vector connecting the medoid of a cluster to one of its cells is projected onto the links between the medoid of this and all remaining clusters, and the cell is assigned to the link with the longest projection after normalizing the length of each link to one. The projection also defines the most likely position of the cell on the link (Figure 2A), reflecting its differentiation state (Experimental Procedures). If this strategy is applied to the intestinal data, then only a subset of links is populated (Figure 2B). To determine links that are more highly populated than expected by chance and are therefore candidates for actual differentiation trajectories, we computed an enrichment p value based on comparison with a background distribution with randomized cell positions (Figure 2B; Figure S2A). Furthermore, we reasoned that the coverage of a link by cells indicates how likely it is that this link represents an actual differentiation trajectory and not only biased perturbations driving the transcriptome of a given cluster preferentially toward the transcriptome of another cluster without leading to actual differentiation events. We defined a link score as one minus the maximum difference between the positions of each pair of neighboring cells on the link after normalizing the length of each link to one (Figure S2B). If this score is close to one, then the link is densely covered with cells with only small gaps in between. If the link score is close to zero, the cell density is only concentrated near the cluster centers connected by this link. A detailed description of the algorithm is given in the Experimental Procedures. The computationally inferred intestinal lineage tree is consistent with the known lineage tree (Figure 1A). Secretory cell types (clusters 4, 5, 6, and 7) populate individual branches emanating from the central Lgr5+ cluster, and absorptive enterocytes (cluster 3) differentiate from the same group via a more abundant group of TA cells (cluster 1).
We compared the inferred lineage tree to the tree predicted by Monocle (Trapnell et al., 2014), a recent method for the derivation of branched lineage trees that does not rely on a predefined tree topology, and found that Monocle could not resolve the different branches of secretory cells (Figure S2).
High Connectivity and High Transcriptome Entropy Reveals the Identity of the Stem Cell
Next we attempted to predict the stem cell identity from the lineage tree. Our working definition of a stem cell for this purpose purely relies on multipotency. More precisely, we try to identify, from the lineage tree, the cell population with the highest degree of multipotency. We noticed that different cell types showed a variable number of populated links to other clusters. The link score is reflected by the thickness of the line in our graphical representation (Figure 2B). We also show links with a low link score because they are informative about the associated cell state. For example, a cell type with many low-scoring links can fluctuate toward a diversity of fate biases, whereas cell types with only a few links are much more canalized. These two scenarios reflect a more promiscuous transcriptome, such as expected for stem cells, versus a more confined transcriptome, as expected for a mature cell type. In our data, cluster 2, which contains cells positive for Lgr5 and other established stem cell markers (Ascl2 and Clca4) (Figure 2C), was the most highly connected cluster. Another putative property of stem cells is the tendency to exhibit a more uniform composition of the transcriptome in comparison with differentiated cells. Mature cell types frequently express a small number of genes at very high levels, crucial for cell type-specific functions. The transcriptome of Paneth cells, for instance, is dominated by high numbers of lysozymes and other host defense genes. The uniformity of the transcriptome is reflected by Shannon’s entropy (Shannon, 1948), and this concept has previously been applied to study cellular differentiation (Anavy et al., 2014, Banerji et al., 2013, Piras et al., 2014) (Experimental Procedures). We anticipate that the transcriptome of a multipotent cell type is more uniform in each individual cell. In addition, multiple state biases could coexist within this population that can give rise to diverse mature cell types upon external stimuli, or stochastically, leading to high entropy (Banerji et al., 2013, Ridden et al., 2015). For the intestinal lineage tracing data, both Paneth and goblet cells had clearly reduced entropy compared with Lgr5-positive cells, whereas the entropy of enterocytes and enteroendocrine cells was comparable with stem cells (Figure 2D). We found that, for all analyzed datasets (see below), the number of links discriminates better between multipotent and differentiated cells when rescaled by the entropy. Therefore, the simplest score that performs well in discriminating multipotent cells from the remaining cell types was a product of the median entropy (after subtracting the minimal entropy observed in the system) and the number of links of a cluster (Experimental Procedures). This score exhibits a clear maximum for cluster 2 comprising the Lgr5+ stem cells (Figure 2D). We named our algorithm StemID for the lineage tree inference and the derivation of this score.
StemID Recovers Intestinal Stem Cells in a Complex Dataset with Non-random Cell Type Frequencies
Next we wanted to test whether StemID could identify Lgr5+ cells in a larger and more complex dataset comprising intestinal cells of various independent experiments conducted in our lab. In this dataset, we combined 3 weeks and 8 weeks of Lgr5 lineage tracing data. A subset of those was enriched in secretory cells by fluorescence-activated cell sorting (FACS) on CD24 (van Es et al., 2012; Figure S3). For both time points, we also sorted non-traced CD24+ control cells (Experimental Procedures; Figure S3). RaceID2 revealed the known intestinal cell types within this dataset based on cluster-specific expression of known cell type marker genes and subdivided these into stages of differentiation or maturation (Figures 3A and 3B; Figure S3A). A full list of differentially expressed genes for each cluster is given in Table S2. For example, intestinal stem cells in cluster 7, marked by high expression of Lgr5 and Clca4 (Figure 3B), were connected directly to all secretory branches, whereas TA cells (cluster 5) primarily give rise to enterocytes (cluster 10) (Figure 3C; Figures S3C and S3D). Interestingly, we observed two distinct differentiation trajectories for Paneth cells (clusters 13 and 14), one via a Dll1-positive common precursor of Paneth and goblet cells (cluster 1) and another one directly connecting stem cells (cluster 7) or TA cells in cluster 5, marked by upregulation of the cell-cycle gene Pcna, directly to the mature Paneth cell clusters. Both the Dll1-dependent (van Es et al., 2012) and the direct route (Farin et al., 2014, Sawada et al., 1991), which was observed after ablation of Paneth cells, have been described. The recovery of alternative differentiation pathways demonstrates the power of our guided lineage inference. We were not able to recapitulate this finding with a minimum spanning tree-based alternative approach (Figure S3E).
We then computed the StemID score and found that the Lgr5+/Clca4+ cells (cluster 7) exhibit the highest score (Figure 3D). The second highest score was observed for cluster 21, which represents a common progenitor to Paneth and goblet cells. The TA cells in cluster 5, which our lineage inference identifies as progenitors with an enterocyte fate bias, acquire the third-highest StemID score.
Noticeably, Paneth cells in cluster 13 and mature goblet cells in cluster 2 show the same connectivity as the stem and progenitor cells in clusters 7, 5, and 21, but rescaling by entropy helps correctly assign a mature state to these cells (Figure S3F). In conclusion, StemID could identify intestinal stem cells and distinguish progenitor populations from more mature intestinal cell types.
StemID Recovers Hematopoietic Stem Cells within a Non-random Sample of Bone Marrow Cells
To test the performance of StemID in a different biological system, we applied the algorithm to single-cell sequencing data of mouse bone marrow cells. These cells were selected based on physical interactions between doublets or larger groups of cells and are thus not sampled randomly from all cell types in the bone marrow. This dataset was complemented with Kit+Sca-1+Lin−CD48−CD150+ HSCs (Kiel et al., 2005) sorted from the bone marrow (Experimental Procedures; Figure S5B). Cell types identified by RaceID2 were dominated by the myeloid lineage and comprised HSCs, erythroblasts, megakaryocytes, two groups of granulocytes (neutrophils and eosinophils), macrophages, a small group of B lymphocytes, and several clusters representing progenitor stages of the myeloid lineage (Figures 4A and 4B; Figure S6A). A full list of differentially expressed genes for each cluster is shown in Table S3. Cluster 1 comprises almost exclusively sorted HSCs (Figure S4B). The inferred lineage tree (Figure 4C; Figures S6C and S6D) indicates that HSCs differentiate into multipotent progenitor cells (cluster 5) but are also directly linked to mature lineages. HSCs and multipotent progenitors are both linked to megakaryocytes (cluster 4), eosinophils (clusters 10 and 29), macrophages (cluster 28), and two branches covering a spectrum of progenitor and mature states of the neutrophil (clusters 11, 3, 2, 14, 12, and 22) and erythroid lineage (clusters 9, 8, 7, 6, and 13), respectively. The B lymphocytes are only directly linked to the HSCs, suggesting that cluster 5 represents a myeloid progenitor population, and no lymphoid progenitors were present in our sample. The inferred lineage tree is therefore consistent with the existence of a common myeloid progenitor population giving rise to erythrocytes, megakaryocytes, granulocytes, and macrophages (Orkin and Zon, 2008). StemID determines the highest score for cluster 1 and, therefore, correctly recovers HSCs among all cell types in the mixture (Figure 4D; Figure S6). The second-highest score discriminates the multipotent myeloid progenitors (cluster 5) from the remaining cell types, and the third-highest score was assigned to the earliest progenitor of the erythroblast lineage. Therefore, the level of multipotency also correlates with the StemID score of bone marrow-derived cells.
The high connectivity of cluster 1 provides evidence for early fate biases already in HSCs. Moreover, the high entropy of HSCs reflects a more uniform transcriptome in individual cells of this population. The entropy distribution across all cells in this cluster is shifted in comparison with all other groups (Figure 5A). In general, the inter-cluster variability substantially exceeds the intra-cluster variability. The narrow entropy distribution of cluster 1 also rules out a strong dependence on the cell cycle. However, we also observed that 54 of the 276 HSCs (20%) show distinct fate biases, revealed by low expression of lineage-specific marker genes (Figure 5B), a finding that is consistent with a recent report based on lineage tracing (Perié et al., 2015). Because the sensitivity of single-cell sequencing is limited, this number is almost certainly an underestimation. We note that most HSCs (112 of 276) are assigned to the link with the multipotent progenitor (cluster 5). We cannot address whether the observed fate bias persists during differentiation or whether stochastic switching between distinct cell fates occurs during differentiation. Our observation is also consistent with a recent single-cell transcriptome analysis showing an unexpected heterogeneity of myeloid progenitor cell populations and suggests the existence of an early cell fate bias (Paul et al., 2015). We observe very similar sets of marker genes, as found in this study, but our lineage inference permits an analysis of the temporal dynamics of gene expression. As an example, we extracted all cells from the neutrophil branch (clusters 1, 11, 3, 2, and 12) in pseudo-temporal order derived from the projection coordinates and clustered temporal expression profiles by using self-organizing maps (Experimental Procedures). A Z-score of gene expression values along this trajectory reveals that the RaceID2 clusters represent sets of cells with common modules of co-expressed genes and that gene expression within these modules changes smoothly over time (Figure 5C). Although ribosomal protein-encoding genes and other components of the translational machinery slowly decline during differentiation, other genes are transiently switched on in progenitor populations (e.g., Elane) or immature neutrophils (e.g., Ngp) or only upregulated in mature cells (e.g., Retnlg).
Finally, we note that the identification of the HSC population by StemID is robust to changing the contribution of this population to the mixed sample. For example, when only ten HSCs are randomly selected and all others are discarded from the dataset, StemID still assigns the highest score to the small HSC cluster (data not shown).
In summary, StemID could successfully identify the stem cell type in a complex mixture of cells isolated from bone marrow. The inferred lineage tree recovered known trajectories but suggested an early cell fate bias present already in HSCs.
StemID Predicts Multipotent Ductal Cell Populations among Human Adult Pancreatic Cells
After having demonstrated that StemID can robustly identify the stem cell population in two distinct biological systems, we applied the algorithm to predict multipotent cell populations in a less characterized system: the human pancreas. The pancreas consists of acinar cells that produce the digestive enzymes, ductal cells secreting bicarbonate to neutralize stomach acidity, and hormone-producing endocrine cells that regulate hormone metabolism (Jennings et al., 2015). It is unclear which multipotent cells maintain pancreatic homeostasis and can give rise to different mature cell types during regeneration upon injury. Although early studies have suggested that, in humans, these cell populations could reside within the exocrine compartment or that dedifferentiation of exocrine cells could give rise to endocrine cells (Bonner-Weir et al., 2000, Puri et al., 2015), the identity of multipotent cell populations is still unclear (Jiang and Morahan, 2014). We sequenced pancreatic cells from human donors (Experimental Procedures), and application of RaceID2 revealed all major cell types, including different subpopulations of acinar and ductal cells; hormone-producing α, β, δ, and pancreatic polypeptide producing (PP) cells; and stellate cells (Figures 6A and 6B; Figures S5A and S5B). A full list of differentially expressed genes for each cluster is shown in Table S4. In particular, we discovered novel subpopulations of ductal cells. In one of these groups (cluster 14), the cell surface glycoprotein CEACAM6 was significantly upregulated (p < 0.01; Experimental Procedures), whereas components of the ferritin protein (FTH1, FTL), which is the major intracellular iron storage protein, were significantly upregulated (p < 0.01; Experimental Procedures) in the other group (cluster 4) (Figure 6C).
The inferred lineage tree assigns a central position to the ductal cells (Figure 6D; Figures S7C–S7E). Distinct subtypes of ductal cells appear to give rise to different endocrine sub-types and acinar cells. Although differentiation trajectories link cluster 4 to acinar, PP, and β cells, cluster 14 is linked to α and δ cells. Consistently, clusters 4 and 14 acquire the highest StemID score, indicating the highest level of multipotency among the cell types detected in this system (Figure 6E; Figure S7F). The following ranks of the StemID score were occupied by other ductal sub-types and precursor cells that give rise to two sub-states of acinar cells. Interestingly, cluster 4 also directly connects to stellate cells. Upon injury, these cells can switch to an activated state and migrate to the injured location to participate in tissue repair (Omary et al., 2007).
To collect further evidence that cluster 4 is an endocrine progenitor cell, we plotted the expression of the cluster 4 marker FTH1 and the β cell marker insulin (INS) in single cells residing on the differentiation trajectory connecting these two cell types. Cells were ordered by their projection coordinate. The genes exhibited smooth, anti-correlated gradients suggestive of a continuous transition between these two cell types (Figure 6F). To independently validate this observation, we performed antibody staining against insulin and FTL in human pancreatic tissue sections. We were able to detect individual cells co-expressing insulin and FTL within ductal structures, confirming the existence of cluster 4 cells (Figure 7A). Co-staining of glucagon revealed that these cells specifically produce insulin and not glucagon (Figure 7B), as suggested by our analysis (Figure 6C). Our results indicate that the ferritin-positive sub-population of ductal cells might differentiate into mature β cells.
Discussion
In this study, we present an approach to identify stem cells using single-cell transcriptomics data. Because the physiological state of a cell is an approximate reflection of its transcriptome, it is a reasonable assumption that cell types can be discriminated based on their transcriptome. However, determining the stem cell identity among all rare cell types discovered also requires the derivation of a lineage tree.
To address this task, we combined cell type identification by RaceID2 with a tree reconstruction by guided topology. We first introduce an improved version of our previous RaceID algorithm (Grün et al., 2015) with a more robust initial clustering step. The replacement of k-means by k-medoids leads to increased robustness of clustering for all datasets analyzed in the paper. For the complex intestinal dataset (Figure 3), the fraction of clusters with Jaccard’s similarity of > 0.7 is 40% for k-means versus 73% for k-medoids. The corresponding fractions are 58% versus 83% for the bone marrow data and 40% versus 90% for the pancreas data.
To infer differentiation trajectories, we assign every cell onto a specific link between its cluster of origin and another cluster based on the longest projection of the vector connecting the cluster center with the cell position onto these links. This adequately reflects how much a cell has moved from the most representative cell state in the same cluster (the medoid) toward another cell identity (or vice versa). If significantly more cells reside on a link than expected by chance, this provides strong evidence that cells of the cluster of origin exhibit a pronounced transcriptome bias toward another cell fate. In addition, if a continuum of cell states covers a given link, as evidenced by a high link score, then this link represents a strong candidate for an actual differentiation trajectory. Significant links with reduced link scores, on the other hand, indicate plasticity of the connected cell types in a sense that the transcriptome of a cell type can, to some extent, fluctuate toward another fate.
The quality of our lineage inference is supported by the recovery of known differentiation trajectories in the intestinal epithelium and the bone marrow. Remarkably, we recovered a rare alternative differentiation pathway where Lgr5+ cells differentiate directly into Paneth cells without intermediate Dll1+ progenitors (Farin et al., 2014, Sawada et al., 1991). We could also show, for the intestinal and the bone marrow data, that StemID infers a lineage tree with substantially higher resolution in comparison with methods published previously (Haghverdi et al., 2015, Trapnell et al., 2014; Figure S6).
The derived lineage tree for the bone marrow suggested that, in contrast to the classical view of dichotomous differentiation via a hierarchy of increasingly restricted progenitor populations (Giebel and Punzel, 2008), a cell fate bias already exists at stages as early as the HSC stage (Figure 5B). This observation is consistent with a recent single-cell transcriptome analysis revealing heterogeneity of the common myeloid progenitor cell population, indicating early fate bias (Paul et al., 2015). Moreover, direct generation of progenitors restricted to the myeloid fate from mouse HSCs has been described in the past (Yamamoto et al., 2013), and the existence of unipotent cells within human HSCs (Notta et al., 2016) and classically defined mouse multipotent progenitor populations was shown recently (Perié et al., 2015).
For both model systems, the StemID score, which quantifies very general properties of a cell type (i.e., the number of links and the entropy of the transcriptome), ranks RaceID2-predicted cell types by their level of multipotency. Lgr5+ CBCs and sorted HSCs acquire the highest score among all cell types of the intestine and bone marrow, respectively, demonstrating the performance of our algorithm. We could further demonstrate the performance of StemID on two previously published datasets (Figure S7) for cells from developing lung epithelium (Treutlein et al., 2014) and differentiating human radial glial cells (Pollen et al., 2015).
Potential problems for the StemID algorithm arise in the absence of intermediate progenitors or the occurrence of unrelated cell types. In the absence of intermediate progenitors, StemID infers a link to a more multipotent population. For example, B lymphocytes in the bone marrow dataset are directly linked to HSCs. It is known that a spectrum of progenitors will reside on this trajectory, and, as we have observed for the other lineages, an early fate bias toward lymphocytes could exist in HSCs. In the absence of intermediate progenitors, a link to a more multipotent population reflects all information on the lineage relationship that can be extracted from the data. If the stem cell itself is missing from the sample, StemID will identify the cell type with the highest level of multipotency. The presence of unrelated cell types in the mixture could lead to false positive links. However, because the feature space is high-dimensional, it is likely that none of the links between an unrelated cell type and the remaining lineage tree will be significantly populated. We also argue that links of mature cell types to related progenitor or stem cell populations were identified with high specificity (oftentimes only a single link in line with previous findings was detected). This makes the occurrence of significant links between unrelated cell types unlikely.
Finally, we used StemID to screen human adult pancreatic cells for multipotent cell populations. It is unclear which adult pancreatic cell types can give rise to the different mature pancreatic lineages during normal tissue turnover or regeneration. Although initial evidence suggested that multipotent cells within the ductal compartment could differentiate into endocrine cells both in humans and mice (Jiang and Morahan, 2014), subsequent lineage-tracing experiments produced contradictory results. Although mouse lineage tracing of carbonic anhydrase II (Ca2)-positive ductal cells revealed that these cells give rise to β cells upon injury (Bonner-Weir et al., 2008), lineage tracing of Sox9-, Muc1-, or Hnf1β-positive cells could not confirm this finding (Furuyama et al., 2011, Kopinke and Murtaugh, 2010, Kopp et al., 2011, Solar et al., 2009). Using StemID, we were able to predict distinct sub-populations of ductal cells with varying differentiation potential. Although ductal cells marked by high levels of CEACAM6 are predicted to differentiate into α, δ, and PP cells, another sub-population expressing high levels of the ferritin complex primarily appears to give rise to β cells and acinar cells. We note that the latter sub-population does not express any of the markers used in previous lineage-tracing experiments, but we caution that expression of these genes might be too low to be reliably detected by single-cell mRNA sequencing. We further remark that β cell differentiation in the adult pancreas might not be conserved between human and mouse.
We provide the well documented R source code for RaceID2 and the StemID algorithm at https://github.com/dgrun/StemID. We hope that StemID will be useful for a better understanding of differentiation dynamics in a variety of systems.
Experimental Procedures
Lineage-Tracing Experiments
For lineage-tracing experiments, we injected 0.4 mg tamoxifen into 3-month-old Lgr5-CreERT2 C57Bl6/J mice bred to Rosa26LSL-YFP reporter mice.
Isolation of Crypts from Mouse Small Intestine
Crypts were isolated from mice as described previously (Sato et al., 2009). See the Supplemental Experimental Procedures for more details.
Human Islet Isolation, Dispersion, and Sorting
Pancreatic cadaveric tissue was procured from a multiorgan donor program and only used when the pancreas could not be used for clinical pancreas or islet transplantation, according to national laws, and when research consent was present. Human islet isolations were performed in the islet isolation facility of the Leiden University Medical Center according to a modified protocol originally described by Ricordi et al. (1988). See the Supplemental Experimental Procedures for details regarding culturing and cell sorting.
Immunofluorescence
Pancreatic tissue samples were fixed overnight in 4% formaldehyde (Klinipath), stored in 70% ethanol, and subsequently embedded in paraffin. After deparaffinization and rehydration in xylene and ethanol, respectively, antigen retrieval was performed in citric buffer for 20 min. Sections were blocked with 2% normal donkey serum and 1% lamb serum in PBS. Primary antibodies were rabbit anti-Ftl (ab69090), mouse anti-glucagon (ab10988), and guinea pig anti-insulin (ab7842). Alexa Fluor-conjugated secondary antibodies against rabbit, mouse, and guinea pig immunoglobulin G (IgG) (Life Technologies; A11008, A10037, and A21450) were used at a dilution of 1:200. Nuclear counterstaining was done by embedding with DAPI Vectashield (Vector Laboratories, H-1500). Imaging was performed on a Leica SP8 confocal microscope using hybrid detectors.
Preparation of Mouse Hematopoietic Cells
We used C57Bl/6 female or male mice from 23 to 52 weeks bred in our facility. Experimental procedures were approved by the Dier Experimenten Commissie of the Royal Netherlands Academy of Arts and Sciences and performed according to the guidelines. Bone marrow was isolated from femur and tibia by flushing Hank’s balanced salt solution (HBSS, Invitrogen) without calcium or magnesium, supplemented with 1% heat-inactivated fetal calf serum (FCS) (Sigma). See the Supplemental Experimental Procedures for details regarding single cell isolation.
Single-Cell Sequencing Library Preparation
The protocol was carried out as described previously (Grün et al., 2015). See the Supplemental Experimental Procedures for a detailed description.
Quantification of Transcript Abundance
Read mapping and quantification were done as described previously (Grün et al., 2015). See the Supplemental Experimental Procedures for a detailed description.
RaceID2 and StemID
A brief overview is given in the Results. The algorithm and follow-up analyses are described in full detail in the Supplemental Experimental Procedures.
Author Contributions
D.G. and A.v.O. conceived the study. D.G. developed the algorithm and performed all computational analyses. Single-cell sequencing of pancreatic cells and antibody staining were performed by M.J.M with the help of G.D. Single-cell sequencing of intestinal cells was performed by K.W. with the help of A.L., J.v.E., and M.v.d.B. Single-cell sequencing of bone marrow cells was performed by J.C.B. E.J. helped with antibody staining. D.G. wrote the manuscript, and all authors read and edited the manuscript. A.v.O. supervised D.G., M.J.M., K.W., and J.C.B. and the project itself. E.J.P.d.K. supervised G.D. and E.J. H.C. supervised M.v.d.B. and J.v.E.
Acknowledgments
This work was supported by European Research Council Advanced Grant ERC-AdG 294325-GeneNoiseControl, a Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) Vici award, the DON Foundation, and the Dutch Diabetes Research Foundation.
Published: June 23, 2016
Footnotes
Supplemental Information includes Supplemental Experimental Procedures, seven figures, and four tables and can be found with this article online at http://dx.doi.org/10.1016/j.stem.2016.05.010.
Contributor Information
Dominic Grün, Email: gruen@ie-freibug.mpg.de.
Alexander van Oudenaarden, Email: a.vanoudenaarden@hubrecht.eu.
Accession Numbers
The accession numbers for the RNA sequencing datasets reported in this paper are GEO: GSE76408, GSE76983, and GSE81076.
Supplemental Information
References
- Anavy L., Levin M., Khair S., Nakanishi N., Fernandez-Valverde S.L., Degnan B.M., Yanai I. BLIND ordering of large-scale transcriptomic developmental timecourses. Development. 2014;141:1161–1166. doi: 10.1242/dev.105288. [DOI] [PubMed] [Google Scholar]
- Banerji C.R.S., Miranda-Saavedra D., Severini S., Widschwendter M., Enver T., Zhou J.X., Teschendorff A.E. Cellular network entropy as the energy potential in Waddington’s differentiation landscape. Sci. Rep. 2013;3:3039. doi: 10.1038/srep03039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barker N. Adult intestinal stem cells: critical drivers of epithelial homeostasis and regeneration. Nat. Rev. Mol. Cell Biol. 2014;15:19–33. doi: 10.1038/nrm3721. [DOI] [PubMed] [Google Scholar]
- Barker N., van Es J.H., Kuipers J., Kujala P., van den Born M., Cozijnsen M., Haegebarth A., Korving J., Begthel H., Peters P.J., Clevers H. Identification of stem cells in small intestine and colon by marker gene Lgr5. Nature. 2007;449:1003–1007. doi: 10.1038/nature06196. [DOI] [PubMed] [Google Scholar]
- Bendall S.C., Davis K.L., Amir A.D., Tadmor M.D., Simonds E.F., Chen T.J., Shenfeld D.K., Nolan G.P., Pe’er D. Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell. 2014;157:714–725. doi: 10.1016/j.cell.2014.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bonner-Weir S., Taneja M., Weir G.C., Tatarkiewicz K., Song K.H., Sharma A., O’Neil J.J. In vitro cultivation of human islets from expanded ductal tissue. Proc. Natl. Acad. Sci. USA. 2000;97:7999–8004. doi: 10.1073/pnas.97.14.7999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bonner-Weir S., Inada A., Yatoh S., Li W.-C., Aye T., Toschi E., Sharma A. Transdifferentiation of pancreatic ductal cells to endocrine beta-cells. Biochem. Soc. Trans. 2008;36:353–356. doi: 10.1042/BST0360353. [DOI] [PubMed] [Google Scholar]
- Brennecke P., Anders S., Kim J.K., Kołodziejczyk A.A., Zhang X., Proserpio V., Baying B., Benes V., Teichmann S.A., Marioni J.C., Heisler M.G. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods. 2013;10:1093–1095. doi: 10.1038/nmeth.2645. [DOI] [PubMed] [Google Scholar]
- Buczacki S.J.A., Zecchini H.I., Nicholson A.M., Russell R., Vermeulen L., Kemp R., Winton D.J. Intestinal label-retaining cells are secretory precursors expressing Lgr5. Nature. 2013;495:65–69. doi: 10.1038/nature11965. [DOI] [PubMed] [Google Scholar]
- Busch K., Klapproth K., Barile M., Flossdorf M., Holland-Letz T., Schlenner S.M., Reth M., Höfer T., Rodewald H.-R. Fundamental properties of unperturbed haematopoiesis from stem cells in vivo. Nature. 2015;518:542–546. doi: 10.1038/nature14242. [DOI] [PubMed] [Google Scholar]
- Eaves C.J. Hematopoietic stem cells: concepts, definitions, and the new reality. Blood. 2015;125:2605–2613. doi: 10.1182/blood-2014-12-570200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eldar A., Elowitz M.B. Functional roles for noise in genetic circuits. Nature. 2010;467:167–173. doi: 10.1038/nature09326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farin H.F., Karthaus W.R., Kujala P., Rakhshandehroo M., Schwank G., Vries R.G.J., Kalkhoven E., Nieuwenhuis E.E.S., Clevers H. Paneth cell extrusion and release of antimicrobial products is directly controlled by immune cell-derived IFN-γ. J. Exp. Med. 2014;211:1393–1405. doi: 10.1084/jem.20130753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Furuyama K., Kawaguchi Y., Akiyama H., Horiguchi M., Kodama S., Kuhara T., Hosokawa S., Elbahrawy A., Soeda T., Koizumi M. Continuous cell supply from a Sox9-expressing progenitor zone in adult liver, exocrine pancreas and intestine. Nat. Genet. 2011;43:34–41. doi: 10.1038/ng.722. [DOI] [PubMed] [Google Scholar]
- Giebel B., Punzel M. Lineage development of hematopoietic stem and progenitor cells. Biol. Chem. 2008;389:813–824. doi: 10.1515/BC.2008.092. [DOI] [PubMed] [Google Scholar]
- Grün D., Kester L., van Oudenaarden A. Validation of noise models for single-cell transcriptomics. Nat. Methods. 2014;11:637–640. doi: 10.1038/nmeth.2930. [DOI] [PubMed] [Google Scholar]
- Grün D., Lyubimova A., Kester L., Wiebrands K., Basak O., Sasaki N., Clevers H., van Oudenaarden A. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature. 2015;525:251–255. doi: 10.1038/nature14966. [DOI] [PubMed] [Google Scholar]
- Haghverdi L., Buettner F., Theis F.J. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics. 2015;31:2989–2998. doi: 10.1093/bioinformatics/btv325. [DOI] [PubMed] [Google Scholar]
- Jaitin D.A., Kenigsberg E., Keren-Shaul H., Elefant N., Paul F., Zaretsky I., Mildner A., Cohen N., Jung S., Tanay A., Amit I. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science. 2014;343:776–779. doi: 10.1126/science.1247651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jennings R.E., Berry A.A., Strutt J.P., Gerrard D.T., Hanley N.A. Human pancreas development. Development. 2015;142:3126–3137. doi: 10.1242/dev.120063. [DOI] [PubMed] [Google Scholar]
- Jiang F.-X., Morahan G. Pancreatic stem cells remain unresolved. Stem Cells Dev. 2014;23:2803–2812. doi: 10.1089/scd.2014.0214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kiel M.J., Yilmaz O.H., Iwashita T., Yilmaz O.H., Terhorst C., Morrison S.J. SLAM family receptors distinguish hematopoietic stem and progenitor cells and reveal endothelial niches for stem cells. Cell. 2005;121:1109–1121. doi: 10.1016/j.cell.2005.05.026. [DOI] [PubMed] [Google Scholar]
- Kopinke D., Murtaugh L.C. Exocrine-to-endocrine differentiation is detectable only prior to birth in the uninjured mouse pancreas. BMC Dev. Biol. 2010;10:38. doi: 10.1186/1471-213X-10-38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kopp J.L., Dubois C.L., Schaffer A.E., Hao E., Shih H.P., Seymour P.A., Ma J., Sander M. Sox9+ ductal cells are multipotent progenitors throughout development but do not produce new endocrine cells in the normal or injured adult pancreas. Development. 2011;138:653–665. doi: 10.1242/dev.056499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lancaster M.A., Renner M., Martin C.-A., Wenzel D., Bicknell L.S., Hurles M.E., Homfray T., Penninger J.M., Jackson A.P., Knoblich J.A. Cerebral organoids model human brain development and microcephaly. Nature. 2013;501:373–379. doi: 10.1038/nature12517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Macosko E.Z., Basu A., Satija R., Nemesh J., Shekhar K., Goldman M., Tirosh I., Bialas A.R., Kamitaki N., Martersteck E.M. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015;161:1202–1214. doi: 10.1016/j.cell.2015.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Notta F., Zandi S., Takayama N., Dobson S., Gan O.I., Wilson G., Kaufmann K.B., McLeod J., Laurenti E., Dunant C.F. Distinct routes of lineage development reshape the human blood hierarchy across ontogeny. Science. 2016;351:126–127. doi: 10.1126/science.aab2116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Omary M.B., Lugea A., Lowe A.W., Pandol S.J. The pancreatic stellate cell: a star on the rise in pancreatic diseases. J. Clin. Invest. 2007;117:50–59. doi: 10.1172/JCI30082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orkin S.H., Zon L.I. Hematopoiesis: an evolving paradigm for stem cell biology. Cell. 2008;132:631–644. doi: 10.1016/j.cell.2008.01.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patel A.P., Tirosh I., Trombetta J.J., Shalek A.K., Gillespie S.M., Wakimoto H., Cahill D.P., Nahed B.V., Curry W.T., Martuza R.L. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344:1396–1401. doi: 10.1126/science.1254257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paul F., Arkin Y., Giladi A., Jaitin D.A., Kenigsberg E., Keren-Shaul H., Winter D., Lara-Astiaso D., Gury M., Weiner A. Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors. Cell. 2015;163:1663–1677. doi: 10.1016/j.cell.2015.11.013. [DOI] [PubMed] [Google Scholar]
- Perié L., Duffy K.R., Kok L., de Boer R.J., Schumacher T.N. The Branching Point in Erythro-Myeloid Differentiation. Cell. 2015;163:1655–1662. doi: 10.1016/j.cell.2015.11.059. [DOI] [PubMed] [Google Scholar]
- Piras V., Tomita M., Selvarajoo K. Transcriptome-wide variability in single embryonic development cells. Sci. Rep. 2014;4:7137. doi: 10.1038/srep07137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pollen A.A., Nowakowski T.J., Chen J., Retallack H., Sandoval-Espinosa C., Nicholas C.R., Shuga J., Liu S.J., Oldham M.C., Diaz A. Molecular identity of human outer radial glia during cortical development. Cell. 2015;163:55–67. doi: 10.1016/j.cell.2015.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Puri S., Folias A.E., Hebrok M. Plasticity and dedifferentiation within the pancreas: development, homeostasis, and disease. Cell Stem Cell. 2015;16:18–31. doi: 10.1016/j.stem.2014.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raj A., van Oudenaarden A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell. 2008;135:216–226. doi: 10.1016/j.cell.2008.09.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ricordi C., Lacy P.E., Finke E.H., Olack B.J., Scharp D.W. Automated method for isolation of human pancreatic islets. Diabetes. 1988;37:413–420. doi: 10.2337/diab.37.4.413. [DOI] [PubMed] [Google Scholar]
- Ridden S.J., Chang H.H., Zygalakis K.C., MacArthur B.D. Entropy, Ergodicity, and Stem Cell Multipotency. Phys. Rev. Lett. 2015;115:208103. doi: 10.1103/PhysRevLett.115.208103. [DOI] [PubMed] [Google Scholar]
- Sato T., Vries R.G., Snippert H.J., van de Wetering M., Barker N., Stange D.E., van Es J.H., Abo A., Kujala P., Peters P.J., Clevers H. Single Lgr5 stem cells build crypt-villus structures in vitro without a mesenchymal niche. Nature. 2009;459:262–265. doi: 10.1038/nature07935. [DOI] [PubMed] [Google Scholar]
- Sawada M., Takahashi K., Sawada S., Midorikawa O. Selective killing of Paneth cells by intravenous administration of dithizone in rats. Int. J. Exp. Pathol. 1991;72:407–421. [PMC free article] [PubMed] [Google Scholar]
- Shannon C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948;27:379–423, 623–656. [Google Scholar]
- Solar M., Cardalda C., Houbracken I., Martín M., Maestro M.A., De Medts N., Xu X., Grau V., Heimberg H., Bouwens L., Ferrer J. Pancreatic exocrine duct cells give rise to insulin-producing beta cells during embryogenesis but not after birth. Dev. Cell. 2009;17:849–860. doi: 10.1016/j.devcel.2009.11.003. [DOI] [PubMed] [Google Scholar]
- Tibshirani R., Walther G., Hastie T. Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B. Stat. Methodol. 2001;63:411–423. [Google Scholar]
- Trapnell C., Cacchiarelli D., Grimsby J., Pokharel P., Li S., Morse M., Lennon N.J., Livak K.J., Mikkelsen T.S., Rinn J.L. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 2014;32:381–386. doi: 10.1038/nbt.2859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Treutlein B., Brownfield D.G., Wu A.R., Neff N.F., Mantalas G.L., Espinoza F.H., Desai T.J., Krasnow M.A., Quake S.R. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature. 2014;509:371–375. doi: 10.1038/nature13173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Flier L.G., Clevers H. Stem cells, self-renewal, and differentiation in the intestinal epithelium. Annu. Rev. Physiol. 2009;71:241–260. doi: 10.1146/annurev.physiol.010908.163145. [DOI] [PubMed] [Google Scholar]
- van Es J.H., Sato T., van de Wetering M., Lyubimova A., Nee A.N.Y., Gregorieff A., Sasaki N., Zeinstra L., van den Born M., Korving J. Dll1+ secretory progenitor cells revert to stem cells upon crypt damage. Nat. Cell Biol. 2012;14:1099–1104. doi: 10.1038/ncb2581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilson N.K., Kent D.G., Buettner F., Shehata M., Macaulay I.C., Calero-Nieto F.J., Sánchez Castillo M., Oedekoven C.A., Diamanti E., Schulte R. Combined Single-Cell Functional and Gene Expression Analysis Resolves Heterogeneity within Stem Cell Populations. Cell Stem Cell. 2015;16:712–724. doi: 10.1016/j.stem.2015.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yamamoto R., Morita Y., Ooehara J., Hamanaka S., Onodera M., Rudolph K.L., Ema H., Nakauchi H. Clonal analysis unveils self-renewing lineage-restricted progenitors generated directly from hematopoietic stem cells. Cell. 2013;154:1112–1126. doi: 10.1016/j.cell.2013.08.007. [DOI] [PubMed] [Google Scholar]
- Zeisel A., Muñoz-Manchado A.B., Codeluppi S., Lönnerberg P., La Manno G., Juréus A., Marques S., Munguba H., He L., Betsholtz C. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science. 2015;347:1138–1142. doi: 10.1126/science.aaa1934. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.