Abstract
Single-cell genomics technology has transformed our understanding of complex cellular systems. However, excessive cost and a lack of strategies for the purification of newly identified cell types impede their functional characterization and large-scale profiling. Here, we have generated high-content single-cell proteo-genomic reference maps of human blood and bone marrow that quantitatively link the expression of up to 197 surface markers to cellular identities and biological processes across all main hematopoietic cell types in healthy aging and leukemia. These reference maps enable the automatic design of cost-effective high-throughput cytometry schemes that outperform state-of-the-art approaches, accurately reflect complex topologies of cellular systems and permit the purification of precisely defined cell states. The systematic integration of cytometry and proteo-genomic data enables the functional capacities of precisely mapped cell states to be measured at the single-cell level. Our study serves as an accessible resource and paves the way for a data-driven era in cytometry.
Subject terms: Gene expression analysis, Haematopoietic stem cells, Leukaemia, Haematopoiesis
Haas, Velten and colleagues use single-cell multiomics of human blood and bone marrow to generate a reference map allowing the quantitative linking of cytometry and proteo-genomic information.
Main
Single-cell transcriptomic technologies have revolutionized our understanding of tissues1–3. The systematic construction of whole-organ and whole-organism single-cell atlases has revealed an unanticipated diversity of cell types and cell states, and has provided detailed insights into cellular development and differentiation processes4–7. However, strategies for the prospective isolation of cell populations newly identified by single-cell genomics are needed to enable their functional characterization or therapeutic use. Furthermore, single-cell genomics technologies remain cost-intense and scale poorly, impeding their integration into clinical routine.
Unlike single-cell transcriptomics, flow cytometry offers a massive throughput in terms of samples and cells, is commonly used in routine clinical diagnostics8 and remains unrivaled in the ability to prospectively isolate live populations of interest for downstream applications. However, flow cytometry provides low-dimensional measurements and relies on predefined sets of surface markers and gating strategies that have evolved historically in a process of trial and error. Hence, single-cell transcriptomics (scRNA-seq) approaches have demonstrated that flow cytometry gating schemes frequently yield impure or heterogeneous populations9,10, and flow strategies for the precise identification of cell types defined by scRNA-seq are lacking. Conversely, the precision and efficiency of commonly used cytometry gating schemes are largely unknown, and the exact importance of many surface markers remains unclear. Together, these findings highlight a disconnect between single-cell genomics-based molecular cell type maps and data generated by widely used cytometry assays.
The differentiation of hematopoietic stem cells (HSCs) in the bone marrow (BM) constitutes a particularly striking example of this disconnect11–14. The classical model of hematopoiesis, which is based mainly on populations defined by flow cytometry15–17, has recently been challenged in several aspects by single-cell transcriptomic9,10,18–20, functional21,22 and lineage tracing23 approaches. These studies revealed that hematopoietic lineage commitment occurs earlier than previously anticipated, that putative oligopotent progenitors isolated by fluorescence activated cell sorting (FACS) consist of heterogeneous mixtures of progenitor populations and that lineage commitment is represented most accurately by a continuous process of differentiation trajectories rather than by a stepwise differentiation series of discrete progenitor populations12–14,24. The frequency of functionally oligopotent progenitors in immunophenotypic hematopoietic stem and progenitor cell (HSPC) gates remains controversial9,25,26. These discrepancies have contributed to conflicting results between studies that employ scRNA-seq for the definition of progenitor populations9,10,18,19,27 and studies that use FACS15,16,28. As a consequence, flow-based assays that accurately reflect the molecular and cellular complexity of the hematopoietic system are urgently needed.
Recently, methods to simultaneously measure mRNA and surface protein expression in single cells have been developed29,30. Here, we demonstrate that ultrahigh content single-cell proteo-genomic reference maps, alongside appropriate computational tools, can be used to systematically design and analyze cytometry assays that accurately reflect scRNA-seq-based molecular tissue maps at the level of cell types and differentiation states. For this purpose, we have generated proteo-genomic datasets encompassing 97–197 surface markers across 122,004 cells representing the cellular landscape of young, aged and leukemic human BM and blood, as well as all states of HSC differentiation. We demonstrate how such data can be used in an unbiased manner to evaluate and automatically design cytometry gating schemes for individual populations and entire biological systems without previous knowledge. We show that, compared with existing approaches, such optimized schemes are superior in the identification of cell types and more accurately reflect molecular cell states. Projecting datasets from malignant hematopoiesis on our reference atlases enables the fine-mapping of the exact stage of differentiation arrest in leukemias, the identification of leukemia-specific surface markers and an unsupervised classification of disease states. Finally, we demonstrate how such data resources can be used to project low-dimensional cytometry data on single-cell genomic atlases to enable functional analysis of precisely defined states of cellular differentiation. Our data resource and bioinformatic advances enable the efficient identification and isolation of any molecularly defined cell state from blood and BM while laying the grounds for reconciling flow cytometry and single-cell genomics data across human tissues.
Results
A single-cell proteo-genomic reference map of BM
To establish a comprehensive single-cell transcriptomic and surface protein expression map of human BM, we performed a series of Abseq experiments in which mononuclear BM cells from hip aspirates were labeled with 97–197 oligo-tagged antibodies, followed by targeted or whole transcriptome scRNA-seq on the BD Rhapsody platform (Fig. 1a). For targeted single-cell transcriptome profiling, we established a custom panel, consisting of 462 mRNAs covering all HSPC differentiation stages, cell type identity genes, mRNAs of surface receptors and additional genes that permit the characterization of cellular states. These genes were selected systematically to capture all relevant layers of RNA expression heterogeneity observed in this system (Supplementary Note 1 and Supplementary Table 1). Whole transcriptome single-cell proteo-genomics confirmed that no populations were missed due to the targeted nature of the assay (Supplementary Note 2). Using this panel, in combination with 97 surface markers (Supplementary Table 2), we analyzed the BM of three young healthy donors, three aged healthy donors and three acute myeloid leukemia (AML) patients at diagnosis (Fig. 1a, Extended Data Fig. 1 and Supplementary Table 3). For samples from healthy donors, CD34+ cells were enriched to enable a detailed study of HSC differentiation (Extended Data Fig. 2). For samples from AML patients, CD3+ cells were enriched in some cases to ensure sufficient coverage of T cells.
Since single-cell proteo-genomic approaches are not commonly performed at this level of antibody multiplexing, we designed a series of control experiments. First, we performed matched Abseq experiments in the presence or absence of antibodies to ensure that highly multiplex antibody stains do not effect the transcriptome of single cells (Supplementary Note 3). We further performed a series of Abseq experiments on fresh and frozen samples to demonstrate that the freeze–thawing process has no great impact on the data (Supplementary Note 3). Finally, we evaluated the sequencing requirements for optimal cell type classification in high-parametric single-cell proteo-genomic experiments (Supplementary Note 4). In the main reference data set, 70,017 high-quality BM cells were profiled with combined RNA and high-parametric surface protein information, and an average of ~7,500 surface molecules per cell were detected (Extended Data Fig. 3). Following data integration across experiments and measurement modalities, we identified 45 cell types and cell stages covering the vast majority of previously described hematopoietic cell types of the BM and peripheral blood (PB), including all stages of HSC differentiation in the CD34+ compartment, all T cell and natural killer (NK) cell populations of the CD3+ and CD56+ compartments, several dendritic cell and monocyte subpopulations from the CD33+ compartment and all main B cell differentiation states across CD10+, CD19+ and CD38high compartments (Fig. 1b,c, Supplementary Note 5 and Supplementary Table 4). In addition, poorly characterized populations, such as cytotoxic CD4+ T cells and mesenchymal stem or stromal cells (MSCs) are covered. Cells from young and aged BM occupied the same cell states in all individuals, whereas cell states in AML differed (Fig. 1b and see below). Importantly, the combined RNA and surface protein information provided higher resolution and revealed cell types that are not readily identified by one of the individual data layers alone (Supplementary Note 6).
Besides our main reference dataset, we generated ‘query‘ single-cell proteo-genomic datasets, which are displayed in the context of the main reference (Supplementary Note 7). These include, first, the analyses of healthy BM and matched PB samples using a 197-plex antibody panel to query the expression of additional surface markers in the context of our reference (Extended Data Fig. 4 and Supplementary Table 2). Second, the analyses of healthy BM analyzed with a 97-plex antibody panel in combination with whole transcriptome profiling to query any gene’s expression in the space defined by our reference (Supplementary Note 2). Third, the profiling of the CD34+CD38− BM compartment with a 97-plex antibody panel to provide higher resolution of immature HSPCs (see below and Extended Data Fig. 9c,d) and fourth, a cohort of 12 AML patients (see below and Fig. 4). To make our comprehensive resource accessible, we developed the Abseq App, a web-based application that permits visualization of gene and surface marker expression, differential expression testing and the data-driven identification of gating schemes across all datasets presented in this manuscript. A demonstration video of the app is available in the supplement (Supplementary Video 1). The Abseq App is accessible at: https://abseqapp.shiny.embl.de/.
A directory of the biological importance of surface markers
While surface markers are widely used in immunology, stem-cell biology and cancer research to identify cell types, cell stages and biological processes, the exact importance of individual markers frequently remains ambiguous. To link surface marker expression quantitatively with biological processes, we assigned each cell in our data set to its respective cell type, and determined its differentiation stage, its stemness score, its cytotoxicity score and its current cell cycle phase as well as technical covariates (see Methods and below). Moreover, we included covariates representing unknown biological processes that were defined in an unsupervised manner using a factor model. Nontechnical covariates were not affected by marker expression level (Extended Data Fig. 5a and Methods). For each surface marker, we then quantified the fraction of variance of expression that is determined by any of these processes (Fig. 2a). This model identified markers that represent cell type identities or differentiation stages, as well as stemness, cytotoxicity and cell cycle properties (Fig. 2b–d and Extended Data Fig. 5b–f).
To characterize new markers identified by this analysis, we focused initially on the evaluation of surface molecules that specifically mark distinct stages of HSC differentiation, since a lack of specific markers currently impedes the accurate representation of lineage commitment by flow cytometry9,10,18,21,27. For this purpose, we performed pseudotime analyses within the CD34+ HSPC compartment and identified surface markers that correlate with the progression of HSCs towards erythroid, megakaryocytic, monocyte, conventional dendritic cell or B cell differentiation trajectories (Methods; Figs. 2d and 3a and Extended Data Fig. 5g). Of note, the monocyte trajectory also includes neutrophil progenitor stages, but mature neutrophils are not included in the datasets due to the use of density gradient centrifugation of samples. Moreover, trajectory analyses were not performed for plasmacytoid dendritic and eosinophil/basophil lineages due to a low number of intermediate cells impeding the unanimous identification of branch points. Pseudotime analyses quantified the exact expression dynamics of many well-established markers, such as CD38 as a pandifferentiation marker, as well as CD10 and CD11c as early B cell and monocyte-dendritic cell lineage commitment markers, respectively (Fig. 2d and Extended Data Fig. 6a). Importantly, our analyses revealed new surface markers that specifically demarcate distinct stages of lineage commitment, including CD326, CD11a and Tim3 (Figs. 2d and 3). To confirm the high specificity of these markers for erythroid and myeloid commitment, respectively, we used FACS-based indexing of surface markers coupled to single-cell RNA-seq (‘index scRNA-seq’, see also Supplementary Note 8), or coupled to single-cell cultures (‘index cultures’) (Fig. 3b). As suggested by our proteo-genomic single-cell data, CD326 expression was associated with molecular priming and functional commitment into the erythroid lineage (Fig. 3c–g and Extended Data Fig. 6b,c). By contrast, Tim3 and CD11a were identified as panmyeloid differentiation markers and were associated with transcriptomic priming and functional commitment into the myeloid lineage (Fig. 3c,h–o and Extended Data Fig. 6c). Finally, CD98 was identified as a new pandifferentiation marker of HSCs, which we confirmed by classical flow cytometry (Fig. 2d and Extended Data Fig. 6d–h). Beyond the progression of HSCs to lineage-committed cells, we also analyzed the surface marker dynamics throughout B cell differentiation, allowing us to identify markers specific to their lineage commitment, maturation, isotype switching and final plasma cell generation (Extended Data Fig. 6i–p).
Our model provides a global and quantitative understanding of how well cell type identities, differentiation stages and biological processes are related to the expression of individual surface markers. A comprehensive overview of surface markers associated with these processes is depicted in the supplement (Supplementary Data 1 and Extended Data Fig. 5).
Surface protein expression in healthy aging and cancer
To investigate surface protein expression throughout healthy aging, we compared Abseq data of BM from young and aged healthy individuals. These analyses revealed that the expression of surface molecules was highly similar across all BM populations between age groups (Fig. 4a,b and Supplementary Data 1), suggesting unexpectedly stable and highly regulated patterns of surface protein expression that are affected only modestly by aging. While cell type frequencies were also affected only modestly by aging, a substantial accumulation of cytotoxic effector CD8+ T cells was observed31 (Extended Data Fig. 7a). Moreover, the expression of several immune regulatory molecules showed age-related changes in surface presentation, including the death receptor FAS (CD95), the poliovirus receptor (CD155) and the ICOS ligand (CD275) (Fig. 4b). In particular, naive CD8+ and CD4+ T cell subsets displayed an aging-associated decline in surface expression of CD27, a costimulatory molecule required for generation and maintenance of long-term T cell immunity32 (Fig. 4b,c). Together, these analyses suggest that the overall pattern of surface protein expression is widely maintained upon healthy aging, whereas specific changes, most prominently in the surface presentation of immune regulatory molecules, occur.
We next explored surface marker remodeling in AML—a blood cancer characterized by the accumulation of immature, dysfunctional myeloid progenitors, also called blasts. While the cellular BM of healthy donors displayed highly similar topologies across six individuals, initial analysis of three AML patients demonstrated that leukemic cells showed patient-specific alterations and a large degree of interpatient variability (Fig. 1b). To develop a generically applicable workflow to interpret data from hematological diseases in the context of our reference, we generated single-cell proteo-genomics datasets from a total of 15 AML patients, covering six t(15;17) translocated acute promyelocytic leukemias and nine normal karyotype AMLs with NPM1 mutations, of which four patients carried an additional FLT3 internal tandem duplication (Supplementary Table 3). While an unsupervised integration of these data highlighted primarily patient-to-patient variability (Extended Data Fig. 7b), projecting cells onto our healthy reference enabled a fine-mapping of the differentiation stages of leukemia cells (Fig. 4d and Supplementary Note 7). Unsupervised clustering of patients on the basis of relative abundancies of differentiation stages revealed three main categories: ‘monocytic AMLs’ that displayed an extensive accumulation of blasts with classical monocyte phenotype, acute promyelocytic leukemias that were blocked in early and late promyelocyte states, and ‘immature AMLs’ that showed high numbers of immature blasts resembling HSC, multipotent progenitors (MPP), early lymphomyeloid progenitor and early promyelocyte states (Fig. 4e,f). In general, leukemic blasts retained many features reminiscent of the cell stage they were blocked in (Extended Data Fig. 7c–e). Accordingly, differential expression analyses revealed that many surface markers that distinguish the different AML states also mark their corresponding healthy counterparts, such as CD133 for immature AMLs or CD14 and CD11b for monocytic AMLs (Fig. 4g). This also translated into differential surface expression of potential drug targets, such as PD-L1 (CD274) and CTLA4 (CD152) (Fig. 4h and Extended Data Fig. 7f), suggesting that the myeloid differentiation program of the AML might be essential in the treatment choice of targeted immune therapies.
By contrast, differential analyses between AML and healthy cells from the same differentiation stage revealed markers specifically overexpressed in leukemic cells (Fig. 4i, Extended Data Fig. 7c and Supplementary Data 2). Interestingly, these analyses readily identified several previously described leukemia stem-cell markers, including CD25, Tim3, CD123 and CD45RA33, supporting the validity of our approach. Quantifying the degree of interpatient heterogeneity of each marker while accounting for cell state revealed that many known leukemia stem-cell markers vary strongly in their expression between patients (Fig. 4i). Together, this workflow of projection to a well-annotated healthy reference in combination with cell-state-specific differential expression testing might become a standard in scRNA-seq analyses of hematological diseases. Our computational routines are available online at https://git.embl.de/triana/nrn.
Data-driven flow cytometry for immunology
Gating strategies for flow cytometry have evolved historically in a process of trial and error. In particular, the isolation of rare and poorly characterized cell subsets using flow cytometry remains challenging, whereas commonly used gating schemes are not necessarily optimal in purity (precision) and efficiency (recall). To tackle these problems, we explored different machine learning approaches for the data-driven definition of gating schemes. For all populations in our dataset, gating schemes defined by machine learning approaches provided higher precision (purity) when compared with classical gating schemes from the literature (Fig. 5a, Extended Data Fig. 8a–d and Supplementary Table 5). While different machine learning methods tested achieved similar purities, gates defined by the hypergate algorithm34 offered a higher recall (Fig. 5a and Extended Data Fig. 8a–d).
To validate and demonstrate this approach, we focused on determining new gating strategies for rare and poorly characterized BM cell types, such as cytotoxic CD4+ T cells (Fig. 5b) and MSCs (Fig. 5g). Cytotoxic CD4+ T cells represent a rare T cell population characterized by the expression of cytotoxicity genes typically observed in their well-characterized CD8+ T cell counterparts35. While this cell type has been suggested to be involved in several physiological and pathophysiological processes, no coherent gating strategy for their prospective isolation exists36. Hypergate suggested that cytotoxic CD4+ T cells display an immunophenotype of CD4+CD28−, and differential expression analyses of surface markers revealed that cytotoxic CD4+ T cells express significantly lower levels of CD7, CD25, CD127 and CD197 when compared with other CD4+ T cell subsets (Fig. 5b–e). Flow cytometric analyses of CD4+CD28− T cells confirmed the expected immunophenotype in BM from healthy donors and patients with different hematological cancers, suggesting a robust and efficient prospective isolation of this rare cell type (Fig. 5d and Extended Data Fig. 8e). Finally, FACS-based sorting of CD4+CD28− T cells followed by gene expression analysis confirmed the expression of cytotoxicity genes in this population (Fig. 5f).
MSCs constitute a rare and heterogeneous group of cells in the BM37,38. While ex vivo expanded MSCs have been phenotyped extensively, primary human MSCs remain poorly characterized, in particular due to their extremely low frequency. In our dataset, we captured a small number of heterogeneous MSCs, with one subset (MSC-1) expressing high levels of the key BM-homing cytokine CXCL12 (Fig. 5g). Hypergate suggested CXCL12-expressing MSCs to be isolated most efficiently by expression of CD13 and absence of CD11a (Fig. 5h). Indeed, flow cytometric analyses of CD13+CD11a− MSCs validated the immunophenotype suggested by our Abseq data and confirmed known and new MSC surface markers identified by our approach (Fig. 5i,j and Extended Data Fig. 8f). Moreover, FACS-based isolation of CD13+CD11a− cells followed by transcriptomic analyses revealed a high enrichment of CXCL12 and other key MSC signature genes (Fig. 5k).
Together, these analyses demonstrate the utility of our approach for deriving gating schemes from data and mapping the surface marker expression of poorly characterized populations. In combination with our single-cell proteo-genomic reference map, the Abseq App allows users to define new data-driven gating schemes for any population of interest.
A data-defined gating scheme for human hematopoiesis
Gating schemes for complex biological systems, such as the HSPC compartment, are improving steadily. However, there is strong evidence from single-cell transcriptomics9,10,18,19, lineage tracing22,23 and single-cell functional experiments21 that even the most advanced gating schemes do not recapitulate the molecular and cellular heterogeneity observed by single-cell genomics approaches. This has contributed to several misconceptions in the understanding of the hematopoietic system, most notably incorrect assumptions on the purity of cell populations and inconsistent views on lineage commitment hierarchies11–14.
To generate flow cytometric gating schemes that most adequately reflect the transcriptomic states associated with HSC differentiation, we used the Abseq dataset of CD34+ cells from one BM sample (‘Young1’) to train a decision tree. Thereby, we obtained a gating scheme that uses 12 surface markers to define 14 leaves representing molecularly defined cell states with high precision (Fig. 6a–c). The data-derived scheme excelled in the identification of lineage-committed progenitors—a principal shortcoming of many current gating strategies (Fig. 6a–c)9,10,21,22. Importantly, cell populations defined by the data-defined gating scheme were transcriptionally more homogenous, compared with a widely used gating scheme17 (Fig. 6d,e), a state-of-the-art gating scheme focusing on lymphomyeloid differentiation25 (Fig. 6e and Extended Data Fig. 9a–d) and a ‘consensus gating’ scheme generated in silico to combine the latter with a scheme focusing on erythroid-myeloid differentiation26 (Fig. 6e and Extended Data Fig. 9b). Of note, individual populations from the data-defined scheme displayed a functional output comparable with that of populations of the ‘consensus gating’ scheme, while the data-defined scheme overall provided a higher level of information on functional lineage commitment (Extended Data Fig. 9e,f).
To validate this new gating scheme, we implemented the suggested surface marker panel in a classical flow cytometry setup and performed Smart-seq2-based single-cell RNA-seq while simultaneously recording surface marker expression (index scRNA-seq) (Fig. 6f,g and Supplementary Note 8). This approach demonstrated that the new gating strategy efficiently separated molecularly defined cell states (Fig. 6g). Quantitatively, the data-defined gating scheme performed equally well at resolving molecularly defined cell states on the Abseq training data as on the Smart-seq2 validation data, and significantly outperformed the expert-defined gating scheme (Fig. 6h). A limitation of the low cellular throughput of the Smart-seq2 analysis is that the signature-based identification might result in the ‘over-identification’ of certain cell states. Together, our results demonstrate that high-content single-cell proteo-genomic maps can be used to derive data-defined cytometry panels that describe the molecular states of complex biological systems with high accuracy. Moreover, our gating scheme permits a faithful identification and prospective isolation of transcriptomically defined progenitor states in the human hematopoietic hierarchy using cost-effective flow cytometry.
Mapping flow cytometry data on single-cell reference maps
While classical FACS gating strategies are of great use for the prospective isolation and characterization of populations, single-cell genomics studies revealed that differentiation processes, including the first steps of hematopoiesis, are represented most accurately by a continuous process9,18,20,27,39. To complement the approach based on discrete gates, we propose here that high-dimensional flow cytometry data can be used to place single cells into the continuous space of hematopoietic differentiation spanned by single-cell proteo-genomics exploiting shared surface markers (Fig. 7a). Based on the observation that surface marker expressions in flow cytometry and Abseq follow similar distributions (Extended Data Fig. 10a), we developed a new projection algorithm termed nearest rank neighbors (NRN https://git.embl.de/triana/nrn/; see Methods). Given an identical starting population, NRN employs sample ranks to transform surface marker expression of FACS and Abseq data to the same scale, followed by k-nearest neighbors-based projection into a space defined by the proteo-genomic single-cell data. We tested NRN on FACS-indexed Smart-seq2 datasets using the classification panel developed in Fig. 6 (12 markers) and a semiautomated panel based on our Abseq data to better resolve erythromyeloid lineages (11 markers; Supplementary Note 8). We evaluated the performance of NRN using a variety of methods. First, cell types molecularly defined by Smart-seq2 were placed correctly on the Abseq uniform manifold approximation and projection (UMAP) (Fig. 7b). For most molecularly defined cell types, the accuracy of the projection using the flow cytometry data was close to the performance of data integration using whole transcriptome data with a state-of-the-art algorithm (Extended Data Fig. 10b–d). Most importantly, the projections closely reflected the gradual progression of cells through pseudotime, as confirmed by the expression dynamics of key lineage genes from our FACS-indexed Smart-seq2 data (Fig. 7c). This suggests that NRN, in combination with high-quality reference datasets, can be used to study the continuous nature of cellular differentiation processes by flow cytometry.
A key limitation of single-cell genomics remains the lack of insight into functional differentiation capacities of cells. We therefore evaluated whether NRN can be used to interpret functional single-cell data in the context of single-cell genomic reference maps. For this purpose, we performed single-cell culture assays, while recording surface markers of our data-defined gating scheme from Fig. 6, followed by data integration using our Abseq data via NRN. As expected, cells with the highest proliferative capacity and lineage potency were placed in the phenotypic HSC and MPP compartments, and HSPCs placed along the transcriptomically defined differentiation trajectories continuously increased the relative generation of cells of the respective lineage (Fig. 7d). Functionally unipotent progenitors cells were observed along the respective transcriptomic trajectories, but were also present in the phenotypic HSC/MPP compartment (Fig. 7d,g), in line with previous findings on early lineage commitment of HSPCs9,10,21. By contrast, oligopotent cells with distinct combinations of cell fates were enriched specifically in the HSC/MPP compartment (Fig. 7d,g). Some of these fate combinations, in particular combinations of erythroid, megakaryocytic and eosinophilic/basophilic fates, and combinations of lymphoid, neutrophilic, monocytic and dendritic fates, co-occurred more frequently than expected by chance (Fig. 7e,f), in line with most recent findings on routes of lineage segregation9,18,40,41. Despite strong associations between surface phenotype, transcriptome and function, cells with a highly similar phenotype can give rise to different combinations of lineages (Fig. 7g). This observation suggests a role of stochasticity in the process of lineage commitment, or hints towards layers of cell fate regulation not observed in the transcriptome. Together, our observations confirm that hematopoietic lineage commitment occurs predominantly continuously along the routes predicted by the transcriptome, with an early primary erythromyeloid versus lymphomyeloid split9,10,18,21,40,41 and might help reconciling discrepancies in the interpretation of previous studies.
In summary, our data resource, alongside the NRN algorithm, enables accurate integration of flow data with single-cell genomics data. This permits the charting of continuous processes by flow cytometry and the mapping of single-cell functional data into the single-cell genomics space.
Discussion
In this study, we have demonstrated the power of single-cell proteo-genomic reference maps for the design and analysis of cytometry experiments. We have introduced a map of human blood and BM spanning the expression of 97–197 surface markers across 45 cell types and stages of HSC differentiation, healthy ageing and leukemia. Our dataset is carefully annotated and will serve as a key resource for hematology and immunology.
While cytometry experiments remain the workhorse of immunology, stem-cell biology and hematology, recent single-cell atlas projects have revealed that current cytometry setups do not accurately reflect the full complexity of biological systems10,42. For the first time, we have exploited single-cell proteo-genomic data to systematically design and interpret flow cytometry experiments that mirror most accurately the cellular heterogeneity observed by single-cell transcriptomics. Unlike approaches based on index sorting9,10,43,44, single-cell proteo-genomics has a sufficient throughput to enable the profiling of entire tissues or organs, and at the same time covers up to several hundred surface markers. Unlike single-cell RNA-seq data, antibody tag counts reflect the true distribution of surface marker expression, enabling a quantitative integration of cell atlas data with FACS. Building on these unique properties of our reference map, we have automated the design of gating schemes for the isolation of rare cell types, devised a gating strategy that reflects the molecular routes of HSC differentiation and demonstrated the direct interpretation of flow cytometry data in the context of our reference.
These advances enable a functional characterization of molecularly defined cell states and thereby directly affect HSC research. There is a growing consensus in the field that lineage commitment occurs early from primed HSCs, that not all progenitor cells in the classical megakaryocyte-erythrocyte progenitor/granulocyte-macrophage progenitor (MEP/GMP) gates are functionally oligopotent and that the main branches of the hematopoietic system are a GATA2-positive branch of erythroid, megakaryocytic and eosinophil/basophil/mast cell progenitors, as well as a GATA2-negative branch of lymphomyeloid progenitors, including the progenitors of monocytes, neutrophils and dendritic cells9,18,19,27,40,41,45. Due to a lack of better alternatives, many functional studies still use the classical gating scheme alongside the outdated concept of ‘common myeloid progenitors’15,16,28. Here, we introduce and validate a flow cytometry scheme that allows the prospective isolation of molecularly homogeneous progenitor populations. We have used this scheme to show that transcriptional lineage priming impacts on cellular fate in vitro9,21, thereby contributing further evidence for the revised model of hematopoiesis. In the future, a wider use of this scheme has the potential to avoid conflicting results stemming from imprecisely defined populations.
Furthermore, these advances enable the rapid profiling of blood formation and other BM phenotypes while offering a resolution comparable with that of single-cell genomics. Recently, BM phenotypes of disease, ranging from sickle cell disease46 to leukemia47 have been investigated using scRNA-seq. However, due to economic and experimental hurdles, the throughput of these studies has remained restricted to maximally tens of patients. Accordingly, the ability to associate patient genotypes with phenotypes is thereby highly limited, and these assays have not been translated to diagnostic routines. Our new gating schemes and analytical strategies are widely applicable to profile aberrations encountered in disease, both in research and, ultimately, in clinical diagnostics.
Although we have demonstrated the implementation of data-driven design and analysis strategies for cytometry assays in the context of BM, conceptually the approach presented here can be applied to any organ of interest. Thereby, it has the potential to enable the precise isolation and routine profiling of myriad cell types discovered by recent single-cell atlas projects.
Methods
All reagents and antibodies used are listed in Supplementary Tables 1 (primers for targeted transcriptomics), 2 (Abseq antibodies) and 6 (all other reagents, oligonucleotides, equipment and software).
Human samples
BM samples from healthy and diseased donors were obtained at the University clinics in Heidelberg and Mannheim after informed written consent using ethic application numbers S480/2011 and S-693/2018. For demographic characteristics on sample donors, see Supplementary Table 3. BM aspirates were collected from iliac crest. Healthy BM donors received financial compensation in some cases. For BM, mononuclear cells were isolated by Ficoll (GE Healthcare) density gradient centrifugation and stored in liquid nitrogen until further use. All experiments involving human samples were approved by the ethics committee of the University Hospital Heidelberg and were in accordance with the Declaration of Helsinki.
Cell sorting for Abseq
Human BM samples were thawed in a water bath at 37 °C and transferred dropwise into RPMI-1640 10% FCS. Cells were centrifuged for 5 min at 350 and washed once with RPMI-1640 10% FCS. Cells were resuspended in FACS buffer (FB) (PBS 5% FCS 0.5 mM EDTA) containing CD34-PE and CD3 PE-Cy7 and FcR blocking reagent (Miltenyi) and incubated for 15 min at 4 °C. Cells were washed with FB and resuspended in 1 ml FB, followed by addition of 1 µl CellEvent Caspase-3/7 Green (ThermoFisher) and 1 µl 4,6-diamidino-2-phenylindole (DAPI) (ThermoFisher) to the cell suspension. After 3 min incubation at room temperature, cells were filtered through a 40 µm cell strainer. Singlet, CaspaseGreen− DAPI− total BM and singlet, CaspaseGreen− DAPI− CD34+ (HSPCs) as well as singlet, CaspaseGreen− DAPI− CD3+ (T cells) cells were sorted on an Aria Fusion II cell sorter (BD). In general, the entire CD34+ fraction from one thawed vial was sorted (~2 × 104) and combined with 1 × 105 CD34− total BM cells (see also Extended Data Fig. 2). In CD3+ T cell-enriched AML samples, 2 × 104 CD3+ T cells were mixed with the CD34+ HSPC fraction and combined with 1 × 105 CD34− total BM cells. For the generation of the AML query datasets, 2 × 104 live total BM cells from each of 12 different AML samples were sorted. In case of the CD34+ immature HSPCs enrichment experiment, healthy adult human BM cells were stained with anti-human CD34, CD38, CD45RA, CD10 and fixable viability dye efluor506 and 5 × 103 were sorted from each of four different gates (CD34+CD38+CD45RA−, CD34+CD38+CD45RA+, CD34+CD38−CD45RA−, CD34+CD38−CD45RA+). In cases where different biological samples or sorted populations were combined in the same run, cells of interest were sorted and labeled by cell hashing antibodies before surface labeling and single-cell capture as described in Abseq surface labeling, single-cell capture and library preparation.
Cell sorting for gene expression analysis and flow cytometry
Human BM samples were thawed as described above. For dead cell exclusion and blocking of nonspecific binding, fixable viability dye efluor506 (ThermoFisher) and FcR blocking reagent (Miltenyi) were used in all staining solutions. Cells were generally stained for 15 min at 4 °C and then washed once with FB, resuspended in 1 ml FB and filtered through a 40 µm cell strainer. For cytotoxic CD4+ T cell sorting, cells were stained in FB containing anti-CD3, CD4, CD7, CD28, CD45RA, CD45 and CD127 surface antibodies. Singlet, live, CD45+, CD3+ cells were gated and CD4+CD28− or CD4+CD28+ cells were sorted and processed as described below. For MSC gene expression analysis, cells were stained in FB containing anti-CD10, CD11a, CD13, CD26, CD31, CD45, CD49a, CD90, CD105, CD146 and CD271 surface antibodies. Singlet, live, CD11a−CD13+ MSCs or all cells outside this gate were sorted. Cells were sorted on either FACSAria Fusion or FACSAria II equipped with 100 µm nozzles, respectively.
For flow cytometric analysis, human BM samples were processed as described above. For analysis of cytotoxic CD4 T cells across hematopoietic malignancies, cells were stained with anti-CD3, CD4, CD7, CD25, CD28, CD45RA, CD45, CD69 and CD127 surface antibodies. For analysis of CD98 expression in hematopoietic stem and progenitors, cells were stained with anti-human CD4, CD10, CD11a, CD34, CD38, CD45RA, CD49f, CD90, CD98, CD133 and Tim3 antibodies. For analysis of CD326 surface expression in comparison with CD71 and CD41, healthy adult human BM was stained with anti-human CD34, CD38, CD41, CD44, CD45RA, CD49b, CD49d, CD71, CD90 and CD123 antibodies. All experiments were measured on BD FACSFortessa flow cytometers, equipped with five lasers.
Panel design for targeted transcriptomics
Panel design is described in Supplementary Note 1. In short, we used a human cell atlas reference and followed the method described by Schraivogel et al. for target gene selection49.
Abseq surface labeling, single-cell capture and library preparation
Abseq surface antibody libraries (Supplementary Table 2) were pipetted 24 h before experiments. For most antibodies, 1 µl was used for surface library preparation. Antibodies recognizing epitopes with well-known high surface expression were further diluted in PBS and 1 µl was added to the surface library (for example HLA ABC, CD45, CD11a). Sorted cells (around 1.2 × 105–1.4 × 105; described in Cell sorting for Abseq) were centrifuged 5 min at 350g and resuspended in the surface library mix (around 100 µl for the 97 Ab panel, 200 µl for the 197 Ab panel). In cases where different biological samples or sorted populations were combined in the same run, sorted cells were labeled individually with oligonucleotide coupled cell hashing antibodies (BD single-cell multiplexing kit) for 25 min on ice, washed three times in all, each followed by 5 min centrifugation at 350g and then pooled and then subjected to Abseq cell surface labeling. Cells were then labeled for 30 min at 4 °C and washed three times in all, each followed by 5 min centrifugation at 350g. Cells were resuspended in sample buffer (BD Rhapsody Cartridge reagent kit) and between 1 × 104 and 2 × 104 cells were captured with the BD Rhapsody single-cell system following the manufacturer’s instructions50. Antibody tag libraries, multiplexing libraries and targeted mRNA gene expression libraries were generated following manufacturer instructions. For mRNA libraries, the targeted panel (Supplementary Table 1) or the whole transcriptome analysis library preparation protocol was used according to the manufacturer’s instructions (BD). Resulting libraries were quality checked by Qubit and Bioanalyzer, pooled and sequenced using NextSeq500 or Illumina Novaseq S2 (Illumina; high-output mode).
Single-cell index cell cultures
Two days before index sorting, irradiated MS-5 feeder cells were plated at a density of 1 × 104 cells per well into 96-well flat-bottom cell culture plates in αlpha-minimal essential medium with ribo-and deoxynucleosides (ThermoFisher) containing 10% FCS (Gibco), glutamine (2 mM) (ThermoFisher), penicillin/streptomycin (100 U ml−1) (ThermoFisher) and sodium pyruvate (2 mM) (Gibco). Several hours before index sorting, the medium was replaced by 100 µl H5100 medium (StemCell Technologies) containing glutamine (2 mM) (ThermoFisher), penicillin/streptomycin (100 U ml−1) (ThermoFisher), hydrocortisone (1 nM) (StemCell Technologies), SCF (20 ng ml−1), FLT3-L (100 ng ml−1), TPO (50 ng ml−1), IL-3 (20 ng ml−1), IL-5 (20 ng ml−1), IL-6 (20 ng ml−1), IL-7 (20 ng ml−1), IL-11 (20 ng ml−1), G-CSF (20 ng ml−1), GM-CSF (20 ng ml−1), M-CSF (20 ng ml−1) (all Preprotech) and EPO (3 U ml−1) (R&DSystems). Two BM samples from the same donor were thawed and washed as described above. The first sample was subsequently resuspended in 100 µl FB containing anti-human CD4, CD10 (BioLegend), CD11a, CD11c, CD19, CD33, CD34, CD38, CD61, CD123, CD133 and Tim3 antibodies (Classification panel), whereas the second sample was stained with anti-human CD11a, CD33, CD34 (Biolegend), CD38, CD49b, CD61, CD71, CD123, CD133, CD326 and FcεR1A (eBioscience) (Semiautomated panel). In another experiment, cells were labeled with anti-human CD11a, CD71, CD45RA, CD44, CD135, Tim3 (Biolegend), CD90, CD326, CD41 (BioLegend), CD123 (ThermoFisher), CD10, CD38 and CD34 (BioLegend) antibodies (Consensus panel). All antibody clones for flow cytometry matched clones from Abseq experiments and were purchased from BD, except otherwise indicated. For dead cell exclusion and blocking of nonspecific binding, fixable viability dye efluor506 (ThermoFisher) and FcR blocking reagent (Miltenyi) were included in both staining solutions. After staining for 15 min at 4 °C, cells were washed with FB, resuspended in 1 ml FB and filtered through a 40 µm cell strainer. For both assays, 480 single, live CD34+ cells were FACS indexed and sorted into the feeder cell containing 96-well plates as described above. Cells were incubated at 37 °C, 5% CO2 for 16–19 days. To analyze clonal output, cells were harvested and transferred to 96-well V bottom plates, washed with FB and resuspended in 10 µl FB containing anti-human CD1c (Biolegend), CD14, CD19 (Biolegend), CD34 (Biolegend), CD41a (Biolegend), CD45, CD56, CD66b, CD123, CD235a, CD303, CD141, CD370 (Biolegend) and FcεR1a (eBioscience). For dead cell exclusion and blocking of nonspecific binding, fixable viability dye efluor506 (ThermoFisher) and FcR blocking reagent (Miltenyi) were included in the staining solution. After staining for 15 min at 4 °C, cells were washed with FB and resuspended in 100 µl FB and filtered through a 40 µm cell strainer. Cells were analyzed on a LSRII (BD) flow cytometer. Erythroid lineage output was determined via CD235+ expression, which was concomitant with the downregulation of CD45 expression (CD45−CD235+). Myeloid lineages were defined via CD66b and CD14 antibodies (CD235−CD45+CD66b+ or CD235−CD45+CD14+). Dendritic cell lineages were defined via CD1c, CD141, CD370, CD303 and CD123 expression. Lymphoid cell lineages were defined via CD19 and CD56 expression. Megakaryocyte output was determined via CD41a expression, Eosinophil/basophil output was determined via FcεR1a expression. Generally, only wells that contained more than ten CD45+CD235− or CD45+CD235+ or CD45+CD235− cells were considered during analysis if not stated otherwise. For calculation of erythroid ratios, the count of all generated erythroid cells was divided by the sum of all other generated cells. Myeloid ratios were determined by dividing the sum of generated myeloid and dendritic cells by the sum of all other generated cells.
Single-cell index RNA-sequencing
For single-cell index RNA-sequencing, cells from the same samples that were prepared for single-cell cell index cultures were used. Hardshell 96-well polymerase chain reaction (PCR) plates (Bio-Rad) were prefilled with 4 µl lysis buffer containing 1 µl RNase inhibitor (40 U ml−1, Takara), 1.9 µl Triton X-100 (0.2%, Sigma), 1 µl oligo dT30VN (10 µM, Sigma) and dNTPs (10 mM, ThermoFisher). Cells were FACS indexed, sorted into lysis buffer and snap frozen on dry ice. For cell lysis, plates were incubated for 5 min at 10 °C, followed by incubation for 3 min at 72 °C in a thermocycler (PCRMax). For reverse transcription, 0.25 µl RNase inhibitor (40 U ml−1, Takara), 0.5 µl DTT (20 mM, Takara) 0.2 µl template switching oligonucleotides (50 µM, IDT), 1.05 µl H2O (Ambion), 2 µl Smartscribe buffer (5×, Takara) and 1 µl Smartscribe (100 U ml−1, Takara) was added to each well. Reverse transcription was performed by incubating plates for 90 min at 42 °C, followed by ten cycles of 2 min at 50 °C, 2 min 42 °C, followed by 10 min at 72 °C followed by 4 °C storage. To amplify cDNA, 12.5 µl KAPA HiFi HotStart (Roche), 0.25 µl ISPCR primer (10 µM, Sigma) and 2.25 µl H2O was added to each well. Plates were incubated for 3 min at 98 °C, 23 cycles of 20 s at 98 °C, 15 s at 67 °C, 6 min at 72 °C followed by one stage for 5 min at 72 °C, followed by final storage at 4 °C. cDNA was then cleaned up using an equal volume (25 µl) of SPRIselect beads (Beckman) and tagmented using homemade Tn551. Resulting libraries were quality checked by Qubit and Bioanalyzer, pooled and sequenced using all lanes in an Illlumina Hiseq 4000.
Real-time-quantitative PCR
For real-time-quantitative PCR (RT-qPCR) analysis, cells of interest were sorted directly into RNA lysis buffer (Arcturus PicoPure RNA Isolation Kit, Life Technologies, Invitrogen), snap frozen and stored at −80 °C or processed directly for cDNA synthesis using SuperScript VILO cDNA synthesis kit (Invitrogen) according to the manufacturer’s instructions. Depending on the sorted cell number, cDNA was further diluted 1:5–1:10 in RNase-free water and 6 µl was mixed in technical triplicates in 384-well plates with 0.5 µl of forward and reverse primer (10 µM) and 7 µl PowerUP SybrGreen Mastermix (Thermo Fisher). Program: 50 °C for 2 min, 95 °C for 10 min and 40 cycles of 95 °C for 15 s, 60 °C 1 min. Primers were designed to be intron spanning whenever possible using PrimerBlast (National Center for Biotechnology Information) and purchased from Sigma Aldrich (purification: desalting). Experiments were performed on the ViiA7 System (Applied Biosystems) and analysis of gene amplification curves was performed using the Quant StudioTM Real-Time PCR Software v.1.3 (Applied Biosystems). RNA expression was normalized to the housekeepers glyceraldeyde-3-phosphate dehydrogenase and beta actin for gene expression analysis. Relative expression levels (2−ΔCt, ∆Ct = (geometric mean Housekeeper Ct)−(gene of interest Ct)) of replicates were log10 transformed and z-scored. Primers used in this study can be found in Supplementary Table 6.
Analysis of Abseq data
Fastq files were processed via the standard Rhapsody analysis pipeline (BD Biosciences) on Seven Bridges (https://www.sevenbridges.com) according to the manufacturer’s recommendations. The resulting unique molecular identifier (UMI) count matrices were imported into R (v.3.6.2) and processed with the R package Seurat (v.3.1.3 and 3.2.0)52. To account for differences in sequencing depth across cells, both layers were normalized independently using Seurat defaults. RNA UMI counts were log-normalized, while antibody UMI counts were centered using log ratio normalization to account for unspecific binding background signal. Subsequently, both normalized matrices were concatenated and integration across patients was performed using Scanorama53. The resulting corrected counts were used for visualization and clustering analysis. Nonintegrated, raw counts were used for differential expression testing.
Multiomics factor analysis integration, clustering and identification of cell type markers
Following integration, we removed genes and surface markers with variance near to zero using the caret package54 and used MOFA to perform data integration across modalities55. A total of 30 multiomics factor analysis (MOFA) factors were used as a starting point, with a drop factor threshold of 0.001. The resulting MOFA dimensions were used to construct a shared nearest neighbor graph and modularity-based clustering using the Louvain algorithm was performed. Finally, UMAP visualization was calculated using 30 neighboring points for the local approximation of the manifold structure. Marker genes and surface markers for every cell type were identified by comparing the expression of each in a given cluster against the rest of the cells using the receiver operating characteristic test. To evaluate, which genes classify a cell type, cell type specific markers were selected as those with the highest classification power defined by the area under the receiver operating characteristic curve.
Processing of Smart-seq2 data
Count matrices were generated using pseudoalignment with Kallisto56 using the GRCh38 human reference genome as implemented in the Scater package v.1.14.6 (ref. 57). Gene level expression counts were imported into Seurat. Low-quality cells were removed on the basis of the percentage of mitochondrial RNA reads (>20%) and number of detected genes (<1,000). The remaining data were further processed using Seurat. Data was log-normalized and scaled. The top 2,000 highly variable genes were used for clustering and UMAP calculation. Cells were then annotated as described in Supplementary Note 8.
Abseq App web application
Differential expression, data visualization and gating scheme calculation can be performed in the Abseq App shiny web application (https://abseqapp.shiny.embl.de/). The application was written in R and relies on the packages shiny and aws.s3. A demonstration video of the app is included as Supplementary Video 1.
Pseudotime analysis
To reconstruct possible cell lineages from our single-cell gene expression data, data from individual samples were subset to include only the cell types from the CD34+ hematopoietic stem and progenitor compartment. MOFA–UMAP embedding was then used as input for pseudotime analysis by slingshot58. The HSC cluster was used as a start cluster, and myelocytes, class switched memory B cells, late erythroid progenitors, megakaryocyte progenitors and conventional dendritic cell compartments as the end clusters. The genes that significantly changed through pseudotime were determined by fitting a generalized additive model (GAM) for each gene, using the TradeSeq package59.
Modeling variance in surface marker expression
To attribute the variance in surface marker expression to biological processes, we used the variancePartition package60 on log-transformed antibody read count data. As covariates, we used cell type annotation (for all cells except CD34+ HSPCs), splines with three degrees of freedom fitted through pseudotime (for CD34+ HSPCs, Pseudotime analysis), cell cycle scores (calculated using Seurat package defaults), scores for cytotoxicity and stemness (calculated using the gene lists in Supplementary Table 7 and the Seurat function AddModuleScore()), as well as technical covariates (number of genes observed, number of surface markers observed, reads on surface markers, reads on genes). To also account for variance explained by any hypothetical processes not in this predefined list, we additionally performed a factor analysis of the entire dataset (RNA plus surface markers) while accounting for the known covariates using ZiNB-WAVE61. We ran ZiNB-WAVE with four unknown factors on the concatenated mRNA and surface marker expression matrices while using a gene level-covariate specifying whether each row in the matrix is an mRNA or surface marker. The unknown factors explained only a very small part of the variance, and appeared to capture mostly differentiation processes not optimally explained by the pseudotime. Of note, markers with low absolute expression are more strongly subject to stochastic expression or measurement noise, while markers that are expressed by many different cell types are more strongly subject to technical effects, such as differences in single-cell library quality, likely due to the absence of true biological variability for these markers (Extended Data Fig. 5a). Other covariates are not affected by the expression level of the markers.
Projection on a reference atlas
The projection on the reference dataset is described in Supplementary Note 7. In short, we used scMAP to calculate nearest neighbors and thereby determined cell type label, MOFA–UMAP coordinates and pseudotime value.
Differential expression testing between experimental groups and estimation of interpatient variability
For comparing surface protein abundance between young and aged healthy as well as leukemic individuals, antibody tag read counts were summed at the level of cell types for each experimental batch (that is, donor). Differential expression testing was then performed for these pseudobulks using DESeq2 (ref. 62), either separately for each cell type (Fig. 4i, Extended Data Fig. 7c and Supplementary Data 2), or jointly across all cells while accounting for cell type as a covariate (Fig. 4b). For cell-type-specific comparisons, only samples for which the respective cell type was covered with at least 20 cells were included. When comparing leukemic with healthy individuals, age and gender were used as additional covariates. Unlike single-cell specific methods, DESeq2 estimates the variance in gene expression between experimental replicates to separate signal from noise while using a negative binomial distribution that is sufficiently generic to capture the count-nature data of antibody-based pseudobulk expression values.
To estimate the degree of interpatient variability of surface marker abundance while accounting for cell state differences, we trained random forest classifiers to predict the experimental batch (that is, donor) from gene expression separately for each cell state. The feature importance score from these classifiers was then scaled from zero to one and used to estimate interpatient variability.
Changes in cell type abundance between experimental groups
To identify cell types that change in abundance between young and aged individuals (Extended Data Fig. 7a), we considered the following: first, different amounts of CD34+, CD3+ and total BM cells were sorted. Hence, frequencies were always computed within the respective gate (for example, for CD8+ effector T cells, the frequency among CD3+ T cells was computed). We then compared the following statistical models of observed cell type frequency pi in individual i:
Here C(i) indicates if individual i is young or old.
Finally, we sought to distinguish between a model where cell type frequencies change as a function of age, and a model where cell type frequencies are simply highly variable between individuals, with no relationship to age:
We compared the M1 and M2 models to M0 using a Bayesian strategy termed leave-one-out information criterion63 to identify cell types with high evidence for between-group and interindividual variability, respectively.
Thresholding of surface marker expression
For every sample separately, thresholds were calculated using the normalized antibody counts to distinguish marker-positive from marker-negative cells. For this we implemented the Otsu algorithm as described by Otsu64.
Data-driven identification of gating schemes
To account for the CD34+ FACS enrichment of HSPCs performed in our samples, we divided the BM cells into CD34+ and CD34− subsets. For individual cell type gating scheme calculation, we compared three different methods. The first two methods are based on a decision tree using either the continuous normalized surface marker expression matrix named ‘Tree continuous’, or a transformed Boolean matrix (‘Tree Otsu’). For the latter method, a cutoff for each antibody was calculated using the histogram-based Otsu algorithm as described above and the matrix was binarized accordingly. In both cases, the tree was determined using the package Rpart and, if needed, pruned to the maximum number of required surface markers. The third method is based on the Hypergate algorithm34. For this, we used the target population as the Hypergate gate vector input, calculated the predicted gating scheme and calculated the channel contributions of each surface marker using a beta of 1. Afterwards we used the contributions to optimize the predicted gating scheme to only include the maximum number of surface markers selected. Furthermore, we gated the samples using a canonical gating scheme (Expert) reported in the literature to predict gatings for some of the cell types present in the BM (Supplementary Table 5). For this, we used the Otsu threshold to split each population into marker-positive and negative populations. For each gate, the following metrics were calculated: first, the purity (Pr), that is, the proportion of target cells in the final gate and second, the recall (Rc), that is, the proportion of target cells gated from their original total population.
For the simultaneous gating calculation of all cells from the HSC and progenitor compartment, we selected the cells from BM (Young1) with a CD34 surface expression higher than 0.95. Subsequently, we downsampled cells to the same number of cells across populations. Subsequently, we calculated the decision tree with the Rpart package, using the ‘continuous’ approach defined above.
The NRN algorithm for integrating FACS and single-cell genomics data
To project flow cytometric measurements of surface protein abundance from CD34+ cells onto the single-cell reference, we initially subset the single-cell reference to exclude CD34− cells, and flow cytometry data was transformed using the “logicle” transform using FlowJo (v.10.7.1). Subsequently, the expression of each surface marker was normalized separately both in the flow cytometry and in the Abseq dataset using a rank-based approach. In particular, sample ranks were computed and divided by the total number of samples, that is, data was mapped to a scale from 0 to 1 where 0 indicates lowest expression within the dataset, and 1 indicates highest expression. Within this normalized gene expression space, the cosine distance between any cell from the Abseq (reference) dataset and the FACS (query) dataset was computed, the four nearest reference neighbors of every query cell were identified and the average position of these neighbors in UMAP and pseudotime space was computed using scmap65. Subsequently, the average Euclidean distance of the reference neighbors in MOFA space was computed to identify cells with inconsistent mapping results. These cells were later removed by applying a user-defined threshold (here, 8). In the case of the Smart-seq2 dataset, a total of 75 cells were thereby removed from the analyses.
Data visualization for a definition of boxplot elements
All plots were generated using the ggplot2 (v.3.2.1) package in R 3.6.2, GraphPad Prism (v.8 and v.9.1 for MacOS) or FlowJo (v.10.7.1, BD). Boxplots are defined as follows: the middle line corresponds to the median; the lower and upper hinges correspond to first and third quartiles, respectively; the upper whisker extends from the hinge to the largest value no further than 1.5× the interquartile range (or the distance between the first and third quartiles) from the hinge and the lower whisker extends from the hinge to the smallest value at most 1.5× the interquartile range of the hinge. Data beyond the end of the whiskers are called ‘outlying’ points and are plotted individually.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41590-021-01059-0.
Supplementary information
Acknowledgements
We thank V. Lopez-Salmeron, V. Ramani, E. Kowalczyk and W. Keilholz from BD Biosciences/Multiomics for providing oligo-labeled antibodies and for their support in the implementation of the Rhapsody platform. We would like to thank members of the Haas, Velten, Trumpp and Steinmetz laboratories for helpful discussions. Moreover, we thank members of the DKFZ flow cytometry and the EMBL genomics core facility for support. This work was supported financially by the Emerson foundation grant 643577 (to L.V.), grant PID2019-108082GA-I00 from the Spanish Ministry of Science, Innovation and Universities (MCIU/AEI/FEDER, UE), the German Bundesministerium für Bildung und Forschung (BMBF) through the Juniorverbund in der Systemmedizin ‘LeukoSyStem’ (FKZ 01ZX1911D to L.V., S.H. and S.R.), SFB873, FOR2674 and FOR2033 funded by the Deutsche Forschungsgemeinschaft (DFG), the SyTASC consortium (Deutsche Krebshilfe), The Darwin Trust of Edinburgh (to S.T.), the ERC Consolidator Grant METACELL (773089) (to T.A.), the Dietmar Hopp Foundation (all to A.T.) and the José Carreras Foundation for Leukemia Research (grant no. DCJLS 20 R/2017 to L.V., A.T. and S.H.). L.V. acknowledges the support of the Spanish Ministry of Science and Innovation to the EMBL partnership, the Centro de Excelencia Severo Ochoa and ‘the CERCA Programme/Generalitat de Catalunya. D.N. is an endowed professor of the Deutsche José 641Carreras Leukämie Stiftung (DJCLS H 03/01). Contributions by D.N., J.-C.J., W.-K.H. and T.B. were supported by the Gutermuth Foundation, the H.W. & J. Hector fund, Baden-Württemberg. Figure 1a and Supplementary Note Fig. 3f were created at BioRender.com.
Extended data
Source data
Author contributions
S.H., L.V. and M.P. conceived the study with help from D.H., A.T. and V.B. D.V., S.T. and M.P. performed the single-cell proteo-genomics experiments with help from D.L. and V.B. D.V. performed the experimental validations, established new experimental gating schemes and performed functional experiments with help from M.A. and P.H-M. S.T., L.J-S. and L.V. performed bioinformatics analyses with conceptional input from D.V., M.P. and S.H. S.T. developed the Abseq App. S.T. and L.V. established the NRN algorithm. S.H. supervised the experimental work with conceptional input from L.V. L.V. supervised the bioinformatics analyses with conceptional input from S.H. T.A. cosupervised S.T. M.P., D.O-R. and B.R. provided assistance in cell sorting and single-cell work-flows. S.R., R.L., T.B., J-C.J, D.N., W-K.H. and C.M-T. provided clinical samples and conceptional input on data interpretation. S.H., L.V., S.T., L.J-S. and D.V. wrote the manuscript and prepared figures. All authors have carefully read the manuscript.
Funding
Open access funding provided by Deutsches Krebsforschungszentrum (DKFZ).
Data availability
Data is available for interactive browsing at https://abseqapp.shiny.embl.de. Datasets including raw and integrated gene expression data, cell type annotation, metadata and dimensionality reduction are available as Seurat v.3 objects through figshare: https://figshare.com/projects/Single-cell_proteo-genomic_reference_maps_of_the_human_hematopoietic_system/94469. FACS data are provided through figshare: https://figshare.com/projects/Supplementary_data_FACS_data_from_Single-cell_proteo-genomic_reference_maps_of_the_human_hematopoietic_system/122716. Fastq files are available from the European Genome-Phenome Archive under accession number EGAS00001005593. Source data are provided with this paper.
Code availability
The implementation of the NRN algorithm and vignettes describing the workflow for projecting single-cell RNA-seq data on the reference are available at https://git.embl.de/triana/nrn.
Competing interests
The oligo-coupled antibodies used in this study were a gift from BD Biosciences. The authors declare no other relevant conflicts of interest.
Footnotes
Peer review information Nature Immunology thanks the anonymous reviewers for their contribution to the peer review of this work. Zoltan Fehervari was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Sergio Triana, Dominik Vonficht, Lea Jopp-Saile.
Contributor Information
Lars Velten, Email: lars.velten@crg.eu.
Simon Haas, Email: simon.haas@bih-charite.de.
Extended data
is available for this paper at 10.1038/s41590-021-01059-0.
Supplementary information
The online version contains supplementary material available at 10.1038/s41590-021-01059-0.
References
- 1.Stuart T, Satija R. Integrative single-cell analysis. Nat. Rev. Genet. 2019;20:257–272. doi: 10.1038/s41576-019-0093-7. [DOI] [PubMed] [Google Scholar]
- 2.Tanay A, Regev A. Scaling single-cell genomics from phenomenology to mechanism. Nature. 2017;541:331–338. doi: 10.1038/nature21350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Giladi A, Amit I. Single-cell genomics: a stepping stone for future immunology discoveries. Cell. 2018;172:14–21. doi: 10.1016/j.cell.2017.11.011. [DOI] [PubMed] [Google Scholar]
- 4.Schaum N, et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature. 2018;562:367–372. doi: 10.1038/s41586-018-0590-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Han X, et al. Mapping the mouse cell atlas by Microwell-seq. Cell. 2018;172:1091–1107.e17. doi: 10.1016/j.cell.2018.02.001. [DOI] [PubMed] [Google Scholar]
- 6.Han X, et al. Construction of a human cell landscape at single-cell level. Nature. 2020;581:303–309. doi: 10.1038/s41586-020-2157-4. [DOI] [PubMed] [Google Scholar]
- 7.Baccin C, et al. Combined single-cell and spatial transcriptomics reveal the molecular, cellular and spatial bone marrow niche organization. Nat. Cell Biol. 2020;22:38–48. doi: 10.1038/s41556-019-0439-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Van Dongen JJM, et al. EuroFlow antibody panels for standardized n-dimensional flow cytometric immunophenotyping of normal, reactive and malignant leukocytes. Leukemia. 2012;26:1908–1975. doi: 10.1038/leu.2012.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Velten L, et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 2017;19:271–281. doi: 10.1038/ncb3493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Paul F, et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell. 2015;163:1663–1677. doi: 10.1016/j.cell.2015.11.013. [DOI] [PubMed] [Google Scholar]
- 11.Loughran SJ, Haas S, Wilkinson AC, Klein AM, Brand M. Lineage commitment of hematopoietic stem cells and progenitors: insights from recent single cell and lineage tracing technologies. Exp. Hematol. 2020;88:1–6. doi: 10.1016/j.exphem.2020.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Haas S, Trumpp A, Milsom MD. Causes and consequences of hematopoietic stem cell heterogeneity. Cell Stem Cell. 2018;22:627–638. doi: 10.1016/j.stem.2018.04.003. [DOI] [PubMed] [Google Scholar]
- 13.Laurenti E, Göttgens B. From haematopoietic stem cells to complex differentiation landscapes. Nature. 2018;553:418–426. doi: 10.1038/nature25022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Jacobsen SEW, Nerlov C. Haematopoiesis in the era of advanced single-cell technologies. Nat. Cell Biol. 2019;21:2–8. doi: 10.1038/s41556-018-0227-8. [DOI] [PubMed] [Google Scholar]
- 15.Akashi K, Traver D, Miyamoto T, Weissman IL. A clonogenic common myeloid progenitor that gives rise to all myeloid lineages. Nature. 2000;404:193–197. doi: 10.1038/35004599. [DOI] [PubMed] [Google Scholar]
- 16.Kondo M, Weissman IL, Akashi K. Identification of clonogenic common lymphoid progenitors in mouse bone marrow. Cell. 1997;91:661–672. doi: 10.1016/s0092-8674(00)80453-5. [DOI] [PubMed] [Google Scholar]
- 17.Doulatov S, et al. Revised map of the human progenitor hierarchy shows the origin of macrophages and dendritic cells in early lymphoid development. Nat. Immunol. 2010;11:585–593. doi: 10.1038/ni.1889. [DOI] [PubMed] [Google Scholar]
- 18.Tusi BK, et al. Population snapshots predict early haematopoietic and erythroid hierarchies. Nature. 2018;555:54–60. doi: 10.1038/nature25741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Giladi A, et al. Single-cell characterization of haematopoietic progenitors and their trajectories in homeostasis and perturbed haematopoiesis. Nat. Cell Biol. 2018;20:836–846. doi: 10.1038/s41556-018-0121-4. [DOI] [PubMed] [Google Scholar]
- 20.Nestorowa S, et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood. 2016;128:e20–e31. doi: 10.1182/blood-2016-05-716480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Notta F, et al. Distinct routes of lineage development reshape the human blood hierarchy across ontogeny. Science. 2016;351:aab2116. doi: 10.1126/science.aab2116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Perié L, Duffy KR, Kok L, De Boer RJ, Schumacher TN. The branching point in erythro-myeloid differentiation. Cell. 2015;163:1655–1662. doi: 10.1016/j.cell.2015.11.059. [DOI] [PubMed] [Google Scholar]
- 23.Rodriguez-Fraticelli AE, et al. Clonal analysis of lineage fate in native haematopoiesis. Nature. 2018;553:212–216. doi: 10.1038/nature25168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Haas S. Hematopoietic stem cells in health and disease—insights from single-cell multi-omic approaches. Curr. Stem Cell Rep. 2020;6:67–76. [Google Scholar]
- 25.Karamitros D, et al. Single-cell analysis reveals the continuum of human lympho-myeloid progenitor cells article. Nat. Immunol. 2018;19:85–97. doi: 10.1038/s41590-017-0001-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Psaila B, et al. Single-cell profiling of human megakaryocyte-erythroid progenitors identifies distinct megakaryocyte and erythroid differentiation pathways. Genome Biol. 2016;17:83. doi: 10.1186/s13059-016-0939-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Pellin D, et al. A comprehensive single cell transcriptional landscape of human hematopoietic progenitors. Nat. Commun. 2019;10:2395. doi: 10.1038/s41467-019-10291-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Pei W, et al. Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature. 2017;548:456–460. doi: 10.1038/nature23653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Shahi, P., Kim, S. C., Haliburton, J. R., Gartner, Z. J. & Abate, A. R. Abseq: ultrahigh-throughput single cell protein profiling with droplet microfluidic barcoding. Sci. Rep. 7, 44447 (2017). [DOI] [PMC free article] [PubMed]
- 30.Stoeckius M, et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods. 2017;14:865–868. doi: 10.1038/nmeth.4380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Fagnoni FF, et al. Expansion of cytotoxic CD8+ CD28− T cells in healthy ageing people, including centenarians. Immunology. 1996;88:501–507. doi: 10.1046/j.1365-2567.1996.d01-689.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Peters MJ, et al. The transcriptional landscape of age in human peripheral blood. Nat. Commun. 2015;6:8570. doi: 10.1038/ncomms9570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Hanekamp D, Cloos J, Schuurhuis GJ. Leukemic stem cells: identification and clinical application. Int. J. Hematol. 2017;105:549–557. doi: 10.1007/s12185-017-2221-5. [DOI] [PubMed] [Google Scholar]
- 34.Becht E, et al. Reverse-engineering flow-cytometry gating strategies for phenotypic labelling and high-performance cell sorting. Bioinformatics. 2019;35:301–308. doi: 10.1093/bioinformatics/bty491. [DOI] [PubMed] [Google Scholar]
- 35.Szabo PA, et al. Single-cell transcriptomics of human T cells reveals tissue and activation signatures in health and disease. Nat. Commun. 2019;10:4706. doi: 10.1038/s41467-019-12464-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Takeuchi A, Saito T. CD4 CTL, a cytotoxic subset of CD4+ T cells, their differentiation and function. Front. Immunol. 2017;8:194. doi: 10.3389/fimmu.2017.00194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Al-Sabah J, Baccin C, Haas S. Single-cell and spatial transcriptomics approaches of the bone marrow microenvironment. Curr. Opin. Oncol. 2020;32:146–153. doi: 10.1097/CCO.0000000000000602. [DOI] [PubMed] [Google Scholar]
- 38.Frenette PS, Pinho S, Lucas D, Scheiermann C. Mesenchymal stem cell: keystone of the hematopoietic stem cell niche and a stepping-stone for regenerative medicine. Annu. Rev. Immunol. 2013;31:285–316. doi: 10.1146/annurev-immunol-032712-095919. [DOI] [PubMed] [Google Scholar]
- 39.Macaulay IC, et al. Single-cell RNA-sequencing reveals a continuous spectrum of differentiation in hematopoietic cells. Cell Rep. 2016;14:966–977. doi: 10.1016/j.celrep.2015.12.082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Drissen R, Thongjuea S, Theilgaard-Mönch K, Nerlov C. Identification of two distinct pathways of human myelopoiesis. Sci. Immunol. 2019;4:eaau7148. doi: 10.1126/sciimmunol.aau7148. [DOI] [PubMed] [Google Scholar]
- 41.Görgens A, et al. Multipotent hematopoietic progenitors divide asymmetrically to create progenitors of the lymphomyeloid and erythromyeloid lineages. Stem Cell Rep. 2014;3:1058–1072. doi: 10.1016/j.stemcr.2014.09.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Papalexi E, Satija R. Single-cell RNA sequencing to explore immune cell heterogeneity. Nat. Rev. Immunol. 2018;18:35–45. doi: 10.1038/nri.2017.76. [DOI] [PubMed] [Google Scholar]
- 43.Baron CS, et al. Cell type purification by single-cell transcriptome-trained sorting. Cell. 2019;179:527–542.e19. doi: 10.1016/j.cell.2019.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Wilson A, et al. Hematopoietic stem cells reversibly switch from dormancy to self-renewal during homeostasis and repair. Cell. 2008;135:1118–1129. doi: 10.1016/j.cell.2008.10.048. [DOI] [PubMed] [Google Scholar]
- 45.Zheng S, Papalexi E, Butler A, Stephenson W, Satija R. Molecular transitions in early progenitors during human cord blood hematopoiesis. Mol. Syst. Biol. 2018;14:e8041. doi: 10.15252/msb.20178041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Hua P, et al. Single-cell analysis of bone marrow–derived CD34+ cells from children with sickle cell disease and thalassemia. Blood. 2019;134:2111–2115. doi: 10.1182/blood.2019002301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.van Galen P, et al. Single-cell RNA-Seq reveals AML hierarchies relevant to disease progression and immunity. Cell. 2019;176:1265–1281.e24. doi: 10.1016/j.cell.2019.01.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.van Dijk D, et al. Recovering gene interactions from single-cell data using data diffusion. Cell. 2018;174:716–729.e27. doi: 10.1016/j.cell.2018.05.061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Schraivogel D, et al. Targeted Perturb-seq enables genome-scale genetic screens in single cells. Nat. Methods. 2020;17:629–635. doi: 10.1038/s41592-020-0837-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Erickson JR, et al. AbSeq protocol using the nano-well cartridge-based rhapsody platform to generate protein and transcript expression data on the single-cell level. STAR Protoc. 2020;1:100092. doi: 10.1016/j.xpro.2020.100092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Hennig BP, et al. Large-scale low-cost NGS library preparation using a robust Tn5 purification and tagmentation protocol. G3 (Bethesda) 2018;8:79–89. doi: 10.1534/g3.117.300257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Stuart T, et al. Comprehensive integration of single-cell data. Cell. 2019;177:1888–1902.e21. doi: 10.1016/j.cell.2019.05.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Hie B, Bryson B, Berger B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 2019;37:685–691. doi: 10.1038/s41587-019-0113-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Kuhn M. Building predictive models in R using the caret package. J. Stat. Softw. 2008;28:1–26. [Google Scholar]
- 55.Argelaguet R, et al. Multi‐omics factor analysis—a framework for unsupervised integration of multi‐omics data sets. Mol. Syst. Biol. 2018;14:e8124. doi: 10.15252/msb.20178124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 2016;34:525–527. doi: 10.1038/nbt.3519. [DOI] [PubMed] [Google Scholar]
- 57.McCarthy DJ, Campbell KR, Lun ATL, Wills QF. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics. 2017;33:1179–1186. doi: 10.1093/bioinformatics/btw777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Street K, et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018;19:477. doi: 10.1186/s12864-018-4772-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Van den Berge K, et al. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun. 2020;11:1201. doi: 10.1038/s41467-020-14766-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Hoffman GE, Schadt EE. variancePartition: interpreting drivers of variation in complex gene expression studies. BMC Bioinf. 2016;17:483. doi: 10.1186/s12859-016-1323-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Risso D, Perraudeau F, Gribkova S, Dudoit S, Vert JP. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 2018;9:284. doi: 10.1038/s41467-017-02554-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 2017;27:1413–1432. [Google Scholar]
- 64.Otsu N. Threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979;9:62–66. [Google Scholar]
- 65.Kiselev VY, Yiu A, Hemberg M. Scmap: projection of single-cell RNA-seq data across data sets. Nat. Methods. 2018;15:359–362. doi: 10.1038/nmeth.4644. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data is available for interactive browsing at https://abseqapp.shiny.embl.de. Datasets including raw and integrated gene expression data, cell type annotation, metadata and dimensionality reduction are available as Seurat v.3 objects through figshare: https://figshare.com/projects/Single-cell_proteo-genomic_reference_maps_of_the_human_hematopoietic_system/94469. FACS data are provided through figshare: https://figshare.com/projects/Supplementary_data_FACS_data_from_Single-cell_proteo-genomic_reference_maps_of_the_human_hematopoietic_system/122716. Fastq files are available from the European Genome-Phenome Archive under accession number EGAS00001005593. Source data are provided with this paper.
The implementation of the NRN algorithm and vignettes describing the workflow for projecting single-cell RNA-seq data on the reference are available at https://git.embl.de/triana/nrn.